Archive Page 2

The Data Pyramid is a Lie

If you work with data, at some point you will be presented with a powerpoint similar to this:

Data Pyramid Lie

A wonderful fictional land, where we cleanly build everything on the layer below until we reach the heavens (In the past this was wisdom or visualization, increasingly it’s mythical AI).

There are two essential things missing from this:

  1. At the end of every data sequence, should be an Action.
    If there isn’t, what are we even attempting to do?
    Wisdom – should lead to action. A visualization or email alert should prompt Action. But there MUST always be action.
  2. At every stage, there is feedback. It’s a cycle not a mythical pyramid or promised land.
    I’ve never met anyone working with data, that didn’t find something out at a later stage that meant having to go back and rework their previous steps.
    e.g.

    1. Looking at the average height of males, The United States shows 5.5m, oops I guess I better go back and interpret that as feet instead of metres.
    2. Based on analysis, you tried emailing a subset of customers that should have converted to paying customers at 5% rate, but they didn’t. So based on action, you discovered you were wrong. Time to go back to the start and examine why.

Therefore the diagram should look more like this:

Data Cycle

You start with data, you reach Action but at any stage, including after action you can loop back to earlier stages in the cycle.

I’ve purposely blurred out the steps because it doesn’t matter what’s inbetween. Inbetween should be whatever gets your team to the action quickest with the acceptable level of risk. Notice this is the SDLC software development lifecycle. Software people spent years learning this lesson and it’s still an ongoing effort to make it a proper science.

What do you think? Am I wrong?

 

 

New Streaming Order Book Depth-map

We want to be the best finance streaming visualization solution. To achieve that, we can’t just use off the shelf parts, we have built our own market data order book visualization component from scratch, it’s only dependency is webgl. We call it DepthMap. It plots price levels over time, with the shading being the amount of liquidity at that level. It’s experimental right now but we are already receiving a lot of great feedback and ideas.

Faster Streaming Data
A lot of our users were capturing crypto data to a database, then polling that database. We want to remove that step so Pulse is faster and simpler. The first step is releasing our Binance Streaming Connection. In addition to our existing kdb streaming connection, we are trialling Websockets and Kafka. If this is something that interests you , please get in touch.

QuestDB – Review 2023

Our latest product Pulse is for displaying real-time interactive data direct from any database. To get most benefit, the underlying databases need to be fast (<200ms queries). For our purposes databases fall into 2 categories:

  1. Really really fast, can handle queries every 200ms or less and seamlessly show data scrolling in
  2. All Other Databases. The 95%+.

It’s very exciting when we find a new database that meets that speed requirement. I went to the website, downloaded QuestDB and ran it. Coming from kdb+ imagine my excitement at seeing this UI:
QuestDB console

Good News:

  • A very tiny download (7MB .jar file)
  • There’s a free open source version
  • They are focussed on time-series queries
  • Did I mention it’s fast


I wanted to take it for a spin and to test the full ingestion->store->query cycle. So I decided to prototype a crypto dashboard. Consume data from various exchanges and produce a dashboard of latest prices, trades and a nice bid/ask graph as shown below.

questdb database cryto dashboard
Good Points

  • It simply worked.
  • QuestDB chose to be PostgreSQL wire and query compatible. A great technical choice as:
    • It will work with many tools including Pulse without complication
    • Many people already know SQL. I’ve been teaching q/kdb for years and when people learn it, you can use it for absolutely amazing things that standard SQL is terrible at. However most people do not reach that level of expertise. By using standard SQL more people can reuse their existing knowledge.
  • They then added Time-series specific extensions ontop for querying, including:
    • Latest on” – that’s equivalent to kdbs “last by”. It’s used to generate the “latest prices” table in the dashboard with a 1/5/15 minute lag.
    • ASOF Joins
  • QuestDB can automatically create tables when you first send data, there’s no need to send “Create Table …”. This was useful when I was tweaking the data layout from the crypto feeds.
  • At parts my SQL was rusty and I asked for help on their slack channel. Within an hour I got helpful responses to both questions.

Within a very short time, I managed to get the database populated and the dashboard live running. This is the first in a long time that a database has gotten me excited. It seems these guys are trying to solve the same user problems and ideas that I’ve seen everywhere. There were however some significant feature gaps.

Feature Gaps

  • No nested arrays. If I want to store bid/asks, I can only currently do it with columns bid1/bid2/bid3, no arbitrary length arrays.
  • Very limited window analytics. Other than “LATEST ON” QuestDB won’t let me perform analysis within that time window or within arrays in general.
  • I really missed my
    `time xasc (uj/)(table1;table2)

    pattern for combining multiple tables into one. For the graph I had to use a lengthy SQL UNION.
    In general kdb+ has array types and amazingly lets you use all the same functions that work on columns on nested structures. I missed that power.

  • No security on connections. It seems security integration will be an enterprise feature.

Open Source Alternative to kdb+ ?

Overall I would say not yet but they seem to be aiming at a similar market and they are moving fast.

QuestDB Database Structure

In fact, if you look at their architecture on the right, it’s obvious some of their team have used kdb+. Data is partitioned on date, with a separate folder per table and a column per file. Data is mapped in when read and appended when new data arrives.

In some ways this architecture predates kdb+ and originates from APL. It’s good to see new entrants like QuestDB and apache arrow pick up these ideas, make them their own and take them to new heights. I think kdb+ and q are excellent, I was always frustrated that it has remained niche while inferior technical solutions became massively popular, if QuestDB can take time-series databases and good technical ideas to new audiences, I wish them the best of luck!

Please leave any of your thoughts or comments below as I would love to hear what others think.

If you want to see how to setup QuestDB and a crypto dashboard yourself, we have a video tutorial:

 

Support for 30+ databases added to qStudio and Pulse.

Support for 30+ databases has now been added to both qStudio and Pulse.
Clickhouse, Redis, MongoDB, Timescale, DuckDB, TDEngine and the full list shown below are all now supported.

// Supports Every Popular Database

Pulse is being used successfully to deliver data apps including TCA, algo controls,  trade blotters and various other financial analytics. Our users wanted to see all of their data in one place without the cost of duplication. Today we released support for 30+ databases.

“My market data travels over ZeroMQ, is cached in Reddit and stored into QuestDB. While static security data is in SQL server. With this change to Pulse I can view all my data easily in one place.” – Mark – Platform Lead at Crypto Algo Trading Firm.

// Highlighted Partners

In particular we have worked closely with chosen vendors to ensure compatibility.
A number of vendors have tested the system and documented setup on their own websites:

  • TDEngine –  Open-source time series database purpose-built for the IoT (Internet of Things).
  • QuestDB –  Open source time-series database with a similar architecture to kdb+ that supports last-by and asof joins. See our crypto Pulse demo.
  • TimeScale – PostgreSQL++ for time series and events, engineered for speed.
  • ClickHouse and DuckDBWere tested by members of their community and a number of improvements made.

// The Big Picture

qStudio supported databases

Download Pulse 1.36

Download qStudio 2.52

The Ultimate Pivot table

Over the last few months, I’ve discussed grid components, aggregating and pivoting with a lot of people. You would not believe how much users want to see a good grid component that allows drill down and how strongly they hold opinions on certain solutions. I have examined a lot of existing solutions, everything from excel, to powerBI, Oracle, DuckDB, hypertree, grafana, tableau……. I think I’m beginning to converge these ideas and requests into a pivot table that will be a good solution for our users:

  • Like all of our work, it should be really really fast
  • It must work with Big Data
  • It should be Friendly
  • It must allow changing aggregations – e.g. Group by exchange OR group by exchange and sym
  • Allow pivoting some calculations – from one column to a breakdown in separate columns
  • It must work for all databases.

Well now the proposed interface looks like this:

Pulse - Pivot

A lot of the functionality inspiration should be credited to Stevan Apter and HyperTree. Ryan had seen HyperTree and loved the functionality and beautiful kdb only implementation. The challenge was to allow similar functionality for all databases while making it more accessible. We now have a working demo version.

If you love pivot tables and have never got to see your dream grid component come to fruition, we want to build it, so get in touch.

Kx Con 2023 – Emerging kdb Trends

I won’t go through the full list of great presentations as Gary Davies has that excellently covered but I will highlight some trends I saw at kxcon 2023:

  • APIs are powerful abstractions that users need and love
    • Erin Stanton (Virtu) – Brought massive amounts of energy to her presentation, showing how accessing powerful “getData” APIs from python->kdb allowed Erin to run machine learning models in minutes rather than hours. As a python enthusiast she was very happy to use the power of kdb without having to know it well. Erin demonstrated an easily browsable web interface that allowed data discovery, provided documentation and could be used to run live data queries.
    • Alex (TD) – Similarly discussed how his team all used python wrapped APIs to allow sharing smart query defaults such as ignoring weekend data to prevent less sophisticated users from shooting themselves in the foot.
    • Igor (Pimco) – Mentioned the power of APIs and how hiding the tables allowed changing implementations later.
  • Python is super popular
    • Citadel, Alex (TD), Erin (Virtu), Nick Psaris, Rebecca (Inqdata) all included significant python demonstrations.
    • Citadel – Has built an improved Python/Pykx process that uses a proxy thread to subscribe and publish updates extremely fast. They are using this as part of a framework to allow quants to publish data and construct DAGs (directed acyclic graphs) of calculations to produce analytics.

Why users love APIs and Python?

Alex Donohue’s presentation was packed with years of condensed knowledge , including the excellent diagram below showing typical user expertise.  Notice as it transitions from backend kdb developers to frontend business users:

  • The level of kdb expertise drops off quickly (non-linear)
  • The data engineers and quants know much more python than kdb
  • I would like to add one more graph, showing business / domain knowledge.
    That graph would be low for many kdb developers but higher for sales and trading.

Looking at it this way, makes it clear we need to provide APIs as users want to express queries in their own language.

Python Kdb Users

Other Patterns at kx Con:

  • Small Teams can really deliver with kdb – Numerous times we heard how a small team used kdb to deploy a full solution quicker, that scaled better and ran faster than all alternatives.
    • But costly per CPU licensing can be restricting to those teams.
  • ChatGPT  is everywhere – Rebeccas QuBot chatbot and Aaron Davies presentations demoed GPTs.

 

 

Official KX announcements 2023

  1. kdb now on AWS – kdb as a fully managed service under finspace – The website says available in June but at least one big investment bank is trialling it and having talked to  a representative at AWS a significant amount of work and effort has gone into this. This is great to see. I think for the future of KX this needs to work. It doesn’t make sense for every firm to reinvent the wheel, banks could afford to do it but smaller firms cannot.
  2. kdb.ai – seems to be repositioning kdb as a vector database for AI – currently it’s a few blog posts, whether there’s a real product or it’s to ride the AI hype train we will have to wait and see. Given the hype other inferior databases have received in the past, kdb deserves some attention.
  3. Run q code on Snowflake – Snowflake is a column oriented database that only runs in the cloud and uses a central storage with compute nodes to service SQL queries. They provide snowpark that allows running java, python and now q close to the data. I’m unsure who the target of this is, many users struggle to fully understand one database without inception.

snowflake kdb

 

Additionally kx:

  • announced PyKX will be open source?? (exact details to be confirmed, as repo is not available today 2023-05-21)
    • PyKX may add the ability to act as a very fast event processor
  • Announced improved vs code support will arrive shortly
  • The Core team demoed some really cool functionality, I’m not sure it was all meant to be public so I will just say some parts were similar to Destructuring Assignment in javascript.

Pulse 1.0 – Build interactive data tools fast

Are you a Quant or kdb Developer struggling to get the UI team to work on your app?
With Pulse, you can now build interactive data tools fast by yourself.

Pulse version 1.0 is available to download  now and allows you to build real-time interactive data apps, free for up to 3 users.

After an intense year working closely with groups of quants and data analysts, with releases every week, Pulse has reached a major milestone.
Pulse 1.0 is being used successfully to deliver data apps including:

  • TCA – Transaction Cost Analysis
  • Algo start/stop and limit controls
  • Trade Blotters that update in real-time
  • Live Price Charts

Find Out More

Pulse now provides:

A massive thanks to the many beta testers, early adopters, users and companies that invested in and purchased Pulse.

Particular thanks to Rahul, Ruairi, Ian, Steve, Chris, WooiKent, Franco, Palmit, JP, JD, PN, SG, JM, KF, AR, MC, JC, CA, SS.  Thanks for raising numerous feature requests and providing excellent feedback that helped make Pulse what it is today.

If you want to hear more, join one of our scheduled demos or contact us.

If you tried an old version or if you have never tried Pulse:

Download Pulse Now

qStudio – 2.05 – Dark Theme and High DPI Support

qStudio release 2.05 added:

  1. Mac / Intellij / Flat / Material Theme support
  2. Inlcuding 20+ Intellij and 20+ Material Themes builtin
  3. Scaling font size in settings increases code size and all font sizes throughout the UI
  4. Jetbrains Mono is now bundled as the default Font for development

Notice Also

  • The menu bar is now integrated with the title on platforms with that enabled (Windows 10/11)
  • File chooser now includes shortcuts to popular locations
  • Native window decorations on Windows 10 – Snapping / Shadows / etc.

 

qStudio Dark Theme

qStudio Dark Theme

qStudio Theme Settings

qStudio Light Mac Theme

qStudio Light Mac Theme

Pulse 0.15.4 Released – Linked Tables/Charts

Linked Tables and Charts

Since our last blogged release our biggest new feature is Linked Tables and Charts.
When a user clicks on a table or chart, it populates variables that can be used within other charts and tables.
One very creative user already used this to create tableA that when clicked populates tableB, that when click populates tableC and so on 4 levels deep.
Click the image below to see details and to find a tutorial video.

Clicking Table

Stability

With increasing users comes more edge cases that are hard to predict in advance. We’ve invested a lot in stability in the last 2 months, not all of which will be immediately visible. One hotspot involved a number of issues related to websockets including internal firewall rules, cloudflare websocket timeouts and slow subscribers. One of the more interesting changes was introducing a heartbeat on the websocket to prevent timeouts, we then reused that heartbeat to detect slow subscribers for example when someone moves a tab to the background or minimizes their browser. We now smartly throttle back their querying until they catch up or bring the browser back to being visibile. We’ve addressed all known issues and added a number of stress testing test runs to ensure they always continue to work in future.

REST API

Lastly, some of our advanced kdb users spin up dynamic processes for users. They wanted to make those servers available in Pulse.
We’ve added a REST API to allow setting servers dynamically using a API keys.