Archive for the 'kdb+' Category

kdb 5.0 – The Roadmap Ahead

kdb 5.0In 1998 kdb+ was released and changed the finance database industry. We want to do it again.

Today we are releasing kdb+ 5.0 that Works Easily for Everyone, Everywhere, with Everything.

  • A Data Platform that Easily Works for
  • Everyone – Is the most user friendly q ever
  • Everywhere – Finance and beyond
  • Works with Everything
    • Works with every major database tool seamlessly.
    • Interoperates with R/Python and almost every major data tool using high speed standards

The Past – What we have done

Purpose of MS-DOS in Windows 95... - BetaArchive

15 years ago we had a product that was light years ahead of our competition. When you download q today it looks fundamentally similar to how it looked then. Users are presented with a bare q prompt and left to create a tickerplant, a framework and various parts themselves to get real work done.

The landscape has changed and we need to change faster with it. Today we address that. How?

 

Computer developer with glasses and colorful jumper sitting on a trading floor amongst finance people with grey business suits. Cartoon Lego style.

1. We are going to listen and embed ourselves with customers. Pierre and Oleg have been sitting and working with kdb teams at every major bank and hedge fund. They have seen the problems that are being solved, what amazing work those teams have done and where we can improve the core to help them.

2. We are working with the community. Data Intellect invented the marvellous Torq framework, Jo Shinonome has created Kola, Daniel Nugent wrote a wonderful testing framework and numerous others have written useful q modules. They’ve written some great useful components and provided us with lots of insight.

3. We are learning from the competition. Andrew and Ashok have gone round every database and technology similar to ours and examined their strengths and weaknesses. They coded on each and have found some amazing parts but going further they have looked at how those businesses operate and how they attract users.

 

The Past

2347: Dependency - explain xkcd

Previously. Someone downloaded kdb then needed to email us to use commercially and wait months for their company to negotiate a contract.

Previously. Someone starting with kdb has to recreate a lot of the framework work teams in banks have done and they have to discover and adapt the wonderful work the community has done. We want to unleash that creativity.

Previously. Someone trying to use kdb with tableau, pulse, java, c# has to learn our own driver and struggle to get it to communicate.

Previously. Someone trying to write queries has to write qSQL.

Today we are releasing an amazing version of kdb+ that Works Easily for Everyone, Everywhere with Everything.

Everyone = Modules

Q&A: NASA's New Spaceship - NASAToday: We are revealing a Module Framework built into kdb+.  This is going to make it easier for everyone to get started.
Bringing the current enterprise quality code to everyone AND enabling existing community contributions to be reused easily.

The great news is, we’ve worked with partners to already have production quality modules available from day zero:

  • Torq – from Data Intellect
  • qSpec – Testing framework from Daniel Nugent
  • QML – q math library – by Andrey Zholos
  • qTips – analytics library from Nick Psaris
  • S3 – querying from KX
import `qml
import `:https://github.com/nugend/qspec as qspec
import `torq/utils
import `log
q).qml.nicdf .25 .5 .975
-0.6744898 0 1.959964

The framework is documented and public, so you can even load modules from github or your own git URL. (This has required making namespaces stricter to prevent one module from being able to affect another. No more IPC vs local loading oddities). Kdb now ships with a packaging tool called qpm based on concepts similar to NPM.

This will allow both KX and the community to experiment in modules and if successful to integrate those libraries into core.
It will allow you to get up and running with kdb+ faster, at less cost and receive production quality maintenance and feature updates for larger parts of your stack.

Everyone = SQL = Becoming as SQL compatible as possible.

Big_Data

Before – piv:{…….}    ij  -100 sublist.

  1. Example: Select *
  2. Example: Select * from t inner join v LIMIT 50
  3. Example: Pivot using duckdb notation
  4. Example: sums, prods, finance functions.
  5. Query it as if standard postresql database – The old driver is loadable via module.
  6. Partitioned databases now all “date=…” to be placed anywhere in the query. If it’s not a nice clear error message is sent.

q)select * from partitionedtable where (price<10) AND (date=.z.d)
q)PIVOT Cities ON Year USING first Population as POP,Population as P
Country	Name	    | 2000_POP	2000_P	2010_POP	2010_P	2020_POP	2020_P
--------------------|----------------------------------------------------------------------
NL	Amsterdam     | 1005	[1005]	1065	[1065]	1158	[1158]
US	Seattle	      | 564	[564]	608	[608]	738	[738]
US	New York City | 8015	[8015]	8175	[8175]	8772	[8772]

 

With Everything = Postgres Wire Compatible

We’ve listened to user problems with ODBC, tableau, kx drivers over the years and we are now bundling pgwire compatibility within the default kdb engine.
Anything that bundles a postgres driver will now work with kdb+.

With Everything = PyArrow + Parquet

Select from and save to a wide range of open standards: parquet, arrow, delta lake, iceberg.

q)select * from file.parquet
q)select * from s3://blah.com/foo
q)select * from http://homer.internal/data.csv
q)`:asd.parquet 0: table
`:asd.parquet

 

Type Hints

func:{ [argA;argB] if[not -6h=type argA;'wrongType]  if[not -9h=type argB;'wrongTypeB]  }
/ now
func:{ [argA:int; argB:real] }

 

This will provide: runtime checking, optimization of code and we’ve worked with qStudio and vs code to automate checks in the UI.

Previously

You had to spend months getting kdb+, then setting it up and building a platform, integrating it with other systems, finding experts.

Today

Download, reuse the existing modules, it works with all existing tools, and the greater SQL and typing support allows more people to safely run queries.

Works Easily for Everyone, Everywhere with Everything.

  1. Everyone = friendlier SQL, type hints, more functions builtin including PIVOT.
  2. With Everything = S3 / Parquet / HTTP / Postresql wire compatible.

With modules to allowing greater community contribution and reuse.

One Last Thing: Everywhere = We are releasing the 32 bit version of q FREE for all usages including commercial.

Disclaimer: The above is entirely fictional based on some wishes of the author, no proprietary information is known nor being shared. If you like the ideas let KX know. If you dislike the ideas, let me know and post your thoughts for improvement.

The Future of kdb+?

It’s been 2 years since I worked full time in kdb+ but people seem to always want to talk to me about kdb+ and where I think it’s going, so to save rehashing the same debates I’m going to put it here and refer to it in future. Please leave a comment if you want and I will reply.

Let’s first look at the use cases for kdb+, consider the alternatives, then which I think will win for each use-case and why.

Use Cases

A. Historical market data storage and analysis. – e.g. MS Horizon, Citi CloudKDB, UBS Krypton (3 I worked on).
B. Local quant analysis – e.g. Liquidity analysis, PnL analysis, profitability per client.
C. Real-time Streaming Calcuation Engines – e.g. Streaming VWAP, Streaming TCA…
D. Distributed Computing – e.g. Margin calculations for stock portfolios or risk analysis. Spread data out, perform costly calcs, recombine.

Alternatives

Historical Market Data – kdb+ Alternatives

A large number of users want to query big data to get minute bars, perform asof joins or more advanced time-series analysis.

  • New Database Technologies – Clickhouse, QuestDB.
  • Cloud Vendors – Bigquery / redshift
  • Market Data as a Service

Let me tell you three secrets, 1. Most users don’t need the “speed” of kdb+. 2. Most internal bank platforms don’t fully unleash the speed of kdb+. 3. The competitors are now fast enough. I mean clickbench are totally transparent on benchmarking..

Likely Outcome: – Kdb+ can hold their existing clients but haven’t and won’t get the 2nd tier firms as they either want cloud native or something else. The previous major customers for this had to invest heavily to build their own platform. As far as I’m hearing the kdb cloud platform still needs work.

Local Quant Analysis – Alternatives

  • Python – with DuckDB
  • Python – with Polars
  • Python – with PyKX
  • Python – with dataframe/modin/….

Now I’m exaggerating slightly but the local quant analysis game is over and everyone has realised Python has won. The only question is who will provide the speedy add-on. In one corner we have widely popular free community tools that know how to generate interest at huge scale, are fast and well funded. In the other we have a niche company that never spread outside finance, wants to charge $300K to get started and has an exotic syntax.

Likely Outcome: DuckDB or Polars. Why? It’s free. People at Uni will start with it and not change. Any sensible quant currently in a firm will want to use a free tool so that they are guaranteed to be able to use similar analytics at their next firm. WIthout that ability they can only go places that have kdb+ else face losing a large percentage of their skillset.

Real-time Streaming / Distributed Computing

These were always the less popular cases for kdb+ and never the ones that “won” the contract. The ironic thing is, combining streaming with historical data in one model is kdbs largest strength. However the few times I’ve seen it done, it’s either taken someone very experienced and skillful or it has become a mess. These messes have been so bad it’s put other parts of the firm off adopting kdb+ for other use cases.

Likely Outcome: Unsure which will win but not kdb+. Kafka has won mindshare and is deployed at scale but flink/risingwave etc. are upcoming stars.

Summary

Kdb+ is an absolutely amazing technology but it’s about the same amazing today as it was 15 years ago when I started. In that time the world has moved on. The best open source companies have stolen the best kdb+ ideas:

  • Parquet/Iceberg is basically kdb+ on disk format for optimized column storage.
  • Apache Arrow – in-memory format is kdb+ in memory column format.
  • Even Kafka log/replay/ksql concept could be viewed as similar to a tplog viewed from a certain angle.
  • QuestDB / DuckDB / Clickhouse all have asof joins

Not only have the competitors learnt and taken the best parts of kdb+ but they have standardised on them. e.g. Snowflake, Dremio, Confluent, Databricks are all going to support Apache Iceberg/parquet. QuestDB / DuckDB / Python are all going to natively support parquet. This means in comparisons it’s no longer KX against one competitor, it’s KX against many competitors at once. If your data is parquet, you can run any of them against your data.

As many at KX would agree I’ve talked to them for years on issues around this and to be fair they have changed but they are not changing quick enough.
They need to do four things:

  1. Get a free version out there that can be used for many things and have an easy reasonable license for customers with less money to use.
  2. Focus on making the core product great. – For years we had Delta this and now it’s kdb.ai. In the meantime mongodb/influxdb won huge contracts with a good database alone.
  3. Reduce the steep learning curve. Make kdb+ easier to learn by even changing the language and technology if need be.
  4. You must become more popular else it’s a slow death

This is focussing on the core tech product.
Looking more widely at their financials and other huge costs/initiatives such as AI and massive marketing spending, wider changes at the firm should also be considered.

2024-08-03: This post got 10K+ views on the front page of Hacker News to see the followup discussion go here.

Author: Ryan Hamilton

 

10+ Years of kdb+

I decided to go check what KX had done with the core platform over the last 10+ years.

Did I miss anything? Thoughts?

Tech Changes:

  • 2012.05.29 – 3.0 – Huge move to 64-bit
  • 2013.06.09 – 3.1 – Improved performance / parallel
  • 2014.08.22 – 3.2 – Added JSON / Websocket
  • 2015.06.01 – 3.3 – Improved performance / parallel
  • 2016.05.31 – 3.4 – SSL/TLS Security. Improved performance / IPC.
  • 2017.03.15 – 3.5 – Improved performance / parallel. Socket sharding. Debugger.
  • 2018.05.16 – 3.6 – AnyMap
  • 2020.03.17 – 4.0 – Improved performance / Limits. Multithreaded primitives. Data encryption.
  • 2024.02.13 – 4.1 – Improved performance / parallel. New dictionary syntax.

One user suggested Deferred Sync. I’m not including it as I think the implementation is bad and encourages code that would be unsafe and dangerous. To get an idea of why, see this excellent article: https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/

 

qStudio Version 3.0 Released

Powerful local qDuckDB

QStudio is the best SQL IDE for data analysis.
DuckDB is the best free database for OLAP analytical queries.
Together they provide a powerful desktop platform for data analysis.

  • Powerful Local Database – qDuckDB is now at the core of qStudio.
  • Save Locally – Convert a query result from any database to store it locally.
  • Parquet File Viewer – Click to open a parquet file on windows.
    Parquet is the stored data standard of the future and works with everything.
  • Join Data from Different Data Sources

Pulse News – April 2024

It’s approaching 2 years since we launched Pulse and it’s a privilege to continue to listen to users and improve the tool to deliver more for them. A massive thanks to everyone that has joined us on the journey. This includes our free users, who have provided a huge amount of feedback. We are commited to maintaining a free version forever.

We want to keep moving at speed to enable you to build the best data applications.
Below are some features we have added recently.

Pivot Like a Pro

Pulse enables authors to simply write a select query, then choose columns for group-by, pivot and aggregation. Users can then change the pivoted columns to get different views of the data. The really technical cool part is:

  • Unlike other platforms, Pulse does not attempt to pull back all data.
  • All aggregation of data is performed on the SQL server, making it really fast.
  • For kdb+, aggregation and pivoting occur on the kdb+ end using the common piv function.

 

Caching for Faster Results and Less load

  • As we have deployed Pulse at larger firms with more users, the backend databases began to become a bottleneck.
  • Polling queries to data sources can now be cached and results reused within the time intervals selected.
  • This means whether there are 1, 10 or 100 users looking at the same dashboard with the same variables, it will only query once, not separately for every user.

What Changed? Diff.

  • Pulse has allowed users to open or restore previous versions since version 1.
  • Now you can view a diff to see which code or query changed.
  • Next we want to add support for file/git based deployment.
  • If this is something that interests you, add a note to the issue.

 

Tabs and Sheets
Pulse now supports both a tabbed interface and top level sheets.

kdb+ Community News – March 2024

  • FD / KX / MRP to be split into three companies – BBC News
  • DefconQ is a new kdb+ blog by Alexander Unterrainer.
  • Kdb+ linkedin poll results are complete and 44% of users wanted easier / cheaper licensing.

qStudio – You must update this year

The java core feature qStudio relied on for license key authentication has been removed in java 17. If you are using a version of qStudio lower than 2.53 you MUST upgrade this year. Download qStudio now.

  • Old license keys and old versions of qStudio cannot work on java 17+ as the core java library is removed.
  • qStudio 2.53+ released 2023-08-14 will accept both old and new license keys and work on java 8/9/11/17 …. everything.Download it now.
  • All keys provided from 2024-01-01 have been using the new key format. These keys start with QSV3.

We really wish 11+ years ago we hadn’t chosen this particular library but what can you do 🙂

Existing enterprising customers may be issued an old key for exceptional circumstances. If absolutely required, get in touch.

HTML Grid Live Update Performance

Pulse is specialized for real-time interactive data, as such it needs to be fast, very fast. When we first started building Pulse, we benchmarked all the grid components we could find and found that slick grid was just awesome, 60East did a fantastic writeup on how Slick grid compares to others. As we have added more features, e.g. column formattingrow formatting, sparklines…..it’s important to constantly monitor and test performance. We have:

  • Automated tests that check the visual output is correct.
  • Throughput tests to check we can process data fast enough
  • Manual tests to ensure subtle human interactions work.
  • Memory leak checks as our dashboards can be very long running.

Today I wanted to highlight how our throughput tests work by looking at our grid component.

HTML Table Throughput Testing

To test throughput we:

  1. Use scenarios as close to our customers typical use cases as possible.
  2. The most common query being a medium sized scrolling trade blotters with numerical/date formatting and row highlighting.
    1. 200 rows of data, scrolling 50 rows each update.
  3. We use a subscription connection to replay and render 1000s of data points as fast as possible.

Video Demonstrating 21,781 rows being replayed as 435 snapshots taking 16 seconds = 27 Updates per second.  (European TV updates at 25 FPS).

Update: After this video we continued making improvements and with a few days more work got to 40 FPS.

Profiling Slick Grid Table

 

Breakdown

 

We then examine in detail where time is being spent. For example we:

  • Turn on/off all formatting, all rendering options.
  • Add/Remove columns
  • Change screen sizes
  • Change whether edit mode is on or table cells have been select (Off fact: selecting a cell makes the grid 30% slower to update)

Then we try to improve it!

Often this is looking at micro optimizations such as reducing the number of objects created. For example the analysis of how to format columns is only performed when columns change not when data is updated with the same schema. The really large wins tend to be optimizing for specific scenarios, e.g. a lot of our data is timestamped and received mostly in order. But those optimization are for a later post.

 

Modern Databases for Finance 2023

We just announced a unique event that gathers 4 of the newest, most advanced databases for Finance into 1 hour:

If you work on big data in Finance, this is your chance to get an overview of the rapidly changing database landscape. TimeStored will be organizing a free online presentation, each database company will present 10 minutes on what is unique to their solution. Bringing together the top new technologies together in one place.

Tuesday 24th October – 2-3PM UK

SIGN UP NOW

 

QuestDB – Review 2023

Our latest product Pulse is for displaying real-time interactive data direct from any database. To get most benefit, the underlying databases need to be fast (<200ms queries). For our purposes databases fall into 2 categories:

  1. Really really fast, can handle queries every 200ms or less and seamlessly show data scrolling in
  2. All Other Databases. The 95%+.

It’s very exciting when we find a new database that meets that speed requirement. I went to the website, downloaded QuestDB and ran it. Coming from kdb+ imagine my excitement at seeing this UI:
QuestDB console

Good News:

  • A very tiny download (7MB .jar file)
  • There’s a free open source version
  • They are focussed on time-series queries
  • Did I mention it’s fast


I wanted to take it for a spin and to test the full ingestion->store->query cycle. So I decided to prototype a crypto dashboard. Consume data from various exchanges and produce a dashboard of latest prices, trades and a nice bid/ask graph as shown below.

questdb database cryto dashboard
Good Points

  • It simply worked.
  • QuestDB chose to be PostgreSQL wire and query compatible. A great technical choice as:
    • It will work with many tools including Pulse without complication
    • Many people already know SQL. I’ve been teaching q/kdb for years and when people learn it, you can use it for absolutely amazing things that standard SQL is terrible at. However most people do not reach that level of expertise. By using standard SQL more people can reuse their existing knowledge.
  • They then added Time-series specific extensions ontop for querying, including:
    • Latest on” – that’s equivalent to kdbs “last by”. It’s used to generate the “latest prices” table in the dashboard with a 1/5/15 minute lag.
    • ASOF Joins
  • QuestDB can automatically create tables when you first send data, there’s no need to send “Create Table …”. This was useful when I was tweaking the data layout from the crypto feeds.
  • At parts my SQL was rusty and I asked for help on their slack channel. Within an hour I got helpful responses to both questions.

Within a very short time, I managed to get the database populated and the dashboard live running. This is the first in a long time that a database has gotten me excited. It seems these guys are trying to solve the same user problems and ideas that I’ve seen everywhere. There were however some significant feature gaps.

Feature Gaps

  • No nested arrays. If I want to store bid/asks, I can only currently do it with columns bid1/bid2/bid3, no arbitrary length arrays.
  • Very limited window analytics. Other than “LATEST ON” QuestDB won’t let me perform analysis within that time window or within arrays in general.
  • I really missed my
    `time xasc (uj/)(table1;table2)

    pattern for combining multiple tables into one. For the graph I had to use a lengthy SQL UNION.
    In general kdb+ has array types and amazingly lets you use all the same functions that work on columns on nested structures. I missed that power.

  • No security on connections. It seems security integration will be an enterprise feature.

Open Source Alternative to kdb+ ?

Overall I would say not yet but they seem to be aiming at a similar market and they are moving fast.

QuestDB Database Structure

In fact, if you look at their architecture on the right, it’s obvious some of their team have used kdb+. Data is partitioned on date, with a separate folder per table and a column per file. Data is mapped in when read and appended when new data arrives.

In some ways this architecture predates kdb+ and originates from APL. It’s good to see new entrants like QuestDB and apache arrow pick up these ideas, make them their own and take them to new heights. I think kdb+ and q are excellent, I was always frustrated that it has remained niche while inferior technical solutions became massively popular, if QuestDB can take time-series databases and good technical ideas to new audiences, I wish them the best of luck!

Please leave any of your thoughts or comments below as I would love to hear what others think.

If you want to see how to setup QuestDB and a crypto dashboard yourself, we have a video tutorial: