Archive for the 'timeseries' Category

New Streaming Order Book Depth-map

We want to be the best finance streaming visualization solution. To achieve that, we can’t just use off the shelf parts, we have built our own market data order book visualization component from scratch, it’s only dependency is webgl. We call it DepthMap. It plots price levels over time, with the shading being the amount of liquidity at that level. It’s experimental right now but we are already receiving a lot of great feedback and ideas.

Faster Streaming Data
A lot of our users were capturing crypto data to a database, then polling that database. We want to remove that step so Pulse is faster and simpler. The first step is releasing our Binance Streaming Connection. In addition to our existing kdb streaming connection, we are trialling Websockets and Kafka. If this is something that interests you , please get in touch.

QuestDB – Review 2023

Our latest product Pulse is for displaying real-time interactive data direct from any database. To get most benefit, the underlying databases need to be fast (<200ms queries). For our purposes databases fall into 2 categories:

  1. Really really fast, can handle queries every 200ms or less and seamlessly show data scrolling in
  2. All Other Databases. The 95%+.

It’s very exciting when we find a new database that meets that speed requirement. I went to the website, downloaded QuestDB and ran it. Coming from kdb+ imagine my excitement at seeing this UI:
QuestDB console

Good News:

  • A very tiny download (7MB .jar file)
  • There’s a free open source version
  • They are focussed on time-series queries
  • Did I mention it’s fast


I wanted to take it for a spin and to test the full ingestion->store->query cycle. So I decided to prototype a crypto dashboard. Consume data from various exchanges and produce a dashboard of latest prices, trades and a nice bid/ask graph as shown below.

questdb database cryto dashboard
Good Points

  • It simply worked.
  • QuestDB chose to be PostgreSQL wire and query compatible. A great technical choice as:
    • It will work with many tools including Pulse without complication
    • Many people already know SQL. I’ve been teaching q/kdb for years and when people learn it, you can use it for absolutely amazing things that standard SQL is terrible at. However most people do not reach that level of expertise. By using standard SQL more people can reuse their existing knowledge.
  • They then added Time-series specific extensions ontop for querying, including:
    • Latest on” – that’s equivalent to kdbs “last by”. It’s used to generate the “latest prices” table in the dashboard with a 1/5/15 minute lag.
    • ASOF Joins
  • QuestDB can automatically create tables when you first send data, there’s no need to send “Create Table …”. This was useful when I was tweaking the data layout from the crypto feeds.
  • At parts my SQL was rusty and I asked for help on their slack channel. Within an hour I got helpful responses to both questions.

Within a very short time, I managed to get the database populated and the dashboard live running. This is the first in a long time that a database has gotten me excited. It seems these guys are trying to solve the same user problems and ideas that I’ve seen everywhere. There were however some significant feature gaps.

Feature Gaps

  • No nested arrays. If I want to store bid/asks, I can only currently do it with columns bid1/bid2/bid3, no arbitrary length arrays.
  • Very limited window analytics. Other than “LATEST ON” QuestDB won’t let me perform analysis within that time window or within arrays in general.
  • I really missed my
    `time xasc (uj/)(table1;table2)

    pattern for combining multiple tables into one. For the graph I had to use a lengthy SQL UNION.
    In general kdb+ has array types and amazingly lets you use all the same functions that work on columns on nested structures. I missed that power.

  • No security on connections. It seems security integration will be an enterprise feature.

Open Source Alternative to kdb+ ?

Overall I would say not yet but they seem to be aiming at a similar market and they are moving fast.

QuestDB Database Structure

In fact, if you look at their architecture on the right, it’s obvious some of their team have used kdb+. Data is partitioned on date, with a separate folder per table and a column per file. Data is mapped in when read and appended when new data arrives.

In some ways this architecture predates kdb+ and originates from APL. It’s good to see new entrants like QuestDB and apache arrow pick up these ideas, make them their own and take them to new heights. I think kdb+ and q are excellent, I was always frustrated that it has remained niche while inferior technical solutions became massively popular, if QuestDB can take time-series databases and good technical ideas to new audiences, I wish them the best of luck!

Please leave any of your thoughts or comments below as I would love to hear what others think.

If you want to see how to setup QuestDB and a crypto dashboard yourself, we have a video tutorial:

 

The Ultimate Pivot table

Over the last few months, I’ve discussed grid components, aggregating and pivoting with a lot of people. You would not believe how much users want to see a good grid component that allows drill down and how strongly they hold opinions on certain solutions. I have examined a lot of existing solutions, everything from excel, to powerBI, Oracle, DuckDB, hypertree, grafana, tableau……. I think I’m beginning to converge these ideas and requests into a pivot table that will be a good solution for our users:

  • Like all of our work, it should be really really fast
  • It must work with Big Data
  • It should be Friendly
  • It must allow changing aggregations – e.g. Group by exchange OR group by exchange and sym
  • Allow pivoting some calculations – from one column to a breakdown in separate columns
  • It must work for all databases.

Well now the proposed interface looks like this:

Pulse - Pivot

A lot of the functionality inspiration should be credited to Stevan Apter and HyperTree. Ryan had seen HyperTree and loved the functionality and beautiful kdb only implementation. The challenge was to allow similar functionality for all databases while making it more accessible. We now have a working demo version.

If you love pivot tables and have never got to see your dream grid component come to fruition, we want to build it, so get in touch.

qStudio adds Step Plots for displaying price Steps.

Our standard time-series graph interpolates between points. When the data you are displaying is price points, it’s not really valid to always interpolate. If the price was 0.40 at 2pm then 0.46 at 3pm, that does not mean it could be interpreted as 0.43 at 2.30pm. Amazingly till now, sqlDashboards had no sensible way to show taht data. Now we do:

For comparison here is the same data as a time-series graph:

The step-plot is usable for time-series and numerical XY data series. The format is detailed on the usual chart format pages.

KX closes down commercial 32 bit kdb, open alternatives?

Previously on our blog we had a lively debate about a possibly Open Sourced kdb+ , unfortunately kx now seems to be moving the opposite direction. In a recent announcement they are now restricting “32-bit kdb+ for non-commercial use only”. The timing is particularly unfortunate as:

Alternative (far less enterprise proven) solutions are available:

  • MAN AHL have released Arctic an open source Market Data platform based on python and MongoDB
  • Kerf Database – A DB aimed at the same market as kdb has now partnered with Briarcliff-Hall and is making greater sales inroads

This renewed interest in kdb alternatives hasn’t so far delivered a kdb+ killer but I fear in time it will.

Smart Meter Data Analytics Benchmark – Open vs Closed

Benchmarking Smart Meter Data Analytics – I got forwarded this interesting paper that compares how quickly smart meter data can be analysed using

  1. a Relational Database
  2. Matlab
  3. An in-memory Column-Oriented database
  4. Two new NoSQL alternatives

Smart electricity grids, which incorporate renewable energysources such as solar and wind, and allow information sharingamong producers and consumers, are beginning to replace conventional power grids worldwide. Smart electricity meters are afundamental component of the smart grid, enabling automated collection of fine-grained (usually every 15 minutes or hourly) consumption data. This enables dynamic electricity pricing strategies,in which consumers are charged higher prices during peak timesto help reduce peak demand. Additionally, smart meter data analytics, which aims to help utilities and consumers understand electricity consumption patterns, has become an active area in researchand industry. According to a recent report, utility data analytics isalready a billion dollar market and is expected to grow to nearly 4billion dollars by year 2020

 

Open Sourced kdb+

In a world overran with open source big data solutions is kdb+ going to be left behind? I hope not…

Every few weeks someone comes to me with a big data problem often with a half-done mongoDB/NoSQL solution that they “want a little help with”. Usually after looking at it I think to myself

“I could solve this in 20 minutes with 5 lines of q code”

But how do I tell someone they should use a system which may end up costing them £1,000s per core in licensing costs. Yes it’s great that there’s a free 32-bit trial version but the risk that you may end up needing the 64-bit is too great a risk.

kdb+ vs mongoDB database popularity

Given the ever-increasing number of NoSQL solutions and in particular the rising popularity of Hadoop, R, python and MongoDB it’s not hard to see that open-source is taking over the world. Right now kdb+ still has the edge, that it’s faster, sleeker, sexier..but I don’t think that will save it in the long run. The power of open-source is that it let’s everyone contribute, witness the 100’s of libraries available for R, the 1000’s of javascript frameworks. The truly sad thing is that it’s not always the best underlying technology that wins. A 1000 amateurs creating a vibrant ecosystem of plug-ins, add-ons, tutorials… can beat other technologies through sheer force of numbers.

  • APL was a great language yet it remains relegated to history while PHP flourishes.
  • PostgreSQL was technically superior to MySQL yet MySQL is deployed everywhere

I believe kdb+ is the best solution to a large number of “big data” problems (small data to kdb+), When you stop and think, time-series data is everywhere, open sourcing kdb+ would open up entirely new sectors to kdb+ and I hope it’s a step kx take before it’s too late.

What do you think? Leave your comments below.

Pipe-lining Time Series Calculations for Cache Efficiency

I always like to investigate new technology and this week I found a nice automatic technique for improved cache use that I had previously seen some people manually write.

Consider a database query with three steps (three SQL SELECTs), some databases may pass results of each step to temporary tables in main memory. When the first step is finished, these intermediate results are passed back into CPU cache to be transformed by the second step, then back into a new temporary table in main memory, and so on.

To eliminate this back-and-forth, vector-based statistical functions can be pipelined, with the output of one function becoming input for the next, whose output feeds a third function, etc. Intermediate results stay in the pipeline inside CPU cache, with only the full result being materialized at the end.

This technology is part of ExtremeDB, they have a video that explains it well:

Time Series Calculations

Moving Averages Stock Price Example

This is what the actual code would look like to calculate the 5-day and 21-day moving averages for a stock and detect the points where the faster moving average (5-day) crosses over or under the slower moving average (21-day):

  1. Two invocations of ‘seq_window_agg_avg’ execute over the closing price sequence, ‘ClosePrice’, to obtain 5-day and 21-day moving averages.
  2. The function ‘seq_sub’ subtracts 21- from 5-day moving averages;
  3. The result “feeds” a fourth function, ‘seq_cross’, to identify where the 5- and 21-day moving averages cross.
  4. Finally, the function ‘seq_map’ maps the crossovers to the original ‘ClosePrice’ sequence, returning closing prices where the moving averages crossed.

This approach eliminates the need to create, populate and query temporary tables outside CPU cache in main memory. One “tile” of input data is processed by each invocation of ‘mco_seq_window_agg_avg_double()’, each time producing one tile of output that is passed to ‘mco_seq_sub_double()’ which, in turn, produces one tile of output that is passed as input to mco_seq_cross_double(), which produces one tile of output that is passed to mco_seq_map_double(). When the last function, mco_seq_map_double() has exhausted its input, the whole process repeats from the top to produce new tiles for additional processing.

A very cool idea!

And yes, ExtremeDB are the same guys that posted the top Stac M3 benchmark for a while (in 2012/13 I think).

Kdb qSQL vs standard SQL queries

Often at the start of one of our training courses I’m asked why banks use the Kdb Database for their tick data.

  • One well known reason is that kdb is really fast at typical financial time-series queries
    (due to Kdbs column-oriented architecture).
  • Another reason is that qSQL is extremely expressive and well suited for time-series queries.

To demonstrate this I’d like to look at three example queries, comparing qSQL to standard SQL.

SQL Queries dependent on order

From the table below we would like to find the price change between consecutive rows.

time price
07:00 0.9
08:30 1.5
09:59 1.9
10:00 2
12:00 9

q Code

In kdb qSQL this would be the extremely simple and readable code:

Standard SQL

In standard SQL there are a few methods, we can use. The simplest is if we already have a sequential id column present:

Even for this simple query our code is much longer and not as clear to read. If we hadn’t had the id column we would have needed much more code to create a temporary table with row numbers. As our queries get more complex the situation gets worse.

Select top N by category

Given a table of stock trade prices at various times today, find the top two trade prices for each ticker.

trade table
time sym price
09:00 a 80
09:03 b 10
09:05 c 30
09:10 a 85
09:20 a 75
09:30 b 13
09:40 b 14

qSQL Code

In q code this would be select 2 sublist desc price by sym from trade, anyone that has used kdb for a few days could write the query and it almost reads like english. Select the prices in descending order by sym and from those lists take the first 2 items (sublist).

SQL Code

In standard SQL a query that depends on order is much more difficult , witness the numerous online posts with people having problems: stackoverflow top 10 by category, mysql first n rows by group, MS-SQL top N by group. The nicest solution, if your database supports it, is:

The SQL version is much harder to read and will require someone with more experience to be able to write it.

Joining Records on Nearest Time

Lastly we want to consider performing a time based join. A common finance query is to find the prevailing quote for a given set of trades. i.e. Given the following trade table t and quote table q shown below, we want to find the prevailing quote before or at the exact time of each trade.

trades t
time sym price size
07:00 a 0.9 100
08:30 a 1.5 700
09:59 a 1.9 200
10:00 a 2 400
12:00 b 9 500
16:00 a 10 800
quotes q
time sym bid
08:00 a 1
09:00 b 9
10:00 a 2
11:00 b 8
12:00 b 8.5
13:00 a 3
14:00 b 7
15:00 a 4

In qSQL this is: aj[`sym`time; t; q], which means perform an asof-join on t, looking up the nearest match from table q based on the sym and time column.

In standard SQL, again you’ll have difficulty: sql nearest date, sql closest date even just the closest lesser date isn’t elegant. One solution would be:

It’s worth pointing out this is one of the queries that is typically extremely slow (minutes) on row-oriented databases compared to column-oriented databases (at most a few seconds).

qSQL vs SQL

Looking at the simplicity of the qSQL code compared to the standard SQL code we can see how basing our database on ordered lists rather than set theory is much more suited to time-series data analysis. By being built from the ground up for ordered data and by providing special time-series based joins, kdb let’s us form these example queries using very simple expressions. Once we need to create more complex queries and nested selects, attempting to use standard SQL can quickly spiral into a verbose unmaintainable mess.

I won’t say qSQL can’t be cryptic 🙂 but for time-series queries qSQL will mostly be shorter and simpler than trying to use standard SQL.

If you think you have shorter SQL code that solves one of these examples or you are interested in one of our kdb training courses please get in touch.

RSI Relative Strength Index in Kdb

In this tutorial we are going to recreate this example RSI calculation in q, the language of the kdb database.

The relative strength index (RSI) is a technical indicator used in the analysis of financial markets. It is intended to chart the current and historical strength or weakness of a stock or market based on the closing prices of a recent trading period. The RSI is classified as a momentum oscillator, measuring the velocity and magnitude of directional price movements. Momentum is the rate of the rise or fall in price.

The RSI computes momentum as the ratio of higher closes to lower closes: stocks which have had more or stronger positive changes have a higher RSI than stocks which have had more or stronger negative changes. The RSI is most typically used on a 14 day timeframe, measured on a scale from 0 to 100, with high and low levels marked at 70 and 30, respectively. Shorter or longer timeframes are used for alternately shorter or longer outlooks. More extreme high and low levels—80 and 20, or 90 and 10—occur less frequently but indicate stronger momentum.

.

Stock Price Time-Series Data

We are going to use the following example data, you can download the csv here or the excel version here.

Date QQQQ Close Change Gain Loss Avg Gain Avg Loss RS 14-day RSI
2009-12-14 44.34              
2009-12-15 44.09 -0.25 0.00 0.25        
2009-12-16 44.15 0.06 0.06 0.00        
2009-12-17 43.61 -0.54 0.00 0.54        
2009-12-18 44.33 0.72 0.72 0.00        
2009-12-21 44.83 0.50 0.50 0.00        
2009-12-22 45.10 0.27 0.27 0.00        
2009-12-23 45.42 0.33 0.33 0.00        
2009-12-24 45.84 0.42 0.42 0.00        
2009-12-28 46.08 0.24 0.24 0.00        
2009-12-29 45.89 -0.19 0.00 0.19        
2009-12-30 46.03 0.14 0.14 0.00        
2009-12-31 45.61 -0.42 0.00 0.42        
2010-01-04 46.28 0.67 0.67 0.00     RS RSI
2010-01-05 46.28 0.00 0.00 0.00 0.24 0.10 2.39 70.53
2010-01-06 46.00 -0.28 0.00 0.28 0.22 0.11 1.97 66.32
2010-01-07 46.03 0.03 0.03 0.00 0.21 0.10 1.99 66.55
2010-01-08 46.41 0.38 0.38 0.00 0.22 0.10 2.27 69.41
2010-01-11 46.22 -0.19 0.00 0.19 0.20 0.10 1.97 66.36
2010-01-12 45.64 -0.58 0.00 0.58 0.19 0.14 1.38 57.97
2010-01-13 46.21 0.57 0.57 0.00 0.22 0.13 1.70 62.93
2010-01-14 46.25 0.04 0.04 0.00 0.20 0.12 1.72 63.26
2010-01-15 45.71 -0.54 0.00 0.54 0.19 0.15 1.28 56.06
2010-01-19 46.45 0.74 0.74 0.00 0.23 0.14 1.66 62.38
2010-01-20 45.78 -0.67 0.00 0.67 0.21 0.18 1.21 54.71
2010-01-21 45.35 -0.43 0.00 0.43 0.20 0.19 1.02 50.42
2010-01-22 44.03 -1.33 0.00 1.33 0.18 0.27 0.67 39.99
2010-01-25 44.18 0.15 0.15 0.00 0.18 0.26 0.71 41.46
2010-01-26 44.22 0.04 0.04 0.00 0.17 0.24 0.72 41.87
2010-01-27 44.57 0.35 0.35 0.00 0.18 0.22 0.83 45.46
2010-01-28 43.42 -1.15 0.00 1.15 0.17 0.29 0.59 37.30
2010-01-29 42.66 -0.76 0.00 0.76 0.16 0.32 0.49 33.08
2010-02-01 43.13 0.47 0.47 0.00 0.18 0.30 0.61 37.77

RSI Formulas

The formulas behind the calculations used in the table are:

  • First Average Gain = Sum of Gains over the past 14 periods / 14.
  • First Average Loss = Sum of Losses over the past 14 periods / 14

All subsequent gains than the first use the following:

  • Average Gain = [(previous Average Gain) x 13 + current Gain] / 14.
  • Average Loss = [(previous Average Loss) x 13 + current Loss] / 14.
  • RS = Average Gain / Average Loss
  • RSI = 100 – 100/(1+RS)

Writing the Analytic in q

We can load our data (rsi.csv) in then apply updates at each step to recreate the table above:

Code kindly donated by Terry Lynch

Rather than create all the intermediate columns we can create a calcRsi function like so:

Finally we can visualize our data using the charting functionality of qStudio (an IDE for kdb):

RSI Relative Strength Index stock chart for QQQQ created using qStudio

RSI Relative Strength Index stock chart for QQQQ created using qStudio

Or to plot RSI by itself (similar to original article

RSI Line Graph in qStudio

RSI Line Graph in qStudio

Writing kdb analytics such as Relative Strength Index is covered in our kdb training course, we offer both public kdb training courses in New York, London, Asia and on-site kdb courses at your offices, tailored to your needs.