Splayed Table 10x Faster Speed Trick

Let's consider a fairly standard splayed table and then how we can speed up querying it. The table:

q)n:10200300
q)t:([] a:til n; s:n?10000?`3; ex:n?4?`3; size:n?100; price:n?1000.)
q)`:t/ set .Q.en[`:.; t]   / save t as splayed
`:t/
q)\l .
q)t
a s   ex  size price
-----------------------
0 oic ghj 46   387.7701
1 chf hgn 4    15.4704
2 opf hgn 15   654.2458
3 gim hgn 83   453.4827
4 dhe hgn 91   921.5253
5 dhl ghj 87   78.32508
6 adb ghj 4    56.29537
7 dfb boh 3    17.1134
..

The query we want to optimizes our system for is findRows:

q)findRows:{ [syms; exchanges] select from t where ex in exchanges,s in syms }
q)findRows[ `oic`chf; `hgn]
a s   ex  size price
-----------------------
0 oic ghj 46   387.7701
1 chf hgn 4    15.4704
2 opf hgn 15   654.2458
3 gim hgn 83   453.4827
4 dhe hgn 91   921.5253
5 dhl ghj 87   78.32508
6 adb ghj 4    56.29537
7 dfb boh 3    17.1134
..

First Optimize the Query

The first where clause of a select query should always be the most restrictive [SQL Notes] i.e. the one that returns the fewest rows. The s column has many more unique items than the ex column, therefore filtering on it first will restrict the results faster. Let's consider the timings:

 q)\t:10 r1:{ [syms; exchanges] select from t where ex in exchanges,s in syms }[ `oic`chf; `hgn]
1821
q)\t:10 r2:{ [syms; exchanges] select from t where s in syms,ex in exchanges }[ `oic`chf; `hgn]
1452
q)r1~r2 / results match
1b

Notice here I am using the \t function which runs a query and returns the time it take to ran in milliseconds. If I supply a number \t:n, then the query is ran n times and the total time returned. This is explained more in our query timing tutorial

Add on-disk Attributes?

The next step we can consider taking is adding on-disk attributes to the columns used in the where clause. We will add a `g# attribute to both columns and use getFileSizes to show the space required on disk

q)getFileSizes:{{hcount ` sv x } each a!a:x,/:key x}
q)getFileSizes `:t
:t .d   | 26
:t a    | 81602416
:t ex   | 40801224
:t price| 81602416
:t s    | 40801224
:t size | 81602416
before:getFileSizes `:t

q) / apply the attributes on disk
q){x set `g#get x}`:t/s
`:t/s
q){x set `g#get x}`:t/ex
`:t/ex
q)meta t
c    | t f a
-----| -----
a    | j
s    | s   g
ex   | s   g
size | j
price| f

q) Faster Queries
q)\t:10 r3:{ [syms; exchanges] select from t where s in syms,ex in exchanges }[ `oic`chf; `hgn]
235
q)r3~r2
1b

q) / Much larger disk space required
q)after:getFileSizes `:t
q)after
:t .d   | 26
:t a    | 81602416
:t ex   | 122403792
:t price| 81602416
:t s    | 122515512
:t size | 81602416
q)after%before
:t .d   | 1
:t a    | 1
:t ex   | 3.000003
:t price| 1
:t s    | 3.002741
:t size | 1

We've now made the same query, 5 times faster using on-disk attributes with our splayed table but can we do better..?

Use in memory-attributes

Rather than store the attributes on-disk, requiring disk space and slowing access we could store them in-memory. This assumes that we have sufficient RAM which for a splayed table can quite often be the case.

q) / remove the on-disk attributes
q){x set `#get x}`:t/s
`:t/s
q){x set `#get x}`:t/ex
`:t/ex
q)before=getFileSizes `:t / restored to original size
:t .d   | 1
:t a    | 1
:t ex   | 1
:t price| 1
:t s    | 1
:t size | 1

q)update `g#s,`g#ex from `t   / apply in-memory attributes
`t

q) / even faster with the same result
q)\t:10 r4:{ [syms; exchanges] select from t where s in syms,ex in exchanges }[ `oic`chf; `hgn]
16
q)r4~r3
1b

We've successfully gone from 1821 milliseconds to run our query, to 16 milliseconds to run the same query. We hope you found this useful. This tutorial is a small subsection of one module from our full kdb+ training courses. If you are new to kdb+, you will find our Introductory kdb+ training course the best way to master kdb+.

Splayed Table 10x Faster Speed Trick

First Optimize the Query

Add on-disk Attributes?

Use in memory-attributes

See Also

QStudio