{"id":147,"date":"2014-03-30T21:47:19","date_gmt":"2014-03-30T21:47:19","guid":{"rendered":"http:\/\/www.timestored.com\/b\/?p=147"},"modified":"2014-03-31T13:27:31","modified_gmt":"2014-03-31T13:27:31","slug":"kdb-qsql-query-vs-sql","status":"publish","type":"post","link":"https:\/\/www.timestored.com\/b\/kdb-qsql-query-vs-sql\/","title":{"rendered":"Kdb qSQL vs standard SQL queries"},"content":{"rendered":"<p>Often at the start of one of our training courses I&#8217;m asked why banks use the Kdb Database for their tick data. <\/p>\n<ul>\n<li>One well known reason is that kdb is really fast at typical financial time-series queries<br \/>\n       (due to <a href=\"\/kdb-guides\/kdb-database-intro\">Kdbs column-oriented architecture<\/a>).<\/li>\n<li>Another reason is that <strong>qSQL is extremely expressive and well suited for time-series queries<\/strong>.<\/li>\n<\/ul>\n<p>To demonstrate this I&#8217;d like to look at three example queries, comparing qSQL to standard SQL. <\/p>\n<style type=\"text\/css\">\ntable.tableizer-table {\tborder: 1px solid #CCC; font-family: Arial, Helvetica, sans-serif\tfont-size: 12px;} \n.tableizer-table td { padding: 0 4px; margin: 0; border: 1px solid #ccc;}\n.tableizer-table th { background-color: #104E8B;  color: #FFF; font-weight: bold;}\n<\/style>\n<h3>SQL Queries dependent on order<\/h3>\n<p>From the table below we would like to find the price change between consecutive rows.<\/p>\n<table class=\"nolines\">\n<tr>\n<td>\n<table class=\"tableizer-table\">\n<tr class=\"tableizer-firstrow\">\n<th>time<\/th>\n<th>price<\/th>\n<\/tr>\n<tr>\n<td>07:00<\/td>\n<td>0.9<\/td>\n<\/tr>\n<tr>\n<td>08:30<\/td>\n<td>1.5<\/td>\n<\/tr>\n<tr>\n<td>09:59<\/td>\n<td>1.9<\/td>\n<\/tr>\n<tr>\n<td>10:00<\/td>\n<td>2<\/td>\n<\/tr>\n<tr>\n<td>12:00<\/td>\n<td>9<\/td>\n<\/tr>\n<\/table>\n<\/td>\n<td>\n\r\n <textarea rows='2' cols='80' class='code' id='qcode-t-q-tab123'>\r\na:([] time:07:00 08:30 09:59 10:00 12:00; \r\n      price:0.9 1.5 1.9 2 9.)\r\n<\/textarea><script type='text\/javascript'>var myCodeMirror = CodeMirror.fromTextArea(document.getElementById('qcode-t-q-tab123'), {lineNumbers: true, matchBrackets: true, indentUnit: 4, tabMode: 'default', mode: 'text\/x-plsql', readOnly:true });<\/script>\r\n\n<\/tr>\n<\/table>\n<h4>q Code<\/h4>\n<p>In kdb qSQL this would be the extremely simple and readable code:<\/p>\n\r\n <textarea rows='2' cols='80' class='code' id='qcode-t-q-tab534'>\r\nq)update change:price-prev price from a\r\ntime  price change\r\n------------------\r\n07:00 0.9\r\n08:30 1.5   0.6\r\n09:59 1.9   0.4\r\n10:00 2     0.1\r\n12:00 9     7\r\n<\/textarea><script type='text\/javascript'>var myCodeMirror = CodeMirror.fromTextArea(document.getElementById('qcode-t-q-tab534'), {lineNumbers: true, matchBrackets: true, indentUnit: 4, tabMode: 'default', mode: 'text\/x-plsql', readOnly:true });<\/script>\r\n\n<h4>Standard SQL<\/h4>\n<p>In standard SQL there are a few methods, we can use. The simplest is if we already have a <a href=\"http:\/\/stackoverflow.com\/questions\/919136\/subtracting-one-row-of-data-from-another-in-sql\">sequential id column present<\/a>:<\/p>\n\r\n <textarea rows='2' cols='80' class='code' id='qcode-t-q-tab978'>\r\nselect a.time, a.price -(select b.price from tab b where b.id = a.id + 1), \r\n  as diff from tab a\r\n<\/textarea><script type='text\/javascript'>var myCodeMirror = CodeMirror.fromTextArea(document.getElementById('qcode-t-q-tab978'), {lineNumbers: true, matchBrackets: true, indentUnit: 4, tabMode: 'default', mode: 'text\/x-plsql', readOnly:true });<\/script>\r\n\n<p>Even for this simple query our code is much longer and not as clear to read. If we hadn&#8217;t had the id column we would have needed much more code to create a temporary table with row numbers. As our queries get more complex the situation gets worse.<\/p>\n<h3>Select top N by category<\/h3>\n<p>Given a table of stock trade prices at various times today, find the top two trade prices for each ticker.<\/p>\n<table class=\"nolines\">\n<tr>\n<td>\n<table class=\"tableizer-table\">\n<caption>trade table<\/caption>\n<tr class=\"tableizer-firstrow\">\n<th>time<\/th>\n<th>sym<\/th>\n<th>price<\/th>\n<\/tr>\n<tr>\n<td>09:00<\/td>\n<td>a<\/td>\n<td>80<\/td>\n<\/tr>\n<tr>\n<td>09:03<\/td>\n<td>b<\/td>\n<td>10<\/td>\n<\/tr>\n<tr>\n<td>09:05<\/td>\n<td>c<\/td>\n<td>30<\/td>\n<\/tr>\n<tr>\n<td>09:10<\/td>\n<td>a<\/td>\n<td>85<\/td>\n<\/tr>\n<tr>\n<td>09:20<\/td>\n<td>a<\/td>\n<td>75<\/td>\n<\/tr>\n<tr>\n<td>09:30<\/td>\n<td>b<\/td>\n<td>13<\/td>\n<\/tr>\n<tr>\n<td>09:40<\/td>\n<td>b<\/td>\n<td>14<\/td>\n<\/tr>\n<\/table>\n<\/td>\n<td>\n\r\n <textarea rows='1' cols='80' class='code' id='qcode-select-top-n-tab'>\r\ntrade:([] time:09:00+0 3 5 10 20 30 40; \r\n    sym:`a`b`c`a`a`b`b; \r\n    price:80 10 30 85 75 13 14);\r\n<\/textarea><script type='text\/javascript'>var myCodeMirror = CodeMirror.fromTextArea(document.getElementById('qcode-select-top-n-tab'), {lineNumbers: true, matchBrackets: true, indentUnit: 4, tabMode: 'default', mode: 'text\/x-plsql', readOnly:true });<\/script>\r\n\n<\/tr>\n<\/table>\n<h4>qSQL Code<\/h4>\n\r\n <textarea rows='1' cols='80' class='code' id='qcode-select-top-n'>\r\nselect 2 sublist desc price by sym from trade\r\n<\/textarea><script type='text\/javascript'>var myCodeMirror = CodeMirror.fromTextArea(document.getElementById('qcode-select-top-n'), {lineNumbers: true, matchBrackets: true, indentUnit: 4, tabMode: 'default', mode: 'text\/x-plsql', readOnly:true });<\/script>\r\n\n<p>In q code this would be <code>select 2 sublist desc price by sym from trade<\/code>, anyone that has used kdb for a few days could write the query and it almost reads like english. Select the prices in descending order by sym and from those lists take the first 2 items (sublist).<\/p>\n<h4>SQL Code<\/h4>\n<p>In standard SQL a query that depends on order is much more difficult , witness the numerous online posts with people having problems: <a href=\"http:\/\/stackoverflow.com\/questions\/176964\/select-top-10-records-for-each-category\">stackoverflow top 10 by category<\/a>, <a href=\"http:\/\/stackoverflow.com\/questions\/19168834\/mysql-select-the-first-n-rows-per-group\">mysql first n rows by group<\/a>, <a href=\"http:\/\/archive.msdn.microsoft.com\/SQLExamples\/Wiki\/View.aspx?title=SelectTopNByGroup\">MS-SQL top N by group<\/a>. The nicest solution, if your database supports it, is:<\/p>\n\r\n <textarea rows='2' cols='80' class='code' id='sqlcode-select-top-n1'>\r\nSELECT sym, price FROM (\r\n         SELECT \r\n             ROW_NUMBER() OVER ( PARTITION BY sym ORDER BY price DESC ) AS 'RowNumber', \r\n             sym, price FROM trade\r\n      ) dt WHERE RowNumber <= 2\r\n<\/textarea><script type='text\/javascript'>var myCodeMirror = CodeMirror.fromTextArea(document.getElementById('sqlcode-select-top-n1'), {lineNumbers: true, matchBrackets: true, indentUnit: 4, tabMode: 'default', mode: 'text\/x-plsql', readOnly:true });<\/script>\r\n\n<p>The SQL version is much harder to read and will require someone with more experience to be able to write it.<\/p>\n<h3>Joining Records on Nearest Time<\/h3>\n<p>Lastly we want to consider performing a time based join. A common finance query is to find the prevailing quote for a given set of trades. i.e. Given the following trade table t and quote table q shown below, we want to find the prevailing quote before or at the exact time of each trade.<\/p>\n<table class=\"nolines\">\n<tr>\n<td>\n<table class=\"tableizer-table\">\n<caption>trades t<\/caption>\n<tr class=\"tableizer-firstrow\">\n<th>time<\/th>\n<th>sym<\/th>\n<th>price<\/th>\n<th>size<\/th>\n<\/tr>\n<tr>\n<td>07:00<\/td>\n<td>a<\/td>\n<td>0.9<\/td>\n<td>100<\/td>\n<\/tr>\n<tr>\n<td>08:30<\/td>\n<td>a<\/td>\n<td>1.5<\/td>\n<td>700<\/td>\n<\/tr>\n<tr>\n<td>09:59<\/td>\n<td>a<\/td>\n<td>1.9<\/td>\n<td>200<\/td>\n<\/tr>\n<tr>\n<td>10:00<\/td>\n<td>a<\/td>\n<td>2<\/td>\n<td>400<\/td>\n<\/tr>\n<tr>\n<td>12:00<\/td>\n<td>b<\/td>\n<td>9<\/td>\n<td>500<\/td>\n<\/tr>\n<tr>\n<td>16:00<\/td>\n<td>a<\/td>\n<td>10<\/td>\n<td>800<\/td>\n<\/tr>\n<\/table>\n<\/td>\n<td>\n<table class=\"tableizer-table\">\n<caption>quotes q<\/caption>\n<tr class=\"tableizer-firstrow\">\n<th>time<\/th>\n<th>sym<\/th>\n<th>bid<\/th>\n<\/tr>\n<tr>\n<td>08:00<\/td>\n<td>a<\/td>\n<td>1<\/td>\n<\/tr>\n<tr>\n<td>09:00<\/td>\n<td>b<\/td>\n<td>9<\/td>\n<\/tr>\n<tr>\n<td>10:00<\/td>\n<td>a<\/td>\n<td>2<\/td>\n<\/tr>\n<tr>\n<td>11:00<\/td>\n<td>b<\/td>\n<td>8<\/td>\n<\/tr>\n<tr>\n<td>12:00<\/td>\n<td>b<\/td>\n<td>8.5<\/td>\n<\/tr>\n<tr>\n<td>13:00<\/td>\n<td>a<\/td>\n<td>3<\/td>\n<\/tr>\n<tr>\n<td>14:00<\/td>\n<td>b<\/td>\n<td>7<\/td>\n<\/tr>\n<tr>\n<td>15:00<\/td>\n<td>a<\/td>\n<td>4<\/td>\n<\/tr>\n<\/table>\n<\/td>\n<td>\n\r\n <textarea rows='8' cols='80' class='code' id='qcode-t-q-tab'>\r\nt:([] time:07:00 08:30 09:59 10:00 12:00 16:00; \r\n      sym:`a`a`a`a`b`a; \r\n      price:0.9 1.5 1.9 2 9. 10.; \r\n      size:100*1 7 2 4 5 8);\r\n\r\nq:([] time:08:00+60*til 8; \r\n      sym:`a`b`a`b`b`a`b`a;\r\n      bid:1 9 2 8 8.5 3 7 4.);\r\n<\/textarea><script type='text\/javascript'>var myCodeMirror = CodeMirror.fromTextArea(document.getElementById('qcode-t-q-tab'), {lineNumbers: true, matchBrackets: true, indentUnit: 4, tabMode: 'default', mode: 'text\/x-plsql', readOnly:true });<\/script>\r\n\n<\/tr>\n<\/table>\n<p>In qSQL this is: <code>aj[`sym`time; t; q]<\/code>, which means perform an asof-join on t, looking up the nearest match from table q based on the sym and time column.<\/p>\n<p>In standard SQL, again you&#8217;ll have difficulty: <a href=\"http:\/\/stackoverflow.com\/questions\/4649665\/t-sql-select-by-nearest-date-and-grouped-by-id\">sql nearest date<\/a>, <a href=\"http:\/\/www.sqlservercentral.com\/Forums\/Topic1374454-338-1.aspx\">sql closest date<\/a> even just the  <a href=\"http:\/\/dba.stackexchange.com\/questions\/27823\/query-to-find-closest-lesser-date\">closest lesser date<\/a> isn&#8217;t elegant. One solution would be:<\/p>\n\r\n <textarea rows='8' cols='80' class='code' id='qcode-t-q-574'>\r\nWITH cte AS\r\n(SELECT t.sym, t.time, q.bid,\r\n\tROW_NUMBER() OVER (PARTITION BY t.ID, t.time \r\n           ORDER BY ABS(DATEDIFF(dd, t.time, p.time))) AS rowNum\r\n\tFROM t LEFT JOIN q ON t.sym = q.sym)\r\nSELECT\tsym,time,bid FROM cte WHERE rowNum = 1\r\n<\/textarea><script type='text\/javascript'>var myCodeMirror = CodeMirror.fromTextArea(document.getElementById('qcode-t-q-574'), {lineNumbers: true, matchBrackets: true, indentUnit: 4, tabMode: 'default', mode: 'text\/x-plsql', readOnly:true });<\/script>\r\n\n<p>It&#8217;s worth pointing out this is one of the queries that is typically extremely slow (minutes) on row-oriented databases compared to column-oriented databases (at most a few seconds).<\/p>\n<h3>qSQL vs SQL<\/h3>\n<p>Looking at the simplicity of the qSQL code compared to the standard SQL code we can see how basing our database on ordered lists rather than set theory is much more suited to time-series data analysis. By being built from the ground up for ordered data and by providing special time-series based joins, kdb let&#8217;s us form these example queries using very simple expressions. Once we need to create more complex queries and nested selects, attempting to use standard SQL can quickly spiral into a verbose unmaintainable mess.<\/p>\n<p>I won&#8217;t say qSQL can&#8217;t be cryptic \ud83d\ude42 but <strong>for time-series queries qSQL will mostly be shorter and simpler<\/strong> than trying to use standard SQL. <\/p>\n<p>If you think you have shorter SQL code that solves one of these examples or you are interested in one of our <a href=\"\/kdb-training\/\">kdb training courses<\/a> please <a href=\"\/contact\">get in touch<\/a>. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Often at the start of one of our training courses I&#8217;m asked why banks use the Kdb Database for their tick data. One well known reason is that kdb is really fast at typical financial time-series queries (due to Kdbs column-oriented architecture). Another reason is that qSQL is extremely expressive and well suited for time-series [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0},"categories":[2,23],"tags":[90,30,31,32,24],"_links":{"self":[{"href":"https:\/\/www.timestored.com\/b\/wp-json\/wp\/v2\/posts\/147"}],"collection":[{"href":"https:\/\/www.timestored.com\/b\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.timestored.com\/b\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.timestored.com\/b\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.timestored.com\/b\/wp-json\/wp\/v2\/comments?post=147"}],"version-history":[{"count":60,"href":"https:\/\/www.timestored.com\/b\/wp-json\/wp\/v2\/posts\/147\/revisions"}],"predecessor-version":[{"id":51859,"href":"https:\/\/www.timestored.com\/b\/wp-json\/wp\/v2\/posts\/147\/revisions\/51859"}],"wp:attachment":[{"href":"https:\/\/www.timestored.com\/b\/wp-json\/wp\/v2\/media?parent=147"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.timestored.com\/b\/wp-json\/wp\/v2\/categories?post=147"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.timestored.com\/b\/wp-json\/wp\/v2\/tags?post=147"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}