Archive Page 3

kdb code highlighting in intellij

An intellij keyword file is now available to provide syntax highlighting of kdb code in intellij:

q Code Intellij Highlighting

q Code Intellij Highlighting

To install it copy this xml file to this directory:
C:\Users\USERNAME\.IdeaIC14\config\filetypes
Where USERNAME is obviously your username. Then restart intellij and open a .q file.

We’ve updated our notepad++ qlang.xml to provide code folding and highlighting of the .Q/.z namespaces.

kdb qunit testing now open source on github

We’ve now posted all source code from this website on our github kdb page.

Additionally we are open sourcing qunit, our kdb testing framework.
We look forward to receiving pull requests to fix our (hopefully few) bugs.

qStudio adds Nested Server Folder Support

Since our last qStudio kdb+ IDE announcment we have added a lot of new features:

Bulk importing kdb server lists

Bulk importing kdb server lists

There’s a lot of new features to allow supporting a huge number of servers efficiently:

  • Support importing HUGE number of servers:
    • 5000+ server connections are now supported
    • To prevent massive memory use, the object tree for a server is no longer refreshed at startup only on connection.
    • Allow specifying default username/password once for all servers
    • Allow nested connection folders
    • Add critical color option – servers with prod in name get highlighted in red
  • Sort File Tree Alphabetically
  • Numerous bugfixes including:
    • Fix critical Mac bug that prevented launching in some instances
    • Fix query cancelling

Smart Meter Data Analytics Benchmark – Open vs Closed

Benchmarking Smart Meter Data Analytics – I got forwarded this interesting paper that compares how quickly smart meter data can be analysed using

  1. a Relational Database
  2. Matlab
  3. An in-memory Column-Oriented database
  4. Two new NoSQL alternatives

Smart electricity grids, which incorporate renewable energysources such as solar and wind, and allow information sharingamong producers and consumers, are beginning to replace conventional power grids worldwide. Smart electricity meters are afundamental component of the smart grid, enabling automated collection of fine-grained (usually every 15 minutes or hourly) consumption data. This enables dynamic electricity pricing strategies,in which consumers are charged higher prices during peak timesto help reduce peak demand. Additionally, smart meter data analytics, which aims to help utilities and consumers understand electricity consumption patterns, has become an active area in researchand industry. According to a recent report, utility data analytics isalready a billion dollar market and is expected to grow to nearly 4billion dollars by year 2020

 

Open Sourced kdb+

In a world overran with open source big data solutions is kdb+ going to be left behind? I hope not…

Every few weeks someone comes to me with a big data problem often with a half-done mongoDB/NoSQL solution that they “want a little help with”. Usually after looking at it I think to myself

“I could solve this in 20 minutes with 5 lines of q code”

But how do I tell someone they should use a system which may end up costing them £1,000s per core in licensing costs. Yes it’s great that there’s a free 32-bit trial version but the risk that you may end up needing the 64-bit is too great a risk.

kdb+ vs mongoDB database popularity

Given the ever-increasing number of NoSQL solutions and in particular the rising popularity of Hadoop, R, python and MongoDB it’s not hard to see that open-source is taking over the world. Right now kdb+ still has the edge, that it’s faster, sleeker, sexier..but I don’t think that will save it in the long run. The power of open-source is that it let’s everyone contribute, witness the 100’s of libraries available for R, the 1000’s of javascript frameworks. The truly sad thing is that it’s not always the best underlying technology that wins. A 1000 amateurs creating a vibrant ecosystem of plug-ins, add-ons, tutorials… can beat other technologies through sheer force of numbers.

  • APL was a great language yet it remains relegated to history while PHP flourishes.
  • PostgreSQL was technically superior to MySQL yet MySQL is deployed everywhere

I believe kdb+ is the best solution to a large number of “big data” problems (small data to kdb+), When you stop and think, time-series data is everywhere, open sourcing kdb+ would open up entirely new sectors to kdb+ and I hope it’s a step kx take before it’s too late.

What do you think? Leave your comments below.

qStudio kdb+ GUI adds Dark Theme and Chinese Language

Based on user requests we have released a number of new features with qStudio 1.36:

Download the latest ->qStudio<- now.

Dark Code Editor Themes

qstudio-kdb-dark-theme-gui

Which can be set under settings->preferences

qstudio-settings-preferences

Open Results and Charts in New Window

To expand a panel into a new window click the “pop-out” icon.
pop-out

This will bring up the result in a new window:

mutliple-chart-windows

UTF-8 Chinese Language Support

qstudio-utf8

First Derivatives buy a majority stake in KX

First Derivatives buys €36m majority stake in Kx

Newry-based financial software firm First Derivatives has acquired a majority shareholding in big data analytics company Kx Systems for £36m (€44m).

KX has historically had a hard time penetrating markets outside finance, FD have a good sales team and previously acquired a marketing company in Philadelphia, hopefully this is the chance for kdb+ to go mainstream.

However it’s a worry that FD (First Derivatives) may increasingly “encourage” purchasing of the delta platform bundle rather than stand-alone kdb+. With the smaller margins outside of finance, will FD take a risk and open up the database. (Would FD have been a supporter of the 32-bit version becoming free for commercial use?) There’s a large number of individuals in off-shore locations that want to learn kdb+, FD could be incentivized to discourage that as it would hurt their consulting business.

It’s also interesting to consider companies that have already invested in KX technology and whether they will continue to do so

  • Competing consulting firms that specialise in kdb+ won’t take this as good news.
  • Panopticon/Datawatch based their visualization system on kdb+ (OEM license), they probably regret that now, given that their visualization software directly competes with FD’s dashboards.
  • Companies that had used kdb+ as part of their trading platform stack may consider FD a competitor as they also offer a trading platform.

What do you think? Will this lead to wider adoption? a growing platform?

Developer Salary by Location

kdb+ London Contract – £800 p/day
kdb+ belfast Citigroup Contract £400 p/day
Java Poland £150 p/day
java

Command Line Kdb+ Charts

sqlDashboards are included as a bundle with qStudio, part of that package is a command line utility called sqlChart that allows generating customized sql charts from the command line.

Checkout the video to see how you can create a chart based on data from a kdb+ database in 2 minutes:

The sqlChart page has all the documentation you need, Download the qstudio.zip to try it now.

The q Code

Help Screen

Pipe-lining Time Series Calculations for Cache Efficiency

I always like to investigate new technology and this week I found a nice automatic technique for improved cache use that I had previously seen some people manually write.

Consider a database query with three steps (three SQL SELECTs), some databases may pass results of each step to temporary tables in main memory. When the first step is finished, these intermediate results are passed back into CPU cache to be transformed by the second step, then back into a new temporary table in main memory, and so on.

To eliminate this back-and-forth, vector-based statistical functions can be pipelined, with the output of one function becoming input for the next, whose output feeds a third function, etc. Intermediate results stay in the pipeline inside CPU cache, with only the full result being materialized at the end.

This technology is part of ExtremeDB, they have a video that explains it well:

Time Series Calculations

Moving Averages Stock Price Example

This is what the actual code would look like to calculate the 5-day and 21-day moving averages for a stock and detect the points where the faster moving average (5-day) crosses over or under the slower moving average (21-day):

  1. Two invocations of ‘seq_window_agg_avg’ execute over the closing price sequence, ‘ClosePrice’, to obtain 5-day and 21-day moving averages.
  2. The function ‘seq_sub’ subtracts 21- from 5-day moving averages;
  3. The result “feeds” a fourth function, ‘seq_cross’, to identify where the 5- and 21-day moving averages cross.
  4. Finally, the function ‘seq_map’ maps the crossovers to the original ‘ClosePrice’ sequence, returning closing prices where the moving averages crossed.

This approach eliminates the need to create, populate and query temporary tables outside CPU cache in main memory. One “tile” of input data is processed by each invocation of ‘mco_seq_window_agg_avg_double()’, each time producing one tile of output that is passed to ‘mco_seq_sub_double()’ which, in turn, produces one tile of output that is passed as input to mco_seq_cross_double(), which produces one tile of output that is passed to mco_seq_map_double(). When the last function, mco_seq_map_double() has exhausted its input, the whole process repeats from the top to produce new tiles for additional processing.

A very cool idea!

And yes, ExtremeDB are the same guys that posted the top Stac M3 benchmark for a while (in 2012/13 I think).