New York kdb Training 2019
Spaces still available. Register now to learn Kdb+.

Import Bitcoin Historical Csv Data to Kdb+

This article is part of a wider tutorial, that outlines the major components that could be used to make a bitcoin trading system based upon kdb+ tick. It's intended as an exercise for those that have completed the TimeStored online kdb+ training course, if you have completed one of our courses contact us to download the source code.

Loading Data into kdb+

Bitcoin Historical Data

Historical data within kdb+ is stored in a process called the HDB. However before we can load up that process we must first import our data from our raw data files then agree on and implement a database structure optimised for our system and the queries we would like to consider.

Parsing and Importing Files

Recommended Tutorials: Casting and Parsing, Importing Data into kdb+

Bitcoin historical data is available as CSV files here. We want to create a function that will:

  • Dynamically pull the list of files (hint: use wget)
  • Create an empty table on disk (splayed table tutorial)
  • Download each zip file, expand, parse the csv's and insert their data to the master table (hint: use gunzip / 7z)
  • Finally we'll add exchange columns, a kdb time format column and a currency column

Starting from a CSV file such as mtgoxUSD.csv that has three columns (timestamp, price, size):

Once you have

  • Read the data from file and parsed it using 0:
  • Pulled the currency and exchange from the filename.
  • Converted the unix timestamp to kdb format
You should end up with the table:

table showing bitcoin trades, one per row

Structuring Our Database

Our bitcoin data is tiny (1.5GB in kdb+ format) on a good system we could easily store it fully in RAM. However for the purposes of learning and to allow efficient access from a slow laptop with a single drive and limited RAM, we want to consider partitioning our data. Partioning is logically dividing a database into distinct independent parts to increase manageability and performance. We cover partitioning in our Partitioned Database Tables tutorial, you should remember the diagram:

How partitioning works in kdb+

Partitioning by date as is common with typical stock data is too fine grained for our data, it would result in thousands of files less than 1kb in size. I suggest partitioning on currency, then sorting the data by time. This will allow fast access for queries such as:
What was the USD Bitcoin rate between 17th and 23rd of June

Pseudocode

The functions you will want to create are:

FunctionDescription
getCsvFileList return table of csv zip file names at bitcoincharts (http://api.bitcoincharts.com/v1/csv/)
Columns are: file, d-date, t-time, size, link
downToFolder:[urls; folder] Delete folder and then fill with files from urls.
unzipAndGet[zdir; csvZipFile] Unzips file to csv directory, parses csv files and returns table
return table with columns ts,price,size,cur,ex,tp where each row is a trade

These then give a main loop of events:

  1. Get the csv zip file list.
  2. Download the files.
  3. Create empty on disk table.
  4. Unzip and load each file to disk.
  5. Sort and apply attributes.