Import Bitcoin Historical Csv Data to Kdb+

This article is part of a wider tutorial, that outlines the major components that could be used to make a bitcoin trading system based upon kdb+ tick. It's intended as an exercise for those that have completed the TimeStored online kdb+ training course, if you have completed one of our courses contact us to download the source code.

Parsing and Importing Files
Structuring our Database
Pseudocode

Loading Data into kdb+

Bitcoin Historical Data

Historical data within kdb+ is stored in a process called the HDB. However before we can load up that process we must first import our data from our raw data files then agree on and implement a database structure optimised for our system and the queries we would like to consider.

Parsing and Importing Files

Recommended Tutorials: Casting and Parsing, Importing Data into kdb+

Bitcoin historical data is available as CSV files here. We want to create a function that will:

Dynamically pull the list of files (hint: use wget)
Create an empty table on disk (splayed table tutorial)
Download each zip file, expand, parse the csv's and insert their data to the master table (hint: use gunzip / 7z)
Finally we'll add exchange columns, a kdb time format column and a currency column

Starting from a CSV file such as mtgoxUSD.csv that has three columns (timestamp, price, size):

1315150133,51.205000000000,6.000000000000
1315150278,51.205000000000,6.000000000000
1315151033,51.205000000000,8.000000000000
1315319733,43.597000000000,0.012500000000	
......

Once you have

Read the data from file and parsed it using 0:
Pulled the currency and exchange from the filename.
Converted the unix timestamp to kdb format

You should end up with the table:

table showing bitcoin trades, one per row

Structuring Our Database

Our bitcoin data is tiny (1.5GB in kdb+ format) on a good system we could easily store it fully in RAM. However for the purposes of learning and to allow efficient access from a slow laptop with a single drive and limited RAM, we want to consider partitioning our data. Partioning is logically dividing a database into distinct independent parts to increase manageability and performance. We cover partitioning in our Partitioned Database Tables tutorial, you should remember the diagram:

Partitioning by date as is common with typical stock data is too fine grained for our data, it would result in thousands of files less than 1kb in size. I suggest partitioning on currency, then sorting the data by time. This will allow fast access for queries such as:
What was the USD Bitcoin rate between 17th and 23rd of June

Pseudocode

The functions you will want to create are:

Function	Description
getCsvFileList	return table of csv zip file names at bitcoincharts (http://api.bitcoincharts.com/v1/csv/) Columns are: file, d-date, t-time, size, link
downToFolder:[urls; folder]	Delete folder and then fill with files from urls.
unzipAndGet[zdir; csvZipFile]	Unzips file to csv directory, parses csv files and returns table return table with columns ts,price,size,cur,ex,tp where each row is a trade

These then give a main loop of events:

Get the csv zip file list.
Download the files.
Create empty on disk table.
Unzip and load each file to disk.
Sort and apply attributes.

Import Bitcoin Historical Csv Data to Kdb+

Contents

Loading Data into kdb+

Bitcoin Historical Data

Parsing and Importing Files

Structuring Our Database

Pseudocode

See Also

QStudio