Time Series Database

What is a time-series Database?
Who uses them? Why? What for?

Time-Series Database -> A database with a specialized structure to allow efficiently storing and analysing time-series.

Time-Series -> A series of data points with associated timestamps.
Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data.
Examples of time series are heights of ocean tides, counts of sunspots, and the daily closing value of the Dow Jones Industrial Average.

The Time-Series Problem

Typically users that have a time-series problem:

  • Have 1000s of events occurring at timestamps
  • They want to record every single event with highly accurate time-stamps
  • They have Big Data because even one value measured over time can create a lot of data
  • For analysis they want to be able to:
    • Perform aggregate queries on date/time buckets
    • Perform time-window functions within the time-buckets e.g. Find first/last/max/min within a time window
    • Perform Time-Joins. For example if you have an event X at time T0, what is the closest event in time from table Y.
  • Since older data is less valuable we would like to archive it to save on storage

Example Time-Series Queries

Common Time-Series Operations

Time-Series Database Features

Increasingly businesses are realizing a one size fits all isn't working for databases. When you want to perform certain time based analyses, time-series databases can provide a 100x speedup.

A time-series database is specialized to quickly and efficiently:

  • Selecting a Time-Range
  • Answer queries requiring Time-Joins - e.g. Event X occurred at 9am, when was the closest Y event to that.
  • To provide elegant Date/Time-Functions - specialized functions to allow handling date/time types well. e.g. bucketing, range querying
  • Allow Compression - As the data is large and often repeating
  • Support Nanoseconds - As for some industries e.g. trading, the exact timing of events matter.

Why are time-series databases important now?

In the past machines, storage and technology would have made storing the full stream of time values too costly. However with plummeting compute/storage costs storing every single user click or stock price movement began to make sense. That has now led to renewed investment in time-series databases to expand their audience. As you can see below the number of time-series databases available has exploded:

Fastest Growing Database Segment

Time-Series Database popularity

Who uses time-series databases?

Banks - barclays, DB, Citi, UBS, HSBC, bank of America kdb To store tick data - fine-grained stock/FX price movements.
User Analytics Google Analytics, Baidu - Doris, Yandex - Clickhouse Used to clicker website visitor metrics
IoT - Internet of Things InfluxDB Gathering metrics from 1000 of deployed devices and performing problem identification or aggregation queries.
Monitoring Prometheus/Grafana Gathering metrics from 1000 of deployed software services and logs.

Top Time-Series Databases

Next read our list of Top Time-Series Databases