DuckDB Calculate Moving Average
What is a Moving Average?
The moving average (MA) is a statistical tool used to analyze data points by creating a series of averages of different subsets of the full data set. It's commonly used in time series analysis, particularly in finance and economics, to smooth out short-term fluctuations and highlight longer-term trends or cycles.
There are different types of moving averages, but the two most common ones are:
- Simple Moving Average (SMA) - Calculated by taking the arithmetic mean of a given number of data points.
- Exponential Moving Average (EMA) - Gives more weight to recent data points, making it more responsive to recent price changes.
For example, a 3-day SMA in stock prices would take the average of the stock prices for the last 3 days, then move forward one day and recalculate the average for the next 3 days, and so on. This moving window is highlighted in green in the example below.
Moving Average SQL Example
The leftmost table contains the original close price by date for the stock XOM (Exxon Mobil Corp) with added columns showing moving average and the values used to calculate that average:
You can see how the moving average smooths a graph in this 7-day and 30day moving average trend indicators:
Window functions and types
Window functions can use multiple rows to calculate a value for each row. DuckDB Window functions are blocking operators, i.e., they require their entire input to be buffered, making them one of the most memory-intensive operators in SQL.
Particular interesting window functions for stocks and finance include:
Name | Description |
---|---|
first_value(expr[ IGNORE NULLS]) |
Returns expr evaluated at the row that is the first row (with a non-null value of expr if IGNORE NULLS is set) of the window frame. |
last_value(expr[ IGNORE NULLS]) |
Returns expr evaluated at the row that is the last row (among rows with a non-null value of expr if IGNORE NULLS is set) of the window frame. |
rank() |
The rank of the current row with gaps; same as row_number of its first peer. |
Rows or DAYS
Notice in the table above how between 2024-01-05 and 2024-01-08 there is a gap for the weekend.
This is because the example above used ROWS as the window function.
You may want to filter on DAYS rather than rows. To do this we would use the below SQL.
Notice how I'm specifying and naming the window threew
, this allows reusing the window name rather than repeating code.
If you get an error saying "cannot copy window..." see this page.