Buddy Memory System - reference counting

Kdb uses a variant of the buddy memory system using reference counting for tracking live objects.

  • Objects are allocated memory in blocks of powers of 2
  • Memory for objects < 32MB will come from an internal heap which can only ever grow - this memory is given back to the heap when the object is no longer referenced, and can be used again for further allocations < 32MB.
  • Memory for objects > 32MB will be given back to the OS when the object is no longer referenced.
  • Symbols are stored as an interned pool

Version Differences

VersionBehaviour
2.4Memory never returned
2.5/2.6Unreferenced memory blocks over 32MB/64MB are returned immediately
2.7Unreferenced memory blocks returned when memory full or .Q.gc[] called

Finding the memory used

The following commands can be used to get memory usage

All values are in bytes

  • used - subset of heap in actual use.
  • heap - physically memory allocated to this process.
  • peak - largest heap size that q process has yet had.
  • wmax - the memory limit as set using the -w command line argument.
  • mmap - memory used for memory mapping files on disk.
  • mphy - physical memory available on the machine.
  • syms - Number of distinct syms in this q process.
  • symw - memory footprint of interned string pool.

In older versions of q, .Q.w[] was not present. The older, less user friendly way of obtaining the above statistics is

  • \w - used heap peak wmax mmap
  • \w 0 - syms symw

Garbage Collection

.Q.gc[]

(since 2.7) invokes the garbage collector. Returns the amount of memory that was returned to the OS.

Command line -g parameter

Switch garbage collection between immediate (1) and deferred (0) modes.

Reference counting in detail

The C API details reference couting as encountered when extending kdb using C

From within kdb we can use -16! - Returns the number of references to an object

Vectors are copied by reference when possible, but editing just one value causes another entire vector to be allocated. Note columns in a table are just vectors as shown below:

Memory Mapped files

There are two modes of memory mapping - immediate and deferred:

  • deferred mode - column is memory mapped on demand as needed for the duration of the query.
  • immediate mode - the columns memory map is maintained, whether or not the memory is actually used, this is down to OS details.

Compression and Memory

Uncompressed columns are stored in memory for the duration of a query, this can significantly increase memory requirements.