Storage
Prometheus includes a local on-disk time series database, but also optionally integrates with remote storage systems.
Local storage
Prometheus's local time series database stores data in a custom, highly efficient format on local storage.
On-disk layout
Ingested samples are grouped into blocks of two hours. Each two-hour block consists of a directory containing a chunks subdirectory containing all the time series samples for that window of time, a metadata file, and an index file (which indexes metric names and labels to time series in the chunks directory). The samples in the chunks directory are grouped together into one or more segment files of up to 512MB each by default. When series are deleted via the API, deletion records are stored in separate tombstone files (instead of deleting the data immediately from the chunk segments).
The current block for incoming samples is kept in memory and is not fully
persisted. It is secured against crashes by a write-ahead log (WAL) that can be
replayed when the Prometheus server restarts. Write-ahead log files are stored
in the wal directory in 128MB segments. These files contain raw data that
has not yet been compacted; thus they are significantly larger than regular block
files. Prometheus will retain a minimum of three write-ahead log files.
High-traffic servers may retain more than three WAL files in order to keep at
least two hours of raw data.
A Prometheus server's data directory looks something like this:
./data
├── 01BKGV7JBM69T2G1BGBGM6KB12
│ └── meta.json
├── 01BKGTZQ1SYQJTR4PB43C8PD98
│ ├── chunks
│ │ └── 000001
│ ├── tombstones
│ ├── index
│ └── meta.json
├── 01BKGTZQ1HHWHV8FBJXW1Y3W0K
│ └── meta.json
├── 01BKGV7JC0RY8A6MACW02A2PJD
│ ├── chunks
│ │ └── 000001
│ ├── tombstones
│ ├── index
│ └── meta.json
├── chunks_head
│ └── 000001
└── wal
├── 000000002
└── checkpoint.00000001
└── 00000000
Note that a limitation of local storage is that it is not clustered or replicated. Thus, it is not arbitrarily scalable or durable in the face of drive or node outages and should be managed like any other single node database.
Snapshots are recommended for backups. Backups made without snapshots run the risk of losing data that was recorded since the last WAL sync, which typically happens every two hours. With proper architecture, it is possible to retain years of data in local storage.
Alternatively, external storage may be used via the remote read/write APIs. Careful evaluation is required for these systems as they vary greatly in durability, performance, and efficiency.
For further details on file format, see TSDB format.
Compaction
The initial two-hour blocks are eventually compacted into longer blocks in the background.
Compaction will create larger blocks containing data spanning up to 10% of the retention time, or 31 days, whichever is smaller.
Operational aspects
Prometheus has several flags that configure local storage. The most important are:
--storage.tsdb.path: Where Prometheus writes its database. Defaults todata/.--storage.tsdb.retention.time: How long to retain samples in storage. When this flag is set, it overridesstorage.tsdb.retention. If neither this flag norstorage.tsdb.retentionnorstorage.tsdb.retention.sizeis set, the retention time defaults to15d. Supported units: y, w, d, h, m, s, ms.--storage.tsdb.retention.size: The maximum number of bytes of storage blocks to retain. The oldest data will be removed first. Defaults to0or disabled. Units supported: B, KB, MB, GB, TB, PB, EB. Ex: "512MB". Based on powers-of-2, so 1KB is 1024B. Only the persistent blocks are deleted to honor this retention although WAL and m-mapped chunks are counted in the total size. So the minimum requirement for the disk is the peak space taken by thewal(the WAL and Checkpoint) andchunks_head(m-mapped Head chunks) directory combined