
Datadog launched Monocle, a Rust-based real-time timeseries storage engine that unifies metrics storage, increases ingestion throughput, and lowers query latency while simplifying operations. Earlier designs split responsibilities across RTDB, an index database, a storage router, and per-node subsystems for ingestion, storage, snapshots, queries, and throttling, creating concurrency and scaling challenges. Historical backends included Cassandra, Redis, MDBM, a Go B+ tree, and RocksDB with DDSketch, each with tradeoffs. Monocle uses a shard-per-core, worker-per-shard model with per-worker LSM-trees to ingest data, serve queries, and perform background tasks like compaction.
"Earlier designs of Datadog's storage infrastructure separated responsibilities across multiple systems. Metrics data was written into the Real-Time Database (RTDB), which stored <timeseries_id, timestamp, value> tuples, while an index database maintained identifiers and tags. A storage router directed metrics to RTDB nodes, and queries fanned out across RTDB and index nodes. Each node contained subsystems for ingestion, storage, snapshots, queries, and throttling, all coordinated through a shared control plane."
"This architecture of RTDB went through several iterations. The first generation relied on Cassandra for high write throughput but limited query flexibility. A Redis-based design followed, improving responsiveness but encountering durability and single-thread execution issues. MDBM, a memory-mapped key-value store, offered better use of operating system caching but ran into scalability bottlenecks. A subsequent Go-based B+ tree engine adopted a thread-per-core model, which added concurrency but also complexity. Later, RocksDB provided persistence and support for distribution metrics through DDSketch, though challenges in scaling remained."
Read at InfoQ
Unable to calculate read time
Collection
[
|
...
]