LRU Block Cache¶
Deep dive into the LRU block cache and achieving 18x performance improvements.
Overview¶
nori-sstable includes a built-in LRU (Least Recently Used) block cache that dramatically improves read performance for hot key workloads. The cache stores decompressed blocks in memory, eliminating repeated disk I/O and decompression overhead.
{: .highlight }
NEW Feature: The LRU block cache was added in version 0.1 and delivers 18x performance improvements for hot key workloads (14ms โ 777ยตs for 1000 operations).
Why Cache?¶
The Problem: - Every SSTable read requires disk I/O (~100-1000ยตs latency) - With compression, every read also requires decompression (~10ยตs) - Hot keys (frequently accessed) pay this cost repeatedly - 80/20 workloads (80% of requests hit 20% of keys) suffer most
The Solution: - Cache decompressed blocks in memory (one-time cost) - Serve subsequent reads from RAM (~100ns access time) - Result: 10-100x faster reads for hot data
Performance Impact¶
Benchmark Results¶
From benches/read_sstable.rs hot key pattern:
| Configuration | Time (1000 ops) | Improvement |
|---|---|---|
| No cache | 14.0 ms | Baseline |
| 64MB cache | 777 ยตs | 18x faster! |
| 256MB cache | 650 ยตs | 21.5x faster |
Point lookup improvements:
| Entry Count | No Cache | With Cache | Improvement |
|---|---|---|---|
| 100 entries | 13.3 ยตs | 486 ns | 27x faster |
| 1K entries | 14.2 ยตs | 609 ns | 23x faster |
| 10K entries | 15.1 ยตs | 877 ns | 17x faster |
{: .note }
Why the improvement? Cache hits avoid two expensive operations: (1) Disk I/O (~100-1000ยตs), and (2) Decompression (~10ยตs). RAM access is ~100ns, giving 10-100x speedups.
How It Works¶
Cache Architecture¶
User Request (key)
โ
Bloom filter check (key might exist?)
โ
Index lookup (which block contains key?)
โ
read_block(offset, size)
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Check LRU Cache โ
โ Key: block offset (u64) โ
โ Value: decompressed Block โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ
Cache HIT Cache MISS
โ โ
Return cached Read from disk
block (~100ns) โ
Decompress block
โ
Add to cache (evict LRU if full)
โ
Return block
Key insights:
1. Cache key = block offset: Each 4KB block is cached by its file offset
2. Cache value = decompressed block: Avoid repeated decompression
3. Thread-safe: Uses Mutex<LruCache> for concurrent access
4. LRU eviction: Automatically evicts least recently used blocks when full
Data Flow¶
// In SSTableReader::read_block()
pub(crate) async fn read_block(&self, offset: u64, size: u32) -> Result<Block> {
// 1. Check cache first
if let Some(ref cache) = self.block_cache {
let mut cache_lock = cache.lock().await;
if let Some(block) = cache_lock.get(&offset) {
// Cache hit! Return cloned block
self.meter.counter("sstable_block_cache_hits", &[]).inc(1);
return Ok(block.clone());
}
// Cache miss
self.meter.counter("sstable_block_cache_misses", &[]).inc(1);
}
// 2. Read compressed block from disk
let mut file = self.file.lock().await;
file.seek(SeekFrom::Start(offset)).await?;
let mut compressed_bytes = vec![0u8; size as usize];
file.read_exact(&mut compressed_bytes).await?;
// 3. Decompress block
let decompressed_bytes = compress::decompress(&compressed_bytes, self.footer.compression)?;
// 4. Decode block structure
let block = Block::decode(Bytes::from(decompressed_bytes))?;
// 5. Cache the decompressed block
if let Some(ref cache) = self.block_cache {
let mut cache_lock = cache.lock().await;
cache_lock.put(offset, block.clone()); // LRU eviction happens here
}
Ok(block)
}
Configuration¶
Default Configuration (64MB)¶
use nori_sstable::{SSTableConfig, SSTableReader};
// Option 1: Use default (64MB cache)
let reader = SSTableReader::open("data.sst".into()).await?;
// Option 2: Explicit default in config
let config = SSTableConfig {
path: "data.sst".into(),
block_cache_mb: 64, // Default
..Default::default()
};
Memory usage: 64MB โ 16,384 blocks (at 4KB per block)
Custom Cache Size¶
use nori_sstable::SSTableReader;
use nori_observe::NoopMeter;
use std::sync::Arc;
// Hot workload: Increase cache to 256MB
let reader = SSTableReader::open_with_config(
"hot_data.sst".into(),
Arc::new(NoopMeter),
256 // 256MB cache
).await?;
Memory usage: 256MB โ 65,536 blocks
Disable Cache (Cold Storage)¶
let config = SSTableConfig {
path: "archive.sst".into(),
block_cache_mb: 0, // Disable caching
..Default::default()
};
Use case: Cold storage where reads are infrequent and memory is limited
Cache Sizing Guide¶
How Much Memory Does the Cache Use?¶
Formula:
Cache Memory = block_cache_mb * 1024 * 1024 bytes
Number of Blocks = Cache Memory / block_size (default 4KB)
Examples:
| Config | Memory | Blocks Cached | Use Case |
|---|---|---|---|
block_cache_mb: 16 |
16 MB | ~4,096 | Development, testing |
block_cache_mb: 64 |
64 MB | ~16,384 | Default (good balance) |
block_cache_mb: 256 |
256 MB | ~65,536 | Hot workloads |
block_cache_mb: 1024 |
1 GB | ~262,144 | Very large hot datasets |
block_cache_mb: 0 |
0 | 0 | Cold storage |
Sizing for Your Workload¶
80/20 Hot Key Pattern (Common)¶
Scenario: 80% of requests hit 20% of your data
Calculation: 1. Total SSTable size: 1 GB 2. Hot data (20%): 200 MB 3. Number of blocks: 200 MB / 4 KB = 51,200 blocks 4. Recommended cache: 256 MB (to fit all hot blocks)
Expected hit rate: 80-90%
Uniform Access Pattern (Rare)¶
Scenario: All keys accessed equally (cache less effective)
Calculation: 1. Total SSTable size: 10 GB 2. Cache can only hold fraction of data 3. Recommended: 64-128 MB (default is fine)
Expected hit rate: 1-5% (proportional to cache/data ratio)
Very Hot Keys (e.g., Metadata, Counters)¶
Scenario: 95% of requests hit 5% of data
Calculation: 1. Total SSTable size: 500 MB 2. Hot data (5%): 25 MB 3. Number of blocks: 25 MB / 4 KB = 6,400 blocks 4. Recommended cache: 64 MB (default is sufficient!)
Expected hit rate: 95%+
Monitoring Cache Performance¶
Key Metrics¶
nori-sstable emits metrics via the Meter trait:
// Cache hits (good!)
sstable_block_cache_hits{} = 8500
// Cache misses (first-time reads or evictions)
sstable_block_cache_misses{} = 1500
// Calculate hit rate
hit_rate = hits / (hits + misses) = 8500 / 10000 = 85%
Interpreting Metrics¶
| Hit Rate | Status | Action |
|---|---|---|
| >80% | Excellent | Cache is sized well |
| 50-80% | Good | Consider increasing cache |
| 20-50% | ๐ Fair | Increase cache or optimize access pattern |
| <20% | Poor | Cache too small or uniform access pattern |
Example: Monitoring in Production¶
use nori_sstable::{SSTableBuilder, SSTableConfig};
use nori_observe_prom::PrometheusMeter; // Prometheus exporter
// Create custom meter
let meter = Box::new(PrometheusMeter::new());
let config = SSTableConfig {
path: "data.sst".into(),
block_cache_mb: 256,
..Default::default()
};
let mut builder = SSTableBuilder::new_with_meter(config, meter).await?;
// Metrics are now exported to Prometheus:
// sstable_block_cache_hits
// sstable_block_cache_misses
// sstable_block_reads (total disk reads)
Query in Prometheus:
# Cache hit rate
rate(sstable_block_cache_hits[5m]) /
(rate(sstable_block_cache_hits[5m]) + rate(sstable_block_cache_misses[5m]))
Cache + Compression Synergy¶
The Winning Combination¶
Configuration:
let config = SSTableConfig {
compression: Compression::Lz4, // 2-3x storage savings
block_cache_mb: 64, // Cache decompressed blocks
..Default::default()
};
What happens:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Disk: Compressed blocks (2-3x smaller) โ
โ [...LZ4 compressed data...] โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Read compressed (fast due to smaller size)
โ Decompress (one-time cost: ~10ยตs)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Cache: Decompressed blocks (fast access) โ
โ [Block1][Block2][Block3]...[BlockN] โ
โ (RAM: ~100ns access) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Serve from cache (no decompression!)
User Request (~5ยตs total)
Benefits: 1. Storage: Save 50-70% disk space (LZ4 compression) 2. First read: Pay decompression cost once (~10ยตs) 3. Cached reads: Zero decompression cost (cached = decompressed) 4. Network: Faster backups/replication (smaller files)
Result: Best of both worlds! ๐
Implementation Details¶
LRU Algorithm¶
nori-sstable uses the lru crate for O(1) cache operations:
use lru::LruCache;
use std::num::NonZeroUsize;
// In SSTableReader
pub struct SSTableReader {
// ...
block_cache: Option<Mutex<LruCache<u64, Block>>>,
// ...
}
// Cache initialization
let blocks_in_cache = (block_cache_mb * 1024 * 1024) / 4096;
let cache = LruCache::new(
NonZeroUsize::new(blocks_in_cache.max(1)).unwrap()
);
Operations:
- get(&key) - O(1) - Returns value and marks as recently used
- put(key, value) - O(1) - Inserts and evicts LRU if full
- Thread-safe with Mutex
Memory Management¶
Block structure:
pub struct Block {
data: Bytes, // Block data (4KB typical)
restart_offsets: Vec<u32>, // Restart point offsets
// ...
}
Memory per cached block: ~4KB + ~64 bytes overhead = ~4100 bytes
Total cache memory:
Thread Safety¶
The cache is protected by a Tokio async Mutex:
// Multiple readers can safely access cache concurrently
let reader = Arc::new(SSTableReader::open("data.sst").await?);
for _ in 0..4 {
let reader_clone = reader.clone();
tokio::spawn(async move {
// Each task safely accesses shared cache
reader_clone.get(b"key").await?;
});
}
Lock contention: - Cache locks are held briefly (<1ยตs typically) - Read-heavy workloads scale well (cache hits don't modify LRU state) - Multiple readers can queue on cache lock (fair scheduling)
Best Practices¶
Do¶
- Enable caching by default (64MB is reasonable for most workloads)
- Monitor hit rates to validate cache effectiveness
- Increase cache for hot workloads (80/20 patterns benefit most)
- Combine with compression (store compressed, cache decompressed)
- Use
Arc<SSTableReader>to share cache across threads - Size based on hot data (not total data size)
Don't¶
- Don't disable cache unless memory is extremely limited
- Don't over-provision cache (diminishing returns beyond hot set size)
- Don't forget to monitor
sstable_block_cache_*metrics - Don't use tiny caches (<16MB) - overhead not worth it
- Don't expect miracles on uniform access patterns (cache helps less)
Troubleshooting¶
"Cache hit rate is low (<20%)"¶
Possible causes: 1. Cache too small for working set 2. Uniform access pattern (all keys accessed equally) 3. Working set larger than total data (shouldn't happen)
Solutions:
- Increase block_cache_mb to 256 or 512 MB
- Check access pattern (is data actually hot?)
- Monitor sstable_block_reads to see if disk I/O is high
"High memory usage"¶
Possible causes: 1. Cache configured too large 2. Multiple SSTableReaders with separate caches
Solutions:
- Reduce block_cache_mb to 64 or 32 MB
- Share SSTableReader instances via Arc (shares cache)
- Use block_cache_mb: 0 for cold data SSTables
"Cache not improving performance"¶
Possible causes: 1. Data access is write-heavy (cache only helps reads) 2. First-time reads (cache miss is expected) 3. Access pattern is sequential scan (cache less helpful)
Solutions: - Verify workload is read-heavy - Allow warm-up period (first reads populate cache) - For sequential scans, cache is less beneficial (still helps for iterators)
Advanced Topics¶
Cache Warming¶
For predictable workloads, pre-warm the cache:
// Pre-load hot keys into cache
let hot_keys = vec![b"key1", b"key2", b"key3"];
for key in hot_keys {
let _ = reader.get(key).await?; // Populate cache
}
// Now subsequent reads are fast
let value = reader.get(b"key1").await?; // Cache hit!
Multiple Readers, Shared Cache¶
// Share reader (and cache) across threads
let reader = Arc::new(SSTableReader::open("data.sst").await?);
let mut handles = vec![];
for i in 0..10 {
let reader_clone = reader.clone(); // Shares cache!
let handle = tokio::spawn(async move {
reader_clone.get(format!("key{}", i).as_bytes()).await
});
handles.push(handle);
}
// All tasks share the same cache
for handle in handles {
handle.await??;
}
Separate Caches for Different SSTables¶
// Each reader gets its own cache
let reader1 = Arc::new(SSTableReader::open("hot.sst").await?);
let reader2 = Arc::new(SSTableReader::open("cold.sst").await?);
// hot.sst has 256MB cache
// cold.sst has 16MB cache (or 0 to disable)
Use case: Allocate more cache to hot SSTables, less to cold ones.
Performance Tuning Workflow¶
- Start with defaults:
block_cache_mb: 64 - Deploy and monitor:
- Analyze hit rate:
- >80%: Cache is sized well
- 50-80%: Consider increasing cache
- <50%: Increase cache or investigate access pattern
- Tune cache size:
- Increase by 2x increments (64 โ 128 โ 256)
- Monitor hit rate improvement
- Stop when hit rate plateaus (diminishing returns)
- Balance memory: Ensure total cache usage across all readers fits in RAM
Related Documentation¶
- Compression Guide - Learn how compression works with caching
- Performance Benchmarks - See cache benchmark results
- API Reference - SSTableReader cache methods
- Tuning Guide - General performance tuning
Summary¶
LRU Block Cache in nori-sstable:
- 18x faster reads for hot key workloads
- Caches decompressed blocks (avoid repeated decompression)
- LRU eviction - automatic management
- Thread-safe - share via Arc
- Configurable - default 64MB, tune to workload
- Works with compression - best of both worlds
Recommended configuration:
SSTableConfig {
compression: Compression::Lz4, // Storage savings
block_cache_mb: 64, // Good default (tune as needed)
..Default::default()
}
Monitor this metric:
Result: Dramatically faster reads for hot data + storage savings!