Skip to content

LRU Block Cache

Deep dive into the LRU block cache and achieving 18x performance improvements.



Overview

nori-sstable includes a built-in LRU (Least Recently Used) block cache that dramatically improves read performance for hot key workloads. The cache stores decompressed blocks in memory, eliminating repeated disk I/O and decompression overhead.

{: .highlight }

NEW Feature: The LRU block cache was added in version 0.1 and delivers 18x performance improvements for hot key workloads (14ms โ†’ 777ยตs for 1000 operations).

Why Cache?

The Problem: - Every SSTable read requires disk I/O (~100-1000ยตs latency) - With compression, every read also requires decompression (~10ยตs) - Hot keys (frequently accessed) pay this cost repeatedly - 80/20 workloads (80% of requests hit 20% of keys) suffer most

The Solution: - Cache decompressed blocks in memory (one-time cost) - Serve subsequent reads from RAM (~100ns access time) - Result: 10-100x faster reads for hot data


Performance Impact

Benchmark Results

From benches/read_sstable.rs hot key pattern:

Configuration Time (1000 ops) Improvement
No cache 14.0 ms Baseline
64MB cache 777 ยตs 18x faster!
256MB cache 650 ยตs 21.5x faster

Point lookup improvements:

Entry Count No Cache With Cache Improvement
100 entries 13.3 ยตs 486 ns 27x faster
1K entries 14.2 ยตs 609 ns 23x faster
10K entries 15.1 ยตs 877 ns 17x faster

{: .note }

Why the improvement? Cache hits avoid two expensive operations: (1) Disk I/O (~100-1000ยตs), and (2) Decompression (~10ยตs). RAM access is ~100ns, giving 10-100x speedups.


How It Works

Cache Architecture

User Request (key)
    โ†“
Bloom filter check (key might exist?)
    โ†“
Index lookup (which block contains key?)
    โ†“
read_block(offset, size)
    โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Check LRU Cache                โ”‚
โ”‚  Key: block offset (u64)        โ”‚
โ”‚  Value: decompressed Block      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ†“                      โ†“
Cache HIT           Cache MISS
    โ†“                      โ†“
Return cached       Read from disk
block (~100ns)          โ†“
                   Decompress block
                        โ†“
                   Add to cache (evict LRU if full)
                        โ†“
                   Return block

Key insights: 1. Cache key = block offset: Each 4KB block is cached by its file offset 2. Cache value = decompressed block: Avoid repeated decompression 3. Thread-safe: Uses Mutex<LruCache> for concurrent access 4. LRU eviction: Automatically evicts least recently used blocks when full

Data Flow

// In SSTableReader::read_block()
pub(crate) async fn read_block(&self, offset: u64, size: u32) -> Result<Block> {
    // 1. Check cache first
    if let Some(ref cache) = self.block_cache {
        let mut cache_lock = cache.lock().await;
        if let Some(block) = cache_lock.get(&offset) {
            // Cache hit! Return cloned block
            self.meter.counter("sstable_block_cache_hits", &[]).inc(1);
            return Ok(block.clone());
        }
        // Cache miss
        self.meter.counter("sstable_block_cache_misses", &[]).inc(1);
    }

    // 2. Read compressed block from disk
    let mut file = self.file.lock().await;
    file.seek(SeekFrom::Start(offset)).await?;
    let mut compressed_bytes = vec![0u8; size as usize];
    file.read_exact(&mut compressed_bytes).await?;

    // 3. Decompress block
    let decompressed_bytes = compress::decompress(&compressed_bytes, self.footer.compression)?;

    // 4. Decode block structure
    let block = Block::decode(Bytes::from(decompressed_bytes))?;

    // 5. Cache the decompressed block
    if let Some(ref cache) = self.block_cache {
        let mut cache_lock = cache.lock().await;
        cache_lock.put(offset, block.clone());  // LRU eviction happens here
    }

    Ok(block)
}

Configuration

Default Configuration (64MB)

use nori_sstable::{SSTableConfig, SSTableReader};

// Option 1: Use default (64MB cache)
let reader = SSTableReader::open("data.sst".into()).await?;

// Option 2: Explicit default in config
let config = SSTableConfig {
    path: "data.sst".into(),
    block_cache_mb: 64,  // Default
    ..Default::default()
};

Memory usage: 64MB โ‰ˆ 16,384 blocks (at 4KB per block)


Custom Cache Size

use nori_sstable::SSTableReader;
use nori_observe::NoopMeter;
use std::sync::Arc;

// Hot workload: Increase cache to 256MB
let reader = SSTableReader::open_with_config(
    "hot_data.sst".into(),
    Arc::new(NoopMeter),
    256  // 256MB cache
).await?;

Memory usage: 256MB โ‰ˆ 65,536 blocks


Disable Cache (Cold Storage)

let config = SSTableConfig {
    path: "archive.sst".into(),
    block_cache_mb: 0,  // Disable caching
    ..Default::default()
};

Use case: Cold storage where reads are infrequent and memory is limited


Cache Sizing Guide

How Much Memory Does the Cache Use?

Formula:

Cache Memory = block_cache_mb * 1024 * 1024 bytes
Number of Blocks = Cache Memory / block_size (default 4KB)

Examples:

Config Memory Blocks Cached Use Case
block_cache_mb: 16 16 MB ~4,096 Development, testing
block_cache_mb: 64 64 MB ~16,384 Default (good balance)
block_cache_mb: 256 256 MB ~65,536 Hot workloads
block_cache_mb: 1024 1 GB ~262,144 Very large hot datasets
block_cache_mb: 0 0 0 Cold storage

Sizing for Your Workload

80/20 Hot Key Pattern (Common)

Scenario: 80% of requests hit 20% of your data

Calculation: 1. Total SSTable size: 1 GB 2. Hot data (20%): 200 MB 3. Number of blocks: 200 MB / 4 KB = 51,200 blocks 4. Recommended cache: 256 MB (to fit all hot blocks)

Expected hit rate: 80-90%


Uniform Access Pattern (Rare)

Scenario: All keys accessed equally (cache less effective)

Calculation: 1. Total SSTable size: 10 GB 2. Cache can only hold fraction of data 3. Recommended: 64-128 MB (default is fine)

Expected hit rate: 1-5% (proportional to cache/data ratio)


Very Hot Keys (e.g., Metadata, Counters)

Scenario: 95% of requests hit 5% of data

Calculation: 1. Total SSTable size: 500 MB 2. Hot data (5%): 25 MB 3. Number of blocks: 25 MB / 4 KB = 6,400 blocks 4. Recommended cache: 64 MB (default is sufficient!)

Expected hit rate: 95%+


Monitoring Cache Performance

Key Metrics

nori-sstable emits metrics via the Meter trait:

// Cache hits (good!)
sstable_block_cache_hits{} = 8500

// Cache misses (first-time reads or evictions)
sstable_block_cache_misses{} = 1500

// Calculate hit rate
hit_rate = hits / (hits + misses) = 8500 / 10000 = 85%

Interpreting Metrics

Hit Rate Status Action
>80% Excellent Cache is sized well
50-80% Good Consider increasing cache
20-50% ๐ŸŸ  Fair Increase cache or optimize access pattern
<20% Poor Cache too small or uniform access pattern

Example: Monitoring in Production

use nori_sstable::{SSTableBuilder, SSTableConfig};
use nori_observe_prom::PrometheusMeter;  // Prometheus exporter

// Create custom meter
let meter = Box::new(PrometheusMeter::new());

let config = SSTableConfig {
    path: "data.sst".into(),
    block_cache_mb: 256,
    ..Default::default()
};

let mut builder = SSTableBuilder::new_with_meter(config, meter).await?;

// Metrics are now exported to Prometheus:
// sstable_block_cache_hits
// sstable_block_cache_misses
// sstable_block_reads (total disk reads)

Query in Prometheus:

# Cache hit rate
rate(sstable_block_cache_hits[5m]) /
(rate(sstable_block_cache_hits[5m]) + rate(sstable_block_cache_misses[5m]))


Cache + Compression Synergy

The Winning Combination

Configuration:

let config = SSTableConfig {
    compression: Compression::Lz4,  // 2-3x storage savings
    block_cache_mb: 64,              // Cache decompressed blocks
    ..Default::default()
};

What happens:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Disk: Compressed blocks (2-3x smaller)     โ”‚
โ”‚ [...LZ4 compressed data...]                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
            โ†“ Read compressed (fast due to smaller size)
            โ†“ Decompress (one-time cost: ~10ยตs)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Cache: Decompressed blocks (fast access)   โ”‚
โ”‚ [Block1][Block2][Block3]...[BlockN]       โ”‚
โ”‚ (RAM: ~100ns access)                       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
            โ†“ Serve from cache (no decompression!)
         User Request (~5ยตs total)

Benefits: 1. Storage: Save 50-70% disk space (LZ4 compression) 2. First read: Pay decompression cost once (~10ยตs) 3. Cached reads: Zero decompression cost (cached = decompressed) 4. Network: Faster backups/replication (smaller files)

Result: Best of both worlds! ๐ŸŽ‰


Implementation Details

LRU Algorithm

nori-sstable uses the lru crate for O(1) cache operations:

use lru::LruCache;
use std::num::NonZeroUsize;

// In SSTableReader
pub struct SSTableReader {
    // ...
    block_cache: Option<Mutex<LruCache<u64, Block>>>,
    // ...
}

// Cache initialization
let blocks_in_cache = (block_cache_mb * 1024 * 1024) / 4096;
let cache = LruCache::new(
    NonZeroUsize::new(blocks_in_cache.max(1)).unwrap()
);

Operations: - get(&key) - O(1) - Returns value and marks as recently used - put(key, value) - O(1) - Inserts and evicts LRU if full - Thread-safe with Mutex

Memory Management

Block structure:

pub struct Block {
    data: Bytes,              // Block data (4KB typical)
    restart_offsets: Vec<u32>, // Restart point offsets
    // ...
}

Memory per cached block: ~4KB + ~64 bytes overhead = ~4100 bytes

Total cache memory:

Memory = (blocks_in_cache * 4100 bytes) โ‰ˆ block_cache_mb * 1024 * 1024

Thread Safety

The cache is protected by a Tokio async Mutex:

// Multiple readers can safely access cache concurrently
let reader = Arc::new(SSTableReader::open("data.sst").await?);

for _ in 0..4 {
    let reader_clone = reader.clone();
    tokio::spawn(async move {
        // Each task safely accesses shared cache
        reader_clone.get(b"key").await?;
    });
}

Lock contention: - Cache locks are held briefly (<1ยตs typically) - Read-heavy workloads scale well (cache hits don't modify LRU state) - Multiple readers can queue on cache lock (fair scheduling)


Best Practices

Do

  • Enable caching by default (64MB is reasonable for most workloads)
  • Monitor hit rates to validate cache effectiveness
  • Increase cache for hot workloads (80/20 patterns benefit most)
  • Combine with compression (store compressed, cache decompressed)
  • Use Arc<SSTableReader> to share cache across threads
  • Size based on hot data (not total data size)

Don't

  • Don't disable cache unless memory is extremely limited
  • Don't over-provision cache (diminishing returns beyond hot set size)
  • Don't forget to monitor sstable_block_cache_* metrics
  • Don't use tiny caches (<16MB) - overhead not worth it
  • Don't expect miracles on uniform access patterns (cache helps less)

Troubleshooting

"Cache hit rate is low (<20%)"

Possible causes: 1. Cache too small for working set 2. Uniform access pattern (all keys accessed equally) 3. Working set larger than total data (shouldn't happen)

Solutions: - Increase block_cache_mb to 256 or 512 MB - Check access pattern (is data actually hot?) - Monitor sstable_block_reads to see if disk I/O is high


"High memory usage"

Possible causes: 1. Cache configured too large 2. Multiple SSTableReaders with separate caches

Solutions: - Reduce block_cache_mb to 64 or 32 MB - Share SSTableReader instances via Arc (shares cache) - Use block_cache_mb: 0 for cold data SSTables


"Cache not improving performance"

Possible causes: 1. Data access is write-heavy (cache only helps reads) 2. First-time reads (cache miss is expected) 3. Access pattern is sequential scan (cache less helpful)

Solutions: - Verify workload is read-heavy - Allow warm-up period (first reads populate cache) - For sequential scans, cache is less beneficial (still helps for iterators)


Advanced Topics

Cache Warming

For predictable workloads, pre-warm the cache:

// Pre-load hot keys into cache
let hot_keys = vec![b"key1", b"key2", b"key3"];

for key in hot_keys {
    let _ = reader.get(key).await?;  // Populate cache
}

// Now subsequent reads are fast
let value = reader.get(b"key1").await?;  // Cache hit!

Multiple Readers, Shared Cache

// Share reader (and cache) across threads
let reader = Arc::new(SSTableReader::open("data.sst").await?);

let mut handles = vec![];
for i in 0..10 {
    let reader_clone = reader.clone();  // Shares cache!
    let handle = tokio::spawn(async move {
        reader_clone.get(format!("key{}", i).as_bytes()).await
    });
    handles.push(handle);
}

// All tasks share the same cache
for handle in handles {
    handle.await??;
}

Separate Caches for Different SSTables

// Each reader gets its own cache
let reader1 = Arc::new(SSTableReader::open("hot.sst").await?);
let reader2 = Arc::new(SSTableReader::open("cold.sst").await?);

// hot.sst has 256MB cache
// cold.sst has 16MB cache (or 0 to disable)

Use case: Allocate more cache to hot SSTables, less to cold ones.


Performance Tuning Workflow

  1. Start with defaults: block_cache_mb: 64
  2. Deploy and monitor:
    sstable_block_cache_hits / (sstable_block_cache_hits + sstable_block_cache_misses)
    
  3. Analyze hit rate:
  4. >80%: Cache is sized well
  5. 50-80%: Consider increasing cache
  6. <50%: Increase cache or investigate access pattern
  7. Tune cache size:
  8. Increase by 2x increments (64 โ†’ 128 โ†’ 256)
  9. Monitor hit rate improvement
  10. Stop when hit rate plateaus (diminishing returns)
  11. Balance memory: Ensure total cache usage across all readers fits in RAM


Summary

LRU Block Cache in nori-sstable: - 18x faster reads for hot key workloads - Caches decompressed blocks (avoid repeated decompression) - LRU eviction - automatic management - Thread-safe - share via Arc - Configurable - default 64MB, tune to workload - Works with compression - best of both worlds

Recommended configuration:

SSTableConfig {
    compression: Compression::Lz4,  // Storage savings
    block_cache_mb: 64,              // Good default (tune as needed)
    ..Default::default()
}

Monitor this metric:

cache_hit_rate = sstable_block_cache_hits / (hits + misses)

Result: Dramatically faster reads for hot data + storage savings!