Bloom Filters¶

How probabilistic data structures prevent unnecessary disk reads in SSTables.

The Problem¶

User: get("user:99999")  // Key doesn't exist

Without bloom filter:
1. Check index → might be in Block 42
2. Read Block 42 from disk (100µs)
3. Decompress block (~1µs)
4. Binary search → not found
Total: ~101µs wasted on disk I/O

With 1000 non-existent keys: 1000 * 100µs = 100ms wasted!

The Solution: Bloom Filters¶

A bloom filter is a probabilistic data structure that answers: "Is key X definitely not in the set?"

User: get("user:99999")

With bloom filter:
1. Check bloom filter (67ns): NOT PRESENT → return None
Total: 67ns (1500x faster!)

Key properties: - No false negatives: If bloom says "not present", key is definitely absent - Possible false positives: If bloom says "maybe present", key might not be there - Space-efficient: 10 bits per key for ~0.9% false positive rate

How Bloom Filters Work¶

Basic Concept¶

A bloom filter is a bit array with hash functions:

Bit array (128 bits):
[0][0][0][0]...[0][0][0]

Add key "alice":
  hash1("alice") = 5  → set bit 5
  hash2("alice") = 89 → set bit 89
  hash3("alice") = 120 → set bit 120

Bit array after adding "alice":
[0][0][0][0][0][1]...[1]...[1][0][0]
      bit5      bit89    bit120

Query "alice":
  hash1("alice") = 5   → check bit 5: SET Yes
  hash2("alice") = 89  → check bit 89: SET Yes
  hash3("alice") = 120 → check bit 120: SET Yes
  Result: MAYBE PRESENT (all bits set)

Query "bob":
  hash1("bob") = 12 → check bit 12: NOT SET No
  Result: DEFINITELY NOT PRESENT (stop here!)

Bloom Filter in SSTables¶

File Layout¶

SSTable:
┌─────────────┐
│  Blocks     │
├─────────────┤
│  Index      │
├─────────────┤
│  Bloom      │ ← Bloom filter here
│  Filter     │
├─────────────┤
│  Footer     │
└─────────────┘

Read Path with Bloom¶

impl SSTableReader {
    pub async fn get(&self, key: &[u8]) -> Result<Option<Entry>> {
        // 1. Check bloom filter (67ns)
        if !self.bloom.contains(key) {
            return Ok(None);  // Definitely not present
        }

        // 2. Might be present, check index
        let block_id = self.index.find_block(key)?;

        // 3. Read block
        let block = self.read_block(block_id).await?;

        // 4. Search in block
        block.get(key)
    }
}

False Positive Rate¶

What is a False Positive?¶

Key "charlie" NOT in SSTable

Bloom filter checks:
  hash1("charlie") = 5   → bit 5 SET (from "alice")
  hash2("charlie") = 89  → bit 89 SET (from "alice")
  hash3("charlie") = 120 → bit 120 SET (from "alice")

Result: MAYBE PRESENT (false positive!)

Then:
  1. Check index
  2. Read block from disk (wasted!)
  3. Binary search → not found

False positive = wasted disk read.

Controlling False Positive Rate¶

Formula:

FP rate ≈ (1 - e^(-kn/m))^k

where:
  k = number of hash functions
  n = number of keys
  m = number of bits

Simplified:

m/n = bits per key

For k=3 hash functions:
  10 bits/key → ~0.9% FP rate
  12 bits/key → ~0.3% FP rate
  16 bits/key → ~0.05% FP rate

nori-sstable default:

pub const DEFAULT_BLOOM_BITS_PER_KEY: usize = 10;

Result: 0.9% false positive rate

Trade-Off Analysis¶

Bits/Key	FP Rate	Bloom Size (100K keys)	Wasted Reads (per 100K queries)
5	~10%	62KB	10,000
10	~0.9%	125KB	900
12	~0.3%	150KB	300
16	~0.05%	200KB	50

Optimal: 10 bits/key balances bloom size vs false positives.

Hash Functions¶

Why Multiple Hashes?¶

Single hash function:

Only 1 bit set per key → collision likely
  "alice" → bit 42
  "bob"   → bit 42 (collision!)

Multiple hash functions:

k bits set per key → lower collision probability
  "alice" → bits {5, 89, 120}
  "bob"   → bits {12, 67, 104}

Optimal k:

k = (m/n) * ln(2) ≈ 0.693 * (m/n)

For m/n = 10:
  k = 0.693 * 10 ≈ 7 hash functions

Double Hashing Trick¶

Problem: Computing k independent hash functions is expensive.

Solution: Use double hashing with only 2 hash functions:

fn bloom_hash(key: &[u8], i: usize, m: usize) -> usize {
    let h1 = xxhash64(key, seed=0);
    let h2 = xxhash64(key, seed=1);
    ((h1 + i * h2) % m) as usize
}

Generate k hashes from 2:

hash_0 = h1
hash_1 = h1 + h2
hash_2 = h1 + 2*h2
...
hash_k = h1 + k*h2

Benefit: Compute 2 hashes, derive k positions (fast!).

Why xxHash64?¶

use xxhash_rust::xxh64;

let hash = xxh64::xxh64(key, seed);

Properties: - Fast: ~10 GB/s throughput - Good distribution: Low collision rate - Deterministic: Same input → same output - Non-cryptographic: Don't need security, just speed

Alternatives considered: - CRC32: Faster but worse distribution - SipHash: Cryptographic (overkill, slower) - MurmurHash3: Similar speed, less Rust support

Building a Bloom Filter¶

During SSTable Creation¶

impl SSTableBuilder {
    pub async fn finish(self) -> Result<SSTableMetadata> {
        // 1. Write data blocks
        for block in self.blocks {
            write_block(&block).await?;
        }

        // 2. Write index
        write_index(&self.index).await?;

        // 3. Build bloom filter
        let bloom = BloomFilter::new(
            self.entry_count,
            BLOOM_BITS_PER_KEY
        );
        for key in self.all_keys {
            bloom.insert(&key);
        }

        // 4. Write bloom filter
        write_bloom(&bloom).await?;

        // 5. Write footer
        write_footer(...).await?;
    }
}

Memory Usage¶

100,000 keys * 10 bits/key = 1,000,000 bits = 125 KB

Bloom filter in memory during build: ~125KB
Bloom filter in file: ~125KB
Bloom filter loaded on read: ~125KB

Small overhead for huge performance gain.

Bloom Filter Performance¶

Benchmark Results¶

Bloom filter operations on M1 Pro:

Insert: ~20ns per key
Query:  ~67ns per key

For 100K key SSTable:
  Build time: 100K * 20ns = 2ms
  Query time: 67ns per lookup

Savings Calculation¶

Scenario: 100K queries on SSTable with 100K keys
  50% keys present, 50% absent

Without bloom filter:
  50K absent keys * 100µs disk read = 5 seconds wasted

With bloom filter (0.9% FP rate):
  50K absent * 67ns = 3.35ms (bloom checks)
  450 false positives * 100µs = 45ms (wasted disk)
  Total: ~48ms

Speedup: 5000ms / 48ms ≈ 100x faster!

Advanced Topics¶

Partitioned Bloom Filters¶

Idea: Split bloom into multiple small filters (one per block).

Pros: - Only load bloom for needed blocks - Better cache locality

Cons: - Higher false positive rate (smaller filters) - More complex implementation

Decision: nori-sstable uses single whole-file bloom for simplicity.

Blocked Bloom Filters¶

Idea: Organize bits into cache-line-sized blocks (64 bytes).

Traditional: bits[0..1000000]
Blocked: blocks[0..15625] where each block = 64 bytes

Benefit: All k hash probes hit same cache line (faster).

Implementation complexity: Medium

Future work: Consider for v2.

Counting Bloom Filters¶

Idea: Store counts instead of bits (supports deletion).

Benefit: Can remove keys from bloom filter.

Cost: 4x memory (32-bit counters vs 1-bit flags).

Use case: Compaction could update bloom (remove deleted keys).

Decision: Not needed (bloom rebuilt during compaction anyway).

Bloom Filter vs Alternatives¶

Cuckoo Filters¶

Idea: Hash table with relocation for collisions.

Pros: - Supports deletion - Better FP rate for same space

Cons: - More complex - Slightly slower queries

Decision: Bloom simpler, fast enough for SSTables.

XOR Filters¶

Idea: Use XOR operations for compact representation.

Pros: - 10-20% smaller than bloom - Faster construction

Cons: - Immutable (can't add keys incrementally) - More complex implementation

Future: Consider for v2 if bloom filter size becomes bottleneck.

Perfect Hashing¶

Idea: Build minimal perfect hash function.

Pros: - No false positives - Compact

Cons: - Expensive to build - Read-only (can't add keys)

Decision: Overkill for SSTables (false positives acceptable).

Practical Guidelines¶

1. Use Default Settings¶

// Recommended for most workloads
SSTableConfig {
    bloom_bits_per_key: 10,  // ~0.9% FP rate
    ..Default::default()
}

2. Increase for Large SSTables¶

// For >1GB SSTables, reduce FP rate
SSTableConfig {
    bloom_bits_per_key: 12,  // ~0.3% FP rate
    // Cost: +20% bloom size (acceptable for large files)
    ..Default::default()
}

3. Decrease for Tiny SSTables¶

// For <1MB SSTables, bloom overhead not worth it
SSTableConfig {
    bloom_bits_per_key: 0,  // Disable bloom
    // Rationale: Index lookup is already fast
    ..Default::default()
}

4. Monitor False Positive Rate¶

let fp_rate = false_positives / total_queries;

// If FP rate > 2%:
//   - Increase bloom_bits_per_key
//   - Check for hash collisions
//   - Verify bloom filter wasn't corrupted

Bloom Filter in Compaction¶

Problem: Stale Bloom Filters¶

Original SSTable: {alice, bob, charlie}
After compaction: {alice, bob}  // charlie deleted

Old bloom filter still contains "charlie" hash
  → False positive on "charlie" queries

Solution: Rebuild Bloom¶

async fn compact(inputs: Vec<SSTable>) -> SSTable {
    let mut builder = SSTableBuilder::new(config).await?;

    // Merge all entries
    for entry in merge_iterator(inputs) {
        if !entry.tombstone {  // Skip deleted
            builder.add(&entry).await?;
        }
    }

    // Bloom automatically rebuilt with only live keys
    builder.finish().await?
}

Result: Compacted SSTable has fresh, accurate bloom filter.

Real-World Impact¶

Case Study: 1GB SSTable with 10M Keys¶

Configuration:
  Keys: 10M
  Bloom bits/key: 10
  False positive rate: 0.9%

Bloom filter size: 10M * 10 bits = 12.5 MB (1.25% of SSTable size)

Query workload: 1M queries (50% hits, 50% misses)

Without bloom:
  500K misses * 100µs disk = 50 seconds

With bloom:
  500K misses * 67ns bloom = 33.5ms
  4.5K false positives * 100µs = 450ms
  Total: ~483ms

Speedup: 50s / 0.48s ≈ 100x faster!

Cost: 12.5 MB memory (negligible)

Summary¶

Bloom filters are essential for SSTables:

Fast negative lookups - 67ns to confirm absence Massive I/O savings - Skip 99%+ of unnecessary disk reads Space-efficient - 10 bits per key for 0.9% FP rate Simple - Single global bloom, no complex logic Battle-tested - Used in LevelDB, RocksDB, Cassandra

Key insight: Trading ~1% false positives for 100x speedup on absent keys is an excellent trade-off.

Next Steps¶

Understand the design choices: See Design Decisions: Bloom Strategy for bloom filter design rationale.

Optimize reads: Check Performance: Tuning for bloom filter configuration guidance.

See the full read path: Explore What is an SSTable? for how bloom fits into lookups.