Memory Pressure System¶
Why ATLL uses 4-zone adaptive backpressure (Green/Yellow/Orange/Red) with composite scoring instead of hard stalls or single thresholds.
Decision¶
Use a 4-zone adaptive backpressure system with composite scoring to progressively throttle writes before hitting hard limits.
Each zone applies different strategies:
pub enum PressureZone {
Green, // 0-50%: No backpressure
Yellow, // 50-75%: Soft throttling (warn, delay small)
Orange, // 75-90%: Heavy throttling (delay exponential)
Red, // 90%+: Hard stall (reject writes)
}
// Composite pressure score
pressure_score = 0.4 × L0_ratio + 0.4 × memory_ratio + 0.2 × memtable_ratio
Where:
L0_ratio = L0_count / L0_max
memory_ratio = total_memory / memory_limit
memtable_ratio = memtable_size / memtable_max
Result: - Progressive degradation: Gradual slowdown instead of cliff-edge failures - Multi-dimensional awareness: Responds to L0 count, memory usage, and memtable size - Observable behavior: Clear pressure zones with metrics and VizEvent emissions
Alternatives Considered¶
1. Hard Stall Only (No Backpressure)¶
Approach: Accept writes until limit, then reject
async fn put(&self, key: Vec<u8>, value: Vec<u8>) -> Result<()> {
if self.l0_count >= L0_MAX {
return Err(Error::MemoryLimitExceeded);
}
// Write normally
self.write(key, value).await
}
Rejected because: - Cliff-edge failures: 0ms latency → rejection with no warning - No client adaptation: Clients can't detect pressure and slow down - Burst amplification: All clients retry simultaneously (thundering herd) - Poor UX: Sudden errors instead of gradual slowdown
Example:
L0 count: 5 (of 6 max)
100 concurrent writes arrive
→ All succeed (L0 = 6)
L0 count: 6 (at max)
Next write arrives
→ Rejected (Error::MemoryLimitExceeded)
→ Client retries immediately (backoff logic needed)
→ Same error (compaction hasn't finished yet)
Problem: No warning, no graceful degradation
2. Single Threshold (Binary Backpressure)¶
Approach: Normal speed until threshold, then fixed delay
async fn put(&self, key: Vec<u8>, value: Vec<u8>) -> Result<()> {
if self.l0_count > 4 {
// Above threshold: delay all writes
sleep(Duration::from_millis(100)).await;
}
self.write(key, value).await
}
Rejected because: - Cliff edge (smaller): 0ms → 100ms with no gradation - Inefficient: Same delay at 50% and 99% pressure - No severity signal: Clients can't distinguish "mildly busy" from "critically overloaded" - Fixed delay: Doesn't adapt to pressure severity
Example:
L0 count: 4 → 0ms write latency
L0 count: 5 → 100ms write latency (sudden jump)
L0 count: 6 → 100ms write latency (same as 5, but more critical)
Problem: Binary signal, no proportional response
3. Per-Dimension Limits (Independent Throttling)¶
Approach: Separate thresholds for L0, memory, memtable
async fn put(&self, key: Vec<u8>, value: Vec<u8>) -> Result<()> {
let mut delay_ms = 0;
if self.l0_count > L0_THRESHOLD {
delay_ms += 50;
}
if self.total_memory > MEMORY_THRESHOLD {
delay_ms += 50;
}
if self.memtable_size > MEMTABLE_THRESHOLD {
delay_ms += 50;
}
sleep(Duration::from_millis(delay_ms)).await;
self.write(key, value).await
}
Rejected because: - Additive delays: Multiple dimensions → 150ms delay (too harsh) - No global view: Doesn't balance dimensions (e.g., high L0 + low memory = less critical) - Tuning difficulty: 3 independent thresholds to configure - Uneven response: One dimension can dominate
Example:
Scenario A: L0=high, memory=low, memtable=low
→ 50ms delay (reasonable)
Scenario B: L0=high, memory=high, memtable=high
→ 150ms delay (too harsh, might trigger timeout)
Problem: Doesn't consider overall system health
4. Fixed Delay Progression¶
Approach: Static delays per pressure level
fn calculate_delay(&self) -> Duration {
match self.pressure_level() {
1 => Duration::from_millis(10),
2 => Duration::from_millis(20),
3 => Duration::from_millis(40),
4 => Duration::from_millis(80),
_ => Duration::ZERO,
}
}
Rejected because: - Linear progression: Doesn't adapt to rate of change - No exponential backoff: Can't handle transient spikes - Static levels: Requires manual tuning per workload - No smooth transition: Jumps between fixed delays
Example:
Pressure: 25% → 0ms
Pressure: 50% → 10ms (jump)
Pressure: 75% → 40ms (jump)
Pressure: 90% → 80ms (jump)
Problem: Discrete jumps, no smooth curve
5. Token Bucket Rate Limiting¶
Approach: Allow X writes/sec, delay when bucket empty
pub struct TokenBucket {
tokens: f64,
rate: f64, // Tokens per second
max: f64, // Bucket capacity
}
async fn put(&self, key: Vec<u8>, value: Vec<u8>) -> Result<()> {
// Wait until token available
self.bucket.acquire(1).await;
self.write(key, value).await
}
Rejected because: - Fixed rate: Doesn't adapt to compaction progress - Ignores system state: Tokens refill regardless of L0/memory pressure - No pressure signal: Clients don't know why they're being throttled - Complex tuning: Rate must match write throughput and compaction rate
Example:
Bucket: 100 tokens/sec
Scenario A: Low pressure, 50 writes/sec
→ Tokens accumulate (no backpressure)
Scenario B: High pressure, 50 writes/sec
→ Tokens accumulate (still no backpressure!)
Problem: Doesn't respond to system state
Rationale¶
1. Progressive Degradation¶
Composite scoring provides smooth transition:
fn pressure_score(&self) -> f64 {
let l0_ratio = self.l0_count as f64 / self.config.l0_max as f64;
let memory_ratio = self.total_memory as f64 / self.config.memory_limit as f64;
let memtable_ratio = self.memtable_size as f64 / self.config.memtable_max as f64;
// Weighted average (L0 and memory matter most)
0.4 * l0_ratio + 0.4 * memory_ratio + 0.2 * memtable_ratio
}
Benefits: - Smooth curve: Pressure increases gradually, not in jumps - Balanced view: No single dimension dominates - Tunable weights: Can adjust for workload (e.g., 0.6×L0 for write-heavy)
Example:
State A: L0=50%, memory=30%, memtable=20%
pressure_score = 0.4×0.5 + 0.4×0.3 + 0.2×0.2 = 0.36 (Green)
State B: L0=70%, memory=60%, memtable=40%
pressure_score = 0.4×0.7 + 0.4×0.6 + 0.2×0.4 = 0.60 (Yellow)
State C: L0=90%, memory=85%, memtable=80%
pressure_score = 0.4×0.9 + 0.4×0.85 + 0.2×0.8 = 0.86 (Orange)
Result: Smooth progression from Green → Yellow → Orange
2. Zone-Based Strategies¶
Each zone applies different backpressure:
pub async fn apply_backpressure(&self, score: f64) -> Result<()> {
match Self::zone_for_score(score) {
PressureZone::Green => {
// 0-50%: No delay, emit metrics only
Ok(())
}
PressureZone::Yellow => {
// 50-75%: Soft throttling (linear delay)
let delay_ms = (score - 0.5) / 0.25 * 50.0; // 0-50ms
sleep(Duration::from_millis(delay_ms as u64)).await;
Ok(())
}
PressureZone::Orange => {
// 75-90%: Heavy throttling (exponential delay)
let factor = (score - 0.75) / 0.15; // 0-1
let delay_ms = 50.0 * (2.0_f64.powf(factor * 4.0)); // 50-800ms
sleep(Duration::from_millis(delay_ms as u64)).await;
Ok(())
}
PressureZone::Red => {
// 90%+: Hard stall (reject writes)
Err(Error::MemoryLimitExceeded)
}
}
}
Delay curves:
Green (0-50%):
delay = 0ms (always)
Yellow (50-75%):
50% → 0ms
60% → 20ms
70% → 40ms
75% → 50ms
Linear growth
Orange (75-90%):
75% → 50ms
80% → 100ms
85% → 200ms
90% → 400ms
Exponential growth (2^x curve)
Red (90%+):
Reject writes immediately
Rationale: - Green: No overhead, normal throughput - Yellow: Gentle slowdown, clients notice but not alarmed - Orange: Aggressive backoff, clear "slow down" signal - Red: Last resort, system critically overloaded
3. Multi-Dimensional Awareness¶
Why composite scoring beats single metrics:
Scenario 1: High L0, low memory
L0: 90% (6 files pending compaction)
Memory: 20% (plenty of RAM available)
Memtable: 30%
Composite: 0.4×0.9 + 0.4×0.2 + 0.2×0.3 = 0.50 (Yellow, not Orange)
Interpretation: High L0 but plenty of memory, compaction will catch up soon
Strategy: Soft throttling (20ms delay), not aggressive backoff
Scenario 2: Low L0, high memory
L0: 30% (few files)
Memory: 95% (nearly full)
Memtable: 80%
Composite: 0.4×0.3 + 0.4×0.95 + 0.2×0.8 = 0.66 (Yellow, approaching Orange)
Interpretation: L0 is fine but memory critical, need flush
Strategy: Moderate throttling, trigger memtable flush
Scenario 3: All dimensions high
L0: 85%
Memory: 90%
Memtable: 95%
Composite: 0.4×0.85 + 0.4×0.9 + 0.2×0.95 = 0.89 (Orange, near Red)
Interpretation: System critically overloaded
Strategy: Heavy throttling (300ms+ delay), aggressive compaction
Benefit: Responds to overall system health, not single bottleneck
4. Observable and Debuggable¶
VizEvent emissions for monitoring:
pub fn emit_pressure_event(&self, zone: PressureZone, score: f64) {
self.meter.emit(VizEvent::PressureZone {
zone: zone.as_str(),
score,
l0_count: self.l0_count,
memory_bytes: self.total_memory,
memtable_bytes: self.memtable_size,
timestamp_ns: self.clock.now().as_nanos(),
});
}
Dashboard visualization:
Pressure Timeline:
100% │ ████ Red
90% │ ████░░░░░░
75% │ ████░░░░░░░░░░░ Orange
50% │ ████░░░░░░░░░░░░░░░░ Yellow
0% │████░░░░░░░░░░░░░░░░░░░░░ Green
└────────────────────────────────
Time →
Zone transitions visible in real-time
Operators can see pressure buildup before failures
Metrics exposed:
// Gauge metrics
pressure_score (0.0-1.0)
pressure_zone (0=Green, 1=Yellow, 2=Orange, 3=Red)
l0_ratio (0.0-1.0)
memory_ratio (0.0-1.0)
// Counter metrics
writes_delayed_total (by zone)
writes_rejected_total
backpressure_time_ms_total
Trade-offs¶
What We Gained¶
1. Graceful Degradation - Smooth slowdown from 0ms → 50ms → 400ms → rejection - Clients experience latency increase, not sudden errors - Time to adapt (reduce write rate, batch, defer)
2. Multi-Dimensional Health - Composite score balances L0, memory, memtable - Avoids overreacting to single metric spike - Responds to true system stress
3. Clear Severity Signals - 4 zones with distinct behaviors - Operators know severity at a glance - Clients can implement zone-aware retry logic
4. Configurable Weights - Adjust for workload (write-heavy, read-heavy, balanced) - Tune L0 vs memory importance - Override via config without code changes
What We Gave Up¶
1. Simplicity - More complex than hard stall or single threshold - Composite formula requires tuning (default works for most) - 4 zones vs binary on/off
2. Maximum Throughput - Yellow/Orange zones add latency (by design) - Reduces peak write rate before hitting limits - Trade throughput for stability
3. Predictability - Zone transitions depend on multiple metrics - Not a simple "L0 > 6 → reject" rule - Requires monitoring to understand behavior
Configuration¶
Default configuration:
pub struct PressureConfig {
/// L0 file count limit
/// Default: 6
pub l0_max: usize,
/// Total memory limit (bytes)
/// Default: 512 MB
pub memory_limit: usize,
/// Memtable size limit (bytes)
/// Default: 64 MB
pub memtable_max: usize,
/// Composite score weights (must sum to 1.0)
/// Default: [0.4, 0.4, 0.2]
pub weights: [f64; 3], // [L0, memory, memtable]
/// Zone boundaries (pressure score thresholds)
/// Default: [0.50, 0.75, 0.90]
pub zone_thresholds: [f64; 3], // [Yellow, Orange, Red]
}
Tuning guidelines:
Write-Heavy Workloads¶
PressureConfig {
l0_max: 8, // Allow more L0 accumulation
memory_limit: 1024 * 1024 * 1024, // 1 GB
weights: [0.6, 0.3, 0.1], // Prioritize L0 over memory
..Default::default()
}
Effect: - Higher L0 tolerance (8 files vs 6) - L0 ratio weighted more heavily (60% vs 40%) - Less backpressure from memory spikes
Read-Heavy Workloads¶
PressureConfig {
l0_max: 4, // Keep L0 low (faster reads)
memory_limit: 2048 * 1024 * 1024, // 2 GB (more cache)
weights: [0.5, 0.3, 0.2], // Balanced
..Default::default()
}
Effect: - Lower L0 limit (better read performance) - More memory for block cache - Earlier backpressure to maintain low L0
High-Pressure Tolerance¶
Effect: - Yellow zone: 60-85% (wider, less sensitive) - Orange zone: 85-95% (narrower, more tolerance) - Red zone: 95%+ (only truly critical) - Less aggressive backpressure overall
Low-Latency SLOs¶
Effect: - Yellow zone: 40-65% (earlier warning) - Orange zone: 65-85% (wider, more aggressive) - Red zone: 85%+ (earlier hard stall) - Preemptive backpressure to maintain low latency
Zone Evolution Examples¶
Example 1: Gradual Write Burst¶
Workload: Steady increase from 1K writes/sec → 10K writes/sec
Time │ L0 │ Mem │ MT │ Score │ Zone │ Delay
──────┼─────┼──────┼─────┼───────┼────────┼──────
00:00 │ 20% │ 30% │ 10% │ 0.24 │ Green │ 0ms
00:10 │ 40% │ 50% │ 30% │ 0.42 │ Green │ 0ms
00:20 │ 60% │ 65% │ 50% │ 0.60 │ Yellow │ 20ms
00:30 │ 75% │ 75% │ 70% │ 0.74 │ Yellow │ 48ms
00:40 │ 85% │ 85% │ 85% │ 0.85 │ Orange │ 178ms
00:50 │ 90% │ 90% │ 90% │ 0.90 │ Red │ Reject
Compaction kicks in aggressively at 00:40
Pressure stabilizes at Yellow zone by 01:00
Example 2: Transient Memory Spike¶
Workload: Sudden 200 MB write batch
Time │ L0 │ Mem │ MT │ Score │ Zone │ Delay
──────┼─────┼──────┼─────┼───────┼────────┼──────
00:00 │ 30% │ 40% │ 20% │ 0.32 │ Green │ 0ms
00:01 │ 35% │ 85% │ 70% │ 0.62 │ Yellow │ 24ms ← Spike
00:02 │ 40% │ 50% │ 30% │ 0.38 │ Green │ 0ms ← Recovered
Memtable flushed quickly (200 MB → SSTable)
Composite score smooths out spike (85% memory but 35% L0)
No Orange zone entered (avoided overreaction)
Example 3: Sustained High Load¶
Workload: 8K writes/sec sustained for 10 minutes
Time │ L0 │ Mem │ MT │ Score │ Zone │ Delay
──────┼─────┼──────┼─────┼───────┼────────┼──────
00:00 │ 50% │ 55% │ 40% │ 0.50 │ Yellow │ 0ms ← Enter Yellow
00:30 │ 70% │ 70% │ 60% │ 0.68 │ Yellow │ 36ms
01:00 │ 80% │ 78% │ 75% │ 0.78 │ Orange │ 89ms ← Enter Orange
02:00 │ 82% │ 80% │ 78% │ 0.80 │ Orange │ 126ms
05:00 │ 78% │ 76% │ 74% │ 0.76 │ Orange │ 63ms ← Compaction catching up
10:00 │ 65% │ 60% │ 55% │ 0.61 │ Yellow │ 22ms ← Stabilized
System never hits Red zone
Backpressure + compaction reach equilibrium in Orange zone
Gradual recovery to Yellow after workload completes
Validation¶
Property test: Pressure monotonicity
#[test]
fn test_pressure_increases_with_load() {
let mut lsm = LSM::new(PressureConfig::default());
let mut prev_score = 0.0;
// Incrementally add L0 files
for i in 0..6 {
lsm.add_l0_file(1024 * 1024); // 1 MB each
let score = lsm.pressure_score();
assert!(score >= prev_score); // Monotonic increase
prev_score = score;
}
// Check zone transitions
assert_eq!(lsm.zone_for_score(0.30), PressureZone::Green);
assert_eq!(lsm.zone_for_score(0.60), PressureZone::Yellow);
assert_eq!(lsm.zone_for_score(0.80), PressureZone::Orange);
assert_eq!(lsm.zone_for_score(0.95), PressureZone::Red);
}
Benchmark: Backpressure overhead
Baseline (no pressure):
PUT latency: 1.2ms (p50), 2.1ms (p95)
Yellow zone (score=0.60):
PUT latency: 21.5ms (p50), 32.8ms (p95)
Overhead: +20ms delay (expected)
Orange zone (score=0.85):
PUT latency: 201.3ms (p50), 285.6ms (p95)
Overhead: ~200ms delay (expected)
Red zone (score=0.95):
PUT latency: N/A (writes rejected)
Error rate: 100% (expected)
Integration test: Recovery from Red zone
#[tokio::test]
async fn test_recovery_from_red_zone() {
let lsm = LSM::new(PressureConfig::default());
// Fill L0 to Red zone
for _ in 0..10 {
let _ = lsm.flush_memtable().await;
}
assert_eq!(lsm.pressure_zone(), PressureZone::Red);
// Trigger aggressive compaction
lsm.compact_all_slots().await.unwrap();
// After compaction, should return to Green
assert_eq!(lsm.pressure_zone(), PressureZone::Green);
}
Summary¶
4-zone adaptive backpressure is the right choice for ATLL because:
- Progressive degradation - Smooth transition from 0ms → 50ms → 400ms → rejection
- Multi-dimensional health - Composite score balances L0, memory, memtable
- Observable behavior - Clear zones with VizEvent emissions and metrics
- Configurable response - Tune weights and thresholds per workload
We accept these trade-offs: - Complexity (vs hard stall only) - Reduced peak throughput (by design, for stability) - Multiple metrics to monitor (vs single threshold)
For heterogeneous workloads with bursty writes, 4-zone backpressure prevents cliff-edge failures while maintaining observable, predictable behavior.
Last Updated: 2025-10-31 See Also: ATLL Architecture, Write Path