Adaptive K-Way Fanout¶
Why each slot chooses its own K-max (maximum sorted runs) based on heat score instead of using a global strategy.
Decision¶
Each slot independently chooses its K-max (maximum sorted runs allowed) based on its heat score using this formula:
k_max[slot] = 1 + floor((1 - heat_score) × (K_global - 1))
Where:
heat_score ∈ [0.0, 1.0] // 0 = cold, 1 = hot
K_global = 4 // Maximum runs allowed (configurable)
Result: - Hot slots (heat_score ≥ 0.8) → k_max = 1 (leveled, no overlaps) - Warm slots (heat_score ≈ 0.5) → k_max = 2 (hybrid) - Cold slots (heat_score ≤ 0.2) → k_max = 3-4 (tiered, bounded overlaps)
Alternatives Considered¶
1. Pure Leveled (Global K=1)¶
Approach: All slots maintain K=1 (single sorted run, no overlaps)
Rejected because: - High write amplification: 40-100x WA for all data (even cold) - Constant I/O: Compaction never stops (even for rarely accessed data) - No adaptation: Hot/cold treated identically - Wasted resources: Compact cold data for no benefit
Example (cold slot):
Cold slot compacted 100 times/day:
Each compaction rewrites 10 MB
Total I/O: 1 GB/day
Benefit: Fast reads (but rarely read!)
Cost: 1 GB/day wasted I/O
2. Pure Tiered (Global K>1)¶
Approach: All slots allow K>1 (multiple runs, bounded overlaps)
Rejected because: - High read amplification: 10-15 RA for all data (even hot) - Slow point queries: Must check 4 runs per slot - Poor SLO compliance: p95 latency >20ms (unacceptable for hot data) - No adaptation: Hot/cold treated identically
Example (hot slot):
Hot slot (1000 queries/sec):
Each query checks 4 runs
Total I/O: 4000 checks/sec
Benefit: Low WA (fast writes)
Cost: 4x slower reads (SLO violation)
3. Manual Per-Slot Configuration¶
Approach: Operator configures K per slot
ATLLConfig {
slot_k_max: vec![
1, // Slot 0: Known hot range
4, // Slot 1: Known cold range
2, // Slot 2: Warm range
...
],
}
Rejected because: - Requires workload knowledge: Operator must analyze access patterns - Static configuration: No adaptation to changes - Brittle: Wrong assumptions → poor performance - Complex: 64 slots × manual tuning = 64 knobs
Example:
Initial config: Slot 0 = K=1 (thought hot)
Workload shift:
Users stop accessing Slot 0
Slot 0 becomes cold
Problem: Still compacting to K=1 (wasted I/O)
Solution: Operator must notice and reconfigure
4. Time-Based Strategy (ScyllaDB ICS)¶
Approach: Recent data = leveled, old data = tiered
Slot 0 (last hour): K=1 (leveled)
Slot 1 (last 24h): K=2 (hybrid)
Slot 2 (last 30d): K=4 (tiered)
Slot 3 (older): K=8 (heavily tiered)
Rejected because: - Assumes time-series: Not general-purpose - Recent ≠ hot: Many workloads access old data frequently - Manual window config: Requires tuning - Key format dependency: Must extract timestamp from key
Example (counterexample):
E-commerce database:
Recent orders (last week): 10% of queries (not hot)
Historical orders (>1 year): 70% of queries (hot for analytics)
Time-based strategy:
Recent → K=1 (wasted I/O, rarely queried)
Old → K=8 (slow queries, frequently queried)
Result: Backwards from optimal
5. Fixed Thresholds (No Continuous Mapping)¶
Approach: Discrete heat buckets
Rejected because: - Cliff edges: Small heat change → large K change - Oscillation: Heat fluctuates around 0.8 → constant K changes - Inefficient: No gradations (only 3 strategies)
Example:
Slot heat: 0.79 → K=2
Write burst → heat: 0.81 → K=1 (trigger compaction)
Heat decays → heat: 0.79 → K=2 (allow 2nd run)
Write burst → heat: 0.81 → K=1 (trigger compaction again)
Problem: Thrashing between K=1 and K=2
Rationale¶
1. RUM Optimization Per Slot¶
RUM Conjecture: For any data structure, R × U × M ≥ Constant
ATLL's insight: Optimize per slot on the RUM Pareto frontier
Hot slot (heat_score = 0.9):
k_max = 1 + floor((1 - 0.9) × 3) = 1 + floor(0.3) = 1
RUM trade-off:
R (read cost): Low (RA = L0 + 1 = 7)
U (update cost): High (WA = 20x, frequent compaction)
M (memory cost): Low (no overlaps, compact storage)
Justification: Frequent reads benefit from low RA
Cold slot (heat_score = 0.1):
k_max = 1 + floor((1 - 0.1) × 3) = 1 + floor(2.7) = 3
RUM trade-off:
R (read cost): High (RA = L0 + 3 = 9)
U (update cost): Low (WA = 6x, rare compaction)
M (memory cost): Medium (some overlaps, 1.2x SA)
Justification: Rare reads tolerate higher RA, save WA
Continuous mapping: Smooth transition from leveled → tiered
2. Smooth Adaptation (No Cliff Edges)¶
Formula properties:
Gradient:
heat_score: 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0
k_max: 1 1 1 2 2 2 3 3 3 3 4
└───leveled───┘ └──hybrid──┘ └────tiered────┘
Benefit: Gradual transition, no oscillation
Example:
Slot heat evolves: 0.85 → 0.80 → 0.75 → 0.70
K-max evolution: 1 → 1 → 1 → 2
Result: K stays stable until significant heat drop
No thrashing
3. Mathematical Justification¶
Write Amplification (slot-level):
WA_slot = T × compaction_frequency
Where:
T = fanout (default 10)
compaction_frequency = f(k_max)
For k_max=1 (leveled):
Compact every L0 flush → high frequency
WA ≈ 10 × 2 = 20x
For k_max=4 (tiered):
Compact every 4 L0 flushes → low frequency
WA ≈ 10 × 0.5 = 5x
Read Amplification (slot-level):
RA_slot = L0_files + k_max
For k_max=1:
RA = 6 + 1 = 7 (fast)
For k_max=4:
RA = 6 + 4 = 10 (slower, but acceptable for cold data)
Space Amplification:
SA_slot = 1 + (k_max - 1) / T
For k_max=1:
SA = 1 + 0/10 = 1.0 (no overhead)
For k_max=4:
SA = 1 + 3/10 = 1.3 (30% overhead)
Trade-off: Hot slots pay WA for low RA, cold slots save WA, accept higher RA
4. Workload-Driven (Not Assumption-Driven)¶
Heat-based K adapts to actual access patterns:
Scenario 1: Recent orders hot
Orders (last 30d) → heat=0.9 → K=1 (fast reads)
Orders (>1 year) → heat=0.1 → K=4 (low WA)
Scenario 2: Historical analytics hot
Orders (last 30d) → heat=0.2 → K=3 (low WA, few writes)
Orders (>1 year) → heat=0.8 → K=1 (fast analytics queries)
Result: Adapts to both scenarios without manual tuning
Contrast with time-based:
Time-based assumes recent=hot, old=cold
→ Works for Scenario 1
→ Fails for Scenario 2
Heat-based measures actual access
→ Works for both scenarios
Trade-offs¶
What We Gained¶
1. Per-Range Optimization - Hot ranges: Low RA, fast queries - Cold ranges: Low WA, efficient writes - No global compromise
2. Automatic Adaptation - Heat increases → K decreases (toward leveled) - Heat decreases → K increases (toward tiered) - No manual intervention
3. Smooth Transitions - Continuous mapping (no cliff edges) - Stable K values (no thrashing) - Predictable behavior
4. Workload Agnostic - Works for time-series, key-value, analytics - No assumptions about access patterns - Self-tuning
What We Gave Up¶
1. Simplicity - More complex than global K=1 or K=4 - Requires heat tracking subsystem - Per-slot state management
2. Predictability - K changes over time (not static) - Harder to reason about worst-case - Variable compaction schedules
3. Control - Operator can't override K per slot (without code changes) - No manual "pin this slot to K=1" mode - Learning period before convergence
K-Max Evolution Examples¶
Example 1: Hot Slot (User Sessions)¶
Workload: 1000 queries/sec to active user sessions
Week 1 (new slot, no history):
heat_score = 0.0 (default)
k_max = 1 + floor((1 - 0.0) × 3) = 4 (tiered initially)
Day 2 (after 100K queries):
heat_score → 0.85 (EWMA converged)
k_max = 1 + floor((1 - 0.85) × 3) = 1 (leveled)
Week 2 (stable workload):
heat_score = 0.9 (stable)
k_max = 1 (stable)
RA = 7 (fast queries)
WA = 20x (acceptable for hot data)
Example 2: Cold Slot (Archived Data)¶
Workload: 10 queries/day to archived orders
Week 1 (new slot):
heat_score = 0.0
k_max = 4 (tiered)
Week 2 (10 queries over 7 days):
heat_score → 0.05 (very low)
k_max = 1 + floor((1 - 0.05) × 3) = 3 (tiered)
Month 1 (stable low access):
heat_score = 0.1 (stable)
k_max = 3 (stable)
RA = 9 (acceptable for rare queries)
WA = 6x (low I/O overhead)
Example 3: Workload Shift¶
Scenario: Cold slot becomes hot due to analytics job
Initial state (cold):
heat_score = 0.1
k_max = 3 (tiered)
Analytics job starts (1000 queries/sec):
Day 1: heat_score → 0.3, k_max = 3
Day 2: heat_score → 0.5, k_max = 2
Day 3: heat_score → 0.7, k_max = 2
Day 5: heat_score → 0.85, k_max = 1
Analytics job completes:
Week 2: heat_score → 0.6 (decay)
Week 3: heat_score → 0.3, k_max = 3
Week 4: heat_score → 0.1 (back to cold)
Result: Automatically adapted to temporary workload shift
Configuration¶
Default:
pub struct ATLLConfig {
/// Maximum K allowed for cold slots
/// Default: 4
/// Recommendation: 4-8 for most workloads
pub k_global: usize,
}
Tuning guidelines:
Write-Heavy Workloads¶
Effect: - Cold slots: k_max up to 8 (very tiered) - Lower WA: 4-6x (vs 6-8x with K=4) - Higher RA: 14 (vs 10, acceptable for cold data)
Read-Heavy Workloads¶
Effect: - Cold slots: k_max up to 2 (mild tiering) - Higher WA: 10-12x (vs 6-8x with K=4) - Lower RA: 8 (vs 10, faster cold reads)
Balanced (Default)¶
Effect: - Hot slots: k_max=1 (leveled) - Cold slots: k_max=3-4 (tiered) - WA: 8-20x, RA: 5-12
Validation¶
Benchmark results (Zipfian workload, 80/20 hot/cold):
Hot slots (20% of data, 80% of queries):
k_max converged to: 1
RA: 7
WA: 18x
p95 latency: 8.2ms
Cold slots (80% of data, 20% of queries):
k_max converged to: 3-4
RA: 9-10
WA: 6x
p95 latency: 15.3ms (acceptable for rare queries)
Overall:
Weighted WA: 0.8×18 + 0.2×6 = 15.6x (vs 40-100x for pure leveled)
Weighted RA: 0.8×7 + 0.2×9.5 = 7.5 (vs 10-15 for pure tiered)
Property test: K-max convergence
#[test]
fn test_k_max_convergence() {
let mut slot = Slot::new(0, vec![0x00], vec![0x40]);
// Simulate hot workload (1000 accesses)
for _ in 0..1000 {
slot.update_heat(1.0); // Access
}
// After convergence
assert!(slot.heat_score > 0.85);
assert_eq!(slot.k_max(), 1); // Leveled
// Simulate workload shift (no accesses for 1000 iterations)
for _ in 0..1000 {
slot.update_heat(0.0); // No access
}
// After decay
assert!(slot.heat_score < 0.15);
assert!(slot.k_max() >= 3); // Tiered
}
Summary¶
Adaptive K-way fanout per slot is the right choice because:
- RUM optimization per range - Hot slots minimize R, cold slots minimize U
- Automatic adaptation - Heat-driven, not manual configuration
- Smooth transitions - Continuous formula, no cliff edges
- Workload agnostic - Measures actual access, no assumptions
We accept these trade-offs: - Complexity (vs global K) - Variable K over time (vs static) - Learning period before convergence
For heterogeneous workloads (Zipfian, multi-tenant), adaptive K achieves near-Pareto-optimal performance without manual tuning.
Last Updated: 2025-10-31 See Also: Heat Tracking, ATLL Architecture