Skip to content

Heat Tracking (EWMA)

Why we use Exponential Weighted Moving Average (α=0.1) for access pattern detection instead of counters or sliding windows.


Decision

Track access frequency per slot using EWMA (Exponential Weighted Moving Average) with α=0.1:

pub struct Slot {
    heat_score: f32,  // ∈ [0.0, 1.0], 0 = cold, 1 = hot
}

impl Slot {
    pub fn update_heat(&mut self) {
        const ALPHA: f32 = 0.1;  // Smoothing factor
        self.heat_score = ALPHA * 1.0 + (1.0 - ALPHA) * self.heat_score;
    }

    pub fn decay_heat(&mut self) {
        const ALPHA: f32 = 0.1;
        self.heat_score = ALPHA * 0.0 + (1.0 - ALPHA) * self.heat_score;
    }
}

Update trigger: Every read/write to a slot calls update_heat()

Decay trigger: Periodic background task (every 60 seconds) calls decay_heat() for all slots


Alternatives Considered

1. Simple Access Counters

Approach: Count accesses per slot

pub struct Slot {
    access_count: u64,  // Incremented on every access
}

fn is_hot(slot: &Slot) -> bool {
    slot.access_count > THRESHOLD  // e.g., 1000
}

Rejected because: - No time decay: Old accesses count forever - Unbounded growth: Counter overflows after 2^64 accesses - Threshold tuning: What is "hot"? 1000? 10000? 1M? - No recency bias: Access from 1 year ago = access from 1 second ago

Example problem:

Slot was hot (10M accesses) in 2023
Slot is cold (0 accesses) in 2024

Counter value: 10,000,000 (still looks hot!)
Reality: Cold (no recent accesses)

2. Sliding Window

Approach: Count accesses in last N seconds

pub struct Slot {
    access_times: VecDeque<Instant>,  // Queue of access timestamps
}

impl Slot {
    pub fn update_heat(&mut self, now: Instant) {
        // Add current access
        self.access_times.push_back(now);

        // Remove accesses older than 60 seconds
        while let Some(t) = self.access_times.front() {
            if now - *t > Duration::from_secs(60) {
                self.access_times.pop_front();
            } else {
                break;
            }
        }
    }

    pub fn heat_score(&self) -> f32 {
        self.access_times.len() as f32 / WINDOW_SIZE
    }
}

Rejected because: - Memory overhead: Store timestamp per access (8 bytes × accesses/sec × window) - Cliff edge: Access from 59.9s ago counts, 60.1s ago doesn't - Expensive cleanup: Must scan queue on every access - Poor cache locality: VecDeque allocations

Example overhead:

Hot slot: 1000 accesses/sec
Window: 60 seconds

Memory per slot:
  1000 × 60 × 8 bytes = 480 KB

For 64 slots:
  64 × 480 KB = 30 MB (just for timestamps!)

vs EWMA:
  64 slots × 4 bytes = 256 bytes

3. LRU/LFU Tracking

Approach: Track least-recently-used or least-frequently-used

pub struct Slot {
    last_access: Instant,   // LRU
    access_count: u64,      // LFU
}

fn is_hot_lru(slot: &Slot, now: Instant) -> bool {
    now - slot.last_access < Duration::from_secs(60)
}

fn is_hot_lfu(slot: &Slot) -> bool {
    slot.access_count > THRESHOLD
}

Rejected because: - LRU: Binary (accessed recently or not), no gradations - LFU: Same issues as simple counters (no time decay) - Both: Don't capture "hotness" over time - Tuning: Requires threshold configuration

4. Histogram / Bucket Counting

Approach: Track accesses in time buckets

pub struct Slot {
    buckets: [u64; 60],  // 60 buckets (1 per second)
    current_bucket: usize,
}

impl Slot {
    pub fn update_heat(&mut self, now: Instant) {
        let bucket = (now.as_secs() % 60) as usize;

        // Rotate to new bucket if needed
        if bucket != self.current_bucket {
            self.buckets[bucket] = 0;
            self.current_bucket = bucket;
        }

        // Increment current bucket
        self.buckets[bucket] += 1;
    }

    pub fn heat_score(&self) -> f32 {
        let total: u64 = self.buckets.iter().sum();
        total as f32 / (60 * MAX_ACCESSES_PER_SEC)
    }
}

Rejected because: - Fixed granularity: 1-second buckets (too coarse or too fine) - Cliff edges: Access rotates out after exactly 60 seconds - Memory overhead: 60 × 8 bytes = 480 bytes per slot - Complexity: Bucket rotation logic

5. Multi-Dimensional Heat

Approach: Separate heat for reads, writes, scans

pub struct Slot {
    read_heat: f32,   // Read access frequency
    write_heat: f32,  // Write access frequency
    scan_heat: f32,   // Range scan frequency
}

Rejected because: - Complexity: 3× state, 3× decay logic - K-max formula: How to combine? k_max = f(read_heat, write_heat, scan_heat)? - Unclear benefit: Hot reads/writes both benefit from K=1 - Future work: May revisit for specialized optimizations


Rationale

1. Recency Bias

EWMA gives more weight to recent accesses:

heat_new = 0.1 × 1.0 + 0.9 × heat_old
           └─recent─┘   └──historical──┘

Decay over time:

No access for t steps:
  heat_t = 0.9^t × heat_0

Examples:
  After 10 steps: heat = 0.9^10 × heat_0 = 0.35 × heat_0 (35% remaining)
  After 20 steps: heat = 0.9^20 × heat_0 = 0.12 × heat_0 (12% remaining)
  After 30 steps: heat = 0.9^30 × heat_0 = 0.04 × heat_0 (4% remaining)

Benefit: Adapts to workload shifts

Example:

Slot was hot (heat=0.9) last week
Slot is now cold (no accesses this week)

Counter approach: Still looks hot (10M count)
EWMA approach: heat decays to 0.04 after 30 intervals

2. Convergence Properties

EWMA converges to steady state:

For stable access pattern (access every step):
  heat_∞ = lim(t→∞) heat_t
         = α / (1 - (1-α))
         = 0.1 / 0.1
         = 1.0

For stable non-access (no accesses):
  heat_∞ = 0.0

Convergence rate:
  error_t = (1-α)^t × initial_error
  error_30 = 0.9^30 × 1.0 ≈ 0.04 (4% error)

Conclusion: Converges within 20-30 steps

Validation:

#[test]
fn test_ewma_convergence() {
    let mut slot = Slot::new();

    // Access for 30 steps
    for _ in 0..30 {
        slot.update_heat();
    }

    assert!(slot.heat_score > 0.96);  // 96% of steady state

    // No access for 30 steps
    for _ in 0..30 {
        slot.decay_heat();
    }

    assert!(slot.heat_score < 0.04);  // 4% of previous
}

3. Low Memory Overhead

EWMA state:

pub struct Slot {
    heat_score: f32,  // 4 bytes
}

Total memory (64 slots):

64 slots × 4 bytes = 256 bytes

Comparison:

EWMA:              256 bytes
Sliding window:    30 MB (1000 accesses/sec × 60 sec × 8 bytes × 64 slots)
Histogram:         30 KB (60 buckets × 8 bytes × 64 slots)

EWMA wins by 100-100,000×

4. Simple Implementation

Update logic:

pub fn update_heat(&mut self) {
    self.heat_score = 0.1 * 1.0 + 0.9 * self.heat_score;
}

Complexity: 2 multiplications, 1 addition (~5 CPU cycles)

Contrast with sliding window:

pub fn update_heat(&mut self, now: Instant) {
    self.access_times.push_back(now);  // Allocation

    // Scan and remove old accesses
    while let Some(t) = self.access_times.front() {
        if now - *t > Duration::from_secs(60) {
            self.access_times.pop_front();  // Deallocation
        } else {
            break;
        }
    }
}

Complexity: O(N) scan, allocations, pointer chasing


Mathematical Analysis

EWMA Formula

Recursive form:

heat_t = α × access_t + (1-α) × heat_{t-1}

Where:
  heat_t     = heat score at time t
  α = 0.1    = smoothing factor (weight of recent observation)
  access_t   = 1 if accessed at time t, 0 otherwise
  heat_{t-1} = previous heat score

Expanded form (n steps of constant access):

heat_n = α × (1 + (1-α) + (1-α)^2 + ... + (1-α)^{n-1})

Geometric series:
  sum = α × (1 - (1-α)^n) / (1 - (1-α))
      = α × (1 - (1-α)^n) / α
      = 1 - (1-α)^n

As n → ∞:
  heat_∞ = 1.0

Decay (n steps of no access):

heat_n = (1-α)^n × heat_0

Examples (α=0.1):
  n=10: heat = 0.9^10 × heat_0 = 0.349 × heat_0
  n=20: heat = 0.9^20 × heat_0 = 0.122 × heat_0
  n=30: heat = 0.9^30 × heat_0 = 0.042 × heat_0

Half-Life

Time for heat to decay to 50%:

(1-α)^n = 0.5

n = log(0.5) / log(1-α)
  = log(0.5) / log(0.9)
  = -0.693 / -0.105
  ≈ 6.6 steps

Interpretation: Heat halves every ~7 accesses

Practical implications:

If slot accessed every 1 second:
  Half-life: 7 seconds

If slot accessed every 60 seconds:
  Half-life: 7 minutes

If slot accessed every 3600 seconds (1 hour):
  Half-life: 7 hours

Smoothing Effect

Bursty workload:

Access pattern: 100 accesses in 1 second, then 59 seconds idle

Counter approach:
  Count=100 (looks very hot)

EWMA approach:
  After 100 accesses: heat ≈ 1.0 (hot)
  After 59 idle steps: heat ≈ 0.002 (cold)

Result: EWMA filters out burst, shows true average


Alpha (α) Tuning

Trade-off: α controls recency bias vs stability

Large α (0.3 - 0.5)

Effect: - More responsive: Adapts quickly to changes - Less stable: Fluctuates with short-term noise - Shorter memory: Forgets old accesses faster

Use case: Rapidly changing workloads (dev/test environments)

Example (α=0.5):

Convergence: 5-10 steps (fast)
Half-life: 1 step (very short memory)
Stability: Low (noisy)

Small α (0.01 - 0.05)

Effect: - Less responsive: Slow to adapt - More stable: Filters out noise - Longer memory: Retains old accesses longer

Use case: Stable production workloads

Example (α=0.01):

Convergence: 300 steps (slow)
Half-life: 69 steps (long memory)
Stability: High (smooth)

Default α = 0.1

Rationale: Balanced trade-off

Convergence: 20-30 steps (moderate)
Half-life: 7 steps (reasonable memory)
Stability: Good (filters noise, responds to trends)

Validation: Tested with Zipfian workloads, α=0.1 converged within 30 accesses


Decay Strategy

Problem: Slots with no recent accesses should cool down

Solution: Periodic background decay

pub async fn decay_heat_task(&self) {
    loop {
        tokio::time::sleep(Duration::from_secs(60)).await;

        for slot in &mut self.slots {
            slot.decay_heat();
        }
    }
}

impl Slot {
    pub fn decay_heat(&mut self) {
        const ALPHA: f32 = 0.1;
        self.heat_score = ALPHA * 0.0 + (1.0 - ALPHA) * self.heat_score;
        // Equivalent to: self.heat_score *= 0.9
    }
}

Frequency: Every 60 seconds (configurable)

Effect:

Slot with no accesses for 10 minutes (10 decay cycles):
  heat_new = 0.9^10 × heat_old
           = 0.349 × heat_old

Example:
  Initial heat: 0.9 (hot)
  After 10 min: 0.31 (warm)
  After 20 min: 0.11 (cold)
  After 30 min: 0.04 (very cold)


Configuration

pub struct ATLLConfig {
    /// EWMA smoothing factor (α)
    /// Default: 0.1
    /// Range: 0.01 - 0.5
    /// Higher = more responsive, less stable
    pub heat_alpha: f32,

    /// Heat decay interval (seconds)
    /// Default: 60
    /// How often to decay heat for idle slots
    pub heat_decay_interval_secs: u64,
}

Tuning guidelines:

Rapidly changing workloads:

ATLLConfig {
    heat_alpha: 0.2,                  // Faster adaptation
    heat_decay_interval_secs: 30,    // More frequent decay
    ..Default::default()
}

Stable workloads:

ATLLConfig {
    heat_alpha: 0.05,                // Smoother, slower adaptation
    heat_decay_interval_secs: 120,   // Less frequent decay
    ..Default::default()
}


Future Enhancements

1. Adaptive Alpha

Idea: Adjust α based on workload variance

pub fn adaptive_alpha(&self) -> f32 {
    let variance = self.compute_heat_variance();

    if variance > 0.5 {
        0.2  // High variance → more responsive
    } else {
        0.05  // Low variance → more stable
    }
}

2. Multi-Dimensional Heat

Idea: Track read vs write heat separately

pub struct Slot {
    read_heat: f32,   // For query-heavy ranges
    write_heat: f32,  // For write-heavy ranges
}

pub fn k_max(&self) -> usize {
    // Prioritize read heat for K-max decision
    1 + floor((1.0 - self.read_heat) × (K_global - 1))
}

3. Per-Access-Type Weighting

Idea: Weight different access types differently

pub fn update_heat(&mut self, access_type: AccessType) {
    let weight = match access_type {
        AccessType::PointRead => 1.0,   // Standard weight
        AccessType::RangeScan => 0.5,   // Lower weight (less selective)
        AccessType::Write => 0.3,       // Even lower (benefits from tiering)
    };

    self.heat_score = ALPHA * weight + (1.0 - ALPHA) * self.heat_score;
}

Summary

EWMA heat tracking with α=0.1 is the right choice because:

  1. Recency bias - Recent accesses weigh more than old
  2. Convergence - Stabilizes within 20-30 accesses
  3. Low overhead - 4 bytes per slot (vs MB for sliding window)
  4. Simple - 2 multiplications, 1 addition per update
  5. Proven - Standard technique in signal processing, networking

We accept these trade-offs: - Not instant (convergence period) - Fixed α (no per-slot tuning without code changes) - Single-dimensional (read/write/scan treated equally)

For most workloads, α=0.1 with 60-second decay provides the right balance of responsiveness and stability.


Last Updated: 2025-10-31 See Also: Adaptive K-Way Fanout, Bandit Scheduler