home /
writings /
thoughts /
courses /
research /
projects /
Back to writings

Caching and Their Issues at Scale

Jan 4, 2026 | 11 min read

tl;dr: Multi-layer caching architectures, invalidation strategies, cache stampede mitigation, distributed consistency challenges, and production patterns for scaling cache systems.

Caching is one of the most effective performance optimizations—until it isn’t. At scale, caches introduce complexity, consistency challenges, and failure modes that can bring down your system.

Here’s what you need to know beyond “add Redis.”

The Multi-Layer Cache Architecture

Effective caching uses multiple layers, each with different characteristics:

Multi-Layer Cache Architecture

graph TD
    Request[Request] --> L1C(L1 In-Memory<br/>Latency under 1ms<br/>Hit Rate 60-80%)
    L1C -->|Hit| ReturnA[Return]
    L1C -->|Miss| L2C(L2 Distributed<br/>Redis, Memcached<br/>Latency 1-5ms<br/>Hit Rate 40-60%)
    L2C -->|Hit| L1C
    L2C -->|Miss| DB[(Database<br/>Latency 10-100ms<br/>Hit Rate 5-20%)]
    DB --> L2C
    L2C --> L1C
    L1C --> ReturnB[Return]

Layer 1: In-Memory (L1)

  • Location: Process memory
  • Latency: <1ms
  • Size: Limited (MBs)
  • Hit Rate: 60-80%
  • Use Case: Frequently accessed, small data

Layer 2: Distributed (L2)

  • Location: Redis, Memcached
  • Latency: 1-5ms
  • Size: Large (GBs)
  • Hit Rate: 40-60%
  • Use Case: Shared across instances, medium-term storage

Layer 3: Database

  • Location: Persistent storage
  • Latency: 10-100ms
  • Size: Unlimited
  • Hit Rate: 5-20% (cache misses)
  • Use Case: Source of truth, long-term storage

Implementation Pattern

class MultiLayerCache {
  async get(key) {
    // L1: Check in-memory first
    const l1Value = this.l1Cache.get(key);
    if (l1Value) {
      return l1Value;
    }
    
    // L2: Check distributed cache
    const l2Value = await this.l2Cache.get(key);
    if (l2Value) {
      // Populate L1
      this.l1Cache.set(key, l2Value, { ttl: 60 });
      return l2Value;
    }
    
    // L3: Hit database
    const dbValue = await this.database.get(key);
    
    // Populate both caches
    await this.l2Cache.set(key, dbValue, { ttl: 300 });
    this.l1Cache.set(key, dbValue, { ttl: 60 });
    
    return dbValue;
  }
}

Key insight: Each layer reduces load on the layer below. A 90% L1 hit rate means only 10% of requests check L2. A 90% L2 hit rate means only 1% hit the database.

Cache Invalidation Strategies

Cache invalidation is one of the hardest problems in computer science. Here are the main strategies:

Cache Invalidation Strategies

1. TTL (Time To Live): Simple, automatic, but stale data possible
2. Write-Through: Always consistent, but slower writes
3. Write-Behind: Fast writes, but risk of data loss
4. Invalidation: Always fresh, but complex to track

1. TTL (Time To Live)

The simplest approach: expire cache entries after a fixed time.

await cache.set('user:123', userData, { ttl: 300 }); // 5 minutes

Pros:

  • Simple to implement
  • Automatic cleanup
  • No invalidation logic needed

Cons:

  • Stale data possible
  • Wasted cache space (expired but not yet evicted)
  • No control over when data becomes stale

Use when: Data changes infrequently, staleness is acceptable.

2. Write-Through

Update both cache and database synchronously.

async function updateUser(userId, data) {
  // Write to database first
  await db.update('users', userId, data);
  
  // Then update cache
  await cache.set(`user:${userId}`, data);
  
  return data;
}

Pros:

  • Always consistent
  • Cache always has latest data
  • Simple to reason about

Cons:

  • Slower writes (two operations)
  • Cache write can fail independently
  • More complex error handling

Use when: Consistency is critical, write performance is less important.

3. Write-Behind (Write-Back)

Update cache immediately, write to database asynchronously.

async function updateUser(userId, data) {
  // Update cache immediately
  await cache.set(`user:${userId}`, data);
  
  // Write to database asynchronously
  this.writeQueue.enqueue({
    type: 'update',
    table: 'users',
    id: userId,
    data
  });
  
  return data;
}

Pros:

  • Fast writes (cache update only)
  • Better user experience
  • Can batch database writes

Cons:

  • Risk of data loss (cache lost before DB write)
  • Eventual consistency
  • Complex failure handling

Use when: Write performance is critical, some data loss is acceptable.

4. Invalidation

Delete cache entries when data changes.

async function updateUser(userId, data) {
  // Update database
  await db.update('users', userId, data);
  
  // Invalidate cache
  await cache.delete(`user:${userId}`);
  
  // Next read will fetch from DB and repopulate cache
}

Pros:

  • Always fresh data on next read
  • No stale data risk
  • Simple concept

Cons:

  • Complex to track what to invalidate
  • Cache misses after invalidation
  • Need to know all cache keys

Use when: You can track dependencies, want fresh data.

Choosing a Strategy

The right strategy depends on your requirements:

StrategyConsistencyPerformanceComplexityBest For
TTLLowHighLowRead-heavy, acceptable staleness
Write-ThroughHighMediumMediumCritical data, consistency required
Write-BehindLowHighHighWrite-heavy, performance critical
InvalidationHighMediumHighNeed fresh data, can track keys

Hybrid approach: Use different strategies for different data types:

const strategies = {
  userProfile: 'write-through',  // Critical, must be consistent
  analytics: 'ttl',              // Can be stale, high read volume
  session: 'write-behind',      // Performance critical, can lose some
  productCatalog: 'invalidation' // Need fresh, can track changes
};

Cache Stampede (Thundering Herd)

When a cache entry expires, multiple requests can simultaneously try to refresh it, overwhelming the database.

Cache Stampede Problem

Scenario:

  1. Cache expires at T0
  2. 1000 requests arrive at T1
  3. All 1000 requests see cache miss
  4. All 1000 requests query database simultaneously
  5. Database is overwhelmed

Solution: Probabilistic early expiration + mutex/lock

  • Refresh cache before it expires (last 20% of TTL)
  • Only one request fetches on cache miss (mutex)
  • Other requests wait and retry

The Problem

// BAD: All requests hit DB when cache expires
async function getUser(userId) {
  let user = await cache.get(`user:${userId}`);
  
  if (!user) {
    // Cache expired - all requests hit DB simultaneously!
    user = await db.getUser(userId);
    await cache.set(`user:${userId}`, user, { ttl: 300 });
  }
  
  return user;
}

What happens:

  1. Cache expires at T0
  2. 1000 requests arrive at T1
  3. All 1000 requests see cache miss
  4. All 1000 requests query database
  5. Database is overwhelmed

Solution: Probabilistic Early Expiration

Refresh cache before it expires, but only probabilistically:

class CacheWithStampedeProtection {
  async get(key) {
    const cached = await this.cache.get(key);
    
    if (cached) {
      // Check if we should refresh early
      const age = Date.now() - cached.timestamp;
      const ttl = cached.ttl;
      const refreshWindow = ttl * 0.2; // Last 20% of TTL
      
      if (age > ttl - refreshWindow) {
        // Probabilistic refresh: 10% chance
        if (Math.random() < 0.1) {
          // Refresh in background (don't block)
          this.refreshInBackground(key);
        }
      }
      
      return cached.value;
    }
    
    // Cache miss - use mutex to prevent stampede
    return await this.getWithMutex(key);
  }
  
  async getWithMutex(key) {
    // Try to acquire lock
    const lock = await this.lock.acquire(`lock:${key}`, 1000);
    
    if (lock) {
      try {
        // Double-check cache (another request might have populated it)
        let value = await this.cache.get(key);
        if (value) {
          return value.value;
        }
        
        // Fetch from database
        value = await this.database.get(key);
        
        // Populate cache
        await this.cache.set(key, {
          value,
          timestamp: Date.now(),
          ttl: 300000
        }, { ttl: 300 });
        
        return value;
      } finally {
        await lock.release();
      }
    } else {
      // Another request is fetching - wait and retry
      await this.sleep(50);
      return await this.get(key);
    }
  }
  
  async refreshInBackground(key) {
    // Don't block - refresh asynchronously
    setImmediate(async () => {
      try {
        const value = await this.database.get(key);
        await this.cache.set(key, {
          value,
          timestamp: Date.now(),
          ttl: 300000
        }, { ttl: 300 });
      } catch (error) {
        // Ignore errors - cache still valid
      }
    });
  }
}

Key techniques:

  1. Early refresh: Refresh before expiration (last 20% of TTL)
  2. Probabilistic: Only some requests refresh (10% chance)
  3. Mutex/lock: Only one request fetches on cache miss
  4. Double-check: Verify cache after acquiring lock

Distributed Cache Consistency

In a distributed system with multiple cache nodes, keeping them consistent is challenging.

Distributed Cache Consistency

Problem: Multiple cache nodes can have different versions of the same data.

Solutions:

  1. Write-Through to All Nodes: Update all nodes synchronously (simple, but slow)
  2. Event Bus for Invalidation: Publish invalidation events (decoupled, but eventual consistency)
  3. Versioning: Compare versions, reject stale reads (handles stale data gracefully)
  4. Accept Eventual Consistency: Use where acceptable, strong consistency where required

The Problem

When you have multiple cache instances:

// Node 1 has: user:123 = "Alice" (version 1)
// Node 2 has: user:123 = "Bob" (version 0) - stale!
// Node 3 has: user:123 = "Alice" (version 1)

// User reads from Node 2 → gets stale data

Solutions

1. Write-Through to All Nodes

Update all cache nodes synchronously:

async function updateUser(userId, data) {
  await db.update('users', userId, data);
  
  // Update all cache nodes
  await Promise.all([
    cacheNode1.set(`user:${userId}`, data),
    cacheNode2.set(`user:${userId}`, data),
    cacheNode3.set(`user:${userId}`, data)
  ]);
}

Pros: Simple, consistent Cons: Slow (all nodes), one failure affects all

2. Event Bus for Invalidation

Publish invalidation events:

async function updateUser(userId, data) {
  await db.update('users', userId, data);
  
  // Publish invalidation event
  await eventBus.publish('user.updated', {
    userId,
    timestamp: Date.now()
  });
}

// Each cache node subscribes
eventBus.subscribe('user.updated', async (event) => {
  await cache.delete(`user:${event.userId}`);
});

Pros: Decoupled, scalable Cons: Eventual consistency, event delivery not guaranteed

3. Versioning

Include version numbers, reject stale reads:

async function get(key) {
  const cached = await cache.get(key);
  const dbValue = await database.get(key);
  
  if (cached && cached.version >= dbValue.version) {
    return cached.value; // Cache is fresh or newer
  }
  
  // Cache is stale, use DB value
  await cache.set(key, {
    value: dbValue.value,
    version: dbValue.version
  });
  
  return dbValue.value;
}

Pros: Handles stale data gracefully Cons: More complex, version tracking needed

4. Accept Eventual Consistency

Sometimes, eventual consistency is acceptable:

// User profile can be slightly stale
// Product catalog can be slightly stale
// Analytics can be stale

// But payment data must be consistent

Key insight: Not all data needs strong consistency. Use eventual consistency where acceptable, strong consistency where required.

Cache Warming

Cold starts are a problem: when a cache is empty (after deployment, restart, or eviction), the first requests are slow.

Cache Warming Strategies

1. Pre-Deployment Warming: Load data before deployment (most effective, requires planning)
2. Lazy Loading: Load on first request, refresh in background (simple, but first user slow)
3. Predictive Warming: ML-based prediction of likely requests (optimal, but complex)
4. Hybrid Approach: Pre-warm critical data, lazy load rest (balanced, practical)

Strategies

1. Pre-Deployment Warming

Load cache before deployment:

class CacheWarmer {
  async warmCache() {
    // Load popular items
    const popularUsers = await db.query(`
      SELECT * FROM users 
      ORDER BY last_login DESC 
      LIMIT 1000
    `);
    
    for (const user of popularUsers) {
      await cache.set(`user:${user.id}`, user, { ttl: 3600 });
    }
    
    // Load frequently accessed data
    const popularProducts = await db.query(`
      SELECT * FROM products 
      WHERE views > 1000
    `);
    
    for (const product of popularProducts) {
      await cache.set(`product:${product.id}`, product, { ttl: 1800 });
    }
  }
}

2. Lazy Loading with Background Refresh

Load on first request, refresh in background:

async function get(key) {
  let value = await cache.get(key);
  
  if (!value) {
    // Cache miss - fetch from DB
    value = await database.get(key);
    await cache.set(key, value, { ttl: 300 });
    
    // Schedule background refresh before expiry
    scheduleRefresh(key, 240); // Refresh at 80% of TTL
  }
  
  return value;
}

3. Predictive Warming

Use ML or patterns to predict what will be needed:

class PredictiveWarmer {
  async warmBasedOnTime() {
    const hour = new Date().getHours();
    
    // Morning: warm up daily reports
    if (hour >= 8 && hour < 10) {
      await this.warmDailyReports();
    }
    
    // Evening: warm up analytics
    if (hour >= 18 && hour < 20) {
      await this.warmAnalytics();
    }
  }
  
  async warmBasedOnPatterns() {
    // Analyze access patterns
    const patterns = await this.analyzeAccessPatterns();
    
    // Warm likely-to-be-accessed items
    for (const pattern of patterns) {
      await this.warmPattern(pattern);
    }
  }
}

4. Hybrid Approach

Combine strategies:

class HybridWarmer {
  async warm() {
    // 1. Pre-warm critical data
    await this.warmCriticalData();
    
    // 2. Lazy load everything else
    // (handled in get() method)
    
    // 3. Predictive warm based on time/patterns
    await this.warmPredictive();
  }
}

Memory Pressure and Eviction

When cache is full, you need eviction policies:

LRU (Least Recently Used):

  • Evict items not accessed recently
  • Good for: Temporal locality

LFU (Least Frequently Used):

  • Evict items accessed least often
  • Good for: Long-term popularity

TTL-based:

  • Evict expired items
  • Good for: Time-sensitive data

Random:

  • Evict random items
  • Good for: Simple, no overhead
class LRUCache {
  constructor(maxSize) {
    this.maxSize = maxSize;
    this.cache = new Map(); // Ordered by insertion
  }
  
  get(key) {
    if (this.cache.has(key)) {
      // Move to end (most recently used)
      const value = this.cache.get(key);
      this.cache.delete(key);
      this.cache.set(key, value);
      return value;
    }
    return null;
  }
  
  set(key, value) {
    if (this.cache.has(key)) {
      // Update existing
      this.cache.delete(key);
    } else if (this.cache.size >= this.maxSize) {
      // Evict least recently used (first item)
      const firstKey = this.cache.keys().next().value;
      this.cache.delete(firstKey);
    }
    
    this.cache.set(key, value);
  }
}

Real-World Scaling Patterns

Pattern 1: Cache-Aside (Lazy Loading)

Application manages cache:

async function getUser(userId) {
  // Check cache
  let user = await cache.get(`user:${userId}`);
  
  if (!user) {
    // Cache miss - fetch from DB
    user = await db.getUser(userId);
    
    // Populate cache
    await cache.set(`user:${userId}`, user, { ttl: 300 });
  }
  
  return user;
}

Pattern 2: Read-Through

Cache handles DB access:

// Cache automatically fetches from DB on miss
const user = await cache.get(`user:${userId}`);
// Cache handles DB lookup internally

Pattern 3: Write-Through

Cache writes to both cache and DB:

await cache.set(`user:${userId}`, userData);
// Cache automatically writes to DB

Key Takeaways

  1. Use multiple layers: L1 (memory) → L2 (distributed) → L3 (database)
  2. Choose invalidation strategy: Based on consistency vs. performance needs
  3. Prevent cache stampede: Probabilistic early expiration, mutexes
  4. Handle distributed consistency: Event bus, versioning, or accept eventual consistency
  5. Warm your cache: Pre-deployment, lazy loading, or predictive
  6. Monitor cache metrics: Hit rate, latency, memory usage, eviction rate

Caching is powerful, but it adds complexity. The best cache implementations are invisible—they just make everything faster without introducing bugs.

Monitor your cache hit rates. If they’re below 80%, you’re not caching effectively. If they’re above 95%, you might be over-caching.

Scaling your cache architecture? I provide performance reviews, cache strategy design, and production-ready patterns for high-scale systems. Let's discuss your setup.

P.S. Follow me on Twitter where I share engineering insights, system design patterns, and technical leadership perspectives.

Enjoyed this? Support my work

Buy me a coffee