Caching and Their Issues at Scale
| 11 min read
tl;dr: Multi-layer caching architectures, invalidation strategies, cache stampede mitigation, distributed consistency challenges, and production patterns for scaling cache systems.
Caching is one of the most effective performance optimizations—until it isn’t. At scale, caches introduce complexity, consistency challenges, and failure modes that can bring down your system.
Here’s what you need to know beyond “add Redis.”
The Multi-Layer Cache Architecture
Effective caching uses multiple layers, each with different characteristics:
Multi-Layer Cache Architecture
graph TD
Request[Request] --> L1C(L1 In-Memory<br/>Latency under 1ms<br/>Hit Rate 60-80%)
L1C -->|Hit| ReturnA[Return]
L1C -->|Miss| L2C(L2 Distributed<br/>Redis, Memcached<br/>Latency 1-5ms<br/>Hit Rate 40-60%)
L2C -->|Hit| L1C
L2C -->|Miss| DB[(Database<br/>Latency 10-100ms<br/>Hit Rate 5-20%)]
DB --> L2C
L2C --> L1C
L1C --> ReturnB[Return]
Layer 1: In-Memory (L1)
- Location: Process memory
- Latency: <1ms
- Size: Limited (MBs)
- Hit Rate: 60-80%
- Use Case: Frequently accessed, small data
Layer 2: Distributed (L2)
- Location: Redis, Memcached
- Latency: 1-5ms
- Size: Large (GBs)
- Hit Rate: 40-60%
- Use Case: Shared across instances, medium-term storage
Layer 3: Database
- Location: Persistent storage
- Latency: 10-100ms
- Size: Unlimited
- Hit Rate: 5-20% (cache misses)
- Use Case: Source of truth, long-term storage
Implementation Pattern
class MultiLayerCache {
async get(key) {
// L1: Check in-memory first
const l1Value = this.l1Cache.get(key);
if (l1Value) {
return l1Value;
}
// L2: Check distributed cache
const l2Value = await this.l2Cache.get(key);
if (l2Value) {
// Populate L1
this.l1Cache.set(key, l2Value, { ttl: 60 });
return l2Value;
}
// L3: Hit database
const dbValue = await this.database.get(key);
// Populate both caches
await this.l2Cache.set(key, dbValue, { ttl: 300 });
this.l1Cache.set(key, dbValue, { ttl: 60 });
return dbValue;
}
}
Key insight: Each layer reduces load on the layer below. A 90% L1 hit rate means only 10% of requests check L2. A 90% L2 hit rate means only 1% hit the database.
Cache Invalidation Strategies
Cache invalidation is one of the hardest problems in computer science. Here are the main strategies:
Cache Invalidation Strategies
1. TTL (Time To Live): Simple, automatic, but stale data possible
2. Write-Through: Always consistent, but slower writes
3. Write-Behind: Fast writes, but risk of data loss
4. Invalidation: Always fresh, but complex to track
1. TTL (Time To Live)
The simplest approach: expire cache entries after a fixed time.
await cache.set('user:123', userData, { ttl: 300 }); // 5 minutes
Pros:
- Simple to implement
- Automatic cleanup
- No invalidation logic needed
Cons:
- Stale data possible
- Wasted cache space (expired but not yet evicted)
- No control over when data becomes stale
Use when: Data changes infrequently, staleness is acceptable.
2. Write-Through
Update both cache and database synchronously.
async function updateUser(userId, data) {
// Write to database first
await db.update('users', userId, data);
// Then update cache
await cache.set(`user:${userId}`, data);
return data;
}
Pros:
- Always consistent
- Cache always has latest data
- Simple to reason about
Cons:
- Slower writes (two operations)
- Cache write can fail independently
- More complex error handling
Use when: Consistency is critical, write performance is less important.
3. Write-Behind (Write-Back)
Update cache immediately, write to database asynchronously.
async function updateUser(userId, data) {
// Update cache immediately
await cache.set(`user:${userId}`, data);
// Write to database asynchronously
this.writeQueue.enqueue({
type: 'update',
table: 'users',
id: userId,
data
});
return data;
}
Pros:
- Fast writes (cache update only)
- Better user experience
- Can batch database writes
Cons:
- Risk of data loss (cache lost before DB write)
- Eventual consistency
- Complex failure handling
Use when: Write performance is critical, some data loss is acceptable.
4. Invalidation
Delete cache entries when data changes.
async function updateUser(userId, data) {
// Update database
await db.update('users', userId, data);
// Invalidate cache
await cache.delete(`user:${userId}`);
// Next read will fetch from DB and repopulate cache
}
Pros:
- Always fresh data on next read
- No stale data risk
- Simple concept
Cons:
- Complex to track what to invalidate
- Cache misses after invalidation
- Need to know all cache keys
Use when: You can track dependencies, want fresh data.
Choosing a Strategy
The right strategy depends on your requirements:
| Strategy | Consistency | Performance | Complexity | Best For |
|---|---|---|---|---|
| TTL | Low | High | Low | Read-heavy, acceptable staleness |
| Write-Through | High | Medium | Medium | Critical data, consistency required |
| Write-Behind | Low | High | High | Write-heavy, performance critical |
| Invalidation | High | Medium | High | Need fresh data, can track keys |
Hybrid approach: Use different strategies for different data types:
const strategies = {
userProfile: 'write-through', // Critical, must be consistent
analytics: 'ttl', // Can be stale, high read volume
session: 'write-behind', // Performance critical, can lose some
productCatalog: 'invalidation' // Need fresh, can track changes
};
Cache Stampede (Thundering Herd)
When a cache entry expires, multiple requests can simultaneously try to refresh it, overwhelming the database.
Cache Stampede Problem
Scenario:
- Cache expires at T0
- 1000 requests arrive at T1
- All 1000 requests see cache miss
- All 1000 requests query database simultaneously
- Database is overwhelmed
Solution: Probabilistic early expiration + mutex/lock
- Refresh cache before it expires (last 20% of TTL)
- Only one request fetches on cache miss (mutex)
- Other requests wait and retry
The Problem
// BAD: All requests hit DB when cache expires
async function getUser(userId) {
let user = await cache.get(`user:${userId}`);
if (!user) {
// Cache expired - all requests hit DB simultaneously!
user = await db.getUser(userId);
await cache.set(`user:${userId}`, user, { ttl: 300 });
}
return user;
}
What happens:
- Cache expires at T0
- 1000 requests arrive at T1
- All 1000 requests see cache miss
- All 1000 requests query database
- Database is overwhelmed
Solution: Probabilistic Early Expiration
Refresh cache before it expires, but only probabilistically:
class CacheWithStampedeProtection {
async get(key) {
const cached = await this.cache.get(key);
if (cached) {
// Check if we should refresh early
const age = Date.now() - cached.timestamp;
const ttl = cached.ttl;
const refreshWindow = ttl * 0.2; // Last 20% of TTL
if (age > ttl - refreshWindow) {
// Probabilistic refresh: 10% chance
if (Math.random() < 0.1) {
// Refresh in background (don't block)
this.refreshInBackground(key);
}
}
return cached.value;
}
// Cache miss - use mutex to prevent stampede
return await this.getWithMutex(key);
}
async getWithMutex(key) {
// Try to acquire lock
const lock = await this.lock.acquire(`lock:${key}`, 1000);
if (lock) {
try {
// Double-check cache (another request might have populated it)
let value = await this.cache.get(key);
if (value) {
return value.value;
}
// Fetch from database
value = await this.database.get(key);
// Populate cache
await this.cache.set(key, {
value,
timestamp: Date.now(),
ttl: 300000
}, { ttl: 300 });
return value;
} finally {
await lock.release();
}
} else {
// Another request is fetching - wait and retry
await this.sleep(50);
return await this.get(key);
}
}
async refreshInBackground(key) {
// Don't block - refresh asynchronously
setImmediate(async () => {
try {
const value = await this.database.get(key);
await this.cache.set(key, {
value,
timestamp: Date.now(),
ttl: 300000
}, { ttl: 300 });
} catch (error) {
// Ignore errors - cache still valid
}
});
}
}
Key techniques:
- Early refresh: Refresh before expiration (last 20% of TTL)
- Probabilistic: Only some requests refresh (10% chance)
- Mutex/lock: Only one request fetches on cache miss
- Double-check: Verify cache after acquiring lock
Distributed Cache Consistency
In a distributed system with multiple cache nodes, keeping them consistent is challenging.
Distributed Cache Consistency
Problem: Multiple cache nodes can have different versions of the same data.
Solutions:
- Write-Through to All Nodes: Update all nodes synchronously (simple, but slow)
- Event Bus for Invalidation: Publish invalidation events (decoupled, but eventual consistency)
- Versioning: Compare versions, reject stale reads (handles stale data gracefully)
- Accept Eventual Consistency: Use where acceptable, strong consistency where required
The Problem
When you have multiple cache instances:
// Node 1 has: user:123 = "Alice" (version 1)
// Node 2 has: user:123 = "Bob" (version 0) - stale!
// Node 3 has: user:123 = "Alice" (version 1)
// User reads from Node 2 → gets stale data
Solutions
1. Write-Through to All Nodes
Update all cache nodes synchronously:
async function updateUser(userId, data) {
await db.update('users', userId, data);
// Update all cache nodes
await Promise.all([
cacheNode1.set(`user:${userId}`, data),
cacheNode2.set(`user:${userId}`, data),
cacheNode3.set(`user:${userId}`, data)
]);
}
Pros: Simple, consistent Cons: Slow (all nodes), one failure affects all
2. Event Bus for Invalidation
Publish invalidation events:
async function updateUser(userId, data) {
await db.update('users', userId, data);
// Publish invalidation event
await eventBus.publish('user.updated', {
userId,
timestamp: Date.now()
});
}
// Each cache node subscribes
eventBus.subscribe('user.updated', async (event) => {
await cache.delete(`user:${event.userId}`);
});
Pros: Decoupled, scalable Cons: Eventual consistency, event delivery not guaranteed
3. Versioning
Include version numbers, reject stale reads:
async function get(key) {
const cached = await cache.get(key);
const dbValue = await database.get(key);
if (cached && cached.version >= dbValue.version) {
return cached.value; // Cache is fresh or newer
}
// Cache is stale, use DB value
await cache.set(key, {
value: dbValue.value,
version: dbValue.version
});
return dbValue.value;
}
Pros: Handles stale data gracefully Cons: More complex, version tracking needed
4. Accept Eventual Consistency
Sometimes, eventual consistency is acceptable:
// User profile can be slightly stale
// Product catalog can be slightly stale
// Analytics can be stale
// But payment data must be consistent
Key insight: Not all data needs strong consistency. Use eventual consistency where acceptable, strong consistency where required.
Cache Warming
Cold starts are a problem: when a cache is empty (after deployment, restart, or eviction), the first requests are slow.
Cache Warming Strategies
1. Pre-Deployment Warming: Load data before deployment (most effective, requires planning)
2. Lazy Loading: Load on first request, refresh in background (simple, but first user slow)
3. Predictive Warming: ML-based prediction of likely requests (optimal, but complex)
4. Hybrid Approach: Pre-warm critical data, lazy load rest (balanced, practical)
Strategies
1. Pre-Deployment Warming
Load cache before deployment:
class CacheWarmer {
async warmCache() {
// Load popular items
const popularUsers = await db.query(`
SELECT * FROM users
ORDER BY last_login DESC
LIMIT 1000
`);
for (const user of popularUsers) {
await cache.set(`user:${user.id}`, user, { ttl: 3600 });
}
// Load frequently accessed data
const popularProducts = await db.query(`
SELECT * FROM products
WHERE views > 1000
`);
for (const product of popularProducts) {
await cache.set(`product:${product.id}`, product, { ttl: 1800 });
}
}
}
2. Lazy Loading with Background Refresh
Load on first request, refresh in background:
async function get(key) {
let value = await cache.get(key);
if (!value) {
// Cache miss - fetch from DB
value = await database.get(key);
await cache.set(key, value, { ttl: 300 });
// Schedule background refresh before expiry
scheduleRefresh(key, 240); // Refresh at 80% of TTL
}
return value;
}
3. Predictive Warming
Use ML or patterns to predict what will be needed:
class PredictiveWarmer {
async warmBasedOnTime() {
const hour = new Date().getHours();
// Morning: warm up daily reports
if (hour >= 8 && hour < 10) {
await this.warmDailyReports();
}
// Evening: warm up analytics
if (hour >= 18 && hour < 20) {
await this.warmAnalytics();
}
}
async warmBasedOnPatterns() {
// Analyze access patterns
const patterns = await this.analyzeAccessPatterns();
// Warm likely-to-be-accessed items
for (const pattern of patterns) {
await this.warmPattern(pattern);
}
}
}
4. Hybrid Approach
Combine strategies:
class HybridWarmer {
async warm() {
// 1. Pre-warm critical data
await this.warmCriticalData();
// 2. Lazy load everything else
// (handled in get() method)
// 3. Predictive warm based on time/patterns
await this.warmPredictive();
}
}
Memory Pressure and Eviction
When cache is full, you need eviction policies:
LRU (Least Recently Used):
- Evict items not accessed recently
- Good for: Temporal locality
LFU (Least Frequently Used):
- Evict items accessed least often
- Good for: Long-term popularity
TTL-based:
- Evict expired items
- Good for: Time-sensitive data
Random:
- Evict random items
- Good for: Simple, no overhead
class LRUCache {
constructor(maxSize) {
this.maxSize = maxSize;
this.cache = new Map(); // Ordered by insertion
}
get(key) {
if (this.cache.has(key)) {
// Move to end (most recently used)
const value = this.cache.get(key);
this.cache.delete(key);
this.cache.set(key, value);
return value;
}
return null;
}
set(key, value) {
if (this.cache.has(key)) {
// Update existing
this.cache.delete(key);
} else if (this.cache.size >= this.maxSize) {
// Evict least recently used (first item)
const firstKey = this.cache.keys().next().value;
this.cache.delete(firstKey);
}
this.cache.set(key, value);
}
}
Real-World Scaling Patterns
Pattern 1: Cache-Aside (Lazy Loading)
Application manages cache:
async function getUser(userId) {
// Check cache
let user = await cache.get(`user:${userId}`);
if (!user) {
// Cache miss - fetch from DB
user = await db.getUser(userId);
// Populate cache
await cache.set(`user:${userId}`, user, { ttl: 300 });
}
return user;
}
Pattern 2: Read-Through
Cache handles DB access:
// Cache automatically fetches from DB on miss
const user = await cache.get(`user:${userId}`);
// Cache handles DB lookup internally
Pattern 3: Write-Through
Cache writes to both cache and DB:
await cache.set(`user:${userId}`, userData);
// Cache automatically writes to DB
Key Takeaways
- Use multiple layers: L1 (memory) → L2 (distributed) → L3 (database)
- Choose invalidation strategy: Based on consistency vs. performance needs
- Prevent cache stampede: Probabilistic early expiration, mutexes
- Handle distributed consistency: Event bus, versioning, or accept eventual consistency
- Warm your cache: Pre-deployment, lazy loading, or predictive
- Monitor cache metrics: Hit rate, latency, memory usage, eviction rate
Caching is powerful, but it adds complexity. The best cache implementations are invisible—they just make everything faster without introducing bugs.
Monitor your cache hit rates. If they’re below 80%, you’re not caching effectively. If they’re above 95%, you might be over-caching.
Scaling your cache architecture? I provide performance reviews, cache strategy design, and production-ready patterns for high-scale systems. Let's discuss your setup.
P.S. Follow me on Twitter where I share engineering insights, system design patterns, and technical leadership perspectives.
Enjoyed this? Support my work
Buy me a coffee