Async Processing and Kafka Essentials
| 10 min read
tl;dr: Event-driven architecture fundamentals, Kafka topics and partitions, producer-consumer patterns, consumer groups, exactly-once semantics, and production-ready async processing.
Synchronous request-response works for many applications, but at scale, you need asynchronous processing. Kafka has become the de facto standard for event streaming, but understanding it deeply requires grasping its core concepts.
Here’s what you need to know beyond the basics.
When to Use Async Processing
Not everything should be async. Use it when:
1. Decoupling Services
- Services don’t need immediate responses
- You want to add consumers without changing producers
- Services have different processing speeds
2. High Throughput
- Need to handle bursts of traffic
- Processing can be done later
- Batch processing is acceptable
3. Resilience
- Don’t want one slow service to block others
- Need fault tolerance
- Want replay capability
4. Event Sourcing
- Need audit trail
- Want to rebuild state from events
- Need time-travel debugging
Kafka Architecture Fundamentals
Kafka is a distributed event streaming platform. Understanding its architecture is key to using it effectively.
Kafka Architecture
graph TB
P1[Producer 1] --> Topic[Topic user-events<br/>Partition 0 Offset 0-1000<br/>Partition 1 Offset 0-1500<br/>Partition 2 Offset 0-800]
P2[Producer 2] --> Topic
Topic --> C1[Consumer 1<br/>Group analytics<br/>Partition 0]
Topic --> C2[Consumer 2<br/>Group analytics<br/>Partition 1]
Topic --> C3[Consumer 3<br/>Group analytics<br/>Partition 2]
Topic --> Broker[Kafka Broker<br/>Replication 3<br/>ISR 3]
Core Concepts
Topics: Categories or streams of messages (e.g., user-events, orders)
Partitions: Ordered sequences within a topic. Partitions enable:
- Parallelism (multiple consumers can process different partitions)
- Ordering (messages within a partition are ordered)
- Scalability (add partitions to scale)
Offsets: Unique position of a message in a partition. Immutable and sequential.
Consumer Groups: Set of consumers that work together to consume a topic. Each partition is consumed by only one consumer in a group.
Replication: Copies of partitions across brokers for fault tolerance.
ISR (In-Sync Replicas): Replicas that are up-to-date with the leader.
Why Partitions Matter
Partitions are Kafka’s secret sauce:
// Topic with 3 partitions
Topic: user-events
Partition 0: [msg1, msg2, msg3, ...]
Partition 1: [msg1, msg2, msg3, ...]
Partition 2: [msg1, msg2, msg3, ...]
// 3 consumers in a group can process in parallel
Consumer 1 → Partition 0
Consumer 2 → Partition 1
Consumer 3 → Partition 2
Key insight: More partitions = more parallelism, but also more overhead. Start with number of partitions = number of consumers, scale up as needed.
Producer Patterns
Producers send messages to Kafka topics.
Producer-Consumer Flow
sequenceDiagram
participant P as Producer
participant B as Kafka Broker
participant C as Consumer
P->>B: Create Message + Choose Partition
P->>B: Send to Broker
B->>B: Append to Partition + Replicate
B->>P: ACK Received
C->>B: Poll for Messages
B->>C: Messages
C->>C: Process Messages
C->>B: Commit Offset
Idempotent Producers
Enable idempotence to prevent duplicates:
const producer = kafka.producer({
idempotent: true, // Prevents duplicates
maxInFlightRequests: 5,
acks: 'all', // Wait for all replicas
retries: 3
});
await producer.send({
topic: 'user-events',
messages: [{
key: userId, // Ensures same partition for same user
value: JSON.stringify(event)
}]
});
How it works:
- Producer gets a Producer ID (PID)
- Each message gets a sequence number per partition
- Broker deduplicates based on PID + sequence number
Batching
Batch messages for better throughput:
const producer = kafka.producer({
batch: {
size: 16384, // 16KB
maxWait: 100 // 100ms
}
});
// Multiple sends are batched automatically
await producer.send({ topic: 'events', messages: [msg1] });
await producer.send({ topic: 'events', messages: [msg2] });
// Both sent in one batch
Trade-off: Lower latency vs. higher throughput. Tune based on your needs.
Partitioning Strategy
Choose partition based on key:
// Round-robin (no key)
await producer.send({
topic: 'events',
messages: [{ value: 'event1' }] // No key
});
// Key-based (same key → same partition)
await producer.send({
topic: 'user-events',
messages: [{
key: userId, // All events for user go to same partition
value: JSON.stringify(event)
}]
});
When to use keys:
- Need ordering per entity (e.g., all user events in order)
- Need to group related messages
When not to use keys:
- Want maximum parallelism
- Ordering doesn’t matter
Consumer Patterns
Consumers read messages from Kafka topics.
Consumer Groups
Consumers work in groups to share partitions:
Consumer Group Coordination
graph TB
Topic[Topic: orders<br/>Partition 0, 1, 2, 3] --> Group[Consumer Group: analytics]
Group --> C1[Consumer 1<br/>Partition 0<br/>Offset: 750]
Group --> C2[Consumer 2<br/>Partition 1<br/>Offset: 1200]
Group --> C3[Consumer 3<br/>Partition 2<br/>Offset: 600]
Group --> C4[Consumer 4<br/>Partition 3<br/>Offset: 950]
Note[Rebalancing:<br/>If Consumer 2 crashes,<br/>partitions reassigned to others]
const consumer = kafka.consumer({
groupId: 'analytics-service', // Consumer group
sessionTimeout: 30000,
heartbeatInterval: 3000
});
await consumer.subscribe({ topic: 'user-events' });
await consumer.run({
eachMessage: async ({ topic, partition, message }) => {
// Process message
await processEvent(message.value);
// Offset committed automatically (auto-commit)
}
});
How it works:
- Group coordinator assigns partitions to consumers
- Each partition consumed by one consumer in group
- If consumer fails, partitions reassigned to others
Rebalancing
When consumers join or leave, partitions are reassigned:
What happens:
- Coordinator detects change (consumer joins/leaves)
- Triggers rebalance (all consumers stop)
- Partitions reassigned (round-robin or range)
- Consumers resume processing
Problem: Processing pauses during rebalance.
Solution: Minimize rebalances:
- Keep consumers healthy (heartbeat)
- Avoid frequent restarts
- Use static membership (Kafka 2.3+)
const consumer = kafka.consumer({
groupId: 'analytics-service',
sessionTimeout: 30000,
heartbeatInterval: 3000,
// Static membership (Kafka 2.3+)
groupInstanceId: 'consumer-1' // Prevents unnecessary rebalances
});
Manual Offset Management
For exactly-once processing, manage offsets manually:
await consumer.run({
eachMessage: async ({ topic, partition, message }) => {
try {
// Process message
await processEvent(message.value);
// Commit offset only after successful processing
await consumer.commitOffsets([{
topic,
partition,
offset: (parseInt(message.offset) + 1).toString()
}]);
} catch (error) {
// Don't commit on error - will retry
throw error;
}
}
});
Delivery Guarantees
Kafka provides three delivery guarantees:
At-Least-Once
Messages may be delivered multiple times:
// Producer retries on failure
const producer = kafka.producer({
retries: 3,
acks: 'all'
});
// Consumer commits before processing
await consumer.run({
autoCommit: true, // Commits before processing
eachMessage: async ({ message }) => {
await processEvent(message.value);
// If processing fails, message already committed
// Will be reprocessed → duplicate
}
});
Use when: Duplicates are acceptable, data loss is not.
At-Most-Once
Messages delivered at most once:
// Producer no retries
const producer = kafka.producer({
retries: 0
});
// Consumer commits after processing
await consumer.run({
autoCommit: false,
eachMessage: async ({ message }) => {
await processEvent(message.value);
// Commit after processing
await commitOffset();
// If processing fails, offset not committed
// Message lost
}
});
Use when: Duplicates are unacceptable, some data loss is acceptable. Rarely used.
Exactly-Once
Messages delivered exactly once:
Exactly-Once Semantics
Producer Requirements:
- Enable idempotence:
enable.idempotence=true - Producer ID (PID): Unique per producer instance
- Sequence Number: Per partition, monotonically increasing
- Broker deduplication: Rejects duplicate sequence numbers
Consumer Requirements:
- Enable transactions:
isolation.level=read_committed - Transactional ID: Unique per consumer group
- Atomic commits: Offset + output in one transaction
- Coordinator: Transaction coordinator manages state
Requirements:
Producer:
const producer = kafka.producer({
idempotent: true, // Enable idempotence
transactionalId: 'my-producer',
acks: 'all'
});
// Begin transaction
await producer.beginTransaction();
try {
await producer.send({
topic: 'events',
messages: [/* ... */]
});
// Commit transaction
await producer.commitTransaction();
} catch (error) {
// Abort on error
await producer.abortTransaction();
}
Consumer:
const consumer = kafka.consumer({
groupId: 'my-group',
isolationLevel: 'read_committed' // Only read committed messages
});
await consumer.run({
eachMessage: async ({ message }) => {
// Process in transaction
await transactionalProcess(message.value);
// Offset committed atomically with output
}
});
Use when: Need perfect guarantees, can accept performance cost.
Event-Driven Architecture
Kafka enables event-driven architectures:
Event-Driven Architecture
graph LR
Source[Event Source<br/>User Action, System Event] --> Kafka[Kafka<br/>Event Stream<br/>Topics & Partitions]
Kafka --> ServiceA[Service A<br/>Analytics<br/>Consumer Group: analytics]
Kafka --> ServiceB[Service B<br/>Notifications<br/>Consumer Group: notifications]
Kafka --> ServiceC[Service C<br/>Search Index<br/>Consumer Group: search]
Kafka --> ServiceD[Service D<br/>Audit Log<br/>Consumer Group: audit]
Benefits:
- Decoupling: Services don’t know about each other
- Scalability: Scale consumers independently
- Resilience: Durable message storage
- Flexibility: Multiple consumers, different processing
Pattern:
Event Source → Kafka → Multiple Consumers
(User Action) (Topics) (Analytics, Notifications, Search, etc.)
Example:
// Producer: User service
await producer.send({
topic: 'user-events',
messages: [{
key: userId,
value: JSON.stringify({
type: 'user.created',
userId,
timestamp: Date.now()
})
}]
});
// Consumer 1: Analytics service
await analyticsConsumer.run({
eachMessage: async ({ message }) => {
await trackUserCreated(JSON.parse(message.value));
}
});
// Consumer 2: Notification service
await notificationConsumer.run({
eachMessage: async ({ message }) => {
await sendWelcomeEmail(JSON.parse(message.value));
}
});
// Consumer 3: Search service
await searchConsumer.run({
eachMessage: async ({ message }) => {
await indexUser(JSON.parse(message.value));
}
});
Performance Tuning
Producer:
- Batch size: Larger = better throughput, higher latency
- Compression:
snappyorlz4for good balance - Acks:
allfor durability,1for speed
Consumer:
- Fetch size: Larger = fewer requests, more memory
- Max poll records: Balance throughput and processing time
- Session timeout: Longer = fewer rebalances, slower failure detection
// Tuned producer
const producer = kafka.producer({
batch: { size: 32768, maxWait: 50 },
compression: 'snappy',
acks: 'all',
retries: 3
});
// Tuned consumer
const consumer = kafka.consumer({
groupId: 'my-group',
fetchMinBytes: 1024,
fetchMaxWaitMs: 500,
maxPollRecords: 500
});
Common Pitfalls
1. Too Many Partitions
More partitions ≠ always better:
- Each partition has overhead
- More partitions = more open files
- Rebalancing takes longer
Rule of thumb: Start with number of consumers, scale up gradually.
2. Consumer Lag
When consumers can’t keep up:
// Monitor lag
const lag = await admin.fetchOffsets({
groupId: 'my-group',
topic: 'events'
});
// Alert if lag > threshold
if (lag > 10000) {
alert('Consumer lag high!');
}
Solutions:
- Scale consumers
- Optimize processing
- Increase partitions (if needed)
3. Message Ordering
Kafka only guarantees order within a partition:
// BAD: Messages for same user might go to different partitions
await producer.send({
topic: 'user-events',
messages: [{ value: 'event1' }] // No key
});
// GOOD: All messages for user go to same partition
await producer.send({
topic: 'user-events',
messages: [{
key: userId, // Ensures same partition
value: 'event1'
}]
});
4. Offset Management
Don’t commit too early or too late:
// BAD: Commit before processing
await commitOffset();
await processEvent(); // If this fails, message lost
// BAD: Commit after processing, but processing might fail
await processEvent();
await commitOffset(); // If processing failed, duplicate
// GOOD: Commit after successful processing
try {
await processEvent();
await commitOffset();
} catch (error) {
// Don't commit, will retry
}
Key Takeaways
- Use async when: Decoupling, high throughput, resilience needed
- Partitions enable: Parallelism, ordering, scalability
- Consumer groups: Share partitions, enable scaling
- Delivery guarantees: Choose based on duplicate vs. loss tolerance
- Exactly-once: Requires idempotent producer + transactional consumer
- Monitor: Lag, throughput, error rates
Kafka is powerful but complex. Start simple, monitor closely, tune based on metrics.
The best Kafka implementations are invisible—they just work, reliably, at scale.
Building event-driven systems? I provide architecture reviews, Kafka strategy design, and production-ready patterns for async processing. Let's discuss your implementation.
P.S. Follow me on Twitter where I share engineering insights, system design patterns, and technical leadership perspectives.
Enjoyed this? Support my work
Buy me a coffee