Async Processing and Kafka Essentials

tl;dr: Event-driven architecture fundamentals, Kafka topics and partitions, producer-consumer patterns, consumer groups, exactly-once semantics, and production-ready async processing.

Synchronous request-response works for many applications, but at scale, you need asynchronous processing. Kafka has become the de facto standard for event streaming, but understanding it deeply requires grasping its core concepts.

Here’s what you need to know beyond the basics.

When to Use Async Processing

Not everything should be async. Use it when:

1. Decoupling Services

Services don’t need immediate responses
You want to add consumers without changing producers
Services have different processing speeds

2. High Throughput

Need to handle bursts of traffic
Processing can be done later
Batch processing is acceptable

3. Resilience

Don’t want one slow service to block others
Need fault tolerance
Want replay capability

4. Event Sourcing

Need audit trail
Want to rebuild state from events
Need time-travel debugging

Kafka Architecture Fundamentals

Kafka is a distributed event streaming platform. Understanding its architecture is key to using it effectively.

Kafka Architecture

graph TB
    P1[Producer 1] --> Topic[Topic user-events<br/>Partition 0 Offset 0-1000<br/>Partition 1 Offset 0-1500<br/>Partition 2 Offset 0-800]
    P2[Producer 2] --> Topic
    
    Topic --> C1[Consumer 1<br/>Group analytics<br/>Partition 0]
    Topic --> C2[Consumer 2<br/>Group analytics<br/>Partition 1]
    Topic --> C3[Consumer 3<br/>Group analytics<br/>Partition 2]
    
    Topic --> Broker[Kafka Broker<br/>Replication 3<br/>ISR 3]

Core Concepts

Topics: Categories or streams of messages (e.g., user-events, orders)

Partitions: Ordered sequences within a topic. Partitions enable:

Parallelism (multiple consumers can process different partitions)
Ordering (messages within a partition are ordered)
Scalability (add partitions to scale)

Offsets: Unique position of a message in a partition. Immutable and sequential.

Consumer Groups: Set of consumers that work together to consume a topic. Each partition is consumed by only one consumer in a group.

Replication: Copies of partitions across brokers for fault tolerance.

ISR (In-Sync Replicas): Replicas that are up-to-date with the leader.

Why Partitions Matter

Partitions are Kafka’s secret sauce:

// Topic with 3 partitions
Topic: user-events
  Partition 0: [msg1, msg2, msg3, ...]
  Partition 1: [msg1, msg2, msg3, ...]
  Partition 2: [msg1, msg2, msg3, ...]

// 3 consumers in a group can process in parallel
Consumer 1 → Partition 0
Consumer 2 → Partition 1
Consumer 3 → Partition 2

Key insight: More partitions = more parallelism, but also more overhead. Start with number of partitions = number of consumers, scale up as needed.

Producer Patterns

Producers send messages to Kafka topics.

Producer-Consumer Flow

sequenceDiagram
    participant P as Producer
    participant B as Kafka Broker
    participant C as Consumer
    
    P->>B: Create Message + Choose Partition
    P->>B: Send to Broker
    B->>B: Append to Partition + Replicate
    B->>P: ACK Received
    
    C->>B: Poll for Messages
    B->>C: Messages
    C->>C: Process Messages
    C->>B: Commit Offset

Idempotent Producers

Enable idempotence to prevent duplicates:

const producer = kafka.producer({
  idempotent: true,  // Prevents duplicates
  maxInFlightRequests: 5,
  acks: 'all',  // Wait for all replicas
  retries: 3
});

await producer.send({
  topic: 'user-events',
  messages: [{
    key: userId,  // Ensures same partition for same user
    value: JSON.stringify(event)
  }]
});

How it works:

Producer gets a Producer ID (PID)
Each message gets a sequence number per partition
Broker deduplicates based on PID + sequence number

Batching

Batch messages for better throughput:

const producer = kafka.producer({
  batch: {
    size: 16384,  // 16KB
    maxWait: 100  // 100ms
  }
});

// Multiple sends are batched automatically
await producer.send({ topic: 'events', messages: [msg1] });
await producer.send({ topic: 'events', messages: [msg2] });
// Both sent in one batch

Trade-off: Lower latency vs. higher throughput. Tune based on your needs.

Partitioning Strategy

Choose partition based on key:

// Round-robin (no key)
await producer.send({
  topic: 'events',
  messages: [{ value: 'event1' }]  // No key
});

// Key-based (same key → same partition)
await producer.send({
  topic: 'user-events',
  messages: [{
    key: userId,  // All events for user go to same partition
    value: JSON.stringify(event)
  }]
});

When to use keys:

Need ordering per entity (e.g., all user events in order)
Need to group related messages

When not to use keys:

Want maximum parallelism
Ordering doesn’t matter

Consumer Patterns

Consumers read messages from Kafka topics.

Consumer Groups

Consumers work in groups to share partitions:

Consumer Group Coordination

graph TB
    Topic[Topic: orders<br/>Partition 0, 1, 2, 3] --> Group[Consumer Group: analytics]
    
    Group --> C1[Consumer 1<br/>Partition 0<br/>Offset: 750]
    Group --> C2[Consumer 2<br/>Partition 1<br/>Offset: 1200]
    Group --> C3[Consumer 3<br/>Partition 2<br/>Offset: 600]
    Group --> C4[Consumer 4<br/>Partition 3<br/>Offset: 950]
    
    Note[Rebalancing:<br/>If Consumer 2 crashes,<br/>partitions reassigned to others]

const consumer = kafka.consumer({
  groupId: 'analytics-service',  // Consumer group
  sessionTimeout: 30000,
  heartbeatInterval: 3000
});

await consumer.subscribe({ topic: 'user-events' });

await consumer.run({
  eachMessage: async ({ topic, partition, message }) => {
    // Process message
    await processEvent(message.value);
    
    // Offset committed automatically (auto-commit)
  }
});

How it works:

Group coordinator assigns partitions to consumers
Each partition consumed by one consumer in group
If consumer fails, partitions reassigned to others

Rebalancing

When consumers join or leave, partitions are reassigned:

What happens:

Coordinator detects change (consumer joins/leaves)
Triggers rebalance (all consumers stop)
Partitions reassigned (round-robin or range)
Consumers resume processing

Problem: Processing pauses during rebalance.

Solution: Minimize rebalances:

Keep consumers healthy (heartbeat)
Avoid frequent restarts
Use static membership (Kafka 2.3+)

const consumer = kafka.consumer({
  groupId: 'analytics-service',
  sessionTimeout: 30000,
  heartbeatInterval: 3000,
  // Static membership (Kafka 2.3+)
  groupInstanceId: 'consumer-1'  // Prevents unnecessary rebalances
});

Manual Offset Management

For exactly-once processing, manage offsets manually:

await consumer.run({
  eachMessage: async ({ topic, partition, message }) => {
    try {
      // Process message
      await processEvent(message.value);
      
      // Commit offset only after successful processing
      await consumer.commitOffsets([{
        topic,
        partition,
        offset: (parseInt(message.offset) + 1).toString()
      }]);
    } catch (error) {
      // Don't commit on error - will retry
      throw error;
    }
  }
});

Delivery Guarantees

Kafka provides three delivery guarantees:

At-Least-Once

Messages may be delivered multiple times:

// Producer retries on failure
const producer = kafka.producer({
  retries: 3,
  acks: 'all'
});

// Consumer commits before processing
await consumer.run({
  autoCommit: true,  // Commits before processing
  eachMessage: async ({ message }) => {
    await processEvent(message.value);
    // If processing fails, message already committed
    // Will be reprocessed → duplicate
  }
});

Use when: Duplicates are acceptable, data loss is not.

At-Most-Once

Messages delivered at most once:

// Producer no retries
const producer = kafka.producer({
  retries: 0
});

// Consumer commits after processing
await consumer.run({
  autoCommit: false,
  eachMessage: async ({ message }) => {
    await processEvent(message.value);
    // Commit after processing
    await commitOffset();
    // If processing fails, offset not committed
    // Message lost
  }
});

Use when: Duplicates are unacceptable, some data loss is acceptable. Rarely used.

Exactly-Once

Messages delivered exactly once:

Exactly-Once Semantics

Producer Requirements:

Enable idempotence: enable.idempotence=true
Producer ID (PID): Unique per producer instance
Sequence Number: Per partition, monotonically increasing
Broker deduplication: Rejects duplicate sequence numbers

Consumer Requirements:

Enable transactions: isolation.level=read_committed
Transactional ID: Unique per consumer group
Atomic commits: Offset + output in one transaction
Coordinator: Transaction coordinator manages state

Requirements:

Producer:

const producer = kafka.producer({
  idempotent: true,  // Enable idempotence
  transactionalId: 'my-producer',
  acks: 'all'
});

// Begin transaction
await producer.beginTransaction();

try {
  await producer.send({
    topic: 'events',
    messages: [/* ... */]
  });
  
  // Commit transaction
  await producer.commitTransaction();
} catch (error) {
  // Abort on error
  await producer.abortTransaction();
}

Consumer:

const consumer = kafka.consumer({
  groupId: 'my-group',
  isolationLevel: 'read_committed'  // Only read committed messages
});

await consumer.run({
  eachMessage: async ({ message }) => {
    // Process in transaction
    await transactionalProcess(message.value);
    // Offset committed atomically with output
  }
});

Use when: Need perfect guarantees, can accept performance cost.

Event-Driven Architecture

Kafka enables event-driven architectures:

Event-Driven Architecture

graph LR
    Source[Event Source<br/>User Action, System Event] --> Kafka[Kafka<br/>Event Stream<br/>Topics & Partitions]
    
    Kafka --> ServiceA[Service A<br/>Analytics<br/>Consumer Group: analytics]
    Kafka --> ServiceB[Service B<br/>Notifications<br/>Consumer Group: notifications]
    Kafka --> ServiceC[Service C<br/>Search Index<br/>Consumer Group: search]
    Kafka --> ServiceD[Service D<br/>Audit Log<br/>Consumer Group: audit]

Benefits:

Decoupling: Services don’t know about each other
Scalability: Scale consumers independently
Resilience: Durable message storage
Flexibility: Multiple consumers, different processing

Pattern:

Event Source → Kafka → Multiple Consumers
  (User Action)    (Topics)   (Analytics, Notifications, Search, etc.)

Example:

// Producer: User service
await producer.send({
  topic: 'user-events',
  messages: [{
    key: userId,
    value: JSON.stringify({
      type: 'user.created',
      userId,
      timestamp: Date.now()
    })
  }]
});

// Consumer 1: Analytics service
await analyticsConsumer.run({
  eachMessage: async ({ message }) => {
    await trackUserCreated(JSON.parse(message.value));
  }
});

// Consumer 2: Notification service
await notificationConsumer.run({
  eachMessage: async ({ message }) => {
    await sendWelcomeEmail(JSON.parse(message.value));
  }
});

// Consumer 3: Search service
await searchConsumer.run({
  eachMessage: async ({ message }) => {
    await indexUser(JSON.parse(message.value));
  }
});

Performance Tuning

Producer:

Batch size: Larger = better throughput, higher latency
Compression: snappy or lz4 for good balance
Acks: all for durability, 1 for speed

Consumer:

Fetch size: Larger = fewer requests, more memory
Max poll records: Balance throughput and processing time
Session timeout: Longer = fewer rebalances, slower failure detection

// Tuned producer
const producer = kafka.producer({
  batch: { size: 32768, maxWait: 50 },
  compression: 'snappy',
  acks: 'all',
  retries: 3
});

// Tuned consumer
const consumer = kafka.consumer({
  groupId: 'my-group',
  fetchMinBytes: 1024,
  fetchMaxWaitMs: 500,
  maxPollRecords: 500
});

Common Pitfalls

1. Too Many Partitions

More partitions ≠ always better:

Each partition has overhead
More partitions = more open files
Rebalancing takes longer

Rule of thumb: Start with number of consumers, scale up gradually.

2. Consumer Lag

When consumers can’t keep up:

// Monitor lag
const lag = await admin.fetchOffsets({
  groupId: 'my-group',
  topic: 'events'
});

// Alert if lag > threshold
if (lag > 10000) {
  alert('Consumer lag high!');
}

Solutions:

Scale consumers
Optimize processing
Increase partitions (if needed)

3. Message Ordering

Kafka only guarantees order within a partition:

// BAD: Messages for same user might go to different partitions
await producer.send({
  topic: 'user-events',
  messages: [{ value: 'event1' }]  // No key
});

// GOOD: All messages for user go to same partition
await producer.send({
  topic: 'user-events',
  messages: [{
    key: userId,  // Ensures same partition
    value: 'event1'
  }]
});

4. Offset Management

Don’t commit too early or too late:

// BAD: Commit before processing
await commitOffset();
await processEvent();  // If this fails, message lost

// BAD: Commit after processing, but processing might fail
await processEvent();
await commitOffset();  // If processing failed, duplicate

// GOOD: Commit after successful processing
try {
  await processEvent();
  await commitOffset();
} catch (error) {
  // Don't commit, will retry
}

Key Takeaways

Use async when: Decoupling, high throughput, resilience needed
Partitions enable: Parallelism, ordering, scalability
Consumer groups: Share partitions, enable scaling
Delivery guarantees: Choose based on duplicate vs. loss tolerance
Exactly-once: Requires idempotent producer + transactional consumer
Monitor: Lag, throughput, error rates

Kafka is powerful but complex. Start simple, monitor closely, tune based on metrics.

The best Kafka implementations are invisible—they just work, reliably, at scale.

Building event-driven systems? I provide architecture reviews, Kafka strategy design, and production-ready patterns for async processing. Let's discuss your implementation.

P.S. Follow me on Twitter where I share engineering insights, system design patterns, and technical leadership perspectives.