Your HTTP Calls Are A Time Bomb

// the problem

Two services. One of them just died.

You're working on the backend for a pizza restaurant. Your order service calls your kitchen service over HTTP. Works great... Until the kitchen deploys a breaking change at 3am on a Friday. With HTTP, one failure becomes two.

💥

Outages

Kitchen goes down? Order service starts 500-ing too. Two independent services fail as one.

🌊

Traffic Bursts

Friday rush sends 10× traffic. HTTP calls hammer the kitchen until it collapses, taking orders down too.

☠️

Poison Pills

One malformed order (pineapple on a pizza, unacceptable) retries forever, blocking every valid order behind it.

"The kitchen service is down. But why is the order service also returning 500s?"
— your on-call engineer, 3:47am

// so what is a queue?

A waiting room
for messages.

Instead of your order service calling the kitchen service directly, it drops a message into a queue. The kitchen picks it up when it’s ready. They never need to talk to each other directly — or even be running at the same time.

Try it. Take the kitchen offline, then send some orders. Watch what happens.

// queue demo — toggle the kitchen on/off

Order Service

producer

Queue

Kitchen

consumer — online

The order service doesn’t know or care whether the kitchen is running. It just drops the message and moves on. The kitchen processes it when ready — even if that’s after a restart, a deploy, or a three-hour outage.

That’s it.

// the solution

Put a queue between them.

A point-to-point message channel decouples your services at runtime. The order service writes to a queue. The kitchen reads from it when ready. They never need to be online at the same time.

// architecture, pizza restaurant

1

User submits order

APIgateway / load balancer

→

Producerorder service

→

Data Storepersist order

2

Producer publishes to the message queue

Message Queuepoint-to-point channel

↳ after N failures:

Dead Letter Queueunprocessable messages

3

Consumer processes the message

Consumerkitchen service

4

Consumer sends acknowledgement back via replyTo

Reply QueuereplyTo channel

→

Reply Processorack service

→

Data Storemark ack’d

One message travels through 3 services, 2 queues, 1 data store, fully asynchronously.

AWS implementation: API Gateway · Lambda · SQS · DynamoDB

// problem #3, poison pills

One bad pizza.
A hundred blocked orders.

Without a Dead Letter Queue, one invalid message gets retried forever, blocking every valid message behind it. This is the poison pill problem. The fix is pure configuration.

// dead letter queue simulator

Orders Queue

Dead Letter Queue

After N failed attempts, the message moves to the Dead Letter Queue automatically. The rest of the queue? Unaffected. With most cloud messaging platforms, this is pure configuration, no application code required.

// AWS CDK, SQS + Lambda

const ordersQueue = new Queue(this, 'OrdersQueue', { deadLetterQueue: { queue: dlq, maxReceiveCount: 3, // 1 = fail fast 3 = handle transients higher = flaky downstream }, });

// partial batch failures

5 messages in.
4 succeed. 1 fails.

Most queue consumers receive messages in batches. If one fails, you don’t want all five retried. Your messaging platform needs to know exactly which message failed, so only that one goes back to the queue.

// batch processor simulator

Batch size5

// AWS CDK, SQS + Lambda

One flag enables partial failure reporting. No application code needed.

new SqsEventSource(ordersQueue, { batchSize: 5, reportBatchItemFailures: true, // only failed messages are retried maxBatchingWindow: Duration.seconds(30), });

The pattern is universal: wrap each message in a try/catch. Throw for real errors (message returns to queue). Return silently for validation failures (message deleted). On Kafka you’d manually send to a DLQ topic; on SQS + Lambda, reportBatchItemFailures handles this with configuration alone.

// ordering

Do you actually need ordering?

Most engineers think they need guaranteed ordering. Most don’t. Ordered queues add cost and reduce throughput on every platform, Kafka, RabbitMQ, SQS alike. Unordered queues are “good enough” for more use cases than you’d expect.

Unordered Queue

✓Maximum throughput

✓Lower cost

✓Best-effort ordering

✗No ordering guarantee

Best for: pizza orders, notifications, async jobs
AWS: SQS Standard · Kafka default partitions

Ordered Queue

✓Exactly-once delivery

✓Strict ordering

✗Lower throughput

✗Higher cost

Best for: financial ledgers, stock trades, chat messages
AWS: SQS FIFO · Kafka ordered partitions

“I’ve spoken to hundreds of engineers building serverless apps. Most think they need ordering. Almost none actually do.”

// message structure

Wrap your message.
Every single time.

Don’t put raw JSON on a queue. Wrap it in a CloudEvents envelope. Click any highlighted field to see exactly why it earns its place.

// message explorer, click any field

← click a field

Select any highlighted field in the CloudEvents view to learn what it does and why it matters.

// async request / reply

The kitchen needs
to talk back.

Asynchronous doesn’t mean fire-and-forget. The replyTo URL in your CloudEvents envelope lets the consumer send a response back, on a queue the producer controls.

// replyTo flow, step through a message

Order
Service

Orders
Queue

Kitchen
Service

Ack
Queue

Ack
Processor

Waiting to start

// click "Next Step" to trace a message through the system

Watch an order travel from submission to kitchen acknowledgement, one hop at a time.

The replyTo pattern keeps the producer in control. The kitchen doesn’t hardcode where to reply, it’s told at runtime, in the message itself.

// observability

Three services.
One trace.

How do you know a slow consumer is caused by the producer? With OpenTelemetry parent-child relationships, you see the entire message journey in one waterfall, across all three services, regardless of which queue technology you use.

// distributed trace, order #8472 (4.4s end-to-end)

order-receiver / send-pizza-order

—

└ order-receiver / store-to-dynamo

—

└ order-receiver / publish-to-sqs

—

└ kitchen-processor / process-pizza-order

—

└ kitchen-processor / validate-order

—

└ kitchen-processor / send-acknowledgement

—

└ ack-processor / process-acknowledgement

—

■ order-receiver ■ kitchen-processor ■ ack-processor

Works by injecting trace context into the message when publishing, then extracting it in the consumer to start a child span. There are two approaches:

Option A, transport attributes

Inject into SQS message attributes or Kafka headers. Works transparently but is transport-specific, switching queues means updating your propagation code.

Option B, CloudEvents traceparent extension ✓ recommended

Include traceparent as a CloudEvents extension attribute in the message body itself. Defined in the CloudEvents distributed tracing spec. The consumer reads it from the envelope and creates a child span, completely transport-agnostic. Switch from SQS to Kafka to RabbitMQ and your tracing just works.

// key takeaways

The patterns that matter.

Everything on this page applies whether you’re using SQS, Kafka, RabbitMQ, or Azure Service Bus. The platform changes. The patterns don’t.

01

Use a queue to decouple producer from consumer. An outage in the consumer doesn’t take down the producer. Traffic spikes buffer in the queue. They never need to be online at the same time.

02

Always have a Dead Letter Queue. Poison pill messages are inevitable. Move them out of the main queue after N retries so one bad message never blocks everything else.

03

Report partial batch failures. When processing a batch, only retry the message that failed, not the entire batch. Wrap each message in a try/catch and signal failures individually.

04

Always wrap your payload in an envelope. CloudEvents gives you id, source, type, replyTo, and traceparent for free. Routing, deduplication, tracing, and async reply all depend on it.

05

Async reply needs a replyTo URL and a correlation ID. The producer embeds where to reply. A natural entity ID (orderId) or a generated UUID stored in your data store ties the reply back to the original request.

06

Put traceparent in the message body, not transport headers. The CloudEvents distributed tracing extension keeps your trace context transport-agnostic. Switch queues without touching your observability code.

Just Use A Queue

Stop treating queues
as an afterthought.

Learn to build resilient, production-grade serverless integrations, with the patterns, the code, and the observability to know when things go wrong.

Get the Course →

TypeScript · AWS CDK · Lambda · SQS · OpenTelemetry

Your services calleach other. That's the bug.

Two services. One of them just died.

Outages

Traffic Bursts

Poison Pills

A waiting roomfor messages.

Put a queue between them.

One bad pizza.A hundred blocked orders.

5 messages in.4 succeed. 1 fails.

Do you actually need ordering?

Unordered Queue

Ordered Queue

Wrap your message.Every single time.

The kitchen needsto talk back.

Three services.One trace.

The patterns that matter.

Stop treating queuesas an afterthought.

Your services call
each other. That's the bug.

A waiting room
for messages.

One bad pizza.
A hundred blocked orders.

5 messages in.
4 succeed. 1 fails.

Wrap your message.
Every single time.

The kitchen needs
to talk back.

Three services.
One trace.

Stop treating queues
as an afterthought.