// just use a queue

Your services call
each other. That's the bug.

If you've built a service that calls another service directly over HTTP, this page is for you. It works great, your services are communicating, right up until it doesn't.

order-service.log, 03:47:22 UTC
// the problem

Two services. One of them just died.

You're working on the backend for a pizza restaurant. Your order service calls your kitchen service over HTTP. Works great... Until the kitchen deploys a breaking change at 3am on a Friday. With HTTP, one failure becomes two.

💥

Outages

Kitchen goes down? Order service starts 500-ing too. Two independent services fail as one.

🌊

Traffic Bursts

Friday rush sends 10× traffic. HTTP calls hammer the kitchen until it collapses, taking orders down too.

☠️

Poison Pills

One malformed order (pineapple on a pizza, unacceptable) retries forever, blocking every valid order behind it.

"The kitchen service is down. But why is the order service also returning 500s?"
— your on-call engineer, 3:47am
// so what is a queue?

A waiting room
for messages.

Instead of your order service calling the kitchen service directly, it drops a message into a queue. The kitchen picks it up when it’s ready. They never need to talk to each other directly — or even be running at the same time.

Try it. Take the kitchen offline, then send some orders. Watch what happens.

// queue demo — toggle the kitchen on/off
Order Service
producer
Queue
Kitchen
consumer — online

The order service doesn’t know or care whether the kitchen is running. It just drops the message and moves on. The kitchen processes it when ready — even if that’s after a restart, a deploy, or a three-hour outage.

That’s it.

// the solution

Put a queue between them.

A point-to-point message channel decouples your services at runtime. The order service writes to a queue. The kitchen reads from it when ready. They never need to be online at the same time.

// architecture, pizza restaurant
1
User submits order
APIgateway / load balancer
Producerorder service
Data Storepersist order
2
Producer publishes to the message queue
Message Queuepoint-to-point channel
↳ after N failures:
Dead Letter Queueunprocessable messages
3
Consumer processes the message
Consumerkitchen service
4
Consumer sends acknowledgement back via replyTo
Reply QueuereplyTo channel
Reply Processorack service
Data Storemark ack’d

One message travels through 3 services, 2 queues, 1 data store, fully asynchronously.

AWS implementation: API Gateway · Lambda · SQS · DynamoDB

// problem #3, poison pills

One bad pizza.
A hundred blocked orders.

Without a Dead Letter Queue, one invalid message gets retried forever, blocking every valid message behind it. This is the poison pill problem. The fix is pure configuration.

// dead letter queue simulator
Orders Queue
Dead Letter Queue
⚠ Poison pill has been retried 0 times. Queue is blocked. Every message behind it is stuck.

After N failed attempts, the message moves to the Dead Letter Queue automatically. The rest of the queue? Unaffected. With most cloud messaging platforms, this is pure configuration, no application code required.

// AWS CDK, SQS + Lambda
const ordersQueue = new Queue(this, 'OrdersQueue', { deadLetterQueue: { queue: dlq, maxReceiveCount: 3, // 1 = fail fast 3 = handle transients higher = flaky downstream }, });
// partial batch failures

5 messages in.
4 succeed. 1 fails.

Most queue consumers receive messages in batches. If one fails, you don’t want all five retried. Your messaging platform needs to know exactly which message failed, so only that one goes back to the queue.

// batch processor simulator
Batch size5
// AWS CDK, SQS + Lambda
One flag enables partial failure reporting. No application code needed.
new SqsEventSource(ordersQueue, { batchSize: 5, reportBatchItemFailures: true, // only failed messages are retried maxBatchingWindow: Duration.seconds(30), });

The pattern is universal: wrap each message in a try/catch. Throw for real errors (message returns to queue). Return silently for validation failures (message deleted). On Kafka you’d manually send to a DLQ topic; on SQS + Lambda, reportBatchItemFailures handles this with configuration alone.

// ordering

Do you actually need ordering?

Most engineers think they need guaranteed ordering. Most don’t. Ordered queues add cost and reduce throughput on every platform, Kafka, RabbitMQ, SQS alike. Unordered queues are “good enough” for more use cases than you’d expect.

Unordered Queue

Maximum throughput
Lower cost
Best-effort ordering
No ordering guarantee
Best for: pizza orders, notifications, async jobs
AWS: SQS Standard · Kafka default partitions

Ordered Queue

Exactly-once delivery
Strict ordering
Lower throughput
Higher cost
Best for: financial ledgers, stock trades, chat messages
AWS: SQS FIFO · Kafka ordered partitions
“I’ve spoken to hundreds of engineers building serverless apps. Most think they need ordering. Almost none actually do.”
// message structure

Wrap your message.
Every single time.

Don’t put raw JSON on a queue. Wrap it in a CloudEvents envelope. Click any highlighted field to see exactly why it earns its place.

// message explorer, click any field
← click a field
Select any highlighted field in the CloudEvents view to learn what it does and why it matters.
// async request / reply

The kitchen needs
to talk back.

Asynchronous doesn’t mean fire-and-forget. The replyTo URL in your CloudEvents envelope lets the consumer send a response back, on a queue the producer controls.

// replyTo flow, step through a message
Order
Service
Orders
Queue
Kitchen
Service
Ack
Queue
Ack
Processor
Waiting to start
// click "Next Step" to trace a message through the system
Watch an order travel from submission to kitchen acknowledgement, one hop at a time.
The replyTo pattern keeps the producer in control. The kitchen doesn’t hardcode where to reply, it’s told at runtime, in the message itself.
// observability

Three services.
One trace.

How do you know a slow consumer is caused by the producer? With OpenTelemetry parent-child relationships, you see the entire message journey in one waterfall, across all three services, regardless of which queue technology you use.

// distributed trace, order #8472 (4.4s end-to-end)
order-receiver / send-pizza-order
order-receiver / store-to-dynamo
order-receiver / publish-to-sqs
kitchen-processor / process-pizza-order
kitchen-processor / validate-order
kitchen-processor / send-acknowledgement
ack-processor / process-acknowledgement
order-receiver kitchen-processor ack-processor

Works by injecting trace context into the message when publishing, then extracting it in the consumer to start a child span. There are two approaches:

Option A, transport attributes
Inject into SQS message attributes or Kafka headers. Works transparently but is transport-specific, switching queues means updating your propagation code.
Option B, CloudEvents traceparent extension ✓ recommended
Include traceparent as a CloudEvents extension attribute in the message body itself. Defined in the CloudEvents distributed tracing spec. The consumer reads it from the envelope and creates a child span, completely transport-agnostic. Switch from SQS to Kafka to RabbitMQ and your tracing just works.
// key takeaways

The patterns that matter.

Everything on this page applies whether you’re using SQS, Kafka, RabbitMQ, or Azure Service Bus. The platform changes. The patterns don’t.

01
Use a queue to decouple producer from consumer. An outage in the consumer doesn’t take down the producer. Traffic spikes buffer in the queue. They never need to be online at the same time.
02
Always have a Dead Letter Queue. Poison pill messages are inevitable. Move them out of the main queue after N retries so one bad message never blocks everything else.
03
Report partial batch failures. When processing a batch, only retry the message that failed, not the entire batch. Wrap each message in a try/catch and signal failures individually.
04
Always wrap your payload in an envelope. CloudEvents gives you id, source, type, replyTo, and traceparent for free. Routing, deduplication, tracing, and async reply all depend on it.
05
Async reply needs a replyTo URL and a correlation ID. The producer embeds where to reply. A natural entity ID (orderId) or a generated UUID stored in your data store ties the reply back to the original request.
06
Put traceparent in the message body, not transport headers. The CloudEvents distributed tracing extension keeps your trace context transport-agnostic. Switch queues without touching your observability code.
Just Use A Queue

Stop treating queues
as an afterthought.

Learn to build resilient, production-grade serverless integrations, with the patterns, the code, and the observability to know when things go wrong.

Get the Course →

TypeScript · AWS CDK · Lambda · SQS · OpenTelemetry