If you've built a service that calls another service directly over HTTP, this page is for you. It works great, your services are communicating, right up until it doesn't.
You're working on the backend for a pizza restaurant. Your order service calls your kitchen service over HTTP. Works great... Until the kitchen deploys a breaking change at 3am on a Friday. With HTTP, one failure becomes two.
Kitchen goes down? Order service starts 500-ing too. Two independent services fail as one.
Friday rush sends 10× traffic. HTTP calls hammer the kitchen until it collapses, taking orders down too.
One malformed order (pineapple on a pizza, unacceptable) retries forever, blocking every valid order behind it.
Instead of your order service calling the kitchen service directly, it drops a message into a queue. The kitchen picks it up when it’s ready. They never need to talk to each other directly — or even be running at the same time.
Try it. Take the kitchen offline, then send some orders. Watch what happens.
The order service doesn’t know or care whether the kitchen is running. It just drops the message and moves on. The kitchen processes it when ready — even if that’s after a restart, a deploy, or a three-hour outage.
That’s it.
A point-to-point message channel decouples your services at runtime. The order service writes to a queue. The kitchen reads from it when ready. They never need to be online at the same time.
One message travels through 3 services, 2 queues, 1 data store, fully asynchronously.
AWS implementation: API Gateway · Lambda · SQS · DynamoDB
Without a Dead Letter Queue, one invalid message gets retried forever, blocking every valid message behind it. This is the poison pill problem. The fix is pure configuration.
After N failed attempts, the message moves to the Dead Letter Queue automatically. The rest of the queue? Unaffected. With most cloud messaging platforms, this is pure configuration, no application code required.
Most queue consumers receive messages in batches. If one fails, you don’t want all five retried. Your messaging platform needs to know exactly which message failed, so only that one goes back to the queue.
The pattern is universal: wrap each message in a
try/catch. Throw for real errors (message returns to
queue). Return silently for validation failures (message deleted). On
Kafka you’d manually send to a DLQ topic; on SQS + Lambda,
reportBatchItemFailures handles this with configuration
alone.
Most engineers think they need guaranteed ordering. Most don’t. Ordered queues add cost and reduce throughput on every platform, Kafka, RabbitMQ, SQS alike. Unordered queues are “good enough” for more use cases than you’d expect.
Don’t put raw JSON on a queue. Wrap it in a CloudEvents envelope. Click any highlighted field to see exactly why it earns its place.
Asynchronous doesn’t mean fire-and-forget. The
replyTo URL in your CloudEvents envelope lets the
consumer send a response back, on a queue
the producer controls.
replyTo pattern keeps the producer in control. The
kitchen doesn’t hardcode where to reply, it’s told at
runtime, in the message itself.
How do you know a slow consumer is caused by the producer? With OpenTelemetry parent-child relationships, you see the entire message journey in one waterfall, across all three services, regardless of which queue technology you use.
Works by injecting trace context into the message when publishing, then extracting it in the consumer to start a child span. There are two approaches:
traceparent extension
✓ recommended
traceparent as a CloudEvents extension
attribute in the message body itself. Defined in the
CloudEvents distributed tracing spec. The consumer reads it from the envelope and creates a child
span, completely transport-agnostic. Switch from SQS to Kafka to
RabbitMQ and your tracing just works.
Everything on this page applies whether you’re using SQS, Kafka, RabbitMQ, or Azure Service Bus. The platform changes. The patterns don’t.
Learn to build resilient, production-grade serverless integrations, with the patterns, the code, and the observability to know when things go wrong.
Get the Course →TypeScript · AWS CDK · Lambda · SQS · OpenTelemetry