Multi-Cloud Without the Chaos
Building a single pipeline abstraction across AWS, GCP, and Azure, and why the queue lives in Postgres.
Running workloads across AWS, GCP, and Azure sounds like an architecture astronaut's dream and an operator's nightmare. Here's how to do it without losing your mind.
The Abstraction Layer Problem
Every cloud provider has their own queue (SQS, Pub/Sub, Service Bus), their own object store (S3, GCS, Blob), their own function runtime. If you use these directly, you're locked in and every migration is a rewrite.
Why the Queue Lives in Postgres
For our use case (relatively low volume, high reliability requirements), using Postgres as a job queue was the right call:
- No new infrastructure: we already had Postgres
- ACID guarantees: a job is either claimed or it isn't, no duplicates
- Queryable: debugging a distributed queue without SQL is miserable
- Portable: works the same on every cloud
The pattern is simple: a jobs table with a claimed_at column. Workers claim jobs with a SELECT ... FOR UPDATE SKIP LOCKED, which is atomic and concurrent-safe in Postgres.
What We Gave Up
Throughput. Postgres queues work well up to tens of thousands of jobs per hour. If you need millions, you need Kafka or SQS. Know your scale before you choose.
— Amisha
Filed under: AI Systems