Multi-Cloud Without the Chaos

Building a single pipeline abstraction across AWS, GCP, and Azure, and why the queue lives in Postgres.

Running workloads across AWS, GCP, and Azure sounds like an architecture astronaut's dream and an operator's nightmare. Here's how to do it without losing your mind.

The Abstraction Layer Problem

Every cloud provider has their own queue (SQS, Pub/Sub, Service Bus), their own object store (S3, GCS, Blob), their own function runtime. If you use these directly, you're locked in and every migration is a rewrite.

Why the Queue Lives in Postgres

For our use case (relatively low volume, high reliability requirements), using Postgres as a job queue was the right call:

No new infrastructure: we already had Postgres
ACID guarantees: a job is either claimed or it isn't, no duplicates
Queryable: debugging a distributed queue without SQL is miserable
Portable: works the same on every cloud

The pattern is simple: a jobs table with a claimed_at column. Workers claim jobs with a SELECT ... FOR UPDATE SKIP LOCKED, which is atomic and concurrent-safe in Postgres.

What We Gave Up

Throughput. Postgres queues work well up to tens of thousands of jobs per hour. If you need millions, you need Kafka or SQS. Know your scale before you choose.

— Amisha

Filed under: AI Systems

Multi-Cloud Without the Chaos

01 The Abstraction Layer Problem

02 Why the Queue Lives in Postgres

03 What We Gave Up

The Abstraction Layer Problem

Why the Queue Lives in Postgres

What We Gave Up