All writing
AI SystemsFeb 20268 min read

Multi-Cloud Without the Chaos

Building a single pipeline abstraction across AWS, GCP, and Azure, and why the queue lives in Postgres.

Running workloads across AWS, GCP, and Azure sounds like an architecture astronaut's dream and an operator's nightmare. Here's how to do it without losing your mind.

The Abstraction Layer Problem

Every cloud provider has their own queue (SQS, Pub/Sub, Service Bus), their own object store (S3, GCS, Blob), their own function runtime. If you use these directly, you're locked in and every migration is a rewrite.

Why the Queue Lives in Postgres

For our use case (relatively low volume, high reliability requirements), using Postgres as a job queue was the right call:

  • No new infrastructure: we already had Postgres
  • ACID guarantees: a job is either claimed or it isn't, no duplicates
  • Queryable: debugging a distributed queue without SQL is miserable
  • Portable: works the same on every cloud

The pattern is simple: a jobs table with a claimed_at column. Workers claim jobs with a SELECT ... FOR UPDATE SKIP LOCKED, which is atomic and concurrent-safe in Postgres.

What We Gave Up

Throughput. Postgres queues work well up to tens of thousands of jobs per hour. If you need millions, you need Kafka or SQS. Know your scale before you choose.

— Amisha

Filed under: AI Systems

Next in this series
Your Agent Has Amnesia. Here's the Dict That Fakes Memory.
Week 2