Build logs and hot takes on agentic systems, retrieval, and the unglamorous infrastructure that makes AI reliable.
Function calling means the model emits a request and your code runs it. The dispatch table, the loop, the state dict: that is the engineering. Build the loop well and a mediocre model behaves like an agent.
Read the build logHow I built evaluation pipelines for non-deterministic systems, and why vibes-based testing doesn't scale.
7 min read →Notes on queues, idempotency keys, and what it actually means to recover gracefully from failure.
6 min read →What changes when a demo becomes load-bearing. The cuts, the rewrites, and the thing nobody tells you.
5 min read →Building a single pipeline abstraction across AWS, GCP, and Azure, and why the queue lives in Postgres.
8 min read →Clinical and academic users break your assumptions. Notes on the UCSF × Stanford deployment.
4 min read →LLMs are stateless and frozen. Short-term memory is just context-window management: what you keep, what you summarize, what you drop.
Coming soonMy notebooks ran on mock dictionaries. Here is what every mocked piece becomes in a real system: retries, timeouts, persisted state, observability.
Coming soonFixed, semantic, recursive, structure-based, LLM-based. Recursive is the sane default, but the failure modes are where the real decisions live.
Coming soon