← Selected work · 2026

AI Steering Systems.

Prompt and workflow architecture for teams using LLMs in production — structured outputs, retrieval patterns, and the evaluation thinking that keeps the system honest.

Year
2026
Role
Technical partner · prompt architect
Stack
TypeScript, OpenAI, Anthropic, JSON schema, lightweight eval harness
Status
Engagement-based, ongoing

Overview

Most teams stop at “the prompt works on my laptop.” This work picks up from there: turning a working prompt into a system you can change, measure, and trust under real traffic.

Problem

Production LLM use breaks in subtle ways — drift across model versions, fragile parsing, retrieval that returns the wrong context, prompt changes that nobody can verify. Teams ship and then can’t change anything without anxiety.

Constraints

  • Vendor-neutral by default — assume the model under the system will change
  • Outputs must be machine-parseable and human-reviewable
  • Eval has to be lightweight enough that engineers actually run it
  • The system has to be readable by a non-AI engineer six months from now

Approach

Treat prompts as interfaces between humans and models, not magic strings. Define inputs, outputs, and contracts first. Layer the system: orchestration → prompt → retrieval → evaluation. Make every layer observable.

Implementation

Typed prompt modules with JSON-schema outputs. Thin orchestration layer abstracting providers. Retrieval pipelines designed around what the prompt actually needs, not generic semantic search. A small eval harness that runs on every change with a curated set of cases.

Result

Teams stop fearing prompt changes. Regressions get caught before users see them. Provider swaps become days of work, not weeks. And — quietly — the prompts get smaller, not bigger, because the structure does the heavy lifting.

Reflection

The discipline that pays off most isn’t prompt engineering — it’s eval. The teams that take eval seriously can move fast forever. The ones that don’t end up with a prompt they’re afraid to touch.

Need this for your team?

If your team is shipping LLM features and starting to feel the cracks — this is the kind of work I do.