← Selected work · 2026
AI Steering Systems.
Prompt and workflow architecture for teams using LLMs in production — structured outputs, retrieval patterns, and the evaluation thinking that keeps the system honest.
- Year
- 2026
- Role
- Technical partner · prompt architect
- Stack
- TypeScript, OpenAI, Anthropic, JSON schema, lightweight eval harness
- Status
- Engagement-based, ongoing
Overview
Most teams stop at “the prompt works on my laptop.” This work picks up from there: turning a working prompt into a system you can change, measure, and trust under real traffic.
Problem
Production LLM use breaks in subtle ways — drift across model versions, fragile parsing, retrieval that returns the wrong context, prompt changes that nobody can verify. Teams ship and then can’t change anything without anxiety.
Constraints
- Vendor-neutral by default — assume the model under the system will change
- Outputs must be machine-parseable and human-reviewable
- Eval has to be lightweight enough that engineers actually run it
- The system has to be readable by a non-AI engineer six months from now
Approach
Treat prompts as interfaces between humans and models, not magic strings. Define inputs, outputs, and contracts first. Layer the system: orchestration → prompt → retrieval → evaluation. Make every layer observable.
Implementation
Typed prompt modules with JSON-schema outputs. Thin orchestration layer abstracting providers. Retrieval pipelines designed around what the prompt actually needs, not generic semantic search. A small eval harness that runs on every change with a curated set of cases.
Result
Teams stop fearing prompt changes. Regressions get caught before users see them. Provider swaps become days of work, not weeks. And — quietly — the prompts get smaller, not bigger, because the structure does the heavy lifting.
Reflection
The discipline that pays off most isn’t prompt engineering — it’s eval. The teams that take eval seriously can move fast forever. The ones that don’t end up with a prompt they’re afraid to touch.
Need this for your team?
If your team is shipping LLM features and starting to feel the cracks — this is the kind of work I do.