Document intake pipelines
Turn PDFs, emails, and images into typed, validated data.
One AI use case, scoped against your Three-Horizon allocation, built to production in 60-90 days. Your team owns the eval suite, the runbook, and the on-call after handoff. Not an automation shop. Not a freelance build.
Scoped builds are owned by your team after I leave. The build is model-agnostic by default, structured around patterns I ship in Baxie production today. If your use case isn't on the list, ask. The patterns transfer.
Turn PDFs, emails, and images into typed, validated data.
Match how Baxie's orchestrator works.
CRM, ticketing, intake forms that augment a team without replacing the workflow.
Production output becomes training signal for the next run.
Grounded in the shared knowledge base your team and your AI both work from, or built alongside one.
Most "AI build" consultancies optimize for the demo. This engagement optimizes for the steady state.
Build the demo, hand off the code, charge for the deck. Skip eval because production reliability is your problem. Optimize for the showcase, not the steady state. Leave you holding undocumented prompts and a brittle pipeline.
Production-deployed code in your repo, your cloud, your accounts. Eval suites that catch regressions before they hit users. Multi-model routing logic that controls cost without manual tuning. Observability per environment so the next on-call engineer isn't flying blind. Documentation written for the team that takes it over.
Me as the builder. One engineer from your team rides along as the future owner. One product or ops partner owns the use case.
Lock the use case to one specific workflow. Collect 50 to 200 real examples for evals. Pick the model strategy against cost and latency targets.
Stand up the agent graph, routing, retrieval, and observability. Wire production integrations to your existing tools. Eval suite running against real examples.
Token tracking, rate limit handling, retries, stall detection. Multi-model routing logic (Opus for hard, Sonnet for default, Haiku for routine). Regression suite on the prompts that matter.
Train the team on the eval workflow. Document the operating playbook. Schedule 30/60/90 check-ins after engagement closeout.
If you're on a different stack, I'll learn yours in the first week. The eval discipline, the routing logic, and the observability patterns travel.
Anthropic Claude API (Opus, Sonnet, Haiku), OpenAI, Model Context Protocol (MCP).
Python and TypeScript, Next.js Server Actions, Supabase, Postgres.
pgvector, vector databases, RAG pipelines.
Braintrust for evals, observability, and prompt regression. Playwright for E2E coverage on AI-touching UI.
The handoff is the engagement. By week 12, your team can debug and iterate without me.
One AI workflow in production, owned by your team. Eval coverage on the top 5 to 10 failure modes. Token tracking and observability live per environment. A team that can debug and iterate without my involvement.
The workflow stays in production without the original consultant on retainer. New AI features build on top of the spine instead of starting from zero. Cost stays controlled because the multi-model routing is documented.
The first 14 days are a ground-truth phase. We confirm the use case, the data, the integration surface, and the success criteria. If the spec isn't locked by day 14, you walk with no further commitment. The work to that point is yours. That removes the fixed-bid risk for both of us.
By the end of the first 14 days, you have a written spec: the use case, the data, the integration surface, the success criteria, the model strategy, the team rides-along. If we can't lock it, the engagement ends. You keep the eval scaffolding, the ground-truth dataset, and the written read on why it stalled. No fixed-bid math against unknown data.
Most AI builds fail because the spec gets locked before the data gets looked at. Then both sides spend 90 days on the wrong target. The 14-day phase puts ground truth before scope. Either we ship from a spec we both trust, or we don't ship. Both outcomes leave you with useful artifacts.
The build is the deliverable. These come with it. Named. Standalone value listed. Folded into the engagement.
The operating manual your next on-call engineer needs. Stall detection, retry logic, failure modes, escalation paths. Written for the team that takes it over, not for the engagement closeout.
After handoff, 30 days where production incidents come to me before they come to your team. Bug triage, prompt regressions, model behavior shifts. Your team learns the runbook by watching it run.
A live walkthrough of the architecture, the evals, the routing logic, the failure modes. Recorded so the next hire onboards from the tape, not a fresh meeting.
Versioned eval set with labeled examples, regression tests on the prompts that matter, CI wiring so the suite runs on every prompt or model change. Quality bar lives in the repo, not in my head.
A 60 to 90 day engagement that ships one production AI capability inside your stack: multi-agent workflow, document intake pipeline, AI copilot, retrieval-augmented assistant, or closed-loop learning system. Deliverable is production-deployed code in your repo, your cloud, your accounts, plus eval suites, multi-model routing, observability, and documentation. Your team owns the on-call after handoff. Not an automation shop. Not a freelance build.
$30-60k, 60 to 90 days end to end. Includes a 14-day ground-truth phase: if the data doesn't support production reliability, you keep the eval scaffolding and we end the engagement. Quoted in writing within 48 hours of the discovery call.
Automation agencies optimize for the demo. Scoped Build optimizes for the steady state. Production-deployed code in your repo (not theirs), eval suites that catch regressions before users do, observability per environment, multi-model routing that controls cost. The team owns the runbook after handoff, no undocumented prompts or brittle pipelines.
Document intake pipelines (PDFs, emails, images to typed validated data), multi-agent estimating or scoping workflows, AI copilots inside existing CRM or ticketing tools, retrieval-augmented assistants grounded in a shared knowledge base, and closed-loop learning systems where production output becomes training signal. Model-agnostic by default. The patterns transfer to other use cases too.
Yes. The engagement assumes one engineer from your team rides along, not as the builder, but as the future owner. If you don't have anyone yet, we add a hire-or-train phase upfront.
You find out in week 2, not month 6. The first phase is ground truth. If the data doesn't support production reliability, I tell you, you keep the eval scaffolding, and we either reframe the use case or end the engagement.
Yes. The build is model-agnostic by default. Bring your stack. The patterns transfer.
Optional. Most clients move into a fractional retainer if they want me to keep owning AI work post-build. Otherwise the team runs it.
Book the call. 30 minutes to scope. Two weeks to start.