Skip to content
GAT Guzman Applied Technologies
The Memo · Pilot Purgatory

Your AI Pilot Didn't Fail. Your Org Chart Did.

The post-mortem on your stalled AI pilot will blame the model. The vendor. The data quality. The integration layer. The change plan.

6 min read 2026-05-22

The post-mortem on your stalled AI pilot will blame the model. The vendor. The data quality. The integration layer. The change plan.

All of those are wrong.

Your pilot did not fail because the technology was immature. It failed because you bolted production-grade AI onto an org chart designed for a 2018 SaaS workflow. The pilot worked in the demo. It died the moment it had to live inside a function already coordinating 14 people across 3 layers to ship one decision.

The technology was never the bottleneck. The shape of the team was.

The Bain thesis everyone is misreading

Bain's research on AI operating models keeps surfacing the same finding: companies deploying generative AI in operations see meaningful output gains in the functions where it lands 1. The CFO reads the headline and approves the next pilot. The COO reads the same headline and quietly notices something the deck did not say out loud.

If output rises and headcount stays flat, you are running a function with excess coordination capacity. That excess does not disappear. It converts into meetings, status updates, and rework cycles that justify the existing shape of the org. The pilot succeeds in the lab. Then it dies in the org, because the org has antibodies against any change that exposes the excess.

This is the part vendor decks skip. AI does not fail at the model layer for the companies I work with. It fails at the layer where a Director has to explain why their function is staffed for one level of throughput while the model can quietly deliver more.

That is not a technology problem. That is an org design problem.

Four failure patterns that look like tech problems

I walk into Pilot Purgatory engagements and see the same four patterns. Every one gets diagnosed as a tooling, data, or model issue. Every one is an org chart issue wearing a tech costume.

Pattern 1: The pilot has no operating owner

The pilot was sponsored by the CTO or a VP of Innovation. Built by a vendor or an internal lab. Handed to a line function to "adopt."

Nobody on the line function has the pilot in their comp plan. Nobody has it in their quarterly objectives. The Director running the team is measured on the metric the pilot is supposed to move, but the pilot itself is somebody else's project. So the team uses it for two weeks, hits one friction point, and quietly stops.

The fix is not better training. The fix is moving the pilot's success metric into the line Director's scorecard before the pilot ships. If you cannot do that, you do not have a pilot. You have a demo.

Pattern 2: The work was never decomposed

The pilot was scoped at the level of a job title. "AI for analysts." "AI for paralegals." "AI for SDRs."

Job titles are not workflows. A senior analyst does dozens of distinct tasks across a week. A handful the model can fully own. Some it can accelerate. Most it should never touch. Nobody decomposed the job. So the pilot got measured against the entire job, the model got a fraction of it right, and the conclusion was "AI is not ready for analyst work."

The model was ready. The scoping was lazy. You do not buy AI for a job title. You buy it for a task, and you have to know which tasks before you start.

Pattern 3: The manager-to-IC ratio was never updated

Anthropic's public writing on agent architectures points at a quiet operating shift: in an AI-leveraged team, the unit of leverage moves from the manager to the IC running a set of agents, and measurement shifts to what the IC ships rather than how many people a manager supervises 2.

Most companies running pilots still have the traditional manager-to-IC ratio. The ICs now have AI leverage. The manager still has the same span and the same calendar of 1:1s, status reviews, and approval gates. The IC is faster. The approval queue is the same length. Throughput does not move.

The new shape: fewer managers, wider spans, ICs running pods of agents, approvals batched or eliminated. You cannot get there without restructuring. No pilot survives an org where the bottleneck moved from "doing the work" to "getting the work approved."

Pattern 4: The function still owns the handoff

In most ops orgs, work crosses several functional boundaries before it ships. Sales hands to onboarding hands to delivery hands to billing hands to success. Each handoff has a queue. Each queue has a manager defending its SLA.

You drop AI inside one function. That function gets faster. The next function in the chain does not. The queue at the boundary grows. The Director of the next function complains the upstream team is "dumping work." The pilot gets blamed for creating chaos.

The pilot did not create chaos. It exposed the fact that the org was a chain of functional queues pretending to be a workflow. AI does not fix that by being deployed inside one queue. It fixes it by being deployed across the seam, which requires somebody senior enough to redraw the seam.

What the new shape looks like

The org chart that survives AI deployment has four characteristics. None of them are about technology.

  1. Pods, not functions. Cross-functional pods that own a customer outcome end-to-end, with agents embedded as teammates inside the pod. Functions become capability centers, not delivery centers.
  2. Wider spans, fewer layers. Manager-to-IC ratios widen. Layers compress. The ICs who survive are the ones who can run a fleet.
  3. Approval batching. Approval gates that used to be per-item become per-batch or per-exception. The CFO sees the math, not every line item.
  4. Outcome scorecards. Managers are measured on what their pod shipped, not on how many people report to them. Comp plans follow the outcome, not the headcount.

Notice what is not on the list. No model name. No vendor. No integration pattern. The org chart is the deliverable. The technology is the easy part.

A 5-point self-diagnostic before your next pilot

Before you fund another pilot, score yourself on these five. If you are below 3 out of 5, do not run the pilot. Fix the org first.

  1. Owner test. Can you name the line Director whose next-quarter scorecard includes the pilot's success metric? Not the sponsor. The owner.
  2. Task test. Can you list the specific tasks the pilot will own, with current human time-per-task and target post-pilot time-per-task?
  3. Span test. What is the manager-to-IC ratio in the function receiving the pilot? What will it be a year after the pilot ships? If it is the same, you are not deploying AI. You are buying a productivity gym membership.
  4. Approval test. Map the approval chain for the work the pilot will accelerate. Count the gates. If the model is an order of magnitude faster but the gates are unchanged, throughput will not move.
  5. Seam test. Where does the work the pilot accelerates leave the function it was deployed in? Who owns that downstream queue? Do they know what is coming?

If you cannot score yourself a 4 or 5 on all five, you do not need another pilot. You need a redesign.

What to do next

The companies that get this right do not start with model selection. They start with a structural audit of the function the pilot is supposed to live inside. They redraw the org first, then deploy the technology into the shape that can hold it.

That is the work I do as a Fractional Head of AI Transformation. The first 30 days is an AI-Native Org Audit: span analysis, task decomposition, approval-gate mapping, seam analysis, and a redesigned org chart with a phased transition plan. Not a deck. A working blueprint your COO can execute on.

If you have run two pilots and neither survived contact with the org, the third will not survive either. Stop buying technology. Start redesigning the function it is supposed to live in.

Book an AI-Native Org Audit

Sources

1. Bain & Company, "Technology Report" series and operating-model research, 2023-2024. Bain's recurring thesis is that generative AI produces meaningful output gains in functions where it is deployed in operations, and that capturing those gains requires operating-model change, not just tooling.

2. Anthropic, "Building Effective Agents," anthropic.com/research/building-effective-agents, December 2024, plus public engineering writing on how Anthropic teams work with Claude. The "leverage moves to the IC running agents" framing is a paraphrase, not a direct quote.