01 · Dimension
Pilot-to-production conversion
What it measures: What share of AI pilots cleared in the last 12 months are running in production today against a live workflow.
Low (1-2): Pilot purgatory. Pilots run for months in an innovation sandbox, never reach the team whose workflow they were supposed to change, and quietly stop being reported on.
Medium (3): Selective handoff. One or two pilots have made it into production, but ownership is unclear and the rest sit waiting on someone to decide.
High (4-5): Default to production. Pilots have a named production owner before they start. The default path is shipping, not piloting.
02 · Dimension
Decision-flow clarity
What it measures: Whether the decisions an AI workflow has to make are written down, or live in someone's head as oral tradition.
Low (1-2): Oral tradition. The senior IC who knows how a case gets triaged or a contract gets redlined cannot describe the rule. The workflow lives in pattern matching, not in a documented decision tree.
Medium (3): Half-mapped. Some decisions are written in policy docs that nobody reads. Others are tribal. AI gets bolted onto the written part and trips over the tribal part.
High (4-5): Explicit decision flow. Each step in the workflow has a named input, a named decision, a named owner, and a documented escalation rule. An agent can read it. A new hire can read it.
03 · Dimension
Agent-human co-actor design
What it measures: Whether agents are designed as co-actors with humans in the loop, or bolted onto an unchanged workflow.
Low (1-2): Bolt-on automation. AI replaces a button click in an existing tool. The work, the org, and the headcount math are unchanged. The team treats AI as a chore, not a co-actor.
Medium (3): Augmented IC. One or two roles use AI as a real assist (drafting, summarizing, research) but the workflow upstream and downstream of them hasn't been redesigned around it.
High (4-5): Co-actor by design. Agent and human responsibilities are partitioned explicitly. The agent owns the keystrokes; the human owns the judgment calls and the relationships. The org chart reflects the split.
04 · Dimension
Eval discipline
What it measures: Whether AI bets are measured against real examples and regression-tested, or shipped on vibes.
Low (1-2): Vibes. Quality is judged by a demo and a feeling. There is no eval set. Prompt changes hit production without regression testing. Failure shows up as a customer escalation.
Medium (3): Spot checks. One person on the team runs a handful of test cases when a prompt changes. The eval set isn't versioned. Coverage is unclear.
High (4-5): Versioned eval suite. Real examples, labeled, versioned, run automatically on every prompt or model change. Regression failures block merge. The eval suite is part of the team's deliverable.
05 · Dimension
Context layer
What it measures: Whether there is one source of truth your agents and your humans both read from, or scattered docs and chat threads.
Low (1-2): Scattered context. Positioning lives in a deck. Policy lives in a wiki page nobody updates. Decisions live in Slack threads. Agents are prompted with whatever the engineer remembered to paste.
Medium (3): Wiki, but not for agents. The humans have a knowledge base. The agents don't read from it. The two surfaces drift.
High (4-5): Shared knowledge base in place. A versioned, searchable knowledge base your team and your AI both read and write to. Drift is a bug, not a steady state.
06 · Dimension
Allocation discipline
What it measures: Whether AI investment is split explicitly across H1 efficiency, H2 new capabilities, and H3 transformation, or all in one bucket.
Low (1-2): One bucket. Every AI bet is sold as transformational. Nothing is funded as a 6-month efficiency play. The CFO has no allocation defense.
Medium (3): Implicit split. The team knows which bets are short-term and which are long-term but the split isn't documented. Board decks describe everything as strategic.
High (4-5): Explicit Three-Horizon mix. Budget is split across H1, H2, H3 with named owners, named payback windows, and named failure modes per horizon. The CFO can defend it.