AI Playbook 4 of 5

How to Optimize AI Spend Through Intelligent Routing

The 30x to 200x cost spread between AI model tiers means small routing mistakes multiply into large budget impacts. Over-provisioning wastes money on tasks that budget models handle well; under-provisioning produces failures that cost far more than the compute savings. This playbook gives you concrete techniques for gaining cost visibility, implementing spending discipline, calculating the true economics of routing decisions, and presenting optimization strategies that leadership can act on.

This playbook covers the how. For the why and what, see the skill definition .

Developing Start here. Build the foundation.

Audit your current AI spend by listing every AI-powered task, the model it uses, and the approximate cost per execution. If exact costs are not available, use vendor pricing pages to estimate based on typical input/output sizes for each task. Organize the list by cost from highest to lowest. The top 3-5 cost drivers typically account for 60-80% of total spend. This visibility is the prerequisite for every optimization that follows. Share the audit with your team so everyone understands where the money goes.
Implement a basic tiered spending policy by designating which model tier is authorized for which task category. Create three categories: routine operations (email drafting, simple summarization, data formatting) routed to budget models, standard work (analysis, content creation, moderate reasoning) routed to mid-tier, and complex autonomous work (multi-step agentic tasks, high-stakes decisions, novel problem-solving) authorized for frontier models. Post this policy where your team can reference it. Even a simple policy prevents the default behavior of routing everything to the most expensive model.
Start tracking rework caused by model failures. For the next two weeks, every time you need to redo work because an AI model produced inadequate output, log: the task, the model used, the time spent on rework, and whether a higher-tier model would have gotten it right the first time. This rework log is the raw data you need to calculate whether budget model savings are real or illusory for specific task types.

Proficient Build consistency and rhythm.

Calculate the true cost of under-provisioning for your top 3 cost-saving routing decisions. For each task you routed to a cheaper model, document: (1) the compute cost saved per execution, (2) the failure rate on the cheaper model, (3) the average rework time per failure, (4) the cost of that rework time at your loaded labor rate, and (5) any downstream impact of failures that were not caught. If the total cost of failures exceeds the compute savings, the routing decision is actually losing money. This analysis typically reveals 1-2 tasks where budget routing is genuinely cheaper and 1-2 where it is not.
Adjust your spending allocations based on market pricing changes. Set a quarterly reminder to review pricing across your model vendors. When a mid-tier model drops in price or a new model offers better cost-performance ratio, re-run your domain-specific tests to see if routing adjustments are warranted. Pricing shifts of 30% or more, which happen regularly in this market, can change the optimal routing for entire task categories. Teams that review quarterly capture these opportunities; teams that set pricing once miss them.
Build a cost dashboard that tracks AI spend by workflow category over time. Include four metrics: total spend, spend per task execution, failure rate by model tier, and estimated rework cost. Review this dashboard monthly. Trends matter more than point-in-time numbers. Rising failure rates on a budget tier may signal that task complexity has increased and routing needs adjustment. Declining spend with stable quality confirms optimization is working.

Mastered Operate at the highest level.

Present an AI cost optimization strategy to leadership using a structured framework. Start with current state: total AI spend, breakdown by workflow category, and spend per model tier. Then show the opportunity: tasks currently over-provisioned with estimated savings from routing down, and tasks currently under-provisioned with estimated rework and failure costs. Propose specific changes with projected ROI and implementation timeline. Include risk assessment for each change. Leaders make better decisions when they see both the savings opportunity and the reliability implications.
Develop a cost-optimization playbook for your team that documents proven routing configurations, cost-performance data for each model tier on your specific tasks, and decision rules for when to route up versus down. Include the rework cost calculations so team members understand the total economics, not just the per-token price. Update this playbook quarterly with fresh data. This institutional knowledge prevents each person from rediscovering routing economics independently.
Establish a cost-optimization review as a standard agenda item in your team's monthly operations meeting. Spend 10 minutes reviewing the cost dashboard, highlighting any routing changes made in the past month and their outcomes, and identifying the next optimization opportunity to test. This cadence keeps cost discipline as an ongoing practice rather than a one-time exercise that slowly erodes as people revert to familiar models.

Back to AI Model Routing Playbook

Unlock Skill Progression

Coaching Personalized to your current level

Progress Tracking Across every skill area

Mastery Validation Evidence-based, not guesswork