AI Playbook 2 of 5

How to Map Task Complexity to Appropriate Model Tiers

Budget models perform at 90%+ on narrow tasks for a fraction of the cost; on wide agentic tasks requiring sustained reasoning, they fail or loop. This playbook gives you concrete techniques for classifying your workflows by reasoning demands, routing each to the appropriate model tier, spotting hidden complexity in borderline tasks, and keeping your routing current as the model landscape shifts.

Developing Start here. Build the foundation.
  • List your 10 most common AI-assisted tasks and classify each on a complexity spectrum. For each task, answer: Does it have a clear, unambiguous input? Does it require a single reasoning step or multiple chained steps? Does it need domain-specific judgment or can anyone verify the output? Tasks with clear inputs, single steps, and easy verification are narrow tasks suitable for budget models. Tasks with ambiguous inputs, multi-step reasoning, or outputs that require expert judgment are wide tasks that need frontier models. Post this classification where your team can see it.
  • Pick 3 tasks you currently send to your most capable model and run them on a model one tier lower for one week. Track output quality using a simple pass/fail rating for each result. If the cheaper model passes 90% or more of the time, you have found a safe downgrade that saves money without meaningful quality loss. If it fails frequently, you have confirmed the task genuinely needs the higher tier. Either outcome improves your routing confidence with real data.
  • Create a decision tree for your most common routing choice. Start with the question 'Does this task require multi-step reasoning?' If no, route to budget. If yes, ask 'Is the task domain well-defined with clear success criteria?' If yes, try mid-tier. If no, route to frontier. Tape this decision tree next to your monitor for two weeks and use it every time you select a model. After two weeks, evaluate whether the tree needs adjustment based on the outcomes you observed.
Proficient Build consistency and rhythm.
  • Identify 3 borderline tasks in your workflow that appear simple but contain hidden complexity. Common indicators: the task involves disambiguation across many similar options, requires understanding implicit context that is not in the prompt, or produces outputs that look correct but contain subtle errors detectable only by domain experts. Test these tasks on multiple model tiers to find the minimum tier that handles the hidden complexity reliably. These borderline tasks are where most routing mistakes happen.
  • Implement a cascading routing pattern for your highest-volume task type. Configure the workflow to start at a budget model and escalate to a higher tier when the initial model signals low confidence, produces output below a quality threshold, or fails a basic validation check. Track the escalation rate over one month. An escalation rate above 30% suggests the task is too complex for the initial tier. An escalation rate below 10% confirms the cascade is capturing real savings. Adjust the initial tier and escalation thresholds based on the data.
  • Build a team routing map: a shared document that lists every recurring AI task, its complexity classification, the assigned model tier, and the evidence supporting that assignment. Review this map in a monthly team meeting. When someone discovers a task that performs better or worse than expected on its assigned tier, update the map. This makes routing knowledge a team asset rather than individual guesswork and prevents different team members from making contradictory routing decisions.
Mastered Operate at the highest level.
  • Set a quarterly routing reassessment cadence. Each quarter, re-run your domain-specific test cases on the current model landscape, checking whether new models or updated versions have shifted the performance boundaries. Update your team routing map with the results. Pay special attention to mid-tier models, as this tier shifts most rapidly and often absorbs capabilities that previously required frontier models. Teams that reassess quarterly typically find 2-3 routing changes per review that improve either cost or quality.
  • Train your team on task complexity classification by running a 30-minute workshop. Present 10 real tasks and have each participant independently classify them as narrow, borderline, or wide. Compare classifications and discuss disagreements. The disagreements reveal where your team's routing judgment needs calibration. Run this exercise semi-annually to maintain shared understanding as both your workflows and available models evolve.
  • Document and share your cascading routing configurations, including the specific confidence thresholds, quality checks, and escalation rules you use. When a pattern works well, write it up as a reusable template. When a pattern fails, document what went wrong and what you changed. This institutional knowledge prevents each team member from reinventing routing patterns and ensures new team members inherit proven configurations rather than starting from scratch.

Unlock Skill Progression

Coaching Personalized to your current level
Progress Tracking Across every skill area
Mastery Validation Evidence-based, not guesswork