The landscape of AI models is not a monolith. While it is tempting to find one model and use it for everything, doing so limits the quality of your output. Different models are trained with different prioritiesâsome excel at nuanced, creative writing, while others are built for rigorous analytical reasoning or processing massive amounts of text at once.
Understanding these distinct profiles is your starting point. However, because of the unpredictable nature of AI capabilitiesâoften called the jagged frontierâthese profiles are not absolute rules. The true skill lies in forming a hypothesis about which model fits your specific task and then actively testing it. By comparing outputs side-by-side, you move from passive consumption to deliberate selection, ensuring you always use the right tool for the job.
Assignment
Select a task you frequently perform (e.g., drafting a specific type of email, summarizing meeting notes). Write a robust prompt for this task.
Using the Lab's A/B comparison panel, run this prompt through two different models (e.g., Claude and OpenAI, or DeepSeek and Qwen).
Document the differences in the outputs. Which model followed your instructions better? Which tone was more appropriate? Write a brief summary of your findings and which model you will use for this task moving forward.
Learning Objectives
- Understand the distinct strengths of major AI models like Claude, OpenAI, DeepSeek, Qwen, and Kimi.
- Learn to form hypotheses about which model best fits a specific task.
- Utilize A/B testing to empirically determine the most effective model for your workflow.
The Jagged Frontier
The uneven and unpredictable capabilities of AI models. A model might excel at a highly complex task while failing at a seemingly simple one. This means model selection guidelines are starting points, not absolute rules.
Hypothesis-Driven Testing
The practice of predicting which model will perform best based on its known strengths, and then actively testing that assumption rather than relying on default choices.