Running and Comparing

Testing AI outputs shouldn't require a background in data science. The most effective way to understand how different models interpret instructions, or how slight tweaks to a prompt alter the result, is to look at the outputs side by side. This is the essence of A/B testing applied to generative AI.

In this lesson, we move from writing single prompts to running structured experiments. You will learn how to use the Lab's A/B comparison panel to isolate variables, document your changes, and evaluate the results across specific dimensions like tone, accuracy, and usability. By treating prompt engineering as a systematic process rather than a guessing game, you build reliable workflows that produce consistent results.

Assignment

Open the Lab's A/B comparison panel.
Select a single model and write a baseline prompt for a task relevant to your work (e.g., drafting a project update).
Run the prompt and review the output.
Create a variation of your prompt by changing exactly one element (e.g., adding a specific constraint about tone).
Run both prompts side by side and document the differences in the output across three dimensions: accuracy, tone, and completeness.

Learning Objectives

Understand the principles of A/B testing in the context of AI models and prompts.
Learn to control variables by changing only one element at a time during experiments.
Develop a systematic approach to documenting changes and comparing outputs across specific dimensions.

Controlling Variables

The foundation of any valid experiment. When comparing outputs, you must change only one variable at a time—either the model, the prompt structure, or the parameters. If you change multiple variables simultaneously, you cannot determine which change caused the difference in output.

Dimensional Comparison

Evaluating outputs not just on a binary 'good' or 'bad' scale, but across specific, measurable dimensions such as accuracy, tone, completeness, and usability. This structured approach prevents subjective bias and ensures the output meets the precise needs of the task.