Evaluating AI outputs is not just about deciding if something is "good" or "bad." It is a rigorous process of measuring utility against specific constraints. When you invest time in crafting a prompt, the sunk-cost fallacy often creeps in, tempting you to accept mediocre results simply because you worked hard to get them. This lesson introduces a structured approach to evaluation, ensuring you rate outputs honestly and objectively.
We will explore the Lab's 1-5 evaluation scale, moving from completely unusable to production-ready. More importantly, we will discuss why the written notes accompanying your scores are the true engine of improvement. By applying multidimensional evaluation criteria and building a "rejection library," you will transform failed outputs from frustrations into valuable learning assets.
Assignment
Take three recent AI outputs you generated. Rate each using the 1-5 scale. For each output, write a brief note explaining the score, specifically addressing goal alignment and constraint adherence. Add any outputs scoring 2 or below to your rejection library with a note on why they failed.
Learning Objectives
- Apply the 1-5 evaluation scale to AI outputs objectively.
- Evaluate outputs across multiple dimensions: goal alignment, constraint adherence, creative value, and ecosystem fit.
- Document rejections to build a valuable reference library.
The 1-5 Evaluation Scale
A structured rating system where 1 means completely unusable and 5 means production-ready. It forces a concrete decision on the utility of an output.
The Rejection Library
A documented archive of failed outputs and the specific reasons they were rejected. This library becomes a critical resource for refining future prompts.