Evaluating Outputs | Curriculum

Evaluating AI outputs is not just about deciding if something is "good" or "bad." It is a rigorous process of measuring utility against specific constraints. When you invest time in crafting a prompt, the sunk-cost fallacy often creeps in, tempting you to accept mediocre results simply because you worked hard to get them. This lesson introduces a structured approach to evaluation, ensuring you rate outputs honestly and objectively.

We will explore the Lab's 1-5 evaluation scale, moving from completely unusable to production-ready. More importantly, we will discuss why the written notes accompanying your scores are the true engine of improvement. By applying multidimensional evaluation criteria and building a "rejection library," you will transform failed outputs from frustrations into valuable learning assets.

Assignment

Take three recent AI outputs you generated. Rate each using the 1-5 scale. For each output, write a brief note explaining the score, specifically addressing goal alignment and constraint adherence. Add any outputs scoring 2 or below to your rejection library with a note on why they failed.

Learning Objectives

Apply the 1-5 evaluation scale to AI outputs objectively.
Evaluate outputs across multiple dimensions: goal alignment, constraint adherence, creative value, and ecosystem fit.
Document rejections to build a valuable reference library.

The 1-5 Evaluation Scale

A structured rating system where 1 means completely unusable and 5 means production-ready. It forces a concrete decision on the utility of an output.

The Rejection Library

A documented archive of failed outputs and the specific reasons they were rejected. This library becomes a critical resource for refining future prompts.