跳转至

LLM Cody Wiki

Overview

Evaluation¶

Measuring model quality for tasks and safety.

Topics¶

Task metrics (accuracy, F1, BLEU, ROUGE)
Human eval vs automated eval
Prompt robustness and adversarial testing
Safety, bias, and fairness checks