EvalScope Quickstart¶
EvalScope provides a simple harness to define test cases, run them against models, and compute metrics. Use it to compare prompts, models, or decoding settings.
Install¶
pip install evalscope
Define Cases (YAML or JSON)¶
cases:
- id: greeting
input: "Say hello in Spanish"
expected_contains: ["hola"]
Run¶
evalscope run \
--cases cases.yaml \
--model openai/gpt-4o-mini \
--provider-base-url http://127.0.0.1:4000 # via LiteLLM
Tips¶
- Add task-specific metrics (exact match, regex, BLEU/ROUGE) per use case.
- Version your cases alongside code to track regressions.