跳转至

EvalScope Quickstart

EvalScope provides a simple harness to define test cases, run them against models, and compute metrics. Use it to compare prompts, models, or decoding settings.

Install

pip install evalscope

Define Cases (YAML or JSON)

cases:
  - id: greeting
    input: "Say hello in Spanish"
    expected_contains: ["hola"]

Run

evalscope run \
  --cases cases.yaml \
  --model openai/gpt-4o-mini \
  --provider-base-url http://127.0.0.1:4000  # via LiteLLM

Tips

  • Add task-specific metrics (exact match, regex, BLEU/ROUGE) per use case.
  • Version your cases alongside code to track regressions.