跳转至

Building Datasets with easy-dataset

easy-dataset streamlines collecting, transforming, and exporting datasets in JSONL format for training and evaluation.

Install

pip install easy-dataset

Example: Convert CSV to JSONL

easy-dataset convert \
  --input data/raw.csv \
  --input-format csv \
  --output data/train.jsonl \
  --map "instruction=question" --map "output=answer"

Tips

  • Keep a small dev/test split to catch regressions and overfitting.
  • Deduplicate and sanitize PII before publishing datasets.