Building Datasets with easy-dataset¶
easy-dataset
streamlines collecting, transforming, and exporting datasets in JSONL format for training and evaluation.
Install¶
pip install easy-dataset
Example: Convert CSV to JSONL¶
easy-dataset convert \
--input data/raw.csv \
--input-format csv \
--output data/train.jsonl \
--map "instruction=question" --map "output=answer"
Tips¶
- Keep a small dev/test split to catch regressions and overfitting.
- Deduplicate and sanitize PII before publishing datasets.