跳转至

Datasets

Sources, curation, licensing, and privacy considerations.

Topics

  • Deduplication and quality filtering
  • Mixture design and data recipes
  • Synthetic data: benefits and pitfalls
  • Sensitive data and compliance