Datasets¶ Sources, curation, licensing, and privacy considerations. Topics¶ Deduplication and quality filtering Mixture design and data recipes Synthetic data: benefits and pitfalls Sensitive data and compliance