Glossary¶
- LLM: Large Language Model.
- RAG: Retrieval-Augmented Generation; combines retrieval with generation.
- Token: Unit of text used by tokenizers; affects cost and limits.
- Quantization: Reducing numerical precision (e.g., FP16→INT4) to speed/fit.
- Beam Search: Decoding strategy exploring multiple candidates.
- KV Cache: Cached key/value tensors to speed autoregressive decoding.
- Guardrails: Controls to steer or constrain model outputs.