跳转至

Glossary

  • LLM: Large Language Model.
  • RAG: Retrieval-Augmented Generation; combines retrieval with generation.
  • Token: Unit of text used by tokenizers; affects cost and limits.
  • Quantization: Reducing numerical precision (e.g., FP16→INT4) to speed/fit.
  • Beam Search: Decoding strategy exploring multiple candidates.
  • KV Cache: Cached key/value tensors to speed autoregressive decoding.
  • Guardrails: Controls to steer or constrain model outputs.