跳转至

LLM Cody Wiki

Alignment & Safety

Alignment & Safety¶

Methods to align models with human preferences and constraints.

Topics¶

RLHF/DPO and preference data
Guardrails, policy engines, allow/deny lists
Red teaming and jailbreak resilience
Privacy, security, and compliance