RLHF
(0)

Reinforcement Learning from Human Feedback