PerplexityAI

04-14-2025

Link: https://lilianweng.github.io/posts/2024-11-28-reward-hacking/

Note:

Reward hacking occurs when a reinforcement learning (RL) agent exploits flaws or ambiguities in the reward function to achieve high rewards, without genuinely learning or completing the intended task. Reward hacking exists because RL environments are often imperfect, and it is fundamentally challenging to accurately specify a reward function.

Reward Hacking in Reinforcement Learning | Lil'Log