[2401.08967] ReFT: Reasoning with Reinforced Fine-Tuning

04-14-2025
Note:
字节出品的论文,做强化微调