A Study of Thinking in Rule-Based Visual Reinforcement Fine-Tuning
Ming Li, Jike Zhong, Shitian Zhao, Yuxiang Lai, Haoquan Zhang, Wang Bill Zhu, Kaipeng Zhang
¹ No-Thinking-RFT often matches or exceeds Thinking-based RFT in visual tasks.
² Low-capability models produce poor CoT, reducing Thinking-based RFT effectiveness.
³ Inconsistencies in Thinking-based RFT responses may impede reward convergence.