STAIR Research Group | Scalable & Trustworthy AI Research
STAIR Research Group | Scalable & Trustworthy AI Research
People
Projects
Talks
Publications
Light
Dark
Automatic
Reward and Guidance through Rubrics: Promoting Exploration to Improve Multi-Domain Reasoning
Baolong Bi
,
Shenghua Liu
,
Yiwei Wang
,
Siqian Tong
,
Lingrui Mei
,
Yuyao Ge
,
Yilong Xu
,
Jiafeng Guo
,
Xueqi Cheng
November 2025
Cite
DOI
PDF
Type
Preprint
Publication
CoRR, 2025, vol. abs/2511.12344
Cite
×