Article

VisualPRM: An Effective Process Reward Model for Multimodal Reasoning

Citation If you find this project useful in your research, please consider cite: @article{wang2025visualprm, title={VisualPRM: An Effective Process Reward Model for Multimodal Reasoning}, author={Wang, Weiyun and Gao, Zhangwei and Chen, Lianjie and Chen, Zhe and Zhu, Jinguo and Zhao, Xiangyu and Liu, Yangzhou and Cao, Yue and Ye, Shenglong and Zhu, Xizhou and others}, journal={arXiv preprint arXiv:2503.10291}, year={2025} }

Mar 13, 2025

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Citation If you find this project useful in your research, please consider cite:

Dec 6, 2024

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Citation If you find this project useful in your research, please consider cite: @article{wang2024enhancing, title={Enhancing the reasoning ability of multimodal large language models via mixed preference optimization}, author={Wang, Weiyun and Chen, Zhe and Wang, Wenhai and Cao, Yue and Liu, Yangzhou and Gao, Zhangwei and Zhu, Jinguo and Zhu, Xizhou and Lu, Lewei and Qiao, Yu and others}, journal={arXiv preprint arXiv:2411.10442}, year={2024} }

Nov 15, 2024

MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding

Citation If you find this project useful in your research, please consider cite: @article{cao2024mmfuser, title={Mmfuser: Multimodal multi-layer feature fuser for fine-grained vision-language understanding}, author={Cao, Yue and Liu, Yangzhou and Chen, Zhe and Shi, Guangchen and Wang, Wenhai and Zhao, Danhuai and Lu, Tong}, journal={arXiv preprint arXiv:2410.11829}, year={2024} }

Oct 15, 2024