Multimodal Large Language Models

VisualPRM: An Effective Process Reward Model for Multimodal Reasoning

Citation If you find this project useful in your research, please consider cite: @article{wang2025visualprm, title={VisualPRM: An Effective Process Reward Model for Multimodal Reasoning}, author={Wang, Weiyun and Gao, Zhangwei and Chen, Lianjie and Chen, Zhe and Zhu, Jinguo and Zhao, Xiangyu and Liu, Yangzhou and Cao, Yue and Ye, Shenglong and Zhu, Xizhou and others}, journal={arXiv preprint arXiv:2503.10291}, year={2025} }

Mar 13, 2025

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Citation If you find this project useful in your research, please consider cite:

Dec 6, 2024

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Citation If you find this project useful in your research, please consider cite: @article{wang2024enhancing, title={Enhancing the reasoning ability of multimodal large language models via mixed preference optimization}, author={Wang, Weiyun and Chen, Zhe and Wang, Wenhai and Cao, Yue and Liu, Yangzhou and Gao, Zhangwei and Zhu, Jinguo and Zhu, Xizhou and Lu, Lewei and Qiao, Yu and others}, journal={arXiv preprint arXiv:2411.10442}, year={2024} }

Nov 15, 2024

MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding

Citation If you find this project useful in your research, please consider cite: @article{cao2024mmfuser, title={Mmfuser: Multimodal multi-layer feature fuser for fine-grained vision-language understanding}, author={Cao, Yue and Liu, Yangzhou and Chen, Zhe and Shi, Guangchen and Wang, Wenhai and Zhao, Danhuai and Lu, Tong}, journal={arXiv preprint arXiv:2410.11829}, year={2024} }

Oct 15, 2024

MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity

Citation If you find this project useful in your research, please consider cite: @article{liu2024mminstruct, title={Mminstruct: A high-quality multi-modal instruction tuning dataset with extensive diversity}, author={Liu, Yangzhou and Cao, Yue and Gao, Zhangwei and Wang, Weiyun and Chen, Zhe and Wang, Wenhai and Tian, Hao and Lu, Lewei and Zhu, Xizhou and Lu, Tong and others}, journal={Science China Information Sciences}, volume={67}, number={12}, pages={1--16}, year={2024}, publisher={Springer} }

Jul 22, 2024