Yue Cao
Yue Cao

Graduate Student

About Me

Cao Yue (曹越) is a graduate student in computer science at Nanjing University, supervised by Prof. Lu Tong. He obtained his B.E. degree at the School of Software at Dalian University of Technology. His research interests lie within deep learning for Computer Vision, especially multimodal large language models (MLLMs) and vision foundation models (VFMs).

Actively looking for internships!

Download CV
Interests
  • Multimodal Large Language Models
  • Vision Foundation Models
  • Creative Research
Education
  • M.Eng.‌ Artificial Intelligence

    Nanjing University

  • B.Eng. in Software Engineering

    Dalian University of Technology

Recent Publications
(2025). VisualPRM: An Effective Process Reward Model for Multimodal Reasoning.
(2024). Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling.
(2024). Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization.
(2024). MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding.
(2024). MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity. SCIS.

Experience

  1. Research Intern, supervised by Wenhai Wang

    Shanghai AI Laboratory

    Responsibilities include:

    • Participated in the data construction, model exploration, model evaluation and paper writing of the InternVL 2.5 series of multimodal models.
    • Participated in the work of improving the reasoning capability of multimodal large models, including the establishment of dataset pipeline, data construction, experimental verification, etc. in InternVL MPO.
    • Participate in the expansion of InternVL’s long-term thinking chain multimodal reasoning capabilities, participate in the long-term thinking chain reinforcement learning design, data construction, and one-stop evaluation framework construction.
  2. Test Development Intern

    ByteDance
    The team worked together to maintain and improve the VR algorithm automation testing platform, which has nearly 70,000 lines of code. I personally fixed more than ten bugs and implemented three sub-projects based on the message queue mechanism: batch task creation, emergency task termination, and task reconstruction.

Education

  1. M.Eng.‌ Artificial Intelligence

    Nanjing University
  2. B.Eng. in Software Engineering

    Dalian University of Technology