Hi, nice to meet you!

I am Botao Yu (余博涛), a PhD student at The Ohio State University, advised by Prof. Huan Sun. Previously, I earned my Master’s degree at Nanjing University, advised by Prof. Wei Hu (胡伟).

My research interest includes LLMs, language agents, AI for Science (esp. Chemistry), NLP, AI music, and deep learning.

🌟 Featured Projects

ChemMCP
ChemMCP

ChemMCP

An easy-to-use and extensive MCP-compatible chemistry toolkit for LLMs and AI assistants. ChemMCP provides seamless integration of chemistry tools for LLMs, enabling enhanced chemical reasoning and problem-solving.
ChemToolAgent
ChemToolAgent

ChemToolAgent

A tool-augmented language agent for chemistry problem solving. ChemAgent demonstrates the impact of tools on language agents for chemistry tasks, revealing both the benefits and limitations of tool augmentation.
LlaSMol
LlaSMol

LlaSMol

Large Language Models for chemistry with a comprehensive, high-quality instruction tuning dataset. LlaSMol advances chemistry performance in LLMs through a carefully curated dataset SMolInstruct.

🔥 News

  • 2025.06: Check out our new preprint AutoSDT, an automated pipeline for generating high-quality scientific coding tasks.
  • 2025.06: Check out 🛠️ChemMCP, our newly released, MCP-compatible chemistry toolkit for LLMs and AI assistants. Let’s build it together!
  • 2025.05: Check out our new preprint Topic Association Analysis, where we investigated why LLMs misclassify benign comments as toxic from the topic association bias perspective.
  • 2025.05: Our paper MMMU-Pro is accepted to ACL 2025.
  • 2025.03: Our ChemAgent is now renamed to ChemToolAgent. Check out our new version with more experimental results at arXiv.
  • 2025.01: Our paper ChemAgent is accepted to NAACL 2025 Findings.
  • 2025.01: Our paper ScienceAgentBench is accepted to ICLR 2025.
  • 2024.11: Please check out our new preprint ChemAgent, an enhanced chemistry agent and its performance on various chemistry problems.
  • 2024.10: Please check out our new preprint ScienceAgentBench, a benchmark to assess language models in scientific tasks.
  • 2024.09: Check out our new preprint MMMU-Pro, an enhanced version of MMMU featuring full-vision evaluation.
  • 2024.07: Our paper LlaSMol is accepted to COLM 2024 🎉!
  • 2024.05: Our paper MMMU is selected as Oral (0.8%) and nominated for best paper (24 in total) at CVPR 2024 🎊!
  • 2024.02: Please check out our preprint LlaSMol, where we propose an awesome chemistry task instruction tuning dataset and a series of chemistry LLMs.
  • 2023.08: Arrived at Columbus. My PhD journey officially starts 😋!
  • 2023.05: Please check out our preprint MuseCoco, a text-to-music generation system.
  • 2022.09: Our paper Museformer is accepted to NeurIPS 2022 🎉!

📝 Publications

Sadly, there are no such publications.
  • [Preprint] AutoSDT: Scaling Data-Driven Discovery Tasks Toward Open Co-Scientists

    Yifei Li, Hanane Nour Moussa, Ziru Chen, Shijie Chen, Botao Yu, Mingyi Xue, Benjamin Burns, Tzu-Yao Chiu, Vishal Dey, Zitong Lu, Chen Wei, Qianheng Zhang, Tianyu Zhang, Song Gao, Xuhui Huang, Xia Ning, Nesreen K. Ahmed, Ali Payani, Huan Sun
    We introduce AutoSDT, an automated pipeline for generating high-quality coding tasks from real-world data-driven scientific workflows, addressing the data scarcity challenge in building AI co-scientists. Using AutoSDT, we create AutoSDT-5K, the largest open dataset of its kind, enabling significant performance gains in scientific discovery benchmarks.
  • [Preprint] Probing Association Biases in LLM Moderation Over-Sensitivity

    Yuxin Wang, Botao Yu, Ivory Yang, Saeed Hassanpour, Soroush Vosoughi
    This paper investigates why large language models often misclassify benign comments as toxic, revealing that topic-level biases—rather than just offensive keywords—play a significant role. Using a novel Topic Association Analysis inspired by cognitive psychology, we uncover how LLMs' implicit associations influence moderation decisions.
  • [NAACL 2025 Findings] ChemToolAgent: The Impact of Tools on Language Agents for Chemistry Problem Solving

    Botao Yu, Frazier N. Baker*, Ziru Chen*, Garrett Herb, Boyu Gou, Daniel Adu-Ampratwum, Xia Ning, Huan Sun (* equal contribution)
    We propose a tool-augmented language agent for chemistry named ChemToolAgent, and evaluate it on both specialized chemistry tasks and general chemistry questions. The results show that tools cannot always help and may cause more reasoning errors. Previous title: Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving.
  • [ICLR 2025] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery

    Ziru Chen, Shijie Chen, Yuting Ning, Qianheng Zhang, Boshi Wang, Botao Yu, Yifei Li, Zeyi Liao, Chen Wei, Zitong Lu, Vishal Dey, Mingyi Xue, Frazier N. Baker, Benjamin Burns, Daniel Adu-Ampratwum, Xuhui Huang, Xia Ning, Song Gao, Yu Su, Huan Sun
    The study introduces a benchmark for evaluating language models in scientific discovery, using 102 tasks from peer-reviewed publications and expert validation. It reveals current limitations in code generation, highlighting the need for rigorous task assessments.
  • [ACL 2025] MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

    Xiang Yue*, Tianyu Zheng*, Yuansheng Ni*, Yubo Wang, Kai Zhang, Shengbang Tong, Yuxuan Sun, Ming Yin, Botao Yu, Ge Zhang, Huan Sun, Yu Su, Wenhu Chen, Graham Neubig (* equal contribution)
    An enhanced version of MMMU featuring full-vision evaluation for multi-discipline multimodal understanding.
  • [COLM 2024] LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset

    Botao Yu, Frazier N. Baker*, Ziqi Chen*, Xia Ning, Huan Sun (* equal contribution)
    We propose a carefully curated chemistry task dataset for instruction tuning and a series of LLMs that significantly outperform GPT-4 and Claude-3-Opus on various chemistry tasks.
  • [CVPR 2024 Oral] MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

    Xiang Yue*, Yuansheng Ni*, Kai Zhang*, Tianyu Zheng*, Ruoqi Liu, Ge Zhang, Samuel Stevens, Dongfu Jiang, Weiming Ren, Yuxuan Sun, Cong Wei, Botao Yu, Ruibin Yuan, Renliang Sun, Ming Yin, Boyuan Zheng, Zhenzhu Yang, Yibo Liu, Wenhao Huang, Huan Sun*, Yu Su*, Wenhu Chen* (* core contributors)
    This paper proposes a massive multi-discipline multimodal understanding and reasoning benchmark for expert AGI.
  • [Preprint 2023] MuseCoco: Generating Symbolic Music from Text

    Peiling Lu*, Xin Xu*, Chenfei Kang*, Botao Yu*, Chengyi Xing*, Xu Tan, Jiang Bian (* equal contribution)
    A two-stage text-to-music generation system for creating symbolic music from textual descriptions.
  • [Preprint 2023] EmoGen: Eliminating Subjective Bias in Emotional Music Generation

    Chenfei Kang, Peiling Lu, Botao Yu, Xu Tan, Wei Ye, Shikun Zhang, Jiang Bian
    A method for generating emotional music while reducing subjective bias in the process.
  • [NeurIPS 2022] Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation

    Botao Yu, Peiling Lu, Rui Wang, Wei Hu, Xu Tan, Wei Ye, Shikun Zhang, Tao Qin, Tie-Yan Liu
    We propose a fine- and coarse-grained attention mechanism for modeling the structures of music.
  • [ISMIR 2022] MeloForm: Generating Melody with Musical Form Based on Expert Systems and Neural Networks

    Peiling Lu, Xu Tan, Botao Yu, Tao Qin, Sheng Zhao, Tie-Yan Liu
    A system for generating melodies with musical form using a combination of expert systems and neural networks.
  • [EMNLP 2021] Knowing False Negatives: An Adversarial Training Method for Distantly Supervised Relation Extraction

    Kailong Hao, Botao Yu, Wei Hu
    An adversarial training method to improve distantly supervised relation extraction by addressing false negatives.
  • [APWeb-WAIM 2020] Joint Reasoning of Events, Participants and Locations for Plot Relation Recognition

    Shengguang Qiu, Botao Yu, Lei Qian, Qiang Guo, Wei Hu
    A method for recognizing plot relations by jointly reasoning about events, participants, and locations in narratives.

📖 Education

  • PhD student in Computer Science and Engineering @ The Ohio State University

    2023.08 - Now       Columbus, Ohio, USA

  • Master’s student in Computer Science @ Nanjing University (南京大学)

    2019.09 - 2023.06       Nanjing, Jiangsu, China

  • Undergraduate student in Software Engineering @ Dalian University of Technology (大连理工大学)

    2015.09 - 2019.06       Dalian, Liaoning, China

  • High school student @ The High School Attached To Hunan Normal University (湖南师大附中)

    2012.09 - 2015.06       Changsha, Hunan, China

💻 Internship

  • Research intern @ Microsoft Research Asia (微软亚洲研究院)

    2021.04 - 2022.03       Beijing, China

Psst! 🔍 Kudos on your keen eye! Didn't expect anyone to notice this microscopic text. Since you've ventured this far, fancy embarking on a friendship adventure?

Last updated: June 8, 2025