Hi, nice to meet you!

I am Botao Yu (余博涛), a PhD student at The Ohio State University, advised by Prof. Huan Sun. Previously, I earned my Master’s degree at Nanjing University, advised by Prof. Wei Hu (胡伟).

My research interest includes LLMs, language agents, AI for Science (esp. Chemistry), NLP, AI music, and deep learning.

🚀 Seeking an internship position for summer 2026!

🌟 Featured Projects

ChemMCP

An easy-to-use and extensive MCP-compatible chemistry toolkit for LLMs and AI assistants. ChemMCP provides seamless integration of chemistry tools for LLMs, enabling enhanced chemical reasoning and problem-solving.

Page Discord

ChemToolAgent

A tool-augmented language agent for chemistry problem solving. ChemAgent demonstrates the impact of tools on language agents for chemistry tasks, revealing both the benefits and limitations of tool augmentation.

Page Publication

LlaSMol

Large Language Models for chemistry with a comprehensive, high-quality instruction tuning dataset. LlaSMol advances chemistry performance in LLMs through a carefully curated dataset SMolInstruct.

Page Publication

🔥 News

2025.10: Our paper AutoSDT got the best paper award at the LLM for Scientific Discovery workshop @ COLM 2025 🎉🏆.
2025.09: Our paper Mind2Web 2 is accepted to NeurIPS 2025 🎉.
2025.09: Our paper LARC is accepted to AIAS 2025 and selected as the best paper award 🎉🏆.
2025.09: Check out our new preprint LARC, an agentic framework for constrained retrosynthesis planning.
2025.08: Our paper AutoSDT is accepted to EMNLP 2025 🎉.
2025.06: Check out our new preprint Mind2Web 2, a benchmark for evaluating agentic search with agent-as-a-judge.
2025.06: Check out our new preprint AutoSDT, an automated pipeline for generating high-quality scientific coding tasks.
2025.06: Check out 🛠️ChemMCP, our newly released, MCP-compatible chemistry toolkit for LLMs and AI assistants. Let’s build it together!
2025.05: Check out our new preprint Topic Association Analysis, where we investigated why LLMs misclassify benign comments as toxic from the topic association bias perspective.
2025.05: Our paper MMMU-Pro is accepted to ACL 2025.
2025.03: Our ChemAgent is now renamed to ChemToolAgent. Check out our new version with more experimental results at arXiv.
2025.01: Our paper ChemAgent is accepted to NAACL 2025 Findings.
2025.01: Our paper ScienceAgentBench is accepted to ICLR 2025.
2024.11: Please check out our new preprint ChemAgent, an enhanced chemistry agent and its performance on various chemistry problems.
2024.10: Please check out our new preprint ScienceAgentBench, a benchmark to assess language models in scientific tasks.
2024.09: Check out our new preprint MMMU-Pro, an enhanced version of MMMU featuring full-vision evaluation.
2024.07: Our paper LlaSMol is accepted to COLM 2024 🎉!
2024.05: Our paper MMMU is selected as Oral (0.8%) and nominated for best paper (24 in total) at CVPR 2024 🎊!
2024.02: Please check out our preprint LlaSMol, where we propose an awesome chemistry task instruction tuning dataset and a series of chemistry LLMs.
2023.08: Arrived at Columbus. My PhD journey officially starts 😋!
2023.05: Please check out our preprint MuseCoco, a text-to-music generation system.
2022.09: Our paper Museformer is accepted to NeurIPS 2022 🎉!

📝 Publications

There is no such publication.

[AIAS 2025] LARC: Towards Human-level Constrained Retrosynthesis Planning through an Agentic Framework

🏆 Best Paper Award at AIAS 2025

Frazier N. Baker, Daniel Adu-Ampratwum, Reza Averly, Botao Yu, Huan Sun, Xia Ning

LARC, the first LLM-based Agentic framework for Retrosynthesis planning under Constraints. It incorporates agentic constraint evaluation directly into the retrosynthesis planning process, using agentic feedback grounded in tool-based reasoning to guide and constrain route generation.

Paper

[NeurIPS 2025] Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge

Boyu Gou*, Zanming Huang*, Yuting Ning*, Yu Gu, Michael Lin, Weijian Qi, Andrei Kopanev, Botao Yu, Bernal Jiménez Gutiérrez, Yiheng Shu, Chan Hee Song, Jiaman Wu, Shijie Chen, Hanane Nour Moussa, Tianshu Zhang, Jian Xie, Yifei Li, Tianci Xue, Zeyi Liao, Kai Zhang, Boyuan Zheng, Zhaowei Cai, Viktor Rozgic, Morteza Ziyadi, Huan Sun, Yu Su (* equal contribution)

we introduce Mind2Web 2, a benchmark of 130 realistic, high-quality, and long-horizon tasks that require real-time web browsing and extensive information synthesis, constructed with over 1,000 hours of human labor.

Paper Page Dataset Leaderboard Code

[EMNLP 2025] AutoSDT: Scaling Data-Driven Discovery Tasks Toward Open Co-Scientists

🏆 Best Paper Award at the LLM for Scientific Discovery workshop @ COLM 2025

Yifei Li*, Hanane Nour Moussa*, Ziru Chen, Shijie Chen, Botao Yu, Mingyi Xue, Benjamin Burns, Tzu-Yao Chiu, Vishal Dey, Zitong Lu, Chen Wei, Qianheng Zhang, Tianyu Zhang, Song Gao, Xuhui Huang, Xia Ning, Nesreen K. Ahmed, Ali Payani, Huan Sun (* equal contribution)

We introduce AutoSDT, an automated pipeline for generating high-quality coding tasks from real-world data-driven scientific workflows, addressing the data scarcity challenge in building AI co-scientists. Using AutoSDT, we create AutoSDT-5K, the largest open dataset of its kind, enabling significant performance gains in scientific discovery benchmarks.

Paper Page Dataset Code

[Preprint] Probing Association Biases in LLM Moderation Over-Sensitivity

Yuxin Wang, Botao Yu, Ivory Yang, Saeed Hassanpour, Soroush Vosoughi

This paper investigates why large language models often misclassify benign comments as toxic, revealing that topic-level biases—rather than just offensive keywords—play a significant role. Using a novel Topic Association Analysis inspired by cognitive psychology, we uncover how LLMs' implicit associations influence moderation decisions.

Paper

[NAACL 2025 Findings] ChemToolAgent: The Impact of Tools on Language Agents for Chemistry Problem Solving

Botao Yu, Frazier N. Baker*, Ziru Chen*, Garrett Herb, Boyu Gou, Daniel Adu-Ampratwum, Xia Ning, Huan Sun (* equal contribution)

We propose a tool-augmented language agent for chemistry named ChemToolAgent, and evaluate it on both specialized chemistry tasks and general chemistry questions. The results show that tools cannot always help and may cause more reasoning errors. Previous title: Tooling or Not Tooling? The Impact of Tools on Language Agents for Chemistry Problem Solving.

Paper Page Code

[ICLR 2025] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery

Ziru Chen, Shijie Chen, Yuting Ning, Qianheng Zhang, Boshi Wang, Botao Yu, Yifei Li, Zeyi Liao, Chen Wei, Zitong Lu, Vishal Dey, Mingyi Xue, Frazier N. Baker, Benjamin Burns, Daniel Adu-Ampratwum, Xuhui Huang, Xia Ning, Song Gao, Yu Su, Huan Sun

The study introduces a benchmark for evaluating language models in scientific discovery, using 102 tasks from peer-reviewed publications and expert validation. It reveals current limitations in code generation, highlighting the need for rigorous task assessments.

Paper Page Benchmark Code

[ACL 2025] MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

Xiang Yue*, Tianyu Zheng*, Yuansheng Ni*, Yubo Wang, Kai Zhang, Shengbang Tong, Yuxuan Sun, Ming Yin, Botao Yu, Ge Zhang, Huan Sun, Yu Su, Wenhu Chen, Graham Neubig (* equal contribution)

An enhanced version of MMMU featuring full-vision evaluation for multi-discipline multimodal understanding.

Paper Page Dataset

[COLM 2024] LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset

Botao Yu, Frazier N. Baker*, Ziqi Chen*, Xia Ning, Huan Sun (* equal contribution)

We propose a carefully curated chemistry task dataset for instruction tuning and a series of LLMs that significantly outperform GPT-4 and Claude-3-Opus on various chemistry tasks.

Paper Page Dataset Model Code Poster

[CVPR 2024 Oral] MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Xiang Yue*, Yuansheng Ni*, Kai Zhang*, Tianyu Zheng*, Ruoqi Liu, Ge Zhang, Samuel Stevens, Dongfu Jiang, Weiming Ren, Yuxuan Sun, Cong Wei, Botao Yu, Ruibin Yuan, Renliang Sun, Ming Yin, Boyuan Zheng, Zhenzhu Yang, Yibo Liu, Wenhao Huang, Huan Sun*, Yu Su*, Wenhu Chen* (* core contributors)

This paper proposes a massive multi-discipline multimodal understanding and reasoning benchmark for expert AGI.

Paper Page Dataset Code

[Preprint 2023] MuseCoco: Generating Symbolic Music from Text

Peiling Lu*, Xin Xu*, Chenfei Kang*, Botao Yu*, Chengyi Xing*, Xu Tan, Jiang Bian (* equal contribution)

A two-stage text-to-music generation system for creating symbolic music from textual descriptions.

Paper Page Code
[Preprint 2023] EmoGen: Eliminating Subjective Bias in Emotional Music Generation

Chenfei Kang, Peiling Lu, Botao Yu, Xu Tan, Wei Ye, Shikun Zhang, Jiang Bian

A method for generating emotional music while reducing subjective bias in the process.

Paper Page Code

[NeurIPS 2022] Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation

Botao Yu, Peiling Lu, Rui Wang, Wei Hu, Xu Tan, Wei Ye, Shikun Zhang, Tao Qin, Tie-Yan Liu

We propose a fine- and coarse-grained attention mechanism for modeling the structures of music.

Paper Page Code Poster
[ISMIR 2022] MeloForm: Generating Melody with Musical Form Based on Expert Systems and Neural Networks

Peiling Lu, Xu Tan, Botao Yu, Tao Qin, Sheng Zhao, Tie-Yan Liu

A system for generating melodies with musical form using a combination of expert systems and neural networks.

Paper Page Code
[EMNLP 2021] Knowing False Negatives: An Adversarial Training Method for Distantly Supervised Relation Extraction

Kailong Hao, Botao Yu, Wei Hu

An adversarial training method to improve distantly supervised relation extraction by addressing false negatives.

Paper Code
[APWeb-WAIM 2020] Joint Reasoning of Events, Participants and Locations for Plot Relation Recognition

Shengguang Qiu, Botao Yu, Lei Qian, Qiang Guo, Wei Hu

A method for recognizing plot relations by jointly reasoning about events, participants, and locations in narratives.

Paper

📖 Education

PhD student in Computer Science and Engineering @ The Ohio State University

2023.08 - Now Columbus, Ohio, USA
Master’s student in Computer Science @ Nanjing University (南京大学)

2019.09 - 2023.06 Nanjing, Jiangsu, China
Undergraduate student in Software Engineering @ Dalian University of Technology (大连理工大学)

2015.09 - 2019.06 Dalian, Liaoning, China
High school student @ The High School Attached To Hunan Normal University (湖南师大附中)

2012.09 - 2015.06 Changsha, Hunan, China

👨🏻‍💻 Internship

Research intern @ Microsoft Research Asia (微软亚洲研究院)

2021.04 - 2022.03 Beijing, China

💻 Service

2025: Reviewer for ARR 2025 (Feb., May, July, Oct.), COLM 2025, NeurIPS 2025 SEA Workshop, ICLR 2026
2024: Reviewer for ICLR 2025, ARR 2024 (Dec.), AAAI 2025 AI4Research Workshop

Psst! 🔍 Kudos on your keen eye! Didn't expect anyone to notice this microscopic text. Since you've ventured this far, fancy embarking on a friendship adventure?

Last updated: Sep 27, 2025

Botao Yu (余博涛)

🌟 Featured Projects

ChemMCP

ChemToolAgent

LlaSMol

🔥 News

📝 Publications

[AIAS 2025] LARC: Towards Human-level Constrained Retrosynthesis Planning through an Agentic Framework

[NeurIPS 2025] Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge

[EMNLP 2025] AutoSDT: Scaling Data-Driven Discovery Tasks Toward Open Co-Scientists

[Preprint] Probing Association Biases in LLM Moderation Over-Sensitivity

[NAACL 2025 Findings] ChemToolAgent: The Impact of Tools on Language Agents for Chemistry Problem Solving

[ICLR 2025] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery

[ACL 2025] MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

[COLM 2024] LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset

[CVPR 2024 Oral] MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

[Preprint 2023] MuseCoco: Generating Symbolic Music from Text

[Preprint 2023] EmoGen: Eliminating Subjective Bias in Emotional Music Generation

[NeurIPS 2022] Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation

[ISMIR 2022] MeloForm: Generating Melody with Musical Form Based on Expert Systems and Neural Networks

[EMNLP 2021] Knowing False Negatives: An Adversarial Training Method for Distantly Supervised Relation Extraction

[APWeb-WAIM 2020] Joint Reasoning of Events, Participants and Locations for Plot Relation Recognition

📖 Education

👨🏻‍💻 Internship

💻 Service