I’m currently a junior undergraduate student (from 2023 Fall) in IIIS (Yao Class), Tsinghua University, pursuing a Bachelor’s degree in Computer Science and Technology.
I welcome any collaboration or discussion, whether with seniors or peers. Please feel free to reach out!
Some picture options: ( I'll try to keep this up to date)
Inspired by Pieter Abbeel's homepage. Photos are taken within the past year.
Giving a talk on my recent work (first from the right) 🗣️
Eating 😋
Hanging out with friends (second from left) 🤣
Cuddling my dog at home 🐕
Research Interests
My research goal is to develop Fundamental models with intrinsic understandings of the world and apply these to obtain general decision intelligence. Currently, my research interests include:
World Models: State-based World Models, Visual World Models, Grounding Foundation Models(e.g. Video Diffusion Models, LLMs) to World Models.
[May. 2025] I became a member of the Sparking Program, the most prestigious and selective academic organization for students at Tsinghua University (top 1% in university).
[Nov. 2024] Honored to receive Comprehensive Excellence Award of Tsinghua.
[Nov. 2024] Glad to receive Outstanding Sports Scholarship of Tsinghua.
Education
B.S. in Computer Science, Tsinghua University, 2023-2027 (expected). Institute for Interdisciplinary Information Sciences (Yao Class), Tsinghua University. GPA: 3.93/4.00, Rank: 12/93. Selected Courses:Natural Language Processing (A+), Algebra and Computation (A+, Top 1), Fundamentals of Programming (A+), Multi-modal Machine Learning (A), Deep Learning (A), Computer Vision (A), Introduction to Computer Systems (A).More Selected Courses:Basic Principles of Marxism (A+), The History of Western Music (A+), Discrete Mathematics II (A), Fundamentals of Computer Science (A), Advanced Topics in Linear Algebra (A), Calculus-A II (A), Physics I (A).
We investigated the feasibility of utilizing language models as text-based world models. Through empirical study, we found that the performance is greatly hindered by overlengthy CoTs, and we proposed DreamFactory, a novel architecture to address this issue.
We introduce ManiGen, a generative simulation pipeline using ManiSkill to automate task creation. It utilizes the power of LLMs to propose tasks, generate scenes, and produce task-specific code for rewards, parameters, and metrics.
1. Designed and implemented a PostgreSQL-based course sharing platform using Scala for backend and React for frontend 2. Utilized Stable Diffusion 2 and Llama 2 API to enhance users experiences
A 2D Stickman vs CAD-themed game, developed using Unity. In this game, players, taking form as stick figures, explore a world within a CAD software through movement, skills, and various interactions.
We propose “Watch-and-Learn”, a multimodal framework that efficiently enhances MLLMs' reasoning abilities in counting tasks by integrating function calls.