About me

Hi! I am currently enrolled in a dual degree program between the University of Illinois Urbana-Champaign and Zhejiang University, and I will be graduating with two bachelor’s degrees from both institutions in June 2025. I spent my junior year as an exchange student at UIUC and attended Stanford University as an undergraduate summer visitor. During my exchange, I made many friends and built valuable connections. I am deeply grateful to everyone who has offered advice and guidance for my future path!

Education

University of Illinois Urbana-Champaign, Sep 2021 - Jun 2025 (expected)

Bachelor of Science in Computer Engineering, GPA: 4.00/4.00 Transcript

  • Core Courses: Artificial Intelligence (A+), Machine Learning (A+), Computer Systems & Programming (A), Computer Systems Engineering (A), Intro to Algorithms & Models of Computation (A+), Data Structures (A+), Game Development (A+), Analog Signal Processing (A+), Digital Signal Processing (A+), Probability with Engineering Applications (A+)
  • Awards: Dean’s List (Fall 2023 & Spring 2024)

Zhejiang University, Sep 2021 - Jun 2025 (expected)

Bachelor of Engineering in Electronic and Computer Engineering, GPA: 3.98/4.00 Transcript

  • Core Courses: Discrete Mathematics (A+), Linear Algebra (A), Calculus (A), Differential Equations (A)
  • Awards: ZJU-UIUC Institute First-Class Academic Excellence Award (Oct 2023), Zhejiang University First-Class Scholarship (Nov 2024)

Stanford University, Jun 2024 - Aug 2024

Undergraduate Summer Visitor, GPA: 4.075/4.30 Transcript

  • Core Courses: Data Mining & Analysis, Convex Optimization

Research Experience

Adapt-Pruner: Adaptive and Structured Pruning for Efficient LLMs (Co-First Author), Preprint
Supervisor: Tong Zhang Adapt-Pruner

  • Proposed Adapt-Pruner, a method for structured pruning of LLMs that adaptively evaluates the importance of each decoder layer and assigns a corresponding sparsity level by measuring the distance between each layer’s input and output tensors
  • Evaluated performance on LLaMA-3.1-8B, Qwen2.5-7B, and Gemma-2-9B, demonstrated 5% improvement in common sense benchmark scores over state-of-the-art methods including SliceGPT at 50% sparsity
  • Compressed LLaMA-3.2-3B to 1.3B and post-trained on 0.05B tokens, outperforming TinyLlama-1.1 pretrained on 10B tokens, demonstrating structured pruning efficiently generates smaller models with far less computational resources

RL-Pruner: Structured Pruning Using Reinforcement Learning for CNNs (First Author), (Submitted to CVPR 2025) - Preprint, Source Code
Supervisor: Volodymyr Kindratenko RL-Pruner

  • Developed an end-to-end approach to compress CNNs using structured pruning and Q-learning, with the accuracy of compressed models as the reward, which learned the optimal layer-wise pruning distribution to minimize performance loss
  • Applied response-based Knowledge Distillation to post-train the compressed models, using the original uncompressed model as the teacher to transfer learned representations and efficiently recover accuracy lost during compression
  • Independently implemented the entire framework in about 2,000 lines of PyTorch code and experimented on ResNet, GoogLeNet and MobileNet, achieving 81% parameter reduction on VGG-19 (CIFAR-100) within 1% performance drop

Teaching Experience

Teaching Assistant, ECE 374: Intro to Algs & Models of Comp (ZJU), Instructor: Pavel Loskot

  • Hosted a weekly 2-hour lab session for 20 students to review and reinforce class content

Course Assistant, CS 415: Game Development (UIUC), Instructor: Eric Shaffer

  • Held office hours and mentored two project teams of 4 students each through their final projects

Projects

ECE 391 POSIX-compliant Unix-like Operating System, UIUC, Source Code

  • Implemented a POSIX-compliant Unix-like operating system, featuring a terminal driver, real-time clock driver, and system calls, along with advanced capabilities like signal handling and dynamic memory allocation

CS 415 Game Development: The Final Boss, UIUC, Source Code

  • Implemented an advanced action system for the main character featuring dynamic combo mechanics and environmental interactions, plus a context-aware NPC dialogue system that evolves based on game progression and player choices

STATS 202 URL Relevance Prediction, Stanford, Source Code

  • Applied comprehensive feature engineering for data preprocessing, outlier removal, and valuable feature extraction, then leveraged various classification algorithms, including boosting techniques, to optimize URL relevance prediction

Research Statement

My research interests are primarily centered around efficient AI, focusing particularly on neural network compression and distillation. My work aims to compress CNNs and LLMs while preserving the knowledge and representations they have acquired. The goal is to reduce the number of parameters and accelerate inference speed without compromising performance, allowing these models to be deployed on local devices such as personal computers, mobile devices, and integrated into edge scenarios like vehicles, software, and video games. This approach ensures user privacy, reduces latency, and optimizes neural networks for specific applications.

Throughout my research, I have found that compressing LLMs presents unique challenges. Despite their large number of parameters and apparent redundancies, LLMs are often required to retain a vast breadth of knowledge. Given this, I believe that compressed LLMs are more suitable for scenarios where advance LLMs—already trained on extensive knowledge—are distilled into smaller models tailored for specific domains and deployed on corresponding equipment. For instance, a large model could be compressed by distilling it on a dataset focused primarily on cooking and kitchen knowledge, making it well-suited for deployment on specialized hardware in a kitchen environment.

From a broader perspective, I see the evolution of large language models as reminiscent of the early days of computing: initially, computers grew larger, eventually occupying entire rooms, until technological advancements allowed them to be compressed into powerful laptops. I hope that by compressing large language models, we can discover more efficient architectures and more effective representations of knowledge, enabling their deployment in practical and efficient applications that benefit society.

Miscellaneous

Since middle school, I have been fascinated by video games. Many titles, such as Red Dead Redemption 2 and Detroit: Become Human, not only entertained me but also allowed me to briefly experience other people’s lives and explore science fiction concepts. These experiences sparked my interest in science and made me realize that games can serve as a powerful medium for knowledge transfer. We spend a significant amount of time learning the knowledge accumulated by human civilization—could there be a way to accelerate this process and reduce the repetition across generations?

In my spare time, I focus primarily on developing independent games. Since April 2023, I have been working on my own game, and you can check out the early development version here. The current funding for this project comes entirely from the scholarships I received during my freshman and sophomore years, amounting to approximately $10,000. If you are interested in my work, feel free to reach out to me.