I am a Ph.D. Candidate (2022.08-) in Computing and Information Sciences at Rochester Institute of Technology (RIT), advised by Prof. Weijie Zhao. I previously interned at Amazon (Agents & Foundation Models) and ByteDance (Audio & Speech Processing). My research interests span GenAI, Agents, LLMs, AI Security, and Scalable & Trustworthy Machine Learning.

Prior to RIT, I worked as a Research Assistant at the Institute of Computer Vision (ICV), Shenzhen University, supervised by Prof. Linlin Shen (Honorary Professor at the University of Nottingham, UK). I received my B.S. degree in Computer Science and Technology in July 2021. During my undergraduate studies, I led a National Innovation Project funded by the Ministry of Education of China and received several ACM-ICPC Medals.

🔥
I am actively looking for research internship opportunities in Large Language Models, Computer Vision and Machine Learning for 2026 (Spring/Summer/Fall/Winter). More research experience and papers (under review) can be found at my CV.
News
Dec 2025
I will be joining ByteDance (TikTok) as a Research Intern in San Jose, CA in the spring of 2026, focusing on LLMs and AI Infrastructure. 🎉🎉🎉
Nov 2025
Invited as a reviewer for CVPR 2026.
Oct 2025
I will be attending ICDM 2025 to deliver a tutorial titled "Behavior-Aware Data Valuation for LLMs at Scale" on Nov. 13 in Washington, D.C. I look forward to meeting you there! 🤗🤗🤗 [Slides]
Sep 2025
Invited as a reviewer for ICLR 2026.
Aug 2025
Invited as a reviewer for AAAI 2026.
May 2025
We released VTBench, a comprehensive benchmark evaluating over 20 visual tokenizers. [Paper], [Codebase], [Dataset], [Demo] are available.
May 2025
I will be joining Amazon as an Applied Scientist Intern in Bellevue, WA this summer, focusing on LLM agents. 🎉🎉🎉
Apr 2025
Invited to speak in a tutorial of "LLMs and Copyright Risks" at NAACL 2025. [Slides] [Video]
Apr 2025
Invited as a reviewer for NeurIPS 2025.
Apr 2025
One paper is accepted by SIGIR 2025! 🎉🎉🎉
Mar 2025
Invited as a reviewer for ACL ARR 2025.
Mar 2025
Invited as a reviewer for ICML 2025.
Jan 2025
Invited as a reviewer for IEEE Transactions on Dependable and Secure Computing.
Jan 2025
One paper is accepted by NAACL 2025! 🎉🎉🎉
Aug 2024
Invited as a reviewer for ICLR 2025.
Jul 2024
Invited as a reviewer for ACL ARR 2024.
Jul 2024
Invited as a reviewer for KDD 2025.
May 2024
Our paper (Token-wise Influential Training Data Retrieval) is accepted by ACL 2024!
Apr 2024
I will be joining Amazon as an Applied Scientist Intern in Boston this summer, focusing on LLM and multimodal systems. 🎉🎉🎉
Feb 2024
We released an easy-to-run implementation for finetuning LLMs. [Github Repo]
Feb 2024
Invited as a reviewer for KDD 2024.
Oct 2023
Invited as a reviewer for NeurIPS 2023 GLFrontiers Workshop.
Aug 2023
Received the KDD’23 Student Travel Award. Thanks to KDD!
May 2023
Our paper (Machine Unlearning in GBDT) is accepted by KDD 2023! [Video] [Poster]
Sep 2022
Our paper (Activation Template Matching Loss) is accepted by FG 2023!
Publications
ArXiv 2025

Agent-Omni: Test-Time Multimodal Reasoning via Model Coordination for Understanding Anything

Huawei Lin, Yunzhi Shi, Tong Geng, Weijie Zhao, Wei Wang, Ravender Pal Singh
ArXiv 2025

VTBench: Evaluating Visual Tokenizers for Autoregressive Image Generation

Huawei Lin, Tong Geng, Zhaozhuo Xu, Weijie Zhao
ArXiv 2025

UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks

Huawei Lin, Yingjie Lao, Tong Geng, Tan Yu, Weijie Zhao
ArXiv 2025

Online Gradient Boosting Decision Tree: In-Place Updates for Efficient Adding/Deleting Data

Huawei Lin, Jun Woo Chung, Yingjie Lao, Weijie Zhao
ArXiv 2025

RapidUn: Influence-Driven Parameter Reweighting for Efficient Large Language Model Unlearning

Guoshenghui Zhao, Huawei Lin, Weijie Zhao
SIGIR 2025

Locality-Sensitive Indexing for Graph-Based Approximate Nearest Neighbor Search

Jun Woo Chung, Huawei Lin, Weijie Zhao
NAACL 2025

ALinFiK: Learning to Approximate Linearized Future Influence Kernel for Scalable Third-Parity LLM Data Valuation

Yanzhou Pan, Huawei Lin, Yide Ran, Jiamin Chen, Xiaodong Yu, Weijie Zhao, Denghui Zhang, Zhaozhuo Xu
ArXiv 2024

DMin: Scalable Training Data Influence Estimation for Diffusion Models

Huawei Lin, Yingjie Lao, Weijie Zhao
ACL 2024

Token-wise Influential Training Data Retrieval for Large Language Models

Huawei Lin, Jikai Long, Zhaozhuo Xu, Weijie Zhao
KDD 2023

Machine Unlearning in Gradient Boosting Decision Trees

Huawei Lin, Jun Woo Chung, Yingjie Lao, Weijie Zhao
FG 2023

Activation Template Matching Loss for Explainable Face Recognition

Huawei Lin, Haozhe Liu, Qiufu Li, Linlin Shen
Tech Talks
Nov 2025
"Foundation of Data Valuation & Influence Estimation", ICDM, Washington, DC. [Slides]
May 2025
"LLMs and Copyright Risks: An Example of Future Directions", NAACL, Albuquerque, NM. [Slides] [Video]
Apr 2025
"Influence Estimation for Large Language Models and Diffusion Models", RIT, Rochester, NY. [Slides]
Aug 2024
"Token-wise Influential Training Data Retrieval for LLMs", ACL Virtual. [Slides] [Video]
May 2024
"Toward Explainable Large Language Models via Influence Estimation", Boston, MA. [Poster]
Aug 2023
"Machine Unlearning in Gradient Boosting Decision Trees", KDD, Long Beach, CA. [Slides] [Video]
Nov 2022
"Activation Template Matching Loss for Explainable Face Recognition", RIT, Rochester, NY. [Slides] [Video]
Oct 2021
"Toward Explainable Face Recognition", Shenzhen University, Shenzhen, China. [Slides]
Aug 2021
"Trust in Black-Box Models: Interpretability & Explainability for Deep Learning", Shenzhen University, Shenzhen, China. [Slides]
Experience
Amazon
Jun 2025 - Sept 2025
Applied Scientist Intern Bellevue, WA
Research Topic: Unlocking Multi-modal Reasoning through Agent-Based Coordination
Advisor: Yunzhi Shi
  • Present a novel agent-based omni agent that coordinates existing foundation models to reason over text, images, video, and audio.
  • Design a flexible agent system that interprets user intent and delegates subtasks.
Amazon
May 2024 - Aug 2024
Applied Scientist Intern Boston, MA
Research Topic: Auto Prompt: Unsupervised Self-improving Inference for LLMs
Advisor: Rajath Kumar, Raphael Petegrosso
  • Proposed an unsupervised self-improving framework for LLMs inference that enhances generation quality.
  • Implemented methods to detect potential hallucinations by certainty score.
Shenzhen University (Institute of Computer Vision)
Jul 2021 - Jul 2022
Research Assistant Intern Shenzhen, China
Advisor: Linlin Shen
  • Responsible for interpretability and explainability on deep learning (Biometrics).
  • Proposed Explainable Channel Loss (ECLoss) for explainable face recognition.
ByteDance (AI Lab)
Oct 2020 - Jul 2021
Software Engineer Intern Beijing, China
  • Proposed Stream Audio Understanding Chain method for real-time processing.
  • Optimized pipeline throughput by 65% using multiprocessing.
Honors
2023
KDD Student Travel Award
2020
Zhou Lian Scholarship (Top 0.005%)
2020
IAPR/IEEE WSB, Student Travel Grant
19-20
Technology Innovation Scholarship (x2)
18-21
ACM-ICPC Medals (Silver x1, Bronze x5)
Reviewer
2026
AAAI, ICLR, CVPR
2025
NeurIPS, ICML, ICLR, KDD, ACL Rolling
2024
KDD, ACL Rolling
Journals
IEEE TDSC, IEEE Access