Linli Yao

I am a third-year PhD student at the Language Computing and Machine Learning Group (Lanco), Peking University, supervised by Prof. Xu Sun. Prior to this, I received my Master's and Bachelor's degrees from Renmin University of China in 2023 and 2020, respectively, advised by Prof. Qin Jin at the AI·M3 Lab.

I expect to graduate in 2027 and am actively looking for full-time positions in 2026 (both academia and industry).

Research Interests

Multimodal Large Language Models (MLLMs): Vision-Language Understanding
Efficient Video Understanding: Token Compression, Frame Sampling, Streaming Video
Time-aware Video Tasks: Dense Video Captioning, Video Grounding, etc

Publications (Full List)

TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions

Linli Yao, Yuancheng Wei, Yaojie Zhang, Lei Li, Xinlong Chen, Feifan Song, Ziyue Wang, Kun Ouyang, et al.

ICML 2026.

[Paper] [Project Page] [Code] [Dataset] [Model]

[Survey] Towards Efficient Multimodal Large Language Models: A Survey on Token Compression

Linli Yao*, Long Xing*, Yang Shi*, Sida Li, Yuanxin Liu, Yuhao Dong, Yi-Fan Zhang, Lei Li, Qingxiu Dong, et al. (* indicates equal contribution)

Preprint, TechRxiv, 2026.

[Paper] [Github] [机器之心]

TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos

Linli Yao*, Yicheng Li*, Yuancheng Wei*, Lei Li, Shuhuai Ren, Yuanxin Liu, et al.

ACM MM 2025.

[Paper] [Project Page] [Code]

Generative Frame Sampler for Long Video Understanding

Linli Yao, Haoning Wu, Kun Ouyang, Yuanxing Zhang, Caiming Xiong, Bei Chen, Xu Sun, Junnan Li

ACL 2025 (Findings).

[Paper] [Project Page] [Code]

DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models

Linli Yao, Lei Li, Shuhuai Ren, Lean Wang, Yuanxin Liu, Xu Sun, Lu Hou

Preprint, arxiv:2405.20985, 2024.

[Paper] [Code]

Temporal Reasoning Transfer from Text to Video

Lei Li*, Yuanxin Liu*, Linli Yao, Peiyuan Zhang, Chenxin An, Lean Wang, Xu Sun, Lingpeng Kong, Qi Liu

ICLR 2025.

[Paper] [Project Page] [Code]

Edit As You Wish: Video Caption Editing with Multi-grained User Control

Linli Yao, Yuanmeng Zhang, Ziheng Wang, Xinglin Hou, Tiezheng Ge, Yuning Jiang, Xu Sun, Qin Jin

ACM MM 2024.

[Paper] [Dataset] [Code]

TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

Shuhuai Ren*, Linli Yao*, Shicheng Li, Xu Sun, Lu Hou

(* indicates equal contribution)

CVPR 2024.

[Paper] [Code]

UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos

Yuting Mei, Linli Yao, Qin Jin

ICMR 2024.

[Paper] [Code]

LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-text Generation?

Yuchi Wang, Shuhuai Ren, Rundong Gao, Linli Yao, Qingyan Guo, Kaikai An, Jianhong Bai, Xu Sun

NAACL 2024.

[Paper] [Code]

CapEnrich: Enriching Caption Semantics for Web Images via Cross-modal Pre-trained Knowledge

Linli Yao, Weijing Chen, Qin Jin

The Web Conference (WWW) 2023.

[Paper] [Code]

Rethinking Benchmarks for Cross-modal Image-text Retrieval

Weijing Chen, Linli Yao, Qin Jin

SIGIR 2023, long paper.

[Paper] [Code]

Image Difference Captioning with Pre-training and Contrastive Learning

Linli Yao, Weiying Wang, Qin Jin

AAAI 2022 .

[Paper] [Code]

Education

2023.09 - Present	PhD Student	School of Computer Science, Peking University
2020.09 - 2023.06	Master	School of Information, Renmin University of China
2016.09 - 2020.06	Bachelor	School of Information, Renmin University of China

Experience

2024.12 - 2025.12

Research Intern

Kling Team, Kuaishou Technology, Advised by Yuanxing Zhang.

2024.08 - 2024.11

Research Intern

Multimodal Group @ 01.AI, Advised by Bei Chen and Junnan Li.

2022.10 - 2023.07

Research Intern

Alimama CV&NLP Group @ Alibaba, Advised by Tiezheng Ge.

2022.04 - 2022.10

Organizer / Workshop Chair

Person in Context (PIC) Workshop @ ACM MM 2022

The MTVG and MDVC tasks attracted participation from 40 teams worldwide, including prestigious institutions such as Tsinghua University, Peking University, and the University of Hong Kong. It also included industry teams like Tencent, JD.com, Xiaomi, and Bilibili.

2020.04 - 2020.07

Organizer

YouMakeup Video Challenge @ CVPR LVVU Workshop 2020

Awards

2022	National Scholarship	Ministry of Education of China
2023 & 2020	Outstanding Graduate	Renmin University of China
2022 & 2021	1st Class Grade Scholarship	Renmin University of China
2021 & 2018	Merit Student	Renmin University of China
2019	1st Prize of China Undergraduate Mathematical Contest in Modeling (Beijing)	Beijing
2018	Meritorious Winner of American Mathematical Contest In Modeling	U.S.

Academic Service

Reviewer: AAAI 2023/2024, CVPR 2024/2025, NeurIPS 2024, ACM MM 2024/2025, Transactions on Image Processing.

Teaching assistant: Spoken Language Processing (RUC, 2020), Multimedia Application Technology (RUC, 2020), Academic Criterion and Writing (RUC, 2022), Human Language and Artificial Intelligence (PKU, 2024).