Linli Yao

profile.png

I am a third-year PhD student at the Language Computing and Machine Learning Group (Lanco), Peking University, supervised by Prof. Xu Sun. Prior to this, I received my Master's and Bachelor's degrees from Renmin University of China in 2023 and 2020, respectively, advised by Prof. Qin Jin at the AI·M3 Lab.

I expect to graduate in 2027 and am actively looking for full-time positions in 2026 (both academia and industry).

Research Interests

  • Multimodal Large Language Models (MLLMs): Vision-Language Understanding
  • Efficient Video Understanding: Token Compression, Frame Sampling, Streaming Video
  • Time-aware Video Tasks: Dense Video Captioning, Video Grounding, etc


Publications (Full List)


    TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions
    Linli Yao, Yuancheng Wei, Yaojie Zhang, Lei Li, Xinlong Chen, Feifan Song, Ziyue Wang, Kun Ouyang, et al.
    Preprint, arxiv:2602.08711, 2026.

    [Survey] Towards Efficient Multimodal Large Language Models: A Survey on Token Compression
    Linli Yao*, Long Xing*, Yang Shi*, Sida Li, Yuanxin Liu, Yuhao Dong, Yi-Fan Zhang, Lei Li, Qingxiu Dong, et al. (* indicates equal contribution)
    Preprint, TechRxiv, 2026.

    TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos
    Linli Yao*, Yicheng Li*, Yuancheng Wei*, Lei Li, Shuhuai Ren, Yuanxin Liu, et al.
    ACM MM 2025.

    Generative Frame Sampler for Long Video Understanding
    Linli Yao, Haoning Wu, Kun Ouyang, Yuanxing Zhang, Caiming Xiong, Bei Chen, Xu Sun, Junnan Li
    ACL 2025 (Findings).

    DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
    Linli Yao, Lei Li, Shuhuai Ren, Lean Wang, Yuanxin Liu, Xu Sun, Lu Hou
    Preprint, arxiv:2405.20985, 2024.
    [Paper] [Code]

    Temporal Reasoning Transfer from Text to Video
    Lei Li*, Yuanxin Liu*, Linli Yao, Peiyuan Zhang, Chenxin An, Lean Wang, Xu Sun, Lingpeng Kong, Qi Liu
    ICLR 2025.

    Edit As You Wish: Video Caption Editing with Multi-grained User Control
    Linli Yao, Yuanmeng Zhang, Ziheng Wang, Xinglin Hou, Tiezheng Ge, Yuning Jiang, Xu Sun, Qin Jin
    ACM MM 2024.

    TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
    Shuhuai Ren*, Linli Yao*, Shicheng Li, Xu Sun, Lu Hou
    (* indicates equal contribution)
    CVPR 2024.
    [Paper] [Code]

    UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos
    Yuting Mei, Linli Yao, Qin Jin
    ICMR 2024.
    [Paper] [Code]

    LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-text Generation?
    Yuchi Wang, Shuhuai Ren, Rundong Gao, Linli Yao, Qingyan Guo, Kaikai An, Jianhong Bai, Xu Sun
    NAACL 2024.
    [Paper] [Code]

    CapEnrich: Enriching Caption Semantics for Web Images via Cross-modal Pre-trained Knowledge
    Linli Yao, Weijing Chen, Qin Jin
    The Web Conference (WWW) 2023.
    [Paper] [Code]

    Rethinking Benchmarks for Cross-modal Image-text Retrieval
    Weijing Chen, Linli Yao, Qin Jin
    SIGIR 2023, long paper.
    [Paper] [Code]

    Image Difference Captioning with Pre-training and Contrastive Learning
    Linli Yao, Weiying Wang, Qin Jin
    AAAI 2022 .
    [Paper] [Code]

Education

2023.09 - Present PhD Student School of Computer Science, Peking University
2020.09 - 2023.06 Master School of Information, Renmin University of China
2016.09 - 2020.06 Bachelor School of Information, Renmin University of China

Experience


2024.12 - 2025.12
Research Intern
Kling Team, Kuaishou Technology, Advised by Yuanxing Zhang.

2024.08 - 2024.11
Research Intern
Multimodal Group @ 01.AI, Advised by Bei Chen and Junnan Li.

2022.10 - 2023.07
Research Intern
Alimama CV&NLP Group @ Alibaba, Advised by Tiezheng Ge.

2022.04 - 2022.10
Organizer / Workshop Chair
Person in Context (PIC) Workshop @ ACM MM 2022

The MTVG and MDVC tasks attracted participation from 40 teams worldwide, including prestigious institutions such as Tsinghua University, Peking University, and the University of Hong Kong. It also included industry teams like Tencent, JD.com, Xiaomi, and Bilibili.

2020.04 - 2020.07
Organizer
YouMakeup Video Challenge @ CVPR LVVU Workshop 2020


Awards

2022 National Scholarship Ministry of Education of China
2023 & 2020 Outstanding Graduate Renmin University of China
2022 & 2021 1st Class Grade Scholarship Renmin University of China
2021 & 2018 Merit Student Renmin University of China
2019 1st Prize of China Undergraduate Mathematical Contest in Modeling (Beijing) Beijing
2018 Meritorious Winner of American Mathematical Contest In Modeling U.S.

Academic Service

  • Reviewer: AAAI 2023/2024, CVPR 2024/2025, NeurIPS 2024, ACM MM 2024/2025, Transactions on Image Processing.
  • Teaching assistant: Spoken Language Processing (RUC, 2020), Multimedia Application Technology (RUC, 2020), Academic Criterion and Writing (RUC, 2022), Human Language and Artificial Intelligence (PKU, 2024).