Tianyu Yu

Tianyu Yu

M.S. in Computer Technology

Tsinghua University

Biography

Tianyu Yu received his B.S. degree from the Beihang University of China in 2021. He is currently working on Natural Language Processing and Multimodal Understanding, especially in MLLMs.

Download my CV in English or Chinese.

Interests
  • Multimodal Large Lanuage Models
  • Learning from Feedback
  • Information Retrieval
Education
  • M.S. in Computer Technology, 2021 ~ now

    Tsinghua University

  • B.Eng. in Software Engineering, 2017 ~ 2021

    Beihang University

Recent Publications

RLHF-V: Towards trustworthy MLLMs via behavior alignment from fine-grained correctional human feedback
SeqGPT: An Out-of-the-Box Large Language Model for Open Domain Sequence Understanding
Visually Grounded Commonsense Knowledge Acquisition
Cross-Modal Omni Interaction Modeling for Phrase Grounding

Experience

 
 
 
 
 
Research Assistant
Oct 2019 – Present Beijing, China

Project: Mitigating MLLM Hallucination with AI Feedback (2023.12 ~ 2024.06)

Project: Mitigating MLLM Hallucination with Fine-grained Correctional Human Feedback (2023.07 ~ 2023.12)

Project: Reformulating VLMs to construct MLLMs (2023.02 ~ 2023.07)

Project: Extract Common-sense Knowledge from Multimodal Corpora (2021.04 ~ 2022.12)

Project: Sentence-Level Pretraining for Document-Level RE (2020.07 ~ 2021.03)

Project: Joint Extraction of Evidence and Relation for Document-Level RE (2019.10 ~ 2020.05)

 
 
 
 
 
Computer Vision Algorithm Engineer
Sep 2020 – Dec 2020 Beijing, China
  • Designed algorithm for RE-ID using sub-graph to capture local context, which was further used in downstream real applications.
  • Designed a de-noising algorithm to automate data cleaning.
  • Designed a large-scale data generation pipeline to reduce the cost of manual labeling.
 
 
 
 
 
Research Assistant
Oct 2019 – Jul 2020 Beijing, China

Project: Cross-Modal Omni Interaction Modeling for Phrase Grounding

  • Adressed the phrase grounding accuracy problem as the primary researcher.
  • Devised a novel model to capture complex spatial and semantic relationship among image regions and phrases through multi-level multi-modal interaction.
  • The new method improved the grounding accuracy by 6.15% on Flickr30K Entities and 21.25% on ReferItGame.