Zongyu (Johnson) Lin「林宗裕」
I am a first-year CS Ph.D Student at UCLA, co-advised by Prof. Yizhou Sun and Prof. Kaiwei Chang. Before coming to UCLA, I have spent a year in Moonshot.AI (one of the earliest core members), working as a full-time research engineer working on LLM and VideoGen. I was one of the major contributor of training large language models with extremely long context, achieving the state-of-the-art performance on many long context tasks compared with GPT4 and Claude2 at that time, and also participated in visual generation project. I completed my bachelor's degree of Electronic Enginnering at Tsinghua University. Luckily, I have worked with Prof. Zhilin Yang, Prof. Yong Li, Prof. Hanan Samet and Prof. Cyrus Shahabi.
My research interest lies broadly in natural language processing and general machine learning. I have done some work including self-training, instruction finetuning and zero-shot task generalization of LLMs. Most Recently, I am interested in (1) studying the self-evolution paradigm of large language models as well as (2) exploring scalable architectures and recipes for multi-modal generation. Feel free to contact me for chat or discussion if you are also interested in these topics.
Email: lzyxx17 [at] gmail.com
Twitter  
Google Scholar  
Linkedin
|
|
News
2024-6 Our new paper on contradiction retrieval: SPARSECL: Sparse Contrastive Learning for
Contradiction Retrieval is available now at preprint.
2024-6 VideoPhy: Evaluating Physical Commonsense In Video Generation, get accepted by DMLR@ICML 2024, please check our preprint.
|
Experience
Research Intern, Apple, 2024
Research Engineer, Moonshot.AI 2023
Quant Researcher Intern, Ubiquant, Top Hedge Fund in China. 2022
Research Intern, Sensetime, China, 2021
|
Research Topic
My research interest lies broadly in natural language processing and general machine learning. Most Recently, I am interested in
(1) exploring scalable architectures and recipes for multi-modal generation;
(2) improving the self-evolution of LLMs / VLMs;
(3) improve better alignment with the physical world for vision generative models and vision language models
|
Recent Work
|
|
SPARSECL: Sparse Contrastive Learning for
Contradiction Retrieval
Haike Xu*,
Zongyu Lin*,Yizhou Sun, Kai-Wei Chang, Piotr Indyk
*Equal Contribution
arXiv, 2024
preprint, website, code
Contradiction retrieval refers to identifying and extracting documents that explicitly
disagree with or refute the content of a query, which is important to many downstream applications like fact checking and data cleaning. To retrieve contradiction argument to the query from large document corpora, existing methods such as
similarity search and crossencoder models exhibit significant limitations. The
former struggles to capture the essence of contradiction due to its inherent nature
of favoring similarity, while the latter suffers from computational inefficiency,
especially when the size of corpora is large. To address these challenges, we
introduce a novel approach: SPARSECL that leverages specially trained sentence
embeddings designed to preserve subtle, contradictory nuances between sentences.
Our method utilizes a combined metric of cosine similarity and a sparsity function
to efficiently identify and retrieve documents that contradict a given query. This
approach dramatically enhances the speed of contradiction detection by reducing
the need for exhaustive document comparisons to simple vector calculations. We
validate our model using the Arguana dataset, a benchmark dataset specifically
geared towards contradiction retrieval, as well as synthetic contradictions generated
from the MSMARCO and HotpotQA datasets using GPT-4. Our experiments
demonstrate the efficacy of our approach not only in contradiction retrieval with
more than 30% accuracy improvements on MSMARCO and HotpotQA across
different model architectures but also in applications such as cleaning corrupted
corpora to restore high-quality QA retrieval. This paper outlines a promising
direction for improving the accuracy and efficiency of contradiction retrieval in
large-scale text corpora.
|
|
VideoPhy: Evaluating Physical Commonsense In Video Generation
Hritik Bansal*,
Zongyu Lin*,
Jing Zhou,
Tianyi Xie, Zeshun Zong, Michal Yarom, Yonatan Bitton, Chenfanfu Jiang, Yizhou Sun, Kai-Wei Chang, Aditya Grover
*Equal Contribution
arXiv, 2024, accepted by DMLR@ICML 2024
preprint, website, code
Recent advances in internet-scale video data pretraining have led to the development of text-to-video generative models that can create high-quality videos across a broad range of visual concepts and styles. Due to their ability to synthesize realistic motions and render complex objects, these generative models have the potential to become general-purpose simulators of the physical world. However, it is unclear how far we are from this goal with the existing text-to-video generative models. To this end, we present VideoPhy, a benchmark designed to assess whether the generated videos follow physical commonsense for real-world activities (e.g. marbles will roll down when placed on a slanted surface). Specifically, we curate a list of 688 captions that involve interactions between various material types in the physical world (e.g., solid-solid, solid-fluid, fluid-fluid). We then generate videos conditioned on these captions from diverse state-of-the-art text-to-video generative models, including open models (e.g., VideoCrafter2) and closed models (e.g., Lumiere from Google, Pika). Further, our human evaluation reveals that the existing models severely lack the ability to generate videos adhering to the given text prompts, while also lack physical commonsense. Specifically, the best performing model, Pika, generates videos that adhere to the caption and physical laws for only 19.7% of the instances. VideoPhy thus highlights that the video generative models are far from accurately simulating the physical world. Finally, we also supplement the dataset with an auto-evaluator, VideoCon-Physics, to assess semantic adherence and physical commonsense at scale.
|
|
A Universal Discriminator for Zero-Shot Generalization
Haike Xu,
Zongyu Lin,
Jing Zhou,
Yanan Zheng, Zhilin Yang
ACL Long Paper, 2023
Generative modeling has been the dominant approach for large-scale pretraining and zero-shot generalization. In this work, we challenge this convention by showing that discriminative approaches perform substantially better than generative ones on a large number of NLP tasks. Technically, we train a single discriminator to predict whether a text sample comes from the true data distribution, similar to GANs. Since many NLP tasks can be formulated as selecting from a few options, we use this discriminator to predict the option with the highest probability. This simple formulation achieves state-of-the-art zero-shot results on the T0 benchmark, outperforming T0 by 16.0%, 7.8%, and 11.5% respectively on different scales.
Meanwhile, our approach requires minimal prompting efforts, which largely improves robustness and is essential for real-world applications.
|
|
NOT ALL TASKS ARE BORN EQUAL: UNDERSTANDING ZERO-SHOT GENERALIZATION
Jing Zhou,
Zongyu Lin,
Yanan Zheng, Zhilin Yang
ICLR Spotlight, 2023
Recent work has achieved remarkable zero-shot performance with multi-task prompted pretraining, but little has been understood. For the first time, we show
that training on a small number of key tasks beats using all the training tasks,
while removing these key tasks substantially hurts performance. We also find that
these key tasks are mostly question answering (QA) tasks. We design a shuffle
experiment to further show that training on these QA tasks leads to better cross-task
generalization in multi-task learning under various training/test task splits. These
novel findings combined deepen our understanding about zero-generalization—
training on certain tasks such as QA encodes general knowledge transferable to a
wide range of tasks, which explains the improved zero-shot performance in recent
progress. In addition, to automate this procedure, we devise a method to identify
and upsample key training tasks without observing the test tasks based on cross
validation. Empirically, our approach achieves improved results across various
model scales and tasks.
|
|
Learning to Detect Noisy Labels Using Model-Based Features
Zhihao Wang*,
Zongyu Lin*,
Peiqi Liu, Guidong Zheng, Junjie Wen, Xianxin Chen, Yujun Chen, Zhilin Yang
(* First Co-Authors)
Findings of EMNLP, 2023
Label noise is ubiquitous in various machine learning scenarios such as self-labeling with model predictions and erroneous data annotation. Many existing approaches are based on heuristics such as sample losses, which might not be flexible enough to achieve optimal solutions. Meta learning based methods address this issue by learning a data selection function, but can be hard to optimize. In light of these pros and cons, we propose SENT (Selection-Enhanced Noisy label Training) that does not rely on meta learning while having the flexibility of being data-driven. SENT transfers the noise distribution to a clean set and trains a model to distinguish noisy labels from clean ones using model-based features. Empirically, on a wide range of tasks including text classification and speech recognition, SENT improves performance over strong baselines under the settings of self-training and label corruption.
|
Publications
|
|
Learning to Detect Noisy Labels Using Model-Based Features
Zongyu Lin*,
Zhihao Wang*
Peiqi Liu, Guidong Zheng, Junjie Wen, Xianxin Chen, Yujun Chen, Zhilin Yang
(* First Co-Authors)
Findings of EMNLP, 2022 (To Appear)
project page /
arXiv /
Label noise is ubiquitous in various machine learning scenarios such as self-labeling with model predictions and erroneous data annotation. Many existing approaches are based on heuristics such as sample losses, which might not be flexible enough to achieve optimal solutions. Meta learning based methods address this issue by learning a data selection function, but can be hard to optimize. In light of these pros and cons, we propose SENT (Selection-Enhanced Noisy label Training) that does not rely on meta learning while having the flexibility of being data-driven. SENT transfers the noise distribution to a clean set and trains a model to distinguish noisy labels from clean ones using model-based features. Empirically, on a wide range of tasks including text classification and speech recognition, SENT improves performance over strong baselines under the settings of self-training and label corruption.
|
|
Hagen: Homophily-aware graph convolutional recurrent network for crime forecasting
Zongyu Lin*,
Chenyu Wang*,
Guozhen Zhang,
Xiaochen Yang,
Jiao Sun, Mingxuan Yue, Cyrus Shahabi
(* First Co-Authors)
Proceedings of the AAAI Conference on Artificial Intelligence, 2022
paper
We propose an end-to-end graph convolutional recurrent network called HAGEN with several novel designs for crime prediction. Specifically, our framework could jointly capture the crime correlation between regions and the temporal crime dynamics by combining an adaptive region graph learning module with the Diffusion Convolution Gated Recurrent Unit (DCGRU). Based on the homophily assumption of GNN (i.e., graph convolution works better where neighboring nodes share the same label), we propose a homophily-aware constraint to regularize the optimization of the region graph so that neighboring region nodes on the learned graph share similar crime patterns.
|
|
Vehicle Trajectory Recovery on Road Network Based on Traffic Camera Video Data
Zongyu Lin,
Guozhen Zhang,
Zhiqun He,
Jie Feng,
Wei Wu,
Yong Li
Proceedings of the 29th International Conference on Advances in Geographic Information Systems, 2021
paper
We propose a general system to recover vehicle trajectories at the level of the road intersection, where a novel iterative framework is developed to combine both vehicle clustering and trajectory recovery tasks.
|
|
HealthWalks: Sensing Fine-grained Individual Health Condition via Mobility Data
Zongyu Lin,
Shiqing Lyu,
Hancheng Cao,
Yuqiong Wei,
Pan Hui,
Hanan Samet,
Yong Li
In ACM International Joint Conference on Pervasive and Ubiquitous Computing (UBICOMP), 2020
paper
We propose a DFA-based model which can generate interpretable features automatically from raw mobility data for fine-grained health sensing.
|
|
SUME: Semantic-enhanced Urban Mobility Network Embedding for User Demographic Inference
Fengli Xu*,
Zongyu Lin*,
Tong Xia,
Diansheng Guo,
Yong Li
(* Equal Contributions)
In ACM International Joint Conference on Pervasive and Ubiquitous Computing (UBICOMP), 2020
paper
We propose a semantic-enhanced urban mobility embedding model for user profiling, and reveal meaningful patterns in all spatial, temporal and urban structure domains.
|
|
CrimeForecaster: Crime Prediction by Exploiting the Neighborhoods’ Spatiotemporal Dependencies
Jiao Sun,
Mingxuan Yue,
Zongyu Lin,
Xiaochen Yang,
Gabe Kahn,
Luciano Nocera,
Cyrus Shahabi
The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD) , 2020 (To appear)
We introduce a new end-to-end spatiotemporal learning framework dubbed CrimeForecaster that: 1) represents the geographical extents of neighborhoods and their correlations in a graph; 2) uses graph convolution to predict crimes.
|
Selected Awards
Comprehensive Outstanding Scholarship(~10/280), Tsinghua University. 2020
Excellent Technology Innovation Scholarship, Tsinghua University. 2020
First Prize in Software Design Contest, Department of Electronic Enginnering, Tsinghua University. 2018
|
Hobbies
Sports! I really enjoy playing ballgames like football and tennis. I am a big fan of Lionel Messi, Rafael Nadal and Stephen Curry! Also, I love running, swimming and hiking.
|
Updated at June.2024. Thanks Jon Barron for this concise and beautiful template.
|