Research Topics

(Updated Jul 2025)

Full publications available on Google Scholar, DBLP, arXiv.

Research Summary: The research of my lab is focused on the principles and practice of machine intelligence, often with a focus on generalization, and making machine learning more reliable. Our applied research includes applications to healthcare, biomedical imaging, and cognitive neuroscience.

* indicates equal contribution. ** indicates alphabetic author order. ‡ indicates authors working closely with me.

Efficient and Scalable World Foundation Models

We explore how to design general-purpose algorithms that enable world foundation models to align efficiently across tasks, modalities, and data scales. Our focus is on task-agnostic pretraining, scalable adaptation, and architectures that support transfer and compositionality. We aim to unify learning across heterogeneous domains while minimizing supervision and compute overhead. To achieve this, we develop methods that support dynamic task alignment, modular learning, and cross-domain generalization. These algorithmic advances lay the groundwork for building universal models applicable to science, medicine, and beyond.

Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation
Tiansheng Wen*, Yifei Wang*, Zequn Zeng, Zhong Peng, Yudi Su, Xinyang Liu, Bo Chen, Hongwei Liu, Stefanie Jegelka, Chenyu You
ICML 2025 / Paper / Code / Hugging Face Blog / X (Formerly Twitter)
Oral Presentation

Rethinking Semi-Supervised Medical Image Segmentation: A Variance-Reduction Perspective
Chenyu You, Weicheng Dai, Yifei Min, Fenglin Liu, David A. Clifton, S. Kevin Zhou, Lawrence Staib, James S. Duncan
NeurIPS 2023 / Paper / Code / News

UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers
Dehai Min, Zhiyang Xu, Guilin Qi, Lifu Huang, Chenyu You
NAACL 2025 / Paper / Code
Oral Presentation

(Robust) Machine Learning for Imperfect Data

The development of machine learning models, particularly in the context of label scarcity, increasingly necessitates the collection of substantial annotated data. Moreover, massive data often display a long-tailed class distribution or subpopulation shifts, which consequently results in notable imbalance issues. To this end, there are several growing interests in training machine learning models jointly across imbalanced subpopulation distributions and limited annotations. We are developing novel algorithmic and computational approaches to ensure the efficiency and robustness of large machine learning models. Our applied research includes applications to healthcare, biomedical imaging, and cognitive neuroimaging.

Calibrating Multi-modal Representations: A Pursuit of Group Robustness without Annotations
Chenyu You*, Yifei Min*, Weicheng Dai*, Jasjeet S Sekhon, Lawrence Staib, James S Duncan
CVPR 2024 / Paper / Code

Mine yOur owN Anatomy: Revisiting Medical Image Segmentation with Extremely Limited Labels
Chenyu You*, Weicheng Dai*, Fenglin Liu, Yifei Min, Xiaoxiao Li, David A. Clifton, Lawrence Staib, James S Duncan
IEEE TPAMI 2024 / Paper / Code

Bootstrapping Semi-supervised Medical Image Segmentation with Anatomical-aware Contrastive Distillation
Chenyu You, Weicheng Dai, Yifei Min, Lawrence Staib, James S. Duncan
IPMI 2023 / Paper / Code

SimCVD: Simple Contrastive Voxel-Wise Representation Distillation for Semi-Supervised Medical Image Segmentation
Chenyu You, Weicheng Dai, Yifei Min, Lawrence Staib, James S. Duncan
IEEE TMI 2022 / Paper

Learning with Theoretical Guarantees

As machine learning methods have become ubiquitous in human decision-making, their reliability and interpretability have become important. This is particularly crucial in domains where decisions carry significant consequences, interpretable models can uncover crucial but unexpected patterns that complex models often obscure. We are currently studying provably interpretable modeling with theoretical guarantees. We are also exploring structured sparsity and attention in deep neural networks to enable interpretability.

ACTION++: Improving Semi-supervised Medical Image Segmentation with Adaptive Anatomical Contrast
Chenyu You, Weicheng Dai, Yifei Min, Lawrence Staib, Jasjeet S. Sekhon, James S. Duncan
MICCAI 2023 / Paper / Code
Early Accept

Class-Aware Adversarial Transformers for Medical Image Segmentation
Chenyu You, Ruihan Zhao, Fenglin Liu, Siyuan Dong, Sandeep Chinchali, Ufuk Topcu, Lawrence Staib, James S. Duncan
NeurIPS 2022 / Paper / News (Chinese)

Learning with Multi-Modality Data

Multi-modality data is ubiquitous in healthcare and science applications. We are pursuing various techniques for modeling such multiple data, primarily using probabilistic graphical models and other statistical analyses. These tools are primarily used to facilitate biomedical research. We are developing various tools to effectively tackle real-world challenges associated with data heterogeneity. Of particular interest are novel methods that address robustness issues, such as confounding, as well as trustworthy computational approaches, with primary applications in healthcare, biomedical imaging, and cognitive neuroscience.

UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers
Dehai Min, Zhiyang Xu, Guilin Qi, Lifu Huang, Chenyu You
NAACL 2025 / Paper / Code
Oral Presentation

End-to-end Spoken Conversational Question Answering: Task, Dataset and Model
Chenyu You*, Nuo Chen^‡*, Fenglin Liu, Shen Ge, Xian Wu, Yuexian Zou
NAACL Findings 2022 / Paper / Code

Auto-Encoding Knowledge Graph for Unsupervised Medical Report Generation
Fenglin Liu^‡, Chenyu You, Xian Wu, Shen Ge, Sheng Wang, Xu Sun
NeurIPS 2021 / Paper

Self-supervised Contrastive Cross-Modality Representation Learning for Spoken Question Answering
Chenyu You*, Nuo Chen^‡*, Fenglin Liu, Shen Ge, Xian Wu, Yuexian Zou
EMNLP Findings 2021 / Paper / Code

World Foundation Models for Biomedical Data

The development of medical world foundation models often requires massive and diverse biomedical data. To this end, I have developed various world foundation models for biomedical imaging data and explored novel applications of these models. I have also developed novel biomedical AI Agents that lead to the scalable and accurate predictive modeling, particularly for distribution shift problems.

Uncovering Memorization Effect in the Presence of Spurious Correlations
Chenyu You*, Haocheng Dai*, Yifei Min*, Jasjeet S. Sekhon, Sarang Joshi, James S. Duncan
Nature Communications 2025 / Paper / Code

OTSurv: A Novel Multiple Instance Learning Framework for Survival Prediction with Heterogeneity-aware Optimal Transport
Qin Ren, Yifan Wang, Ruogu Fang, Haibin Ling, Chenyu You
MICCAI 2025 / Paper / Code / Hugging Face Collection

Segment Anything in Medical Images
Jun Ma^‡, Yuting He^‡, Feifei Li^‡, Lin Han^‡, Chenyu You, Bo Wang
Nature Communications 2024 / Paper / Code / Highlight Page
Highlight / ESI - Top 1% highly cited papers

Implicit Anatomical Rendering for Medical Image Segmentation with Stochastic Experts
Chenyu You, Weicheng Dai, Yifei Min, Lawrence Staib, James S. Duncan
MICCAI 2023 / Paper / Code
Early Accept