Research Topics

(Updated Jun 2026)

Full publications available on Google Scholar, DBLP, arXiv.

Research Summary: My research studies fundamental and applied problems in computer vision and machine learning, with a focus on AI for health. We build intelligent systems that learn from complex data, reason about and interact with the real world, and drive meaningful impact on healthcare. Besides learning algorithms, we ask: how AI should represent knowledge across imaging, text, and multi-omics data; what principles ensure robustness, trustworthy, and clinical reliability; and how algorithmic advances can translate into real-world healthcare domains. Our research aims to answer these questions by advancing machine learning methods in close collaboration with biomedical and clinical science.

* indicates equal contribution.   ** indicates alphabetic author order.   ‡ indicates authors working closely with me.

  Efficient and Scalable World Foundation Models

We explore how to design general-purpose algorithms that enable world foundation models to align efficiently across tasks, modalities, and data scales. Our focus is on task-agnostic pretraining, scalable adaptation, and architectures that support transfer and compositionality. We aim to unify learning across heterogeneous domains while minimizing supervision and compute overhead. To achieve this, we develop methods that support dynamic task alignment, modular learning, and cross-domain generalization. These algorithmic advances lay the groundwork for building universal models applicable to science, medicine, and beyond.

Supervise Less, See More: Training-free Nuclear Instance Segmentation with Prototype-Guided Prompting
Wen Zhang*, Qin Ren*, Wenjing Liu, Haibin Ling, Chenyu You
ICML 2026 / Paper / Code / Project Page

CSRv2: Unlocking Ultra-Sparse Embeddings
Lixuan Guo*, Yifei Wang*, Tiansheng Wen*, Yifan Wang, Aosong Feng, Bo Chen, Stefanie Jegelka, Chenyu You
ICLR 2026 / Paper / Code / Project Page / Hugging Face

Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation
Tiansheng Wen*, Yifei Wang*, Zequn Zeng, Zhong Peng, Yudi Su, Xinyang Liu, Bo Chen, Hongwei Liu, Stefanie Jegelka, Chenyu You
ICML 2025 / Paper / Code / Hugging Face Blog / X (Formerly Twitter)
Oral Presentation

Ouroboros: Single-step Diffusion Models for Cycle-consistent Forward and Inverse Rendering
Shanlin Sun*, Yifan Wang*, Hanwen Zhang*, Yifeng Xiong, Qin Ren, Ruogu Fang, Xiaohui Xie, Chenyu You
ICCV 2025 / Paper / Code / Project Page

Rethinking Semi-Supervised Medical Image Segmentation: A Variance-Reduction Perspective
Chenyu You, Weicheng Dai, Yifei Min, Fenglin Liu, David A. Clifton, S. Kevin Zhou, Lawrence Staib, James S. Duncan
NeurIPS 2023 / Paper / Code / News

  (Robust) Machine Learning for Imperfect Data

The development of machine learning models, particularly in the context of label scarcity, increasingly necessitates the collection of substantial annotated data. Moreover, massive data often display a long-tailed class distribution or subpopulation shifts, which consequently results in notable imbalance issues. To this end, there are several growing interests in training machine learning models jointly across imbalanced subpopulation distributions and limited annotations. We are developing novel algorithmic and computational approaches to ensure the efficiency and robustness of large machine learning models. Our applied research includes applications to healthcare, biomedical imaging, and cognitive neuroimaging.

Uncovering Memorization Effect in the Presence of Spurious Correlations
Chenyu You*, Haocheng Dai*, Yifei Min*, Jasjeet S. Sekhon, Sarang Joshi, James S. Duncan
Nature Communications 2025 / Paper / Code

Calibrating Multi-modal Representations: A Pursuit of Group Robustness without Annotations
Chenyu You*, Yifei Min*, Weicheng Dai*, Jasjeet S Sekhon, Lawrence Staib, James S Duncan
CVPR 2024 / Paper / Code

Mine yOur owN Anatomy: Revisiting Medical Image Segmentation with Extremely Limited Labels
Chenyu You*, Weicheng Dai*, Fenglin Liu, Yifei Min, Xiaoxiao Li, David A. Clifton, Lawrence Staib, James S Duncan
IEEE TPAMI 2024 / Paper / Code
ESI - Top 1% highly cited papers

Rethinking Semi-Supervised Medical Image Segmentation: A Variance-Reduction Perspective
Chenyu You, Weicheng Dai, Yifei Min, Fenglin Liu, David A. Clifton, S. Kevin Zhou, Lawrence Staib, James S. Duncan
NeurIPS 2023 / Paper / Code / News

SimCVD: Simple Contrastive Voxel-Wise Representation Distillation for Semi-Supervised Medical Image Segmentation
Chenyu You, Weicheng Dai, Yifei Min, Lawrence Staib, James S. Duncan
IEEE TMI 2022 / Paper
Highlight  / ESI - Top 1% highly cited papers

  World Foundation Models for Biomedical Data

The development of medical world foundation models often requires massive and diverse biomedical data. To this end, we have developed various world foundation models for biomedical imaging data and explored novel applications of these models. We have also developed novel biomedical AI Agents that lead to the scalable and accurate predictive modeling, particularly for distribution shift problems.

Let EEG Models Learn EEG
Yifan Wang, Yijia Ma, Wen Li, Chenyu You
ICML 2026 / Paper / Code / Project Page / Hugging Face Collection

FreeBridge: Variational Schrödinger Bridges for Cellular Transition Dynamics
Xurui Wang, Qin Ren, Jun Ma, Haibin Ling, Chenyu You
MICCAI 2026 / Paper / Project Page / Code
Early Accept 

Imaging Foundation Model for Universal Enhancement of Non-Ideal Measurement CT
Rongjun Ge, Yuxin Liu, Zhan Wu, Shangwen Yang, Chenyu You, Ge Wang, Shuo Li, Yuting He, Yang Chen
Nature Communications 2026 / Paper / Code / Hugging Face Collection

OTSurv: A Novel Multiple Instance Learning Framework for Survival Prediction with Heterogeneity-aware Optimal Transport
Qin Ren, Yifan Wang, Ruogu Fang, Haibin Ling, Chenyu You
MICCAI 2025 / Paper / Code / Hugging Face Collection
MICCAI-NIH Registration Grant Award

Segment Anything in Medical Images
Jun Ma, Yuting He, Feifei Li, Lin Han, Chenyu You, Bo Wang
Nature Communications 2024 / Paper / Code / Highlight Page
Editors' Highlights / ESI - Top 1% highly cited papers

  Machine Learning for Neural Systems and Biological Dynamics

We study machine learning methods for neural signals, brain connectivity, and biological state transitions. Neural and biological data often provide partial observations of high-dimensional systems, and computational models offer a way to infer structure from these measurements. EEG recordings are noisy and subject-specific, connectome data can be missing, and fixed-cell imaging captures endpoint populations rather than individual cell trajectories. Our work develops continuous-time generation, imputation-aware prediction, and geometry-constrained stochastic transport, with brain–computer interface (BCI) applications in data augmentation, subject adaptation, and neural decoding.

Let EEG Models Learn EEG
Yifan Wang, Yijia Ma, Wen Li, Chenyu You
ICML 2026 / Paper / Code / Project Page / Hugging Face Collection

NeuroSonic: Conditional Flow Matching for EEG-to-Speech Reconstruction
Wenhao Gao, Yifan Wang, Yijia Ma, Carl Yang, Wen Li, Chenyu You
MICCAI 2026 / Paper / Project Page / Code

FreeBridge: Variational Schrödinger Bridges for Cellular Transition Dynamics
Xurui Wang, Qin Ren, Jun Ma, Haibin Ling, Chenyu You
MICCAI 2026 / Paper / Project Page / Code
Early Accept 

Rescuing Missing Data in Connectome-Based Predictive Modeling
Qinghao Liang, Rongtao Jiang, Brendan D. Adkinson, Matthew Rosenblatt, Saloni Mehta, Maya L. Foster, Siyuan Dong, Chenyu You, Sahand Negahban, Harrison H. Zhou, Joseph Chang, Dustin Scheinost
Imaging Neuroscience 2024 / Paper

  Learning with Theoretical Guarantees

As machine learning methods have become ubiquitous in human decision-making, their reliability and interpretability have become important. This is particularly crucial in domains where decisions carry significant consequences, interpretable models can uncover crucial but unexpected patterns that complex models often obscure. We are currently studying provably interpretable modeling with theoretical guarantees. We are also exploring structured sparsity and attention in deep neural networks to enable interpretability.

Uncovering Memorization Effect in the Presence of Spurious Correlations
Chenyu You*, Haocheng Dai*, Yifei Min*, Jasjeet S. Sekhon, Sarang Joshi, James S. Duncan
Nature Communications 2025 / Paper / Code

Mine yOur owN Anatomy: Revisiting Medical Image Segmentation with Extremely Limited Labels
Chenyu You*, Weicheng Dai*, Fenglin Liu, Yifei Min, Xiaoxiao Li, David A. Clifton, Lawrence Staib, James S Duncan
IEEE TPAMI 2024 / Paper / Code
ESI - Top 1% highly cited papers

Rethinking Semi-Supervised Medical Image Segmentation: A Variance-Reduction Perspective
Chenyu You, Weicheng Dai, Yifei Min, Fenglin Liu, David A. Clifton, S. Kevin Zhou, Lawrence Staib, James S. Duncan
NeurIPS 2023 / Paper / Code / News

ACTION++: Improving Semi-supervised Medical Image Segmentation with Adaptive Anatomical Contrast
Chenyu You, Weicheng Dai, Yifei Min, Lawrence Staib, Jasjeet S. Sekhon, James S. Duncan
MICCAI 2023 / Paper / Code
Early Accept

Class-Aware Adversarial Transformers for Medical Image Segmentation
Chenyu You, Ruihan Zhao, Fenglin Liu, Siyuan Dong, Sandeep Chinchali, Ufuk Topcu, Lawrence Staib, James S. Duncan
NeurIPS 2022 / Paper / News (Chinese)

  Learning with Multi-Modality Data

Multi-modality data is ubiquitous in healthcare and science applications. We are pursuing various techniques for modeling such multiple data, primarily using probabilistic graphical models and other statistical analyses. These tools are primarily used to facilitate biomedical research. We are developing various tools to effectively tackle real-world challenges associated with data heterogeneity. Of particular interest are novel methods that address robustness issues, such as confounding, as well as trustworthy computational approaches, with primary applications in healthcare, biomedical imaging, and cognitive neuroscience.

Ouroboros: Single-step Diffusion Models for Cycle-consistent Forward and Inverse Rendering
Shanlin Sun*, Yifan Wang*, Hanwen Zhang*, Yifeng Xiong, Qin Ren, Ruogu Fang, Xiaohui Xie, Chenyu You
ICCV 2025 / Paper / Code

UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers
Dehai Min, Zhiyang Xu, Guilin Qi, Lifu Huang, Chenyu You
NAACL 2025 / Paper / Code
Oral Presentation

End-to-end Spoken Conversational Question Answering: Task, Dataset and Model
Chenyu You*, Nuo Chen*, Fenglin Liu, Shen Ge, Xian Wu, Yuexian Zou
NAACL Findings 2022 / Paper / Code

Auto-Encoding Knowledge Graph for Unsupervised Medical Report Generation
Fenglin Liu, Chenyu You, Xian Wu, Shen Ge, Sheng Wang, Xu Sun
NeurIPS 2021 / Paper

Self-supervised Contrastive Cross-Modality Representation Learning for Spoken Question Answering
Chenyu You*, Nuo Chen*, Fenglin Liu, Shen Ge, Xian Wu, Yuexian Zou
EMNLP Findings 2021 / Paper / Code