MGLL: Multi-Granular Language Learning

This repository is the official implementation of the paper “Boosting Medical Visual Understanding From Multi-Granular Language Learning” (ICLR 2026). Arxiv, ResearchGate

Abstract

Recent advances in image-text pretraining have significantly enhanced visual understanding by aligning visual and textual representations. Contrastive Language-Image Pretraining (CLIP) has played a pivotal role in multimodal learning. However, its focus on single-label, single-granularity alignment limits its effectiveness in complex domains such as medical imaging, where images often correspond to multiple labels across different levels of granularity. To address this, we propose Multi-Granular Language Learning (MGLL), a contrastive learning framework designed to improve both multi-label and cross-granularity alignment. MGLL leverages structured multi-label supervision, integrates textual descriptions across granularities, and introduces soft-label supervision with point-wise constraints to enhance alignment. MGLL employs smooth Kullback–Leibler (KL) divergence to ensure cross-granularity consistency while maintaining computational efficiency as a plug-and-play module for vision-language models. Pretrained on our constructed large-scale multi-granular datasets and evaluated across multiple datasets, MGLL outperforms other state-of-the-art methods in downstream tasks.

Requirements

Python == 3.11 and install from the requirements.txt using:

pip install -r requirements.txt

Dataset and Pretrain Model Weights

MIDRC dataset: link

MIMIC-CXR: link

Chest-Xray14: link

MGLL-Fundus dataset: Image, Text

Pretrain model weights on MGLL-Fundus: link

Usage

1. Pre-training

You can set your parameters in ./exps/pretrain.sh and train your own model by running the following command.

bash ./exps/pretrain.sh

2. Downstream

You can set your parameters in ./exps/downstream.sh and train your own model by running the following command.

bash ./exps/downstream.sh

To obtain more pre-trained models and the in-house datasets in the future, you can contact the email address zhanli@uw.edu. We just handle the real-name email and your email suffix must match your affiliation. The email should contain the following information:

Name/Homepage/Google Scholar: (Tell us who you are.)
Primary Affiliation: (The name of your institution or university, etc.)
Job Title: (E.g., Professor, Associate Professor, Ph.D., etc.)
Affiliation Email: (the password will be sent to this email, we just reply to the email which is the end of "edu".)
How to use: (Only for academic research, not for commercial use or second-development.)

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
exps		exps
models		models
util		util
README.md		README.md
engine_finetune.py		engine_finetune.py
engine_pretrain.py		engine_pretrain.py
main_finetune.py		main_finetune.py
main_pretrain.py		main_pretrain.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MGLL: Multi-Granular Language Learning

Abstract

Requirements

Dataset and Pretrain Model Weights

Usage

1. Pre-training

2. Downstream

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MGLL: Multi-Granular Language Learning

Abstract

Requirements

Dataset and Pretrain Model Weights

Usage

1. Pre-training

2. Downstream

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages