Join

VLM3D Challenge – Task 2: Multi‑Abnormality Classification¶

Welcome to Task 2 of the Vision‑Language Modeling in 3D Medical Imaging (VLM3D) Challenge. In this task, participants develop algorithms that label 3D chest CT volumes with 18 clinically significant abnormalities.

Contents¶

Overview
Dataset
Task Objective
Participation Rules
Evaluation & Ranking
Prizes & Publication
Citation
Contact

Overview¶

Radiologists often screen chest CT scans for multiple co‑occurring pathologies. Automating this step can

Accelerate triage in busy emergency or outpatient workflows
Standardize reporting across institutions
Enable downstream AI tasks such as localization and report generation

Task 2 uses CT‑RATE, currently the largest public chest‑CT dataset with per‑scan abnormality labels, to benchmark multi‑label classification models for 3D data.

Dataset¶

Split	Patients	CT Volumes	Labels/Scan	Source
Train	20 000	≈ 47 k	18	Istanbul Medipol University
Validation	1 304	≈ 3 k	18	Istanbul Medipol University
Internal Test	2 000	2 000	hidden	Istanbul Medipol University
External Test	1 024	1 024	hidden	Boston University Hospital

Each scan is paired with a binary vector indicating the presence/absence of 18 thoracic findings (e.g., pleural effusion, lung nodule). Raw nifti volumes are provided with full DICOM metadata.

Task Objective¶

Given a 3D chest CT volume, predict a binary label for each of the following 18 abnormalities:

Medical material, Arterial wall calcification, Cardiomegaly, Pericardial effusion, Coronary artery wall calcification, Hiatal hernia, Lymphadenopathy, Emphysema, Atelectasis, Lung nodule, Lung opacity, Pulmonary fibrotic sequela, Pleural effusion, Mosaic attenuation pattern, Peribronchial thickening, Consolidation, Bronchiectasis, Interlobular septal thickening

A valid submission must output one 18‑length vector per scan.

Participation Rules¶

Method type: Fully automatic – no human input at inference.
Training data: Use CT‑RATE plus any publicly available data or models.
Team limits: Max 1 submission / day; the last valid run counts.
Organizer teams: May appear on the leaderboard but cannot win prizes.

Evaluation & Ranking¶

Classification Metrics¶

Metric	What it Measures
AUROC	Threshold‑free separability
F1 Score	Balance of precision & recall
CRG Score	Clinically-weighted

Metrics are aggregated per abnormality, then macro‑averaged over all 18 classes.

Final Ranking¶

A point‑based scheme (VerSe / BraTS style):

For each metric, run a two‑sided permutation test (10 000 samples) between every pair of teams.
Award 1 point for each significant win.
Rank by total points (higher = better). Ties share the same place.

Missing predictions receive the minimum score for that scan.

Prizes & Publication¶

Awards – details TBA.
Every team with a valid submission will be invited to co‑author the joint challenge paper (MedIA / IEEE TMI).
An overview manuscript describing baseline results will appear on arXiv before the test phase closes.

Citation¶

If you use CT‑RATE or participate in VLM3D, please cite:

@article{hamamci2024developing,
  title   = {Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography},
  author  = {Hamamci, Ibrahim Ethem and Er, Sezgin and Almas, Furkan and others},
  journal = {arXiv preprint arXiv:2403.17834},
  year    = {2024}
}


@inproceedings{hamamci2025crg,
  title={CRG Score: A Distribution-Aware Clinical Metric for Radiology Report Generation},
  author={Hamamci, Ibrahim Ethem and Er, Sezgin and Shit, Suprosanna and Reynaud, Hadrien and Kainz, Bernhard and Menze, Bjoern},
  booktitle={Medical Imaging with Deep Learning-Short Papers},
  year={2025}
}

Contact¶

For technical issues, open an issue or post on the challenge forum. For all other inquiries, use the “Help → Email organizers” link on the challenge site.

VLM3D Challenge – Task 2: Multi‑Abnormality Classification¶