Princeton University
Princeton University
Princeton University
*Equal Contribution
The International Classification of Diseases (ICD-10) provides a standardized framework for categorizing medical conditions and diseases worldwide. Our ICD-Bench evaluation suite comprises 3,675 high-quality medical reasoning questions systematically generated from knowledge graph paths. Questions are organized by ICD-10 medical categories with balanced difficulty distribution across 5 progressive difficulty levels.
Basic medical terminology and fundamental concepts
Straightforward medical knowledge and basic clinical reasoning
Intermediate medical concepts requiring clinical correlation
Advanced medical reasoning and complex diagnostic scenarios
Expert-level challenges requiring comprehensive medical knowledge
Comprehensive coverage of disease topologies and clinical vignettes
Choose one out of 15 ICD-10 categories (e.g., Behavioral and Neurodevelopmental disorders, Nutritional and metabolic diseases, Neoplasms)
Automatically balanced difficulty distribution across 5 levels with randomized question selection
Multiple choice questions based on realistic medical scenarios
Detailed explanations with Knowledge Graph paths and structured QwQ-Med-3 reasoning responses
Monitor your score across different difficulty levels and medical specialties