This project implements a framework for echocardiographic view classification that goes beyond simple labeling by providing clinically grounded reasoning for every decision.
Standard deep learning models can classify cardiac views with high accuracy, but they lack the ability to explain why a specific view was chosen. In clinical settings, a diagnosis without a justification is of limited use. Clinicians need to know if a model identified specific landmarks—like the left atrium or the mitral valve—to trust the final output.
We leverage a Vision-Language Model (VLM) fine-tuned via Reinforcement Learning to act as a reasoning-driven assistant.
Explainable AI (XAI): Generates step-by-step anatomical justifications for each predicted view (e.g., identifying chamber locations and orientations).
GRPO Optimization: Utilizes Group Relative Policy Optimization to enhance reasoning capabilities without the need for an explicit value function, ensuring stable and efficient training.
Clinically-Informed Rewards: A rule-based reward system evaluates the model's text against official clinical guidelines, rewarding the mention of correct anatomical landmarks and penalizing incorrect ones.
Structural Prompting: Incorporates "anatomical clues" into the model's input to guide its reasoning path without revealing the ground-truth label.