CXReasonAgent: Evidence-Grounded
Chest X-ray interpretation is a multi-step diagnostic reasoning process that involves identifying anatomical regions, deriving measurements or spatial observations from the image, and applying diagnostic criteria. For diagnostic assistants to be reliable in clinical practice, their reasoning must therefore be grounded in verifiable diagnostic evidence derived from the image.
However, recent studies show that large vision-language models (LVLMs) often generate plausible but ungrounded responses that are not faithfully supported by diagnostic evidence in the image. In addition, LVLMs typically present reasoning only through textual explanations, making it difficult to verify how conclusions are derived from the image. Moreover, extending LVLMs to support diverse diagnostic tasks often requires costly retraining.
To address these limitations, we introduce CXReasonAgent, a diagnostic agent that integrates a large language model (LLM) with clinically grounded diagnostic tools. Instead of directly generating answers, the agent calls diagnostic tools that extract image-derived diagnostic evidence, including quantitative measurements and spatial observations, along with visual evidence presented on the image. The agent then produces responses grounded in this explicit diagnostic evidence.
To evaluate evidence-grounded diagnostic reasoning, we introduce CXReasonDial, a multi-turn dialogue benchmark containing 1,946 dialogues across 12 diagnostic tasks. The benchmark evaluates whether model responses are correctly grounded in diagnostic evidence across dialogue turns, reflecting the iterative nature of real clinical reasoning.
CXReasonAgent performs evidence-grounded diagnostic reasoning by combining an LLM with clinically grounded diagnostic tools. Given a user query and a chest X-ray, the agent identifies the requested diagnostic task, calls the appropriate tool to obtain image-derived evidence, and generates a response grounded in the returned evidence. This design supports reliable, verifiable, and coherent multi-turn diagnostic interactions.
The agent first interprets the user query to identify the requested diagnostic task and the type of evidence needed. Queries may ask for diagnostic evidence such as measurements or spatial observations, or for visual evidence that presents this information directly on the image. Based on this interpretation, the agent selects the appropriate diagnostic tool.
The selected tool analyzes the chest X-ray and returns image-derived evidence. Depending on the query, the tool may provide quantitative measurements, spatial observations, diagnostic criteria, conclusions, or annotated visual evidence shown directly on the image. These tools are implemented with CheXStruct, a deterministic pipeline built from clinically grounded criteria defined with radiologists.
The agent then generates its response using the evidence returned by the tools, without directly relying on the image itself. This makes the reasoning process more transparent and verifiable, and helps maintain coherent evidence-grounded reasoning across multi-turn interactions.
The examples illustrate two diagnostic scenarios: assessing inspiration adequacy and evaluating cardiomegaly using the cardiothoracic ratio (CTR). CXReasonAgent grounds its responses in image-derived diagnostic evidence and presents visual overlays for verification. In contrast, conventional LVLMs either generate unsupported estimates or cannot provide visual evidence for verification.
We provide an interactive demo of CXReasonAgent that allows users to explore evidence-grounded diagnostic reasoning through multi-turn interactions with chest X-rays.
When you click the demo link, a page will appear with a “Visit Site” button. Click “Visit Site” to open the demo interface.
Once you enter the demo interface:
After selecting an image:
Note: The first response for a newly uploaded image may take a few seconds while the image is being processed.
You may try questions such as:
General diagnostic questions
Diagnostic evidence questions
Visual evidence requests
The demo currently supports the following image formats:
.jpg / .jpeg / .png
@article{lee2026cxreasonagent,
title={CXReasonAgent: Evidence-Grounded Diagnostic Reasoning Agent for Chest X-rays},
author={Lee, Hyungyung and Yoon, Hangyul and Choi, Edward},
journal={arXiv preprint arXiv:2602.23276},
year={2026}
}