CXReasonAgent: Evidence-Grounded
Diagnostic Reasoning Agent for Chest X-rays

Hyungyung Lee, Hangyul Yoon, Edward Choi

KAIST AI Graduate School

Contact: Hyungyung Lee (ttumyche@kaist.ac.kr)

Overview

Chest X-ray interpretation is a multi-step diagnostic reasoning process that involves identifying anatomical regions, deriving measurements or spatial observations from the image, and applying diagnostic criteria. For diagnostic assistants to be reliable in clinical practice, their reasoning must therefore be grounded in verifiable diagnostic evidence derived from the image.

However, recent studies show that large vision-language models (LVLMs) often generate plausible but ungrounded responses that are not faithfully supported by diagnostic evidence in the image. In addition, LVLMs typically present reasoning only through textual explanations, making it difficult to verify how conclusions are derived from the image. Moreover, extending LVLMs to support diverse diagnostic tasks often requires costly retraining.

To address these limitations, we introduce CXReasonAgent, a diagnostic agent that integrates a large language model (LLM) with clinically grounded diagnostic tools. Instead of directly generating answers, the agent calls diagnostic tools that extract image-derived diagnostic evidence, including quantitative measurements and spatial observations, along with visual evidence presented on the image. The agent then produces responses grounded in this explicit diagnostic evidence.

To evaluate evidence-grounded diagnostic reasoning, we introduce CXReasonDial, a multi-turn dialogue benchmark containing 1,946 dialogues across 12 diagnostic tasks. The benchmark evaluates whether model responses are correctly grounded in diagnostic evidence across dialogue turns, reflecting the iterative nature of real clinical reasoning.

Key Contributions

How It Works

CXReasonAgent performs evidence-grounded diagnostic reasoning by combining an LLM with clinically grounded diagnostic tools. Given a user query and a chest X-ray, the agent identifies the requested diagnostic task, calls the appropriate tool to obtain image-derived evidence, and generates a response grounded in the returned evidence. This design supports reliable, verifiable, and coherent multi-turn diagnostic interactions.

CXReasonAgent framework

Step 1. Interpret the Query and Plan Tool Use

The agent first interprets the user query to identify the requested diagnostic task and the type of evidence needed. Queries may ask for diagnostic evidence such as measurements or spatial observations, or for visual evidence that presents this information directly on the image. Based on this interpretation, the agent selects the appropriate diagnostic tool.

Step 2. Execute Clinically Grounded Diagnostic Tools

The selected tool analyzes the chest X-ray and returns image-derived evidence. Depending on the query, the tool may provide quantitative measurements, spatial observations, diagnostic criteria, conclusions, or annotated visual evidence shown directly on the image. These tools are implemented with CheXStruct, a deterministic pipeline built from clinically grounded criteria defined with radiologists.

Step 3. Generate Evidence-Grounded Responses

The agent then generates its response using the evidence returned by the tools, without directly relying on the image itself. This makes the reasoning process more transparent and verifiable, and helps maintain coherent evidence-grounded reasoning across multi-turn interactions.

Dialogue Examples

Results figure

The examples illustrate two diagnostic scenarios: assessing inspiration adequacy and evaluating cardiomegaly using the cardiothoracic ratio (CTR). CXReasonAgent grounds its responses in image-derived diagnostic evidence and presents visual overlays for verification. In contrast, conventional LVLMs either generate unsupported estimates or cannot provide visual evidence for verification.

Demo

We provide an interactive demo of CXReasonAgent that allows users to explore evidence-grounded diagnostic reasoning through multi-turn interactions with chest X-rays.

How to Access the Demo

When you click the demo link, a page will appear with a “Visit Site” button. Click “Visit Site” to open the demo interface.

How to Use the Demo

Once you enter the demo interface:

After selecting an image:

  1. Type your question in the chat box
  2. Press Enter to start the conversation

Note: The first response for a newly uploaded image may take a few seconds while the image is being processed.

Example Questions

You may try questions such as:

General diagnostic questions

Diagnostic evidence questions

Visual evidence requests

Supported Image Formats

The demo currently supports the following image formats:

.jpg / .jpeg / .png

Usage Limits

Citation

@article{lee2026cxreasonagent,
  title={CXReasonAgent: Evidence-Grounded Diagnostic Reasoning Agent for Chest X-rays},
  author={Lee, Hyungyung and Yoon, Hangyul and Choi, Edward},
  journal={arXiv preprint arXiv:2602.23276},
  year={2026}
}