Reply: challenges of applying large language models to image-based interpretation in abdominal radiology
PDF
Cite
Share
Request
Artificial Intelligence and Informatics - Letter to the Editor
E-PUB
3 November 2025

Reply: challenges of applying large language models to image-based interpretation in abdominal radiology

Diagn Interv Radiol . Published online 3 November 2025.
1. Ege University Faculty of Medicine, İzmir, Türkiye
2. Ege University Faculty of Medicine, Department of Radiology, İzmir, Türkiye
No information available.
No information available
Received Date: 02.10.2025
Accepted Date: 13.10.2025
E-Pub Date: 03.11.2025
PDF
Cite
Share
Request

Dear Editor,

We thank our colleagues for their valuable comments1 on our study.2 Below, we address the points raised in the context of our study’s aim and design choices, and we outline potential improvements for future research.

The primary aim of our work was to provide an objective baseline assessment of a general-purpose large language model (LLM) in the most straightforward and realistic “out-of-the-box,” browser-based scenario. This approach was intentionally chosen to highlight the current limitations of model architectures and training data, particularly the absence of radiology-specific pretraining. Recent reviews3-5 have highlighted that LLMs continue to face challenges in data scarcity, coarse visual embeddings, and limited explainability, all of which contribute to difficulties in capturing subtle signal or texture patterns.

We agree that radiologists base their decisions on volumetric, multiplanar, and multiphase image series. Our study, however, was designed as a standardized and ethically safe browser-based scenario to establish a “minimum requirement” baseline. Future studies will incorporate sequential and volumetric inputs, as well as multiphase evaluation. Of course, this will require LLMs to become technically capable of ingesting larger and more complex inputs.

To isolate image-based signal interpretation, patient history was excluded, allowing us to measure the model’s pure image-based performance. In clinical reality, the integration of imaging and history is essential. Yet, as Bulut et al.6 recently demonstrated, even when clinical findings were provided, overall accuracy remained low. Although their work focused on pneumothorax detection, these results still indicate that the performance of current models remains questionable, even in a clinical context.

Moving forward, we believe improvements should include: (i) the ingestion of sequential/volumetric and multiphase data; (ii) radiology-specific pretraining and/or adapter fine-tuning; (iii) structured prompt libraries and chain-of-thought reasoning; (iv) the integration of clinical metadata; and (v) blinded multicenter studies comparing different levels of radiologist expertise. Industry collaboration will be crucial to achieving these goals.

In conclusion, although many of the limitations highlighted by our colleagues were already acknowledged in our original manuscript, we view these comments as an opportunity to expand the scope of our subsequent studies and to establish a concrete roadmap for future research.

Conflict of interest disclosure

The authors declared no conflicts of interest.

References

1
Letter to the editor: challenges of applying large language models to image-based interpretation in abdominal radiology. Diagn Interv Radiol. Ahead of Print.
2
Elek A, Ekizalioğlu DD, Güler E. Evaluating Microsoft Bing with ChatGPT-4 for the assessment of abdominal computed tomography and magnetic resonance images. Diagn Interv Radiol. 2025;31(3):196-205.
3
Nam Y, Kim DY, Kyung S, et al. Multimodal large language models in medical imaging: current state and future directions. Korean J Radiol. 2025;26(10):900-923.
4
Zhang A, Zhao E, Wang R, Zhang X, Wang J, Chen E. Multimodal large language models for medical image diagnosis: challenges and opportunities. J Biomed Inform. 2025;169:104895.
5
Lanzafame LRM, Gulli C, Mazziotti S, et al. Chatbots in radiology: current applications, limitations and future directions of ChatGPT in medical imaging. Diagnostics (Basel). 2025;15(13):1635.
6
Bulut B, Öz M, Genç M, et al. New frontiers in radiologic interpretation: evaluating the effectiveness of large language models in pneumothorax diagnosis. PLoS One. 2025;20(9):e0331962.