Development and validation of open-source deep neural networks for comprehensive chest x-ray reading: a retrospective, multicentre study

Yashin Dicente Cid, Matthew Macpherson, Louise Gervais-Andre, et al

Lancet Digital Health 2024; 6: E44-E57 Published:December 08, 2023

DOI: https://doi.org/10.1016/S2589-7500(23)00218-2

Summary

Background

Artificial intelligence (AI) systems for automated chest x-ray interpretation hold promise for standardising reporting and reducing delays in health systems with shortages of trained radiologists. Yet, there are few freely accessible AI systems trained on large datasets for practitioners to use with their own data with a view to accelerating clinical deployment of AI systems in radiology. We aimed to contribute an AI system for comprehensive chest x-ray abnormality detection.

Methods

In this retrospective cohort study, we developed open-source neural networks, X-Raydar and X-Raydar-NLP, for classifying common chest x-ray findings from images and their free-text reports. Our networks were developed using data from six UK hospitals from three National Health Service (NHS) Trusts (University Hospitals Coventry and Warwickshire NHS Trust, University Hospitals Birmingham NHS Foundation Trust, and University Hospitals Leicester NHS Trust) collectively contributing 2 513 546 chest x-ray studies taken from a 13-year period (2006–19), which yielded 1 940 508 usable free-text radiological reports written by the contemporary assessing radiologist (collectively referred to as the “historic reporters”) and 1 896 034 frontal images. Chest x-rays were labelled using a taxonomy of 37 findings by a custom-trained natural language processing (NLP) algorithm, X-Raydar-NLP, from the original free-text reports. X-Raydar-NLP was trained on 23 230 manually annotated reports and tested on 4551 reports from all hospitals. 1 694 921 labelled images from the training set and 89 238 from the validation set were then used to train a multi-label image classifier. Our algorithms were evaluated on three retrospective datasets: a set of exams sampled randomly from the full NHS dataset reported during clinical practice and annotated using NLP (n=103 328); a consensus set sampled from all six hospitals annotated by three expert radiologists (two independent annotators for each image and a third consultant to facilitate disagreement resolution) under research conditions (n=1427); and an independent dataset, MIMIC-CXR, consisting of NLP-annotated exams (n=252 374).

Findings

X-Raydar achieved a mean AUC of 0·919 (SD 0·039) on the auto-labelled set, 0·864 (0·102) on the consensus set, and 0·842 (0·074) on the MIMIC-CXR test, demonstrating similar performance to the historic clinical radiologist reporters, as assessed on the consensus set, for multiple clinically important findings, including pneumothorax, parenchymal opacification, and parenchymal mass or nodules. On the consensus set, X-Raydar outperformed historical reporter balanced accuracy with significance on 27 of 37 findings, was non-inferior on nine, and inferior on one finding, resulting in an average improvement of 13·3% (SD 13·1) to 0·763 (0·110), including a mean 5·6% (13·2) improvement in critical findings to 0·826 (0·119).

Interpretation

Our study shows that automated classification of chest x-rays under a comprehensive taxonomy can achieve performance levels similar to those of historical reporters and exhibit robust generalisation to external data. The open-sourced neural networks can serve as foundation models for further research and are freely available to the research community.

Funding

Wellcome Trust.

作者: dubin98

该日志由 dubin98 于3月前发表在时讯速递, 进展交流分类下，
转载请注明: [Lancet Digital Health发表论文]：开源深度神经网络进行胸部X片综合阅片的开发与验证 | 中国病理生理学会危重病医学专业委员会 +复制链接

【上篇】[Intensive Care Med发表论文]：医务人员和接受机械通气患者使用语言的一致性与患者约束的相关性
【下篇】[ICU Management & Practice]: 重新定义ARDS

抱歉!评论已关闭.