Articles
Leveraging deep learning applied to chest radiograph images to identify individuals at high risk of chronic obstructive pulmonary disease: a retrospective model validation study
Saman Doroodgar Jorshery, Jay Chandra, Anika S Walia, et al
Lancet Digital Health 2025; 7: 100903
https://doi.org/10.1016/j.landig.2025.100903
Summary
Background
Chronic obstructive pulmonary disease (COPD) is a leading cause of mortality, yet early detection remains challenging. This study assessed whether deep learning applied to routine outpatient chest radiographs (CXRs) can identify individuals at high risk of incident COPD.
Methods
Using cancer screening trial data, we previously developed a convolutional neural network (CXR-Lung-Risk) to predict lung-related mortality from a CXR image. In this retrospective model validation study, we externally validated whether CXR-Lung-Risk was associated with incident COPD from routine CXRs. We identified outpatients without lung cancer, COPD, or emphysema who had a CXR taken from Jan 1, 2013, to Dec 31, 2014, at Massachusetts General or Brigham and Women’s Hospitals in Boston, MA, USA. The primary outcome was 6-year incident COPD. Discrimination was assessed using area under the receiver operating characteristic curve (AUC) compared with the TargetCOPD clinical risk score. All analyses were stratified by smoking status. A secondary analysis was conducted in the Project Baseline Health Study (PBHS) to test associations between CXR-Lung-Risk with pulmonary function and plasma protein abundance. The PBHS study is registered with ClinicalTrials.gov, NCT03154346.
Findings
The primary analysis consisted of data from 12 550 ever-smokers (mean age 62·4 years [SD 6·8], 6135 [48·9%] male, 6415 [51·1%] female) and 15 298 never-smokers (mean age 63·0 years [8·1], 6550 [42·8%] male, 8748 [57·2%] female). 1562 (12·4%) of 12 550 ever-smokers and 580 (3·8%) of 15 298 never-smokers developed COPD within 6 years. CXR-Lung-Risk had additive predictive value beyond the TargetCOPD score for 6-year incident COPD in both ever-smokers (CXR-Lung-Risk + TargetCOPD AUC 0·73 [95% CI 0·72–0·74] vsTargetCOPD alone AUC 0·66 [0·65–0·68], p<0·0001) and never-smokers (CXR-Lung-Risk + TargetCOPD AUC 0·70 [0·67–0·72] vs TargetCOPD alone AUC 0·60 [0·57–0·62], p<0·0001). In secondary analyses of 2097 individuals in the PBHS, CXR-Lung-Risk was associated with worse pulmonary function and with abundance of SCGB3A2 (secretoglobin family 3A member 2) and LYZ (lysozyme), proteins involved in pulmonary physiology.



Table 1. Cohort characteristics for Mass General Brigham patients
| Empty Cell | Ever-smokers (n=12 550) | Never-smokers (n=15 298) |
|---|---|---|
| Age | 62·4 (6·8) | 63·0 (8·1) |
| Sex | ||
| Female | 6415/12 550 (51⋅1%) | 8748/15 298 (57⋅2%) |
| Male | 6135/12 550 (48⋅9%) | 6550/15 298 (42⋅8%) |
| Race | ||
| Asian | 189/11 569 (1·6%) | 707/13 686 (5·2%) |
| Black | 874/11 569 (7·6%) | 1174/13 686 (8·6%) |
| Other | 19/11 569 (0·2%) | 57/13 686 (0·4%) |
| White | 10 487/11 569 (90·6%) | 11 748/13 686 (85·8%) |
| Hispanic ethnicity | 349/12 338 (2·8%) | 547/13 356 (4·0%) |
| Recent dyspnoea | 4519/12 550 (36·0%) | 6503/15 298 (42·5%) |
| Salbutamol use | 3723/12 550 (29·7%) | 3385/15 298 (22·1%) |
| Antibiotic use | 2378/12 550 (18·9%) | 2180/15 298 (14·3%) |
| Current smoker | 2366/12 550 (18·9%) | ·· |
| Smoking pack-years | 15·9 (20·0) | ·· |
| CXR-Lung-Risk, mean | 50·9 (5·7) | 49·3 (5·9) |
| TargetCOPD score | 0·14 (0·1) | 0·06 (0·05) |
| CXR report findings | ||
| Lung opacity | 2631/12 550 (21·0%) | 3024/15 298 (19·8%) |
| Atelectasis | 1374/12 550 (10·9%) | 1565/15 298 (10·2%) |
| Pneumothorax | 279/12 550 (2·2%) | 250/15 298 (1·6%) |
| Pneumonia | 1194/12 550 (9·5%) | 1397/15 298 (9·1%) |
| Oedema | 484/12 550 (3·9%) | 481/15 298 (3·1%) |
| Consolidation | 310/12 550 (2·5%) | 380/15 298 (2·5%) |
| Lung lesion | 1357/12 550 (10·8%) | 1830/15 298 (12·0%) |
| Prevalent asthma | 1338/12 550 (10·7%) | 2465/15 298 (16·1%) |
| 6-year COPD rate | 1562/12 550 (12·4%) | 580/15 298 (3·8%) |
| 6-year all-cause mortality | 944/12 219 (7·7%) | 1978/15 298 (12·9%) |
Data are n/N (%) or mean (SD). COPD=chronic obstructive pulmonary disease. CXR=chest radiograph.
Table 2. Discrimination for 6-year, 3-year, and 1-year incident COPD by baseline, CXR-Lung-Risk, and TargetCOPD models in all individuals, ever-smokers, and never-smokers, by model
| Empty Cell | Full cohort (N=27 848) | Ever-smokers (n=12 550) | Never-smokers (n=15 298) | ||
|---|---|---|---|---|---|
| AUC (95% CI) | AUC (95% CI) | p value∗ | AUC (95% CI) | p value∗ | |
| 6-year incident COPD | |||||
| Age, sex | 0·56 (0·54–0·58) | 0·53 (0·52–0·55) | <0·0001 | 0·60 (0·57–0·62) | 0·91 |
| Age, sex, smoking status | 0·64 (0·63–0·66) | 0·64 (0·63–0·66) | 0·01 | NA | NA |
| TargetCOPD | 0·66 (0·64–0·68) | 0·66 (0·65–0·68) | Ref | 0·60 (0·57–0·62) | Ref |
| CXR-Lung-Risk | 0·68 (0·66–0·70) | 0·67 (0·66–0·69) | 0·35 | 0·66 (0·64–0·68) | 0·0004 |
| CXR-Lung-Risk + TargetCOPD | 0·74 (0·72–0·76) | 0·73 (0·72–0·74) | <0·0001 | 0·70 (0·67–0·72) | <0·0001 |
| 3-year incident COPD | |||||
| Age, sex | 0·55 (0·54–0·57) | 0·54 (0·52–0·56) | <0·0001 | 0·59 (0·56–0·62) | 0·66 |
| Age, sex, smoking status | 0·64 (0·63–0·66) | 0·65 (0·63–0·67) | 0·003 | NA | NA |
| TargetCOPD | 0·69 (0·67–0·70) | 0·68 (0·66–0·70) | Ref | 0·58 (0·55–0·61) | Ref |
| CXR-Lung-Risk | 0·68 (0·67–0·69) | 0·67 (0·66–0·69) | 0·57 | 0·65 (0·63–0·68) | 0·0007 |
| CXR-Lung-Risk + TargetCOPD | 0·75 (0·74–0·76) | 0·74 (0·73–0·76) | <0·0001 | 0·69 (0·66–0·71) | <0·0001 |
| 1-year incident COPD | |||||
| Age, sex | 0·55 (0·54–0·56) | 0·53 (0·50–0·56) | <0·0001 | 0·60 (0·57–0·64) | 0·17 |
| Age, sex, smoking status | 0·64 (0·63–0·66) | 0·63 (0·61–0·66) | 0·004 | NA | NA |
| TargetCOPD | 0·70 (0·69–0·71) | 0·68 (0·65–0·70) | Ref | 0·56 (0·52–0·60) | Ref |
| CXR-Lung-Risk | 0·68 (0·67–0·70) | 0·67 (0·66–0·69) | 0·62 | 0·67 (0·64–0·70) | <0·0001 |
| CXR-Lung-Risk + TargetCOPD | 0·75 (0·74–0·77) | 0·73 (0·72–0·74) | <0·0001 | 0·69 (0·66–0·73) | <0·0001 |
AUC=area under the receiver operating characteristic curve. COPD=chronic obstructive pulmonary disorder. CXR=chest radiography. NA=not applicable.∗
p value for difference from TargetCOPD.
Table 3. Rates of 6-year COPD in ever-smokers and never-smokers by CXR-Lung-Risk and TargetCOPD binary high-risk groups
| Empty Cell | High risk (TargetCOPD >7·5%) | Low risk (TargetCOPD ≤7·5%) | Total | |||
|---|---|---|---|---|---|---|
| COPD events, n/N | COPD event rate, % (95% CI) | COPD events, n/N | COPD event rate, % (95% CI) | COPD events, n/N | COPD event rate, % (95% CI) | |
| Ever-smokers | ||||||
| High CXR-Lung-Risk | 549/1936 | 28·4% (26·4–30·2) | 128/934 | 13·7% (11·3–15·9) | 677/2870 | 23·6% (22·0–25·1) |
| Low CXR-Lung-Risk | 703/5832 | 12·0% (11·3–12·8) | 182/3848 | 4·7% (4·1–5·5) | 885/9680 | 9·1% (8·6–9·7) |
| Total | 1252/7768 | 16·1% (15·3–16·9) | 310/4782 | 6·5% (5·8–7·2) | 1562/12 550 | 12·4% (11·8–13·0) |
| Never-smokers | ||||||
| High CXR-Lung-Risk | 64/558 | 11·5% (8·8–14·4) | 136/2017 | 6·7% (5·7–7·9) | 200/2575 | 7·8% (6·8–8·9) |
| Low CXR-Lung-Risk | 165/2990 | 5·5% (4·7–6·4) | 215/9733 | 2·2% (1·9–2·5) | 380/12 723 | 3·0% (2·7–3·3) |
| Total | 229/3548 | 6·4% (5·7–7·3) | 351/11 750 | 3·0% (2·7–3·3) | 580/15 298 | 3·8% (3·5–4·1) |
COPD=chronic obstructive pulmonary disorder. CXR=chest radiography.
Interpretation
In this external validation, a deep-learning model applied to routine CXR images identified individuals at high risk of incident COPD, beyond known risk factors. Patients at high risk might benefit from diagnostic spirometry and subsequent preventive care.
Funding
Verily Life Sciences, San Francisco, California.