JOURNAL ARTICLE
Generative Artificial Intelligence–based Surveillance for Avian Influenza Across a Statewide Healthcare System
Katherine E Goodman, Seyed M Shams, Laurence S Magder, et al
Clinical Infectious Diseases, Volume 81, Issue 5, 15 November 2025, Pages 900–903, https://doi.org/10.1093/cid/ciaf369
Abstract
Among all 2024 emergency department visits for acute respiratory illness or conjunctivitis across a statewide healthcare system (n = 13 494), generative artificial intelligence–based surveillance with adjunctive human review rapidly and cost-effectively identified patients with potential avian influenza exposure. Generative artificial intelligence–based approaches may offer a promising opportunity for expanding public health surveillance in clinical settings.
generative AI, LLMs, avian influenza, public health surveillance
Topic:
- conjunctivitis
- avian influenza
- population surveillance
- respiratory tract infections
- surveillance, medical
- health care systems
- generative artificial intelligence
Issue Section:
Brief Report > Emerging Infections
Collection: IDSA Journals
As avian influenza A(H5N1) circulates in US poultry, cattle, and other mammals [1], concern is mounting about H5N1's pandemic potential. US Centers for Disease Control and Prevention (CDC) guidance recommends clinicians consider avian influenza in patients showing signs and symptoms of acute respiratory illness or conjunctivitis who have recent high-risk contact with animals or animal products (eg, birds, livestock, unpasteurized milk) [2]. Unlike the mandated reporting of avian influenza infections, however, there is no system for monitoring whether clinicians ask patients about risk factors or perform recommended testing on high-risk patients (eg, subtyping of influenza A–positive results). These surveillance gaps may conceal a reservoir of undetected infections.
Generative artificial intelligence (AI)-based large language models (LLMs) excel on free-text data and offer novel opportunities for large-scale, automated review of clinical notes [3]. The objective of the current study was to (1) evaluate an LLM's ability to identify avian influenza risk factors from clinical notes and (2) demonstrate a real-world application of generative AI-based surveillance across a statewide healthcare system.
METHODS
We performed a cross-sectional analysis of adult emergency department (ED) visits across the University of Maryland Medical System (UMMS), an 11-hospital statewide system covering urban and rural areas that captures 25% of admissions in Maryland. This study used a 2-stage approach to evaluate LLM performance using gold-standard blinded review (evaluation cohort) and then apply the LLM to a real-world cohort of patients meeting CDC criteria for consideration of avian influenza (application cohort).
First, using 10 000 visits in 2022–2023 selected through stratified-random sampling (evaluation cohort), we evaluated the LLM for identifying CDC-endorsed avian influenza risk factors from ED provider notes. This cohort was enriched with 155 notes positive for agricultural keywords to ensure robust LLM evaluation (full sampling and prompt tuning methods in Supplement). Briefly, notes were extracted from the electronic health record (Epic) and deposited within a secure computing environment. Notes were then passed to the LLM (GPT-4-Turbo, temperature = 0) via Python using the OpenAI applied programming interface. The LLM was prompted to identify whether an ED note mentioned recent contact with animals or animal products relevant to avian influenza based on CDC guidance [2] and to output its determination, whether contact was affirmed or denied, contact type(s), and reasoning to a data file (full prompt in Supplement). In blinded review, K. G. reviewed 100 randomly sampled notes the LLM identified as mentioning relevant animal or animal product contact (positives) and 100 randomly sampled notes the LLM identified as not mentioning relevant animal or animal product contact (negatives), and positive (PPV) and negative predictive values (NPV) were calculated with Wilson 95% confidence intervals (CIs).
Second, to demonstrate a real-world application, we applied the LLM and prompt to all adult ED visits across UMMS between 1 January 2024 and 30 November 2024 with a chief complaint of acute respiratory illness or conjunctivitis, per CDC guidance (application cohort) [4, 5] (Supplementary eAppendix). For all included visits, influenza testing orders and results were identified from the electronic health record. This study was deemed exempt human subjects research by the University of Maryland School of Medicine institutional review board. To facilitate replication by other healthcare systems, the Supplement includes additional details and practical pointers.
RESULTS
The study included 10 000 ED visits in the 2022–2023 evaluation cohort (2% of 2022–2023 ED visits) and 13 494 ED visits in the 2024 application cohort restricting to visits with chief complaints of acute respiratory illness or conjunctivitis (7% of all January–November 2024 ED visits). Across all 23 494 included visits, patients had a median age of 46 years (interquartile range: 32–63) and 57% were female.
Figure 1A provides a sample of the LLM's output. In the evaluation cohort, the LLM flagged 265 visits (3%) as mentioning relevant contact with animals or animal products (263 as affirmed and 2 as mentioned-but-denied; further discussion of “mentioned-but-denied” results in Supplement). Overall, in the evaluation cohort the LLM achieved a PPV of 90% (95% CI: 83–94) and an NPV of 98% (95% CI: 93–99) for identifying animal or animal product contact (Supplementary eTable 1 describes the identified false negatives). Of the 100 manually reviewed LLM positives, 50 discussed animal or animal product contact that was both recent (≤10 days’ prior) and relevant to avian influenza risk per the prompt instructions (PPV for both recent and relevant animal or animal product contact: 50% [95% CI: 40–60]). Contact that the LLM had been instructed to treat as not relevant, but that was marked positive, included with dogs and ticks. Often, the LLM's reasoning acknowledged this internal conflict (eg, “‘Patient states he spent the night at a friend's house who has dogs.’ This indicates affirmed exposure to animals, but not the type relevant for avian influenza risk.”) In additional instances, a flagged contact was consistent with the prompt's instructions but the LLM's justification was overly speculative (eg, flagging poultry or egg consumption because “it was potentially undercooked”).

Figure 1. Provider notes from all 13 494 emergency department (ED) visits across the University of Maryland Medical System (UMMS) for visits between 1 January 2024–30 November 2024 with chief complaints of acute respiratory illness or conjunctivitis were processed by the LLM (16 h of processing time). Part (A) provides sample output of the LLM. Columns could be filtered (eg, restrict to “Mentioned = Yes” to facilitate timely human review). In the application cohort, 76 (0.6%) visits were flagged for mention of contact with animals and/or animal products relevant to avian influenza risk, with manual human review of the LLM output for the 76 flagged visits (26 min of manual review) to confirm recency and direct relevancy to avian influenza. This review winnowed the 76 flagged visits to 16 that mentioned recent, relevant animal or animal product contact, based on currently known risk factors (14 patients positive for contact and 2 patients who denied relevant contact, such as with farm animals). Part (B) provides an overview of the 14 symptomatic patients positive for high-risk contact by visit date, animal/animal product type, and influenza testing status. Eleven patients presented with acute respiratory illness and three patients presented with bilateral conjunctivitis. Abbreviation: vet. conf., veterinary conference.
Across the 13 494 ED visits for acute respiratory illness or conjunctivitis in the 2024 application cohort, the LLM flagged 76 (0.6%) as mentioning relevant contact with animals or animal products. After reviewing the LLM's output and cross-referencing to the full note as needed (review time = 26 minutes; mean 2.9 minutes/visit; performed by K. G.; Supplementary eTable 2), 16 (21% of flagged notes and 0.1% of all visits) were deemed as mentioning relevant contact that was directly relevant to avian influenza risk (Figure 1B): poultry farming/live chicken handling (n = 2); waterfowl (n = 2; duck hunting, wild turkey hunting); possible livestock exposure (n = 3; county fair, pet show, veterinary conference); close or prolonged contact with animal feces (n = 2; cow manure, cleaning a structure “filled with” unspecified animal feces); farms (n = 2); higher suspicion feline contact (n = 1; companion cat with contemporaneous respiratory symptoms); and high-risk occupation (n = 2; veterinarian, butcher). The remaining 2 of 16 notes, flagged as containing mentioned-but-denied contact, documented denials (eg, “[patient] denies being around animals, exposure to new/uncooked foods”).
Among the 14 patients positive for high-risk contact, 11 presented with acute respiratory illness and 3 with bilateral conjunctivitis. Ten (71%) received influenza testing, of whom 1 tested positive for influenza A (Figure 1B). No specimens were subtyped, and no patients received conjunctival swabbing.
Overall, LLM analysis took a median of 3.2 seconds per note and averaged $0.03/note in computing costs (16.3 hours and $404 for the application cohort's 13 494 visits).
DISCUSSION
Across a statewide healthcare system, an LLM had high performance for identifying ED provider notes mentioning animal and animal product contact but lower performance for identifying contacts specific to avian influenza risk. Nevertheless, executing the LLM on all ED visits for acute respiratory illness or conjunctivitis across our medical system in 2024, we identified 14 high-risk patients, including patients with conjunctivitis with waterfowl and poultry farm exposure. LLM review was rapid and cost-effective (3 seconds and 3 cents/note) and only required 26 minutes of human manual review across 13 494 visits. Based on these findings, generative AI-based approaches may offer a promising strategy for expanding public health surveillance in clinical settings. With the possibility of reduced federal coordination [6], other healthcare systems could consider trialing these methods.
Due to the LLM's lower PPV for animal/animal product contacts specific to avian influenza, LLM-flagged positives required manual review. Although not as rapid as full automation, given the LLM's high NPV, this hybrid approach still offered the ability to quickly screen thousands of visits to identify high-risk cases. Beyond retrospective surveillance, LLMs could support prospective surveillance to detect upticks in high-risk patients, cross-referenced against existing surveillance (in Maryland during the January–November 2024 study period, H5 influenza was detected in wastewater at levels similar to national trends [7], and H5N1 was presumptively identified in 1 backyard flock) [8]. Additionally, LLMs could support large-scale audit-and-feedback programs to remind providers to take exposure histories and document denials and alerts to prompt healthcare epidemiology teams to initiate testing and infection control cascades for high-risk patients [9].
Strengths of this study include its diverse statewide cohort and use of real patient notes. However, it is also subject to several limitations. First, extrapolating our findings to other clinical settings (eg, urgent care) would require further study. Second, given the rarity of high-risk animal/animal product contact, we selected notes for manual review based on the LLM's classification. Thus, we could not estimate sensitivity and specificity; NPV and PPV vary by exposure prevalence and may differ across settings (eg, the differing PPVs in the evaluation and application cohorts). Third, because the LLM could only “see” contact documented in notes and its sensitivity is unknown, LLM-flagged visits represent the lower boundary of high-risk patients. Finally, alternative prompts or newer (but pricier) advanced reasoning LLMs [10] may have improved performance. Increasing PPV may decrease NPV, however, leading to missed high-risk patients, especially as avian influenza risk factors expand [11]. Ultimately, we preferred the LLM to err toward assuming animal and animal product contacts were relevant and letting a human make the final determination, but other systems could strike a different balance.
As H5N1 continues to circulate, monitoring the proportion of symptomatic patients with potential H5N1 exposure is a pressing clinical and epidemiological need. Our findings suggest that LLMs can help address this unmet need and would be practical for real-world implementation. If deployed widely, these approaches could support a network of clinical sentinel surveillance sites.