Research Letter
Artificial Intelligence and Pediatric Care
April 29, 2024
Identification of Human-Generated vs AI-Generated Research Abstracts by Health Care Professionals
Dennis Ren, Andrew James Tagg, Helena Wilcox, et al
JAMA Pediatr. Published online April 29, 2024. doi:10.1001/jamapediatrics.2024.0760
Health care professionals submit and review abstracts for academic conferences. As artificial intelligence (AI) is increasingly deployed in scientific research,1-3 this survey study investigated health care professionals’ ability to distinguish human-generated from AI-generated research abstracts.
Methods
Between August 1 and November 30, 2023, a web-based survey was distributed to health care professionals, who were recruited using snowball sampling, through email listservs, on social media, and at the 2023 Don’t Forget the Bubbles medical conference. The Children's National Hospital Institutional Review Board deemed this survey study exempt from review and from the need for written informed consent because no personal or identifiable data were collected and participation posed low risk. We followed the AAPOR reporting guideline.
Participants were presented with 4 research abstracts (the first 2 were generated by human researchers from Pediatric Academic Societies Meeting 2020; the last 2 were generated by ChatGPT 3.54 [OpenAI]) and asked to identify these abstracts’ origin. We also asked how participants made their determination.
An example prompt inputted into the chatbot to generate a research abstract was as follows: “Act as a globally renowned researcher preparing an abstract for the Pediatric Academic Societies meeting. Create an abstract of a study that evaluates the effect of IV magnesium sulfate on admissions for asthma exacerbations in the emergency department.” We incorporated into abstracts fictional numerical data where necessary. Otherwise, no additional edits were made.
The primary outcome was the health care professionals’ accuracy in identifying the origin of the abstracts. We used the chatbot to identify common themes in responses, which were also reviewed by the authors. Additionally, we gathered data about participants’ training level, confidence, prior experience evaluating abstracts, and ethical perspectives on AI use in research. Data analysis was performed with Microsoft Excel 2402 (Microsoft Corp).
Results
A total of 102 health care professionals participated, the majority of whom were in attending or consultant roles (59 [57.8%]). Participants identified the abstracts’ origin correctly 43.0% of the time, but accuracy ranged from 20.0% to 57.0% (Table). Sixty-eight participants (66.7%) reported prior experience reviewing abstracts. This group was less accurate than those without prior experience (39.7% vs 49.3%). No significant change in confidence levels was observed before and after completing the activity. Seventy-four participants (72.5%) believed using AI for research abstracts was ethical.
Comments on human-generated text noticed variations in sentence structures and stylistic elements. Some participants suspected AI-generated text to be repetitive or have clunky structure with poor syntax and disjointed connections between sentences or phrases. Others associated formatting of abstracts, such as bullet points, with AI-generated text. Participants believed unusual vocabulary or phrases were potential indicators of AI. Use of medical terminology and a more natural flow were considered to be characteristic of human-generated text. However, other participants ascribed to AI qualities that were also associated with human-generated abstracts. Many also admitted to relying on instinct and guesswork to make their determination.
Discussion
Participants’ ability to distinguish human-generated from AI-generated abstracts was limited regardless of their prior experience and training. The ethical use of AI in research and writing is still debated, although over 70% of survey participants believed AI was ethical to use in writing research abstracts.5 We have no reservations about using AI to generate abstracts or even full articles as long as the final product can be reviewed and edited. All scientific content warrants critical appraisal, regardless of its origin.
Study limitations include the sample not fully representing the global population of health care professionals due to self-selection. We were unable to assess the denominator of all professionals reached. Results are consistent with previous research findings, but the sample size may limit generalizability.6
This study highlights the difficulty health care professionals face in distinguishing between human-generated and AI-generated research abstracts. It underscores the importance of awareness and critical evaluation of scientific content in the era of advanced AI.