{"id":29960,"date":"2026-03-13T04:22:00","date_gmt":"2026-03-12T20:22:00","guid":{"rendered":"https:\/\/csccm.org.cn\/?p=29960"},"modified":"2026-03-13T05:48:01","modified_gmt":"2026-03-12T21:48:01","slug":"chest%e5%8f%91%e8%a1%a8%e8%bf%b0%e8%af%84%ef%bc%9a%e5%a4%a7%e8%af%ad%e8%a8%80%e6%a8%a1%e5%9e%8b%e4%bd%9c%e4%b8%ba%e5%86%99%e6%89%8b%ef%bc%9a%e5%88%9b%e6%96%b0%e6%8a%91%e6%88%96%e5%b1%80%e9%99%90","status":"publish","type":"post","link":"https:\/\/csccm.org.cn\/?p=29960","title":{"rendered":"[Chest\u53d1\u8868\u8ff0\u8bc4]\uff1a\u5927\u8bed\u8a00\u6a21\u578b\u4f5c\u4e3a\u5199\u624b\uff1a\u521b\u65b0\u6291\u6216\u5c40\u9650\uff1f\u6211\u4eec\u662f\u5426\u5df2\u7ecf\u51c6\u5907\u597d\u4ea4\u51fa\u624b\u4e2d\u7684\u7b14\u4e86\uff1f"},"content":{"rendered":"\n<p>E<strong>DITORIAL<\/strong><\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Large Language Models as Item Writer: Innovation or Limitation? Are We Ready to Hand Over the Pen?<\/h1>\n\n\n\n<h3 class=\"wp-block-heading\">Abdullah\u00a0Alismail<\/h3>\n\n\n\n<h3 class=\"wp-block-heading\">Chest 2025; 168: 1286-1287<\/h3>\n\n\n\n<p>The use of large language models and artificial intelligence (AI) in medical education has recently increased, not only among students, but also among faculty. Students use these tools to enhance their learning ability and comprehension of the subject, and faculty cite other reasons such as assisting them with their curriculum and course structure.<a href=\"https:\/\/journal.chestnet.org\/article\/S0012-3692(25)05148-7\/fulltext#\"><sup>1<\/sup><\/a><sup>,<\/sup><a href=\"https:\/\/journal.chestnet.org\/article\/S0012-3692(25)05148-7\/fulltext#\"><sup>2<\/sup><\/a>\u00a0Studies are encouraging faculty to stay up to date with these tools because changes are happening at a rapid pace and their use should be approached with caution.<a href=\"https:\/\/journal.chestnet.org\/article\/S0012-3692(25)05148-7\/fulltext#\"><sup>3<\/sup><\/a><sup>,<\/sup><a href=\"https:\/\/journal.chestnet.org\/article\/S0012-3692(25)05148-7\/fulltext#\"><sup>4<\/sup><\/a>\u00a0In this issue of\u00a0<em>CHEST<\/em>, Safadi et al<a href=\"https:\/\/journal.chestnet.org\/article\/S0012-3692(25)05148-7\/fulltext#\"><sup>5<\/sup><\/a>\u00a0conducted a study titled \u201cQuality of Human Expert vs Large Language Model-Generated Multiple-Choice Questions in the Field of Mechanical Ventilation.\u201d<\/p>\n\n\n\n<p>Safadi et al<a href=\"https:\/\/journal.chestnet.org\/article\/S0012-3692(25)05148-7\/fulltext#\"><sup>5<\/sup><\/a>&nbsp;conducted a noninferiority cross-sectional study in which they compared multiple choice questions (MCQs) written by human experts vs AI-generated MCQs. The topic used in this study (to generate the MCQs) was mechanical ventilation. The authors used an interesting approach to answer the following question: Are MCQs written by AI noninferior to those written by human experts?<\/p>\n\n\n\n<p>The authors followed an evidence-based method to writing examination questions by first establishing and identifying the learning objectives and outcomes (Table 1).<a href=\"https:\/\/journal.chestnet.org\/article\/S0012-3692(25)05148-7\/fulltext#\"><sup>5<\/sup><\/a>&nbsp;To accomplish this task, they identified the following subtopics of mechanical ventilation: equation of motion, Ohm\u2019s law, and Tau &amp; Auto PEEP.<\/p>\n\n\n\n<p>These objectives were then given to the 2 groups, the human experts and the AI tool. First, the human experts group were asked to write and create a total of 15 MCQs. Then the AI was asked to do the same. When it comes to AI, the authors used specific steps to achieve this task by using the \u201cprompting\u201d feature in AI.<a href=\"https:\/\/journal.chestnet.org\/article\/S0012-3692(25)05148-7\/fulltext#\"><sup>5<\/sup><\/a>&nbsp;This step is an important component in AI tools that has been raised recently in the literature.<a href=\"https:\/\/journal.chestnet.org\/article\/S0012-3692(25)05148-7\/fulltext#\"><sup>6<\/sup><\/a>&nbsp;In this study, the authors ensured that all agreed on a prompt template to guide the AI tool appropriately. The prompt and template included the learning objectives and identified the audience, who were fellows (physicians) in critical care medicine; the next prompt was to generate a question with 5 answer choices and to identify the correct answer.<\/p>\n\n\n\n<p>Safadi et al<a href=\"https:\/\/journal.chestnet.org\/article\/S0012-3692(25)05148-7\/fulltext#\"><sup>5<\/sup><\/a>&nbsp;asked 31 expert physician educators in the topic to respond to and evaluate the quality of the MCQs between the 2 methods.<a href=\"https:\/\/journal.chestnet.org\/article\/S0012-3692(25)05148-7\/fulltext#\"><sup>5<\/sup><\/a>&nbsp;The findings showed that both approaches were comparable, with no meaningful difference between the AI-written questions and the human experts\u2019 written questions, with a mean final score of \u20130.23 \u00b1 0.56;&nbsp;<em>P<\/em>&nbsp;= .67. An interesting component and finding in this study was the time needed to write these questions. The human experts took approximately 10 hours to write the questions, compared with only 1 hour with the AI, including prompt creation and refinement.<a href=\"https:\/\/journal.chestnet.org\/article\/S0012-3692(25)05148-7\/fulltext#\"><sup>5<\/sup><\/a>&nbsp;These findings were similar to those of another study by Cheung et al,<a href=\"https:\/\/journal.chestnet.org\/article\/S0012-3692(25)05148-7\/fulltext#\"><sup>7<\/sup><\/a>&nbsp;which also reflected shorter time with AI compared with humans (20 minutes vs 211 minutes).<a href=\"https:\/\/journal.chestnet.org\/article\/S0012-3692(25)05148-7\/fulltext#\"><sup>7<\/sup><\/a>Finding time to write MCQ questions for a course as a clinician educator is a challenge, in addition to the special training needed to master and achieve competence in MCQ writing. Furthermore, Safadi and colleagues used a modified Delphi process to set a 15% margin of the maximum score to assess the noninferiority margin.<a href=\"https:\/\/journal.chestnet.org\/article\/S0012-3692(25)05148-7\/fulltext#\"><sup>5<\/sup><\/a>&nbsp;Although their findings showed that they were comparable, the literature has shown mixed findings when it comes to the complexity, distracters, and depth of the questions when generated by AI compared with humans.<a href=\"https:\/\/journal.chestnet.org\/article\/S0012-3692(25)05148-7\/fulltext#\"><sup>8<\/sup><\/a><sup>,<\/sup><a href=\"https:\/\/journal.chestnet.org\/article\/S0012-3692(25)05148-7\/fulltext#\"><sup>9<\/sup><\/a><\/p>\n\n\n\n<p>Another outcome the authors evaluated was the ability to identify whether the question was written by AI or by a human. The findings showed that responders identified approximately 55% of the AI questions generated vs 51.8% of the human questions to be AI. Statistically, there was no significance difference and no prediction found to determine the type of method used to write the question.<a href=\"https:\/\/journal.chestnet.org\/article\/S0012-3692(25)05148-7\/fulltext#\"><sup>5<\/sup><\/a><\/p>\n\n\n\n<p>Although the use of AI might be intimidating to some educators, it is vital for clinicians and academic educators to catch the train before it leaves and to embrace these new tools, resources, and advancements. To that end, combining several key skills is recommended: (1) Competence in writing objectives and test questions; (2) choosing the best AI tool and model (eg, ChatGPT 3.5 vs GPT 5.0) for high reasoning (Safadi et al<a href=\"https:\/\/journal.chestnet.org\/article\/S0012-3692(25)05148-7\/fulltext#\"><sup>5<\/sup><\/a>&nbsp;used model o1-preview 2024-09-12), and (3) competence in AI prompt writing.<a href=\"https:\/\/journal.chestnet.org\/article\/S0012-3692(25)05148-7\/fulltext#\"><sup>6<\/sup><\/a><sup>,<\/sup><a href=\"https:\/\/journal.chestnet.org\/article\/S0012-3692(25)05148-7\/fulltext#\"><sup>10<\/sup><\/a>&nbsp;Also, adding a quality assurance step to validate the output of the AI tool by an expert will greatly ensure the validity and accuracy of the questions. These innovations could aid in providing faculty workload time for use in other areas, such as innovation, scholarship, pedagogy, and assessment.<a href=\"https:\/\/journal.chestnet.org\/article\/S0012-3692(25)05148-7\/fulltext#\"><sup>11<\/sup><\/a>&nbsp;Key components to consider in recent AI literature are the inability to form high-level questions, reasoning, ethical issues, and AI hallucination. To limit these issues, the use of proper prompts as mentioned above, and similar to the process used by Safadi and colleagues,<a href=\"https:\/\/journal.chestnet.org\/article\/S0012-3692(25)05148-7\/fulltext#\"><sup>5<\/sup><\/a>&nbsp;is an excellent starting point. The authors have provided the Python code that was used in this study as a supplement for future researchers\u2019 use. The future of AI and large language models use in medical education for faculty raises many questions that need answering. For example, with the advancement of AI use, how will its usage among faculty change their workload? Second, should faculty development courses or workshops implement an AI prompt training for writing MCQ questions? Third, if there were change and reduction within the workload, did scholarship and innovation increase? Or more clinical time? These are example questions for future researchers to address to answer the question: Are we ready to hand over the pen?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>EDITORIAL Large Language Models as Item Writer: Innovat [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[24,23],"tags":[],"_links":{"self":[{"href":"https:\/\/csccm.org.cn\/index.php?rest_route=\/wp\/v2\/posts\/29960"}],"collection":[{"href":"https:\/\/csccm.org.cn\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/csccm.org.cn\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/csccm.org.cn\/index.php?rest_route=\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/csccm.org.cn\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=29960"}],"version-history":[{"count":1,"href":"https:\/\/csccm.org.cn\/index.php?rest_route=\/wp\/v2\/posts\/29960\/revisions"}],"predecessor-version":[{"id":29961,"href":"https:\/\/csccm.org.cn\/index.php?rest_route=\/wp\/v2\/posts\/29960\/revisions\/29961"}],"wp:attachment":[{"href":"https:\/\/csccm.org.cn\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=29960"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/csccm.org.cn\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=29960"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/csccm.org.cn\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=29960"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}