Invited Commentary
November 15, 2023
Who Looks Like a Surgeon?—Evaluating Reflections From a Digital Mirror
Rebecca Sorber, M. Libby Weaver
JAMA Surg. Published online November 15, 2023. doi:10.1001/jamasurg.2023.5705
As the number of publicly available tools using artificial intelligence (AI) has rapidly expanded, innumerable attempts to find beneficial applications for such tools in medicine follow close behind.1 Perhaps one of the most promising uses for such natural language processing systems was as a tool to address systemic bias in academic medicine by eliminating human subjectivity in decision-making. Some hoped that a computer-derived platform would bring objectivity with respect to such tasks such as writing letters of recommendation, selection of job candidates, and evaluation of performance.2
However, collective experience with AI platforms in practice has quickly shown that objectivity is not inherent in their design. The algorithms these tools use are only as objective as the input data upon which they are trained, meaning that unless data inputs are meaningfully curated, these tools will merely perpetuate biases that are already existing and documented.2,3 Using a chatbot to automate letters of recommendation for identical candidates with typical male and female names pursuing the same faculty position manifests gendered, biased phrasing already known to exist in many surgical letters of recommendation.4 Rather than creating a beacon of objectivity, we seem to have created a digital mirror or, worse, an echo chamber with the potential to not just reflect but amplify human subjectivity in medicine.
In their article, Ali et al5 describe a distinct type of AI tool, text-to-image generators, and their propensity to reflect long-standing gender and racial stereotypes in surgeon representation. While 2 of the 3 AI image generators seemed to amplify societal bias, demonstrating over 98% of surgeons as white males, the third, while generating more diverse images, still underrepresented the true gender and racial diversity of trainees in surgery. The authors’ takeaway is that feedback mechanisms and curating data sets used to train AI-based models are needed in order to reduce this phenomenon of bias amplification. This concept of bias mitigation though precise, staged training of AI models is one that is both well documented and highly relevant to medical applications.6
The authors’ present work emphasizes the need for intentional, cautious stewardship of such AI tools as they become increasingly ubiquitous in surgery and medicine. The potential for bias amplification has implications for physicians as well as patients if we fail to address biased data inputs and continue to indiscriminately apply AI constructs to patient care. Algorithmic unfairness resulting from use of open-source, minimally curated AI models has no place in medical training or in medical practice; the stakes are simply too high.