From Bedside to Bot-Side: Artificial Intelligence in Emergency Appendicitis Management

Koray Ersahin,Sebastian Sanduleanu,Sithin Thulasi Seetha,J. Bremm,Cavid Abbasli,Chantal Zimmer,Tim Damer,J. Kottlors,L. Goertz,C. Bruns,D. Maintz,N. Abdullayev

Published 2025 in Life

ABSTRACT

Introduction: Acute appendicitis (AA) is a common cause of abdominal pain that can lead to complications like perforation and intra-abdominal abscesses, increasing morbidity and mortality, often requiring emergency surgery. Nevertheless, appendectomy is performed in up to 95% of uncomplicated cases, while complications like perforation and intra-abdominal abscesses increase morbidity and mortality. The current study compares the accuracy of GPT-4.5, DeepSeek R1, and machine learning in assisting with surgical decision-making for patients presenting with lower abdominal pain at the Emergency Department. Methods: In this multicenter retrospective study, 63 histopathologically confirmed appendicitis patients and 50 control patients with right abdominal pain presenting at the Emergency Department at two German hospitals between October 2022 and October 2023 were included. Using each patient’s clinical, laboratory, and radiological findings, DeepSeek (with and without Retrieval-Augmented Generation using 2020 Jerusalem guidelines) was compared in terms of accuracy with GPT-4.5 and a random forest-based machine-learning model, with a board-certified surgeon (reference standard) to determine the optimal treatment approach (laparoscopic exploration/appendectomy versus conservative antibiotic therapy). Results: Accuracy of agreement with board-certified surgeons in the decision-making of appendectomy versus conservative therapy increased non-significantly from 80.5% to 83.2% with DeepSeek and from 70.8 to 76.1% when GPT-4.5 was provided with the World Journal of Emergency Surgery 2020 Jerusalem guidelines on the diagnosis and treatment of acute appendicitis. The estimated machine-learning model training accuracy was 84.3%, while the validation accuracy for the model was 85.0%. Discussion: GPT-4.5 and DeepSeek R1, as well as the machine-learning model, demonstrate promise in aiding surgical decision-making for appendicitis, particularly in resource-constrained settings. Ongoing training and validation are required to optimize the performance of such models.

PUBLICATION RECORD

Publication year
2025
Venue
Life
Publication date
2025-09-01
Fields of study
Medicine, Computer Science
Identifiers
DOI 10.3390/life15091387 PMID 41010329 PMCID 12470868
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Large language model-generated clinical practice guideline for appendicitis
2025cited by this paper
Bots in white coats: are large language models the future of patient education? A multicenter cross-sectional analysis
2025cited by this paper
DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning
2025cited by this paper
TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods
2024cited by this paper
Dr. Google to Dr. ChatGPT: assessing the content and quality of artificial intelligence-generated medical information on appendicitis
2024cited by this paper
DeepSeek-V3 Technical Report
2024cited by this paper
Feasibility of GPT-3.5 versus Machine Learning for Automated Surgical Decision-Making Determination: A Multicenter Study on Suspected Appendicitis
2024cited by this paper
Patient and Hospital Characteristics Associated With Delayed Diagnosis of Appendicitis.
2023cited by this paper
Artificial Intelligence and Acute Appendicitis: A Systematic Review of Diagnostic and Prognostic Models
2023cited by this paper
Diagnosis and treatment of appendicitis: systematic review and meta-analysis
2023cited by this paper
Exploring the role of an artificial intelligence chatbot on appendicitis management: an experimental study on ChatGPT
2023cited by this paper
Study of outcomes of perforated appendicitis in adults: a prospective cohort study
2023influential reference
Transparent reporting of multivariable prediction models developed or validated using clustered data: TRIPOD-Cluster checklist
2023cited by this paper
Transparent reporting of multivariable prediction models developed or validated using clustered data (TRIPOD-Cluster): explanation and elaboration
2023cited by this paper
The debate over understanding in AI’s large language models
2022cited by this paper
Diagnostic accuracy of computed tomography and ultrasound for the diagnosis of acute appendicitis: A systematic review and meta-analysis.
2022cited by this paper
Comparison of intra-abdominal abscess formation after laparoscopic and open appendectomy for complicated and uncomplicated appendicitis: a retrospective study
2021influential reference
Diagnosis and Management of Acute Appendicitis in Adults: A Review.
2021cited by this paper
Antibiotics versus Appendectomy for Acute Appendicitis — Longer-Term Outcomes
2021cited by this paper
Diagnosis and treatment of acute appendicitis: 2020 update of the WSES Jerusalem guidelines
2020cited by this paper
What Is Machine Learning: a Primer for the Epidemiologist.
2019cited by this paper
Nonoperative Management of Uncomplicated Appendicitis Among Privately Insured Patients
2019cited by this paper
Improving Language Understanding by Generative Pre-Training
2018cited by this paper
Laparoscopic Appendectomy Trends and Outcomes in the United States: Data from the Nationwide Inpatient Sample (NIS), 2004–2011
2014cited by this paper
The impact of postoperative abscess formation in perforated appendicitis.
2011cited by this paper
Laparoscopic versus conventional appendectomy - a meta-analysis of randomized controlled trials
2010cited by this paper
Management of Appendicitis Presenting with Abscess or Mass
2010influential reference
Gallstone disease
2007cited by this paper
Nonsurgical Treatment of Appendiceal Abscess or Phlegmon: A Systematic Review and Meta-analysis
2007cited by this paper
A practical score for the early diagnosis of acute appendicitis.
1986cited by this paper

CITED BY

No citing papers are available for this paper.