Observational Study on AI Accuracy in Diagnosing and Treating Failed or Painful Hip Arthroplasty
PAINGPT
Observational Study on the Accuracy and Completeness of General Artificial Intelligence in the Diagnosis and Therapeutic Recommendations for Failed or Painful Total Hip Arthroplasty
1 other identifier
observational
20
1 country
1
Brief Summary
Primary Goal: This study aims to evaluate the diagnostic and therapeutic accuracy of GPT-4 (an advanced AI language model) compared to three orthopedic surgeons with varying experience levels in cases of failed or painful total hip arthroplasty. Key Research Questions: Diagnostic Accuracy: Does GPT-4 provide correct, partially correct, or incorrect diagnoses compared to human orthopaedic surgeons? Diagnostic Completeness: Are GPT-4's diagnostic suggestions complete, partially complete, or incomplete compared to those of orthopedic surgeons? Treatment Accuracy: Does GPT-4 recommend correct, partially correct, or incorrect treatments for failed hip arthroplasty? Treatment Completeness: Are GPT-4's treatment recommendations fully comprehensive, partially complete, or incomplete compared to those of orthopaedic surgeon? Study Design: Participants: 20 anonymized patient cases (ages 18-80) with failed or painful hip arthroplasties, treated at IRCCS Istituto Ortopedico Rizzoli (Bologna, Italy) between 2004-2024. Cases were selected based on clear diagnostic and treatment records (no ambiguous or incomplete data). Comparison Groups: GPT-4 (via ChatGPT interface) Three orthopedic doctors (with different experience levels: resident, specialist, senior surgeon) Method: Each case (clinical summary + X-ray image) is presented to GPT-4 and the three doctors. They must provide a diagnosis and treatment recommendations. Two independent evaluators (principal investigator + department head) blindly assess responses for correctness and completeness using a 3-point scale (0=wrong/incomplete, 2=correct/complete). Statistical analysis compares GPT-4 vs. human performance. Expected Outcomes: Determine if AI can match or outperform doctors in diagnosing and treating hip arthroplasty failures. Assess whether GPT-4 could serve as a supplementary tool in orthopedic decision-making. Ethical \& Privacy Considerations: No real-time patient data is used-only anonymized past cases. No personal/sensitive data is shared with OpenAI (GPT-4 is used via a standard web interface). Study complies with GDPR, HIPAA, and ethical AI guidelines. Timeline: Study duration: \~8 months (from ethics approval to final analysis). Results will be published regardless of outcome. Why This Study Matters: First study evaluating GPT-4's role in complex orthopedic diagnostics. Could influence future AI-assisted clinical decision-making in joint replacement surgeries.
Trial Health
Trial Health Score
Automated assessment based on enrollment pace, timeline, and geographic reach
participants targeted
Target at below P25 for all trials
Started May 2025
Shorter than P25 for all trials
1 active site
Health score is calculated from publicly available data and should be used for screening purposes only.
Trial Relationships
Click on a node to explore related trials.
Study Timeline
Key milestones and dates
First Submitted
Initial submission to the registry
May 31, 2025
CompletedStudy Start
First participant enrolled
May 31, 2025
CompletedFirst Posted
Study publicly available on registry
June 10, 2025
CompletedPrimary Completion
Last participant's last visit for primary outcome
June 30, 2025
CompletedStudy Completion
Last participant's last visit for all outcomes
July 1, 2025
CompletedJune 18, 2025
June 1, 2025
1 month
May 31, 2025
June 14, 2025
Conditions
Keywords
Outcome Measures
Primary Outcomes (4)
Diagnostic correctness
Proportion of fully correct diagnoses (score=2) by each rater, Scale 0 (worst outcome) - 2 (best outcome). 0: incorrect, 1: imprecise, 2: correct
Immediate (post-case evaluation)
Diagnostic completeness
Proportion of fully complete diagnoses (score=2). Scale 0 (worst outcome) - 2 (best outcome). 0: incomplete, 1: partially complete, 2: complete
Immediate (post-case evaluation)
Treatment recommendation correctness
Proportion of fully correct treatments (score=2) by each rater. Scale 0 (worst outcome) - 2 (best outcome). 0: incorrect, 1: imprecise, 2: correct
Immediate (post-case evaluation)
Treatmetn recommendation completeness
Proportion of fully complete treatments (score=2). Scale 0 (worst outcome) - 2 (best outcome). 0: incomplete, 1: partially complete, 2: complete
Immediate (post-case evaluation)
Study Arms (1)
Failed or Painful Total Hip Arthroplasty Patients
Patients with documented failed/painful THA (aseptic loosening, infection, fracture, etc.) selected from a tertiary center database (2004-2024).
Interventions
Diagnostic/Prognostic evaluation of any single case provided by AI (GPT-4). GPT-4 provides diagnosis/treatment recommendations via standardized prompts
Diagnostic/Prognostic evaluation of any single case provided by an human expert
Diagnostic/Prognostic evaluation of any single case provided by an human expert
Diagnostic/Prognostic evaluation of any single case provided by an human expert
Eligibility Criteria
This study evaluated 20 anonymized cases of failed/painful total hip arthroplasty (2004-2024) from a tertiary center. Cases were selected for diagnostic clarity (e.g., aseptic loosening, periprosthetic fracture) and excluded if records were incomplete or ambiguous. Two senior surgeons verified case eligibility. Each case was assessed by GPT-4, an arthroplasty fellow, a 4th-year resident, and a 3rd-year resident for diagnostic/therapeutic accuracy.
You may qualify if:
- Adults (≥18 and ≤80 years old).
- Documented painful or failed total hip arthroplasty requiring clinical/radiological evaluation (2004-2024).
- Complete pre-operative clinical history, imaging (X-ray/tomography), and surgical reports.
- Clear diagnosis of failure mode (e.g., aseptic loosening, infection, fracture, wear).
- Treatment and outcomes fully documented in the institutional database.
- "Exemplary" cases with minimal diagnostic ambiguity (per Engh/MusculoSkleletal Infection Society criteria, etc.).
You may not qualify if:
- total hip arthroplasty with no documented failure/pain (well-functioning implants).
- Incomplete clinical/radiological records (e.g., missing pre-operative imaging or surgical notes).
- Complex/multifactorial failures (e.g., concurrent infection + loosening + fracture).
- Radiographs/images non-interpretable (poor quality, missing views).
- Cases with conflicting diagnoses/treatments in original records.
Contact the study team to confirm eligibility.
Sponsors & Collaborators
Study Sites (1)
SC Ortopedia e Traumatologia e Chirurgia Protesica e dei Reimpianti di Anca e Ginocchio, IRCCS Istituto Ortopedico Rizzoli
Bologna, 40136, Italy
Related Publications (4)
Knee CJ, Campbell RJ, Graham DJ, Handford C, Symes M, Sivakumar BS. Examining the role of ChatGPT in the management of distal radius fractures: insights into its accuracy and consistency. ANZ J Surg. 2024 Jul-Aug;94(7-8):1391-1396. doi: 10.1111/ans.19143. Epub 2024 Jul 5.
PMID: 38967407BACKGROUNDDagher T, Dwyer EP, Baker HP, Kalidoss S, Strelzow JA. "Dr. AI Will See You Now": How Do ChatGPT-4 Treatment Recommendations Align With Orthopaedic Clinical Practice Guidelines? Clin Orthop Relat Res. 2024 Dec 1;482(12):2098-2106. doi: 10.1097/CORR.0000000000003234. Epub 2024 Sep 6.
PMID: 39246048BACKGROUNDArtioli E, Veronesi F, Mazzotti A, Brogini S, Zielli SO, Giavaresi G, Faldini C. Assessing ChatGPT responses to common patient questions regarding total ankle arthroplasty. J Exp Orthop. 2024 Dec 31;12(1):e70138. doi: 10.1002/jeo2.70138. eCollection 2025 Jan.
PMID: 39741912BACKGROUNDPagano S, Strumolo L, Michalk K, Schiegl J, Pulido LC, Reinhard J, Maderbacher G, Renkawitz T, Schuster M. Evaluating ChatGPT, Gemini and other Large Language Models (LLMs) in orthopaedic diagnostics: A prospective clinical study. Comput Struct Biotechnol J. 2024 Dec 26;28:9-15. doi: 10.1016/j.csbj.2024.12.013. eCollection 2025.
PMID: 39850460BACKGROUND
MeSH Terms
Conditions
Interventions
Condition Hierarchy (Ancestors)
Intervention Hierarchy (Ancestors)
Study Officials
- PRINCIPAL INVESTIGATOR
Francesco Castagnini, MD
IRCCS Istituto Ortopedico Rizzoli
Central Study Contacts
Study Design
- Study Type
- observational
- Observational Model
- COHORT
- Time Perspective
- RETROSPECTIVE
- Sponsor Type
- OTHER
- Responsible Party
- PRINCIPAL INVESTIGATOR
- PI Title
- Principal investigator
Study Record Dates
First Submitted
May 31, 2025
First Posted
June 10, 2025
Study Start
May 31, 2025
Primary Completion
June 30, 2025
Study Completion
July 1, 2025
Last Updated
June 18, 2025
Record last verified: 2025-06
Data Sharing
- IPD Sharing
- Will share
- Shared Documents
- STUDY PROTOCOL, SAP, CSR
- Time Frame
- Immediately upon publication. Access limit: indefinitely
- Access Criteria
- With researchers, clinicians, or institutions upon request or directly via public repository.
Rater evaluations on clinical cases; no direct patient data are included; all datasets are anonymized and do not allow identification.