NCT07012577

Brief Summary

Primary Goal: This study aims to evaluate the diagnostic and therapeutic accuracy of GPT-4 (an advanced AI language model) compared to three orthopedic surgeons with varying experience levels in cases of failed or painful total hip arthroplasty. Key Research Questions: Diagnostic Accuracy: Does GPT-4 provide correct, partially correct, or incorrect diagnoses compared to human orthopaedic surgeons? Diagnostic Completeness: Are GPT-4's diagnostic suggestions complete, partially complete, or incomplete compared to those of orthopedic surgeons? Treatment Accuracy: Does GPT-4 recommend correct, partially correct, or incorrect treatments for failed hip arthroplasty? Treatment Completeness: Are GPT-4's treatment recommendations fully comprehensive, partially complete, or incomplete compared to those of orthopaedic surgeon? Study Design: Participants: 20 anonymized patient cases (ages 18-80) with failed or painful hip arthroplasties, treated at IRCCS Istituto Ortopedico Rizzoli (Bologna, Italy) between 2004-2024. Cases were selected based on clear diagnostic and treatment records (no ambiguous or incomplete data). Comparison Groups: GPT-4 (via ChatGPT interface) Three orthopedic doctors (with different experience levels: resident, specialist, senior surgeon) Method: Each case (clinical summary + X-ray image) is presented to GPT-4 and the three doctors. They must provide a diagnosis and treatment recommendations. Two independent evaluators (principal investigator + department head) blindly assess responses for correctness and completeness using a 3-point scale (0=wrong/incomplete, 2=correct/complete). Statistical analysis compares GPT-4 vs. human performance. Expected Outcomes: Determine if AI can match or outperform doctors in diagnosing and treating hip arthroplasty failures. Assess whether GPT-4 could serve as a supplementary tool in orthopedic decision-making. Ethical \& Privacy Considerations: No real-time patient data is used-only anonymized past cases. No personal/sensitive data is shared with OpenAI (GPT-4 is used via a standard web interface). Study complies with GDPR, HIPAA, and ethical AI guidelines. Timeline: Study duration: \~8 months (from ethics approval to final analysis). Results will be published regardless of outcome. Why This Study Matters: First study evaluating GPT-4's role in complex orthopedic diagnostics. Could influence future AI-assisted clinical decision-making in joint replacement surgeries.

Trial Health

57
Monitor

Trial Health Score

Automated assessment based on enrollment pace, timeline, and geographic reach

Trial has exceeded expected completion date
Enrollment
20

participants targeted

Target at below P25 for all trials

Timeline
Completed

Started May 2025

Shorter than P25 for all trials

Geographic Reach
1 country

1 active site

Status
recruiting

Health score is calculated from publicly available data and should be used for screening purposes only.

Trial Relationships

Click on a node to explore related trials.

Study Timeline

Key milestones and dates

First Submitted

Initial submission to the registry

May 31, 2025

Completed
Same day until next milestone

Study Start

First participant enrolled

May 31, 2025

Completed
10 days until next milestone

First Posted

Study publicly available on registry

June 10, 2025

Completed
20 days until next milestone

Primary Completion

Last participant's last visit for primary outcome

June 30, 2025

Completed
1 day until next milestone

Study Completion

Last participant's last visit for all outcomes

July 1, 2025

Completed
Last Updated

June 18, 2025

Status Verified

June 1, 2025

Enrollment Period

1 month

First QC Date

May 31, 2025

Last Update Submit

June 14, 2025

Conditions

Keywords

Artificial intelligencetotal hip arthroplastydiagnosistreatment

Outcome Measures

Primary Outcomes (4)

  • Diagnostic correctness

    Proportion of fully correct diagnoses (score=2) by each rater, Scale 0 (worst outcome) - 2 (best outcome). 0: incorrect, 1: imprecise, 2: correct

    Immediate (post-case evaluation)

  • Diagnostic completeness

    Proportion of fully complete diagnoses (score=2). Scale 0 (worst outcome) - 2 (best outcome). 0: incomplete, 1: partially complete, 2: complete

    Immediate (post-case evaluation)

  • Treatment recommendation correctness

    Proportion of fully correct treatments (score=2) by each rater. Scale 0 (worst outcome) - 2 (best outcome). 0: incorrect, 1: imprecise, 2: correct

    Immediate (post-case evaluation)

  • Treatmetn recommendation completeness

    Proportion of fully complete treatments (score=2). Scale 0 (worst outcome) - 2 (best outcome). 0: incomplete, 1: partially complete, 2: complete

    Immediate (post-case evaluation)

Study Arms (1)

Failed or Painful Total Hip Arthroplasty Patients

Patients with documented failed/painful THA (aseptic loosening, infection, fracture, etc.) selected from a tertiary center database (2004-2024).

Other: GPT-4 AssessmentOther: Arthroplasty Fellow AssessmentOther: Specializing Resident (4th year) AssessmentOther: Junior Resident (3rd year) Assessment

Interventions

Diagnostic/Prognostic evaluation of any single case provided by AI (GPT-4). GPT-4 provides diagnosis/treatment recommendations via standardized prompts

Failed or Painful Total Hip Arthroplasty Patients

Diagnostic/Prognostic evaluation of any single case provided by an human expert

Failed or Painful Total Hip Arthroplasty Patients

Diagnostic/Prognostic evaluation of any single case provided by an human expert

Failed or Painful Total Hip Arthroplasty Patients

Diagnostic/Prognostic evaluation of any single case provided by an human expert

Failed or Painful Total Hip Arthroplasty Patients

Eligibility Criteria

Age18 Years - 80 Years
Sexall
Healthy VolunteersNo
Age GroupsAdult (18-64), Older Adult (65+)
Sampling MethodProbability Sample
Study Population

This study evaluated 20 anonymized cases of failed/painful total hip arthroplasty (2004-2024) from a tertiary center. Cases were selected for diagnostic clarity (e.g., aseptic loosening, periprosthetic fracture) and excluded if records were incomplete or ambiguous. Two senior surgeons verified case eligibility. Each case was assessed by GPT-4, an arthroplasty fellow, a 4th-year resident, and a 3rd-year resident for diagnostic/therapeutic accuracy.

You may qualify if:

  • Adults (≥18 and ≤80 years old).
  • Documented painful or failed total hip arthroplasty requiring clinical/radiological evaluation (2004-2024).
  • Complete pre-operative clinical history, imaging (X-ray/tomography), and surgical reports.
  • Clear diagnosis of failure mode (e.g., aseptic loosening, infection, fracture, wear).
  • Treatment and outcomes fully documented in the institutional database.
  • "Exemplary" cases with minimal diagnostic ambiguity (per Engh/MusculoSkleletal Infection Society criteria, etc.).

You may not qualify if:

  • total hip arthroplasty with no documented failure/pain (well-functioning implants).
  • Incomplete clinical/radiological records (e.g., missing pre-operative imaging or surgical notes).
  • Complex/multifactorial failures (e.g., concurrent infection + loosening + fracture).
  • Radiographs/images non-interpretable (poor quality, missing views).
  • Cases with conflicting diagnoses/treatments in original records.

Contact the study team to confirm eligibility.

Sponsors & Collaborators

Study Sites (1)

SC Ortopedia e Traumatologia e Chirurgia Protesica e dei Reimpianti di Anca e Ginocchio, IRCCS Istituto Ortopedico Rizzoli

Bologna, 40136, Italy

RECRUITING

Related Publications (4)

  • Knee CJ, Campbell RJ, Graham DJ, Handford C, Symes M, Sivakumar BS. Examining the role of ChatGPT in the management of distal radius fractures: insights into its accuracy and consistency. ANZ J Surg. 2024 Jul-Aug;94(7-8):1391-1396. doi: 10.1111/ans.19143. Epub 2024 Jul 5.

    PMID: 38967407BACKGROUND
  • Dagher T, Dwyer EP, Baker HP, Kalidoss S, Strelzow JA. "Dr. AI Will See You Now": How Do ChatGPT-4 Treatment Recommendations Align With Orthopaedic Clinical Practice Guidelines? Clin Orthop Relat Res. 2024 Dec 1;482(12):2098-2106. doi: 10.1097/CORR.0000000000003234. Epub 2024 Sep 6.

    PMID: 39246048BACKGROUND
  • Artioli E, Veronesi F, Mazzotti A, Brogini S, Zielli SO, Giavaresi G, Faldini C. Assessing ChatGPT responses to common patient questions regarding total ankle arthroplasty. J Exp Orthop. 2024 Dec 31;12(1):e70138. doi: 10.1002/jeo2.70138. eCollection 2025 Jan.

    PMID: 39741912BACKGROUND
  • Pagano S, Strumolo L, Michalk K, Schiegl J, Pulido LC, Reinhard J, Maderbacher G, Renkawitz T, Schuster M. Evaluating ChatGPT, Gemini and other Large Language Models (LLMs) in orthopaedic diagnostics: A prospective clinical study. Comput Struct Biotechnol J. 2024 Dec 26;28:9-15. doi: 10.1016/j.csbj.2024.12.013. eCollection 2025.

    PMID: 39850460BACKGROUND

MeSH Terms

Conditions

Disease

Interventions

Restraint, Physical

Condition Hierarchy (Ancestors)

Pathologic ProcessesPathological Conditions, Signs and Symptoms

Intervention Hierarchy (Ancestors)

Behavior ControlTherapeuticsImmobilizationInvestigative Techniques

Study Officials

  • Francesco Castagnini, MD

    IRCCS Istituto Ortopedico Rizzoli

    PRINCIPAL INVESTIGATOR

Central Study Contacts

Francesco Castagnini, MD

CONTACT

Study Design

Study Type
observational
Observational Model
COHORT
Time Perspective
RETROSPECTIVE
Sponsor Type
OTHER
Responsible Party
PRINCIPAL INVESTIGATOR
PI Title
Principal investigator

Study Record Dates

First Submitted

May 31, 2025

First Posted

June 10, 2025

Study Start

May 31, 2025

Primary Completion

June 30, 2025

Study Completion

July 1, 2025

Last Updated

June 18, 2025

Record last verified: 2025-06

Data Sharing

IPD Sharing
Will share

Rater evaluations on clinical cases; no direct patient data are included; all datasets are anonymized and do not allow identification.

Shared Documents
STUDY PROTOCOL, SAP, CSR
Time Frame
Immediately upon publication. Access limit: indefinitely
Access Criteria
With researchers, clinicians, or institutions upon request or directly via public repository.
More information

Locations