Diagnostic Reasoning With Customized GPT-4 Model
Evaluating the Performance of LLMs and Clinicians in Complex Diagnostic Cases: A Randomized Controlled Trial
1 other identifier
interventional
70
1 country
1
Brief Summary
This study will assess the impact of immediate access to a customized version of GPT-4, a large language model, on performance in case-based diagnostic reasoning tasks. Specifically, it will compare this approach to a two-step process where participants first use traditional diagnostic decision support tools to support their diagnostic reasoning before gaining access to the customized GPT-4 model.
Trial Health
Trial Health Score
Automated assessment based on enrollment pace, timeline, and geographic reach
participants targeted
Target at P50-P75 for not_applicable
Started Dec 2024
Shorter than P25 for not_applicable
1 active site
Health score is calculated from publicly available data and should be used for screening purposes only.
Trial Relationships
Click on a node to explore related trials.
Study Timeline
Key milestones and dates
Study Start
First participant enrolled
December 16, 2024
CompletedPrimary Completion
Last participant's last visit for primary outcome
January 24, 2025
CompletedStudy Completion
Last participant's last visit for all outcomes
January 24, 2025
CompletedFirst Submitted
Initial submission to the registry
February 11, 2025
CompletedFirst Posted
Study publicly available on registry
April 4, 2025
CompletedApril 4, 2025
March 1, 2025
1 month
February 11, 2025
March 27, 2025
Conditions
Keywords
Outcome Measures
Primary Outcomes (1)
Diagnostic reasoning
The primary outcome will be the percentage of correct responses per case (range: 0 to 100). For each case, participants will be asked to provide their top three differential diagnoses, along with supporting and opposing findings for each. They will receive 1 point for each plausible diagnosis. Supporting and opposing findings will be graded based on correctness, with 1 point for a partially correct response and 2 points for a completely correct response. Participants will then select their top diagnosis, earning 1 point for a reasonable choice and 2 points for the most accurate diagnosis. Finally, they will list up to three next steps for further patient evaluation, with 1 point awarded for a partially correct response and 2 points for a completely correct response. The primary outcome will be analyzed at the case level, comparing performance between the randomized study groups.
Through study completion, an average of 6 months
Secondary Outcomes (5)
Time Spent Per Case
Through study completion, an average of 6 months
Prompt frequency
Through study completion, an average of 6 months
Sentiment
Through study completion, an average of 6 months
Participant Perceptions of AI in Clinical Reasoning
Through study completion, an average of 6 months
Customized GPT-4's diagnostic reasoning
Through study completion, an average of 6 months
Study Arms (2)
Immediate access to customized version of GPT-4
ACTIVE COMPARATORGroup will be encouraged to immediately use a customized version of GPT-4.
Conventional resources first, then granted access to customized version of GPT-4.
ACTIVE COMPARATORGroup will be encouraged to first use any resources they wish besides large language models (UpToDate, Pubmed, google, etc) and then will be granted access to a customized version of GPT-4.
Interventions
Group is given immediate access to a customized version of GPT-4 to support their diagnostic reasoning for each case.
Group is first encouraged to reason through diagnostic cases with the support of conventional resources. After they submit a case's answers they are then given access to a customized version of GPT-4 and have the opportunity to change their initial answers.
Eligibility Criteria
You may qualify if:
- Participants must be licensed physicians and have completed at least post-graduate year 1 (PGY1) of medical training.
- Training in Internal medicine, family medicine, or emergency medicine.
You may not qualify if:
- Not currently practicing clinically.
- Participated in one of our previous studies that used the same six diagnostic cases.
Contact the study team to confirm eligibility.
Sponsors & Collaborators
- Stanford Universitylead
- Beth Israel Deaconess Medical Centercollaborator
Study Sites (1)
Stanford University
Palo Alto, California, 94305, United States
MeSH Terms
Conditions
Condition Hierarchy (Ancestors)
Study Officials
- PRINCIPAL INVESTIGATOR
Jonathan H Chen, MD, PhD
Stanford University
Study Design
- Study Type
- interventional
- Phase
- not applicable
- Allocation
- RANDOMIZED
- Masking
- SINGLE
- Who Masked
- OUTCOMES ASSESSOR
- Masking Details
- The grading of responses will be performed by assessors blinded to participant identity and treatment assignment.
- Purpose
- DIAGNOSTIC
- Intervention Model
- PARALLEL
- Sponsor Type
- OTHER
- Responsible Party
- PRINCIPAL INVESTIGATOR
- PI Title
- Assistant Professor of Medicine (Biomedical Informatics) and of Biomedical Data Science
Study Record Dates
First Submitted
February 11, 2025
First Posted
April 4, 2025
Study Start
December 16, 2024
Primary Completion
January 24, 2025
Study Completion
January 24, 2025
Last Updated
April 4, 2025
Record last verified: 2025-03
Data Sharing
- IPD Sharing
- Will not share