Brief Summary

This study will evaluate the effect of providing access to GPT-4, a large language model, compared to traditional diagnostic decision support tools on performance on case-based diagnostic reasoning tasks.

Trial Health

On Track

Trial Health Score

Automated assessment based on enrollment pace, timeline, and geographic reach

Enrollment

participants targeted

Target at P25-P50 for not_applicable

Timeline

Completed

Started Nov 2023

Shorter than P25 for not_applicable

Geographic Reach

1 country

1 active site

Status

completed

Health score is calculated from publicly available data and should be used for screening purposes only.

Trial Relationships

Click on a node to explore related trials.

Study Timeline

Key milestones and dates

1 month study duration

First Submitted

Initial submission to the registry

November 27, 2023

Completed

2 days until next milestone

Study Start

First participant enrolled

November 29, 2023

Completed

7 days until next milestone

First Posted

Study publicly available on registry

December 6, 2023

Completed

24 days until next milestone

Primary Completion

Last participant's last visit for primary outcome

December 30, 2023

Completed

Same day until next milestone

Study Completion

Last participant's last visit for all outcomes

December 30, 2023

Completed

Last Updated

February 20, 2024

Status Verified

February 1, 2024

Enrollment Period

1 month

First QC Date

November 27, 2023

Last Update Submit

February 15, 2024

Conditions

Diagnosis

Keywords

Computer-assisted diagnosisLarge language modelsclinical reasoning

Outcome Measures

Primary Outcomes (1)

Diagnostic reasoning
The primary outcome will be the percent correct (range: 0 to 100) for each case. For each case, participants will be asked for three top diagnoses and findings from the case that support that diagnosis and oppose that diagnosis. Participants will receive 1 point for each plausible diagnosis. Findings supporting the diagnosis and findings opposing the diagnosis will also be graded based on correctness, with 1 point for partially correct and 2 points for completely correct responses. Participants will then be asked to name their top diagnosis, earning one point for a reasonable response and two points for the most correct response. Finally participants will be asked to name up to 3 next steps to further evaluate the patient with one point awarded for a partially correct response and two points for a completely correct response. The primary outcome will be compared on the case-level by the randomized groups.
During evaluation

Secondary Outcomes (1)

Time Spent on Diagnosis
During evaluation

Study Arms (2)

GPT-4

ACTIVE COMPARATOR

Group will be given access to GPT-4.

Other: GPT-4

Usual resources

NO INTERVENTION

Group will not be given access to GPT-4 but will be encouraged to use any resources they wish besides large language models (UpToDate, Dynamed, google, etc).

Interventions

GPT-4OTHER

OpenAI's GPT-4 large language model with chat interface.

GPT-4

Eligibility Criteria

Sexall

Healthy VolunteersYes

Age GroupsChild (0-17), Adult (18-64), Older Adult (65+)

You may qualify if:

Participants must be licensed physicians and have completed at least post-graduate year 2 (PGY2) of medical training.
Training in Internal medicine, family medicine, or emergency medicine.

You may not qualify if:

Not currently practicing clinically.

Contact the study team to confirm eligibility.

Sponsors & Collaborators

Stanford Universitylead
Beth Israel Deaconess Medical Centercollaborator
University of Minnesotacollaborator

Study Sites (1)

Stanford University

Palo Alto, California, 94304, United States

Location

Related Publications (1)

Goh E, Gallo R, Hom J, Strong E, Weng Y, Kerman H, Cool JA, Kanjee Z, Parsons AS, Ahuja N, Horvitz E, Yang D, Milstein A, Olson APJ, Rodman A, Chen JH. Large Language Model Influence on Diagnostic Reasoning: A Randomized Clinical Trial. JAMA Netw Open. 2024 Oct 1;7(10):e2440969. doi: 10.1001/jamanetworkopen.2024.40969.
PMID: 39466245DERIVED

MeSH Terms

Conditions

Disease

Condition Hierarchy (Ancestors)

Pathologic ProcessesPathological Conditions, Signs and Symptoms

Study Officials

Jonathan H Chen, MD, PhD
Stanford University
PRINCIPAL INVESTIGATOR
Adam Rodman, MD
Beth Israel Deaconess Medical Center
PRINCIPAL INVESTIGATOR
Andrew Olson, MD
University of Minnesota
PRINCIPAL INVESTIGATOR

Study Design

Study Type: interventional
Phase: not applicable
Allocation: RANDOMIZED
Masking: SINGLE
Who Masked: OUTCOMES ASSESSOR
Masking Details: The grading of responses will be performed by assessors blinded to participant identity and treatment assignment.
Purpose: DIAGNOSTIC
Intervention Model: PARALLEL
Sponsor Type: OTHER
Responsible Party: PRINCIPAL INVESTIGATOR
PI Title: Assistant Professor of Medicine

Study Record Dates

First Submitted

November 27, 2023

First Posted

December 6, 2023

Study Start

November 29, 2023

Primary Completion

December 30, 2023

Study Completion

December 30, 2023

Last Updated

February 20, 2024

Record last verified: 2024-02

Data Sharing

IPD Sharing: Will not share

Locations

US(1)

Brief Summary

Trial Health

Trial Health Score

Trial Relationships

Related Scientific Literature

Study Timeline

First Submitted

Study Start

First Posted

Primary Completion

Study Completion

Conditions

Keywords

Outcome Measures

Primary Outcomes (1)

Secondary Outcomes (1)

Study Arms (2)

GPT-4

Usual resources

Interventions

Eligibility Criteria

You may qualify if:

You may not qualify if:

Sponsors & Collaborators

Study Sites (1)

Related Publications (1)

MeSH Terms

Conditions

Condition Hierarchy (Ancestors)

Study Officials

Study Design

Study Record Dates

Data Sharing

Locations