NCT07037940

Brief Summary

Clinical decision support tools powered by artificial intelligence are being rapidly integrated into medical practice. Two leading systems currently available to clinicians are OpenEvidence, which uses retrieval-augmented generation to access medical literature, and GPT-4, a large language model. While both tools show promise, their relative effectiveness in supporting clinical decision-making has not been directly compared. This study aims to evaluate how these tools influence diagnostic reasoning and management decisions among internal medicine physicians.

Trial Health

87
On Track

Trial Health Score

Automated assessment based on enrollment pace, timeline, and geographic reach

Enrollment
27

participants targeted

Target at below P25 for not_applicable

Timeline
Completed

Started Jul 2025

Shorter than P25 for not_applicable

Geographic Reach
1 country

2 active sites

Status
completed

Health score is calculated from publicly available data and should be used for screening purposes only.

Trial Relationships

Click on a node to explore related trials.

Study Timeline

Key milestones and dates

First Submitted

Initial submission to the registry

June 17, 2025

Completed
9 days until next milestone

First Posted

Study publicly available on registry

June 26, 2025

Completed
7 days until next milestone

Study Start

First participant enrolled

July 3, 2025

Completed
6 months until next milestone

Primary Completion

Last participant's last visit for primary outcome

December 30, 2025

Completed
Same day until next milestone

Study Completion

Last participant's last visit for all outcomes

December 30, 2025

Completed
Last Updated

April 9, 2026

Status Verified

April 1, 2026

Enrollment Period

6 months

First QC Date

June 17, 2025

Last Update Submit

April 7, 2026

Conditions

Outcome Measures

Primary Outcomes (1)

  • Clinical Reasoning Performance as determined by Rater Scores

    Clinical reasoning performance will be evaluated based upon the accuracy of the rater scores to responses to the surveys administered. Six blinded, trained independent raters will independently score each participant's response using a validated scoring rubric. Possible response scores can range from 0-100% with higher scores indicating increased clinical reasoning performance. Results for each assessment will be summarized by study arm using basic descriptive statistics and analyzed using mixed-effects models to account for within-subject correlation and between-subject factors.

    15-minutes upon completion of cases, up to approximately 90 minutes total

Secondary Outcomes (2)

  • Time efficiency

    Up to approximately 75 minutes

  • Decision confidence

    15-minutes upon completion of cases, up to approximately 90 minutes total

Study Arms (2)

OpenEvidence

ACTIVE COMPARATOR

Participants in this arm will use OpenEvidence as their research tool

Other: OpenEvidence

ChatGPT

ACTIVE COMPARATOR

Patients in this arm will use Chat-GPT as their research tool

Other: GPT-4

Interventions

Medical information platform which uses retrieval-augmented generation to access medical literature

OpenEvidence
GPT-4OTHER

A chatbot application developed that uses GPT-4, a large language model, to engage in conversational interactions with users.

Also known as: ChatGPT
ChatGPT

Eligibility Criteria

Age25 Years+
Sexall
Healthy VolunteersYes
Age GroupsAdult (18-64), Older Adult (65+)

You may qualify if:

  • Internal medicine residents
  • Internal medicine attending physicians

Contact the study team to confirm eligibility.

Sponsors & Collaborators

Study Sites (2)

Harvard Beth Israel Deaconess Medical Center

Boston, Massachusetts, 02215, United States

Location

MontefioreMC

The Bronx, New York, 10467, United States

Location

Related Publications (6)

  • Cabral S, Restrepo D, Kanjee Z, Wilson P, Crowe B, Abdulnour RE, Rodman A. Clinical Reasoning of a Generative Artificial Intelligence Model Compared With Physicians. JAMA Intern Med. 2024 May 1;184(5):581-583. doi: 10.1001/jamainternmed.2024.0295.

    PMID: 38557971BACKGROUND
  • Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, Scales N, Tanwani A, Cole-Lewis H, Pfohl S, Payne P, Seneviratne M, Gamble P, Kelly C, Babiker A, Scharli N, Chowdhery A, Mansfield P, Demner-Fushman D, Aguera Y Arcas B, Webster D, Corrado GS, Matias Y, Chou K, Gottweis J, Tomasev N, Liu Y, Rajkomar A, Barral J, Semturs C, Karthikesalingam A, Natarajan V. Large language models encode clinical knowledge. Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.

    PMID: 37438534BACKGROUND
  • Strong E, DiGiammarino A, Weng Y, Kumar A, Hosamani P, Hom J, Chen JH. Chatbot vs Medical Student Performance on Free-Response Clinical Reasoning Examinations. JAMA Intern Med. 2023 Sep 1;183(9):1028-1030. doi: 10.1001/jamainternmed.2023.2909.

    PMID: 37459090BACKGROUND
  • Schaye V, Miller L, Kudlowitz D, Chun J, Burk-Rafel J, Cocks P, Guzman B, Aphinyanaphongs Y, Marin M. Development of a Clinical Reasoning Documentation Assessment Tool for Resident and Fellow Admission Notes: a Shared Mental Model for Feedback. J Gen Intern Med. 2022 Feb;37(3):507-512. doi: 10.1007/s11606-021-06805-6. Epub 2021 May 4.

    PMID: 33945113BACKGROUND
  • Goh E, Gallo R, Hom J, Strong E, Weng Y, Kerman H, Cool JA, Kanjee Z, Parsons AS, Ahuja N, Horvitz E, Yang D, Milstein A, Olson APJ, Rodman A, Chen JH. Large Language Model Influence on Diagnostic Reasoning: A Randomized Clinical Trial. JAMA Netw Open. 2024 Oct 1;7(10):e2440969. doi: 10.1001/jamanetworkopen.2024.40969.

    PMID: 39466245BACKGROUND
  • Goh E, Gallo RJ, Strong E, Weng Y, Kerman H, Freed JA, Cool JA, Kanjee Z, Lane KP, Parsons AS, Ahuja N, Horvitz E, Yang D, Milstein A, Olson APJ, Hom J, Chen JH, Rodman A. GPT-4 assistance for improvement of physician performance on patient care tasks: a randomized controlled trial. Nat Med. 2025 Apr;31(4):1233-1238. doi: 10.1038/s41591-024-03456-y. Epub 2025 Feb 5.

    PMID: 39910272BACKGROUND

Study Officials

  • Shitij Arora, MD

    Montefiore Medical Center

    PRINCIPAL INVESTIGATOR

Study Design

Study Type
interventional
Phase
not applicable
Allocation
RANDOMIZED
Masking
SINGLE
Who Masked
OUTCOMES ASSESSOR
Purpose
OTHER
Intervention Model
PARALLEL
Sponsor Type
OTHER
Responsible Party
SPONSOR

Study Record Dates

First Submitted

June 17, 2025

First Posted

June 26, 2025

Study Start

July 3, 2025

Primary Completion

December 30, 2025

Study Completion

December 30, 2025

Last Updated

April 9, 2026

Record last verified: 2026-04

Data Sharing

IPD Sharing
Will not share

Locations