Physician Response Evaluation With Contextual Insights vs. Standard Engines - Artificial Intelligence RAG vs LLM Clinical Decision Support
PRECISE
Comparing Clinical Key AI and GPT-4 for Diagnostic Reasoning and Management Decisions
1 other identifier
interventional
27
1 country
2
Brief Summary
Clinical decision support tools powered by artificial intelligence are being rapidly integrated into medical practice. Two leading systems currently available to clinicians are OpenEvidence, which uses retrieval-augmented generation to access medical literature, and GPT-4, a large language model. While both tools show promise, their relative effectiveness in supporting clinical decision-making has not been directly compared. This study aims to evaluate how these tools influence diagnostic reasoning and management decisions among internal medicine physicians.
Trial Health
Trial Health Score
Automated assessment based on enrollment pace, timeline, and geographic reach
participants targeted
Target at below P25 for not_applicable
Started Jul 2025
Shorter than P25 for not_applicable
2 active sites
Health score is calculated from publicly available data and should be used for screening purposes only.
Trial Relationships
Click on a node to explore related trials.
Study Timeline
Key milestones and dates
First Submitted
Initial submission to the registry
June 17, 2025
CompletedFirst Posted
Study publicly available on registry
June 26, 2025
CompletedStudy Start
First participant enrolled
July 3, 2025
CompletedPrimary Completion
Last participant's last visit for primary outcome
December 30, 2025
CompletedStudy Completion
Last participant's last visit for all outcomes
December 30, 2025
CompletedApril 9, 2026
April 1, 2026
6 months
June 17, 2025
April 7, 2026
Conditions
Outcome Measures
Primary Outcomes (1)
Clinical Reasoning Performance as determined by Rater Scores
Clinical reasoning performance will be evaluated based upon the accuracy of the rater scores to responses to the surveys administered. Six blinded, trained independent raters will independently score each participant's response using a validated scoring rubric. Possible response scores can range from 0-100% with higher scores indicating increased clinical reasoning performance. Results for each assessment will be summarized by study arm using basic descriptive statistics and analyzed using mixed-effects models to account for within-subject correlation and between-subject factors.
15-minutes upon completion of cases, up to approximately 90 minutes total
Secondary Outcomes (2)
Time efficiency
Up to approximately 75 minutes
Decision confidence
15-minutes upon completion of cases, up to approximately 90 minutes total
Study Arms (2)
OpenEvidence
ACTIVE COMPARATORParticipants in this arm will use OpenEvidence as their research tool
ChatGPT
ACTIVE COMPARATORPatients in this arm will use Chat-GPT as their research tool
Interventions
Medical information platform which uses retrieval-augmented generation to access medical literature
A chatbot application developed that uses GPT-4, a large language model, to engage in conversational interactions with users.
Eligibility Criteria
You may qualify if:
- Internal medicine residents
- Internal medicine attending physicians
Contact the study team to confirm eligibility.
Sponsors & Collaborators
Study Sites (2)
Harvard Beth Israel Deaconess Medical Center
Boston, Massachusetts, 02215, United States
MontefioreMC
The Bronx, New York, 10467, United States
Related Publications (6)
Cabral S, Restrepo D, Kanjee Z, Wilson P, Crowe B, Abdulnour RE, Rodman A. Clinical Reasoning of a Generative Artificial Intelligence Model Compared With Physicians. JAMA Intern Med. 2024 May 1;184(5):581-583. doi: 10.1001/jamainternmed.2024.0295.
PMID: 38557971BACKGROUNDSinghal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, Scales N, Tanwani A, Cole-Lewis H, Pfohl S, Payne P, Seneviratne M, Gamble P, Kelly C, Babiker A, Scharli N, Chowdhery A, Mansfield P, Demner-Fushman D, Aguera Y Arcas B, Webster D, Corrado GS, Matias Y, Chou K, Gottweis J, Tomasev N, Liu Y, Rajkomar A, Barral J, Semturs C, Karthikesalingam A, Natarajan V. Large language models encode clinical knowledge. Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.
PMID: 37438534BACKGROUNDStrong E, DiGiammarino A, Weng Y, Kumar A, Hosamani P, Hom J, Chen JH. Chatbot vs Medical Student Performance on Free-Response Clinical Reasoning Examinations. JAMA Intern Med. 2023 Sep 1;183(9):1028-1030. doi: 10.1001/jamainternmed.2023.2909.
PMID: 37459090BACKGROUNDSchaye V, Miller L, Kudlowitz D, Chun J, Burk-Rafel J, Cocks P, Guzman B, Aphinyanaphongs Y, Marin M. Development of a Clinical Reasoning Documentation Assessment Tool for Resident and Fellow Admission Notes: a Shared Mental Model for Feedback. J Gen Intern Med. 2022 Feb;37(3):507-512. doi: 10.1007/s11606-021-06805-6. Epub 2021 May 4.
PMID: 33945113BACKGROUNDGoh E, Gallo R, Hom J, Strong E, Weng Y, Kerman H, Cool JA, Kanjee Z, Parsons AS, Ahuja N, Horvitz E, Yang D, Milstein A, Olson APJ, Rodman A, Chen JH. Large Language Model Influence on Diagnostic Reasoning: A Randomized Clinical Trial. JAMA Netw Open. 2024 Oct 1;7(10):e2440969. doi: 10.1001/jamanetworkopen.2024.40969.
PMID: 39466245BACKGROUNDGoh E, Gallo RJ, Strong E, Weng Y, Kerman H, Freed JA, Cool JA, Kanjee Z, Lane KP, Parsons AS, Ahuja N, Horvitz E, Yang D, Milstein A, Olson APJ, Hom J, Chen JH, Rodman A. GPT-4 assistance for improvement of physician performance on patient care tasks: a randomized controlled trial. Nat Med. 2025 Apr;31(4):1233-1238. doi: 10.1038/s41591-024-03456-y. Epub 2025 Feb 5.
PMID: 39910272BACKGROUND
Study Officials
- PRINCIPAL INVESTIGATOR
Shitij Arora, MD
Montefiore Medical Center
Study Design
- Study Type
- interventional
- Phase
- not applicable
- Allocation
- RANDOMIZED
- Masking
- SINGLE
- Who Masked
- OUTCOMES ASSESSOR
- Purpose
- OTHER
- Intervention Model
- PARALLEL
- Sponsor Type
- OTHER
- Responsible Party
- SPONSOR
Study Record Dates
First Submitted
June 17, 2025
First Posted
June 26, 2025
Study Start
July 3, 2025
Primary Completion
December 30, 2025
Study Completion
December 30, 2025
Last Updated
April 9, 2026
Record last verified: 2026-04
Data Sharing
- IPD Sharing
- Will not share