Automation Bias in Physician-LLM Diagnostic Reasoning
Trust or Verify? Automation Bias in Physician-LLM Diagnostic Reasoning
1 other identifier
interventional
44
1 country
1
Brief Summary
This study aims to systematically measure the extent and patterns of automation bias among physicians when utilizing ChatGPT-4o in clinical decision-making.
Trial Health
Trial Health Score
Automated assessment based on enrollment pace, timeline, and geographic reach
participants targeted
Target at P25-P50 for not_applicable
Started Jun 2025
Shorter than P25 for not_applicable
1 active site
Health score is calculated from publicly available data and should be used for screening purposes only.
Trial Relationships
Click on a node to explore related trials.
Study Timeline
Key milestones and dates
First Submitted
Initial submission to the registry
April 23, 2025
CompletedFirst Posted
Study publicly available on registry
May 9, 2025
CompletedStudy Start
First participant enrolled
June 20, 2025
CompletedPrimary Completion
Last participant's last visit for primary outcome
August 15, 2025
CompletedStudy Completion
Last participant's last visit for all outcomes
August 15, 2025
CompletedAugust 22, 2025
August 1, 2025
2 months
April 23, 2025
August 21, 2025
Conditions
Keywords
Outcome Measures
Primary Outcomes (1)
Diagnostic reasoning
The primary outcome will be the percent correct for each case, ranging from 0 to 100%, where higher scores indicate better diagnostic performance. For each case, participants will be asked for their three leading diagnoses, findings that support each diagnosis, and findings that oppose each diagnosis. For each plausible diagnosis, participants will receive 1 point. Findings supporting the diagnosis and findings opposing the diagnosis will also be graded based on correctness, with 1 point for each correct response. Participants will then be asked to name their top diagnosis they believe is most likely, earning 9 points for a reasonable response and 18 points for the most accurate response. Finally participants will be asked to name up to 3 next steps to further evaluate the patient with 0.5 point awarded for a partially correct response and 1 point for a completely correct response. The primary outcome will be compared at the case-level between the randomized groups.
Assessed at a single time point for each case, during the scheduled diagnostic reasoning evaluation session, which takes place between 0-4 days after participant enrollment.
Secondary Outcomes (1)
Top choice diagnosis accuracy score
Assessed at a single time point for each case, during the scheduled diagnostic reasoning evaluation session, which takes place between 0-4 days after participant enrollment.
Study Arms (2)
ChatGPT-4o Recommendations with Hallucinations
ACTIVE COMPARATORParticipants will evaluate six clinical vignettes. During the trial, they will have access to clinical recommendations from a specific, commercially available LLM (ChatGPT-4o) in addition to conventional diagnostic resources. LLM recommendations for three vignettes will contain deliberately flawed diagnostic information and for three vignettes it will contain accurate recommendations). The cases will be presented in random order.
ChatGPT-4o Recommendations without Hallucinations
NO INTERVENTIONParticipants will evaluate the same six clinical vignettes as in the intervention arm. During the trial, they will have access to clinical recommendations from a specific, commercially available LLM (ChatGPT-4o) in addition to conventional diagnostic resources. However, the LLM-generated recommendations will not contain any deliberately introduced errors. The cases will be presented in random order.
Interventions
ChatGPT-4o's differential diagnoses of six clinical vignettes, three of which will contain deliberately introduced inaccurate information.
Eligibility Criteria
You may qualify if:
- Completed Bachelor of Medicine, Bachelor of Surgery (MBBS) Exam. The equivalent degree of MBBS in US and Canada is called Doctor of Medicine (MD).
- Full or Provisionally Registered Medical Practitioners with the Pakistan Medical and Dental Council (PMDC).
- Participants must have completed a structured training program on the use of ChatGPT (or a comparable large language model), totaling at least 10 hours of instruction. The program must include hands-on practice related to LLM's aspects, specifically prompt engineering and content evaluation.
You may not qualify if:
- Any other Registered Medical Practitioners (Full or Provisional) with PMDC (e.g., Professionals with Bachelor of Dental Surgery or BDS).
Contact the study team to confirm eligibility.
Sponsors & Collaborators
Study Sites (1)
Lahore University of Management Sciences
Lahore, Punjab Province, 54000, Pakistan
MeSH Terms
Conditions
Condition Hierarchy (Ancestors)
Study Officials
- PRINCIPAL INVESTIGATOR
Ihsan Ayyub Qazi, PhD
Lahore University of Management Sciences (LUMS)
- PRINCIPAL INVESTIGATOR
Ayesha Ali, PhD
Lahore University of Management Sciences (LUMS)
- PRINCIPAL INVESTIGATOR
Muhammad Asadullah Khawaja, MBBS
King Edward Medical University
- PRINCIPAL INVESTIGATOR
Ali Zafar Sheikh, MBBS
Lahore General Hospital
- PRINCIPAL INVESTIGATOR
Muhammad Junaid Akhtar, MBBS
Children's Hospital, Lahore
Study Design
- Study Type
- interventional
- Phase
- not applicable
- Allocation
- RANDOMIZED
- Masking
- SINGLE
- Who Masked
- OUTCOMES ASSESSOR
- Masking Details
- Single (Outcomes Assessor)
- Purpose
- DIAGNOSTIC
- Intervention Model
- PARALLEL
- Sponsor Type
- OTHER
- Responsible Party
- PRINCIPAL INVESTIGATOR
- PI Title
- Full Professor, PhD
Study Record Dates
First Submitted
April 23, 2025
First Posted
May 9, 2025
Study Start
June 20, 2025
Primary Completion
August 15, 2025
Study Completion
August 15, 2025
Last Updated
August 22, 2025
Record last verified: 2025-08
Data Sharing
- IPD Sharing
- Will not share