Brief Summary

The goal of this randomized controlled trial is to evaluate whether behavioral nudges can reduce automation bias, the uncritical acceptance of automated output, in physicians using large language models (LLM) like ChatGPT-5.1 for clinical decision-making. The main question it aims to answer is: Does a dual-mechanism behavioral nudge intervention (baseline accuracy anchoring plus case-specific color-coded confidence signals) reduce physicians' uncritical acceptance of incorrect LLM recommendations? Researchers will compare physicians who receive LLM recommendations along with a behavioral nudge to those who receive LLM recommendations without the nudge to assess if the nudge reduces automation bias. Participants will:

Evaluate six clinical vignettes accompanied by LLM-generated recommendations (half containing deliberate, clinically significant errors).
Control group: Be able to view LLM recommendations in standard format without the nudge.
Treatment group: Be able to view ChatGPT's diagnostic accuracy on standard medical datasets as an initial anchor, then receive color-coded confidence signals alongside each recommendation (e.g., red for low confidence).
Have their responses evaluated by blinded reviewers using an expert-developed assessment rubric to detect uncritical acceptance of erroneous information.

Trial Health

On Track

Trial Health Score

Automated assessment based on enrollment pace, timeline, and geographic reach

Enrollment

participants targeted

Target at P50-P75 for not_applicable

Timeline

Completed

Started Jan 2026

Shorter than P25 for not_applicable

Geographic Reach

1 country

1 active site

Status

completed

Health score is calculated from publicly available data and should be used for screening purposes only.

Trial Relationships

Click on a node to explore related trials.

Study Timeline

Key milestones and dates

4 months study duration

First Submitted

Initial submission to the registry

December 26, 2025

Completed

14 days until next milestone

First Posted

Study publicly available on registry

January 9, 2026

Completed

8 days until next milestone

Study Start

First participant enrolled

January 17, 2026

Completed

4 months until next milestone

Primary Completion

Last participant's last visit for primary outcome

May 25, 2026

Completed

1 day until next milestone

Study Completion

Last participant's last visit for all outcomes

May 26, 2026

Completed

Last Updated

June 25, 2026

Status Verified

June 1, 2026

Enrollment Period

4 months

First QC Date

December 26, 2025

Last Update Submit

June 24, 2026

Conditions

Diagnosis

Keywords

clinical reasoningautomation biasbehavioral nudgescomputer-assisted diagnosislarge language models

Outcome Measures

Primary Outcomes (1)

Diagnostic reasoning accuracy score
The primary outcome will be the percent correct for each case, ranging from 0 to 100%, where higher scores indicate better diagnostic performance. For each case, participants will be asked for their three leading diagnoses, findings that support each diagnosis, and findings that oppose each diagnosis. For each plausible diagnosis, participants will receive 1 point. Findings supporting the diagnosis and findings opposing the diagnosis will also be graded based on correctness, with 1 point for each correct response. Participants will then be asked to name their top diagnosis they believe is most likely, earning 9 points for a reasonable response and 18 points for the most accurate response. Finally participants will be asked to name up to 3 next steps to further evaluate the patient with 0.5 point awarded for a partially correct response and 1 point for a completely correct response. The primary outcome will be compared at the case-level between the randomized groups.
Assessed at a single time point for each case, during the scheduled diagnostic reasoning evaluation session, which takes place between 0-5 days after participant enrollment.

Secondary Outcomes (1)

Top choice diagnosis accuracy score
Assessed at a single time point for each case, during the scheduled diagnostic reasoning evaluation session, which takes place between 0-5 days after participant enrollment.

Study Arms (2)

ChatGPT Recommendations alongside a Behavioral Nudge

ACTIVE COMPARATOR

Participants will evaluate six clinical vignettes. During the trial, they will have access to clinical recommendations from a specific, commercially available LLM (ChatGPT) in addition to conventional diagnostic resources. LLM recommendations for three vignettes will contain deliberately flawed diagnostic information and for three vignettes it will contain accurate recommendations). The cases will be presented in random order. Participants in this arm will receive a behavioral nudge embedded in the LLM recommendations interface that presents two synchronized cognitive cues when the LLM panel is expanded: (1) an anchoring cue displaying ChatGPT's baseline diagnostic accuracy on standard medical datasets at the top of the panel to set realistic expectations before cue intervention located immediately below, which shows the LLM recommendations alongside a case-specific color-coded confidence signal.

Other: Behavioral Nudge Intervention

ChatGPT Recommendations without a Behavioral Nudge

NO INTERVENTION

Interventions

Behavioral Nudge InterventionOTHER

Participants in the treatment group will receive a behavioral nudge intervention embedded in the LLM recommendations interface that presents two synchronized cognitive cues when the LLM panel is expanded: (1) an anchoring cue displaying ChatGPT's baseline diagnostic accuracy on standard medical datasets at the top of the panel to set realistic expectations before viewing the specific recommendation, and (2) a selective attention cue located immediately below, which shows the LLM recommendation alongside a case-specific and color-coded confidence signal. This signal is categorized as red when the mean ensemble confidence falls below the established baseline accuracy, flagging high-uncertainty cases that demand critical evaluation; orange when confidence meets or exceeds the baseline but remains below 100%, intended to prevent complacency and maintain active clinical scrutiny; and green for a 100% ensemble consensus, though standard cautionary warnings still apply to guard against.

ChatGPT Recommendations alongside a Behavioral Nudge

Eligibility Criteria

Sexall

Healthy VolunteersYes

Age GroupsChild (0-17), Adult (18-64), Older Adult (65+)

You may qualify if:

Full or Provisionally Registered Medical Practitioners with the Pakistan Medical and Dental Council (PMDC).
Completed Bachelor of Medicine, Bachelor of Surgery (MBBS) Exam. The equivalent degree of MBBS in US and Canada is the Doctor of Medicine (MD).
Participants must have completed a structured training program on the use of ChatGPT (or a comparable large language model), totaling at least 10 hours of instruction. The program must include hands-on practice related to LLM's key aspects, specifically prompt engineering and content evaluation.

You may not qualify if:

Any other Registered Medical Practitioners (Full or Provisional) with PMDC (e.g., professionals with Bachelor of Dental Surgery or BDS).

Contact the study team to confirm eligibility.

Sponsors & Collaborators

Lahore University of Management Scienceslead

Study Sites (1)

Lahore University of Management Sciences

Lahore, Punjab Province, 54792, Pakistan

Location

MeSH Terms

Conditions

Disease

Condition Hierarchy (Ancestors)

Pathologic ProcessesPathological Conditions, Signs and Symptoms

Study Officials

Ihsan Ayyub Qazi, PhD
Lahore University of Management Sciences (LUMS)
PRINCIPAL INVESTIGATOR
Muhammad Hamad Alizai, PhD
Lahore University of Management Sciences (LUMS)
PRINCIPAL INVESTIGATOR
Muhammad Asadullah Khawaja, MBBS
King Edward Medical University
PRINCIPAL INVESTIGATOR
Ali Zafar Sheikh, MBBS
Lahore General Hospital
PRINCIPAL INVESTIGATOR
Muhammad Junaid Akhtar, MBBS
Children's Hospital, Lahore
PRINCIPAL INVESTIGATOR

Study Design

Study Type: interventional
Phase: not applicable
Allocation: RANDOMIZED
Masking: SINGLE
Who Masked: OUTCOMES ASSESSOR
Masking Details: Single (Outcomes Assessor)
Purpose: DIAGNOSTIC
Intervention Model: PARALLEL
Sponsor Type: OTHER
Responsible Party: PRINCIPAL INVESTIGATOR
PI Title: Full Professor, PhD

Study Record Dates

First Submitted

December 26, 2025

First Posted

January 9, 2026

Study Start

January 17, 2026

Primary Completion

May 25, 2026

Study Completion

May 26, 2026

Last Updated

June 25, 2026

Record last verified: 2026-06

Data Sharing

IPD Sharing: Will not share

Locations

PA(1)

Brief Summary

Trial Health

Trial Health Score

Trial Relationships

Related Scientific Literature

Study Timeline

First Submitted

First Posted

Study Start

Primary Completion

Study Completion

Conditions

Keywords

Outcome Measures

Primary Outcomes (1)

Secondary Outcomes (1)

Study Arms (2)

ChatGPT Recommendations alongside a Behavioral Nudge

ChatGPT Recommendations without a Behavioral Nudge

Interventions

Eligibility Criteria

You may qualify if:

You may not qualify if:

Sponsors & Collaborators

Study Sites (1)

MeSH Terms

Conditions

Condition Hierarchy (Ancestors)

Study Officials

Study Design

Study Record Dates

Data Sharing

Locations