Brief Summary

Artificial intelligence (AI) shows promising in identifying abnormalities in clinical images. However, systematically biased AI models, where a model makes inaccurate predictions for entire subpopulations, can lead to errors and potential harms. When shown incorrect predictions from an AI model, clinician diagnostic accuracy can be harmed. This study aims to study the effectiveness of providing clinicians with image-based AI model explanations when provided AI model predictions to help clinicians better understand the logic of an AI model's prediction. It will evaluate whether providing clinicians with AI model explanations can improve diagnostic accuracy and help clinicians catch when models are making incorrect decisions. As a test case, the study will focus on the diagnosis of acute respiratory failure because determining the underlying causes of acute respiratory failure is critically important for guiding treatment decisions but can be clinically challenging. To determine if providing AI explanations can improve clinician diagnostic accuracy and alleviate the potential impact of showing clinicians a systematically biased AI model, a randomized clinical vignette survey study will be conducted. During the survey, study participants will be shown clinical vignettes of patients hospitalized with acute respiratory failure, including the patient's presenting symptoms, physical exam, laboratory results, and chest X-ray. Study participants will then be asked to assess the likelihood that heart failure, pneumonia and/or Chronic Obstructive Pulmonary Disease (COPD) is the underlying diagnosis. During specific vignettes in the survey, participants will also be shown standard or systematically biased AI models that provide an estimate the likelihood that heart failure, pneumonia and/or COPD is the underlying diagnosis. Clinicians will be randomized see AI predictions alone or AI predictions with explanations when shown AI models. This survey design will allow for testing the hypothesis that systematically biased models would harm clinician diagnostic accuracy, but commonly used image-based explanations would help clinicians partially recover their performance.

Trial Health

On Track

Trial Health Score

Automated assessment based on enrollment pace, timeline, and geographic reach

Enrollment

457

participants targeted

Target at P75+ for not_applicable

Timeline

Completed

Started Apr 2022

Shorter than P25 for not_applicable

Geographic Reach

1 country

1 active site

Status

completed

Health score is calculated from publicly available data and should be used for screening purposes only.

Trial Relationships

Click on a node to explore related trials.

Study Timeline

Key milestones and dates

10 months study duration

Study Start

First participant enrolled

April 1, 2022

Completed

10 months until next milestone

Primary Completion

Last participant's last visit for primary outcome

January 31, 2023

Completed

Same day until next milestone

Study Completion

Last participant's last visit for all outcomes

January 31, 2023

Completed

9 months until next milestone

First Submitted

Initial submission to the registry

October 17, 2023

Completed

8 days until next milestone

First Posted

Study publicly available on registry

October 25, 2023

Completed

Last Updated

October 25, 2023

Status Verified

October 1, 2023

Enrollment Period

10 months

First QC Date

October 17, 2023

Last Update Submit

October 17, 2023

Conditions

Acute Respiratory Failure

Keywords

Artificial IntelligenceDiagnostic AccuracyComputer Assisted DiagnosisBiased Model

Outcome Measures

Primary Outcomes (1)

Participant diagnostic accuracy across clinical vignette settings
Diagnostic accuracy is defined as the number of correct diagnostic assessments over the total number of diagnostic assessments. After reviewing each individual patient clinical vignette within the survey, participants will be asked to make three separate diagnostic assessments for each clinical vignette, one for heart failure, pneumonia, and COPD. If the participant's assessment agrees with the reference label for each vignette, the diagnostic assessment is considered correct. Diagnostic assessments will be performed while participants are completing the survey (day 0), immediately after the participant reviews the clinical vignette. Participant diagnostic accuracy will be compared across vignette settings (no AI model, standard AI model, standard AI model with explanation, biased AI model, biased AI model with explanation).
Day 0

Secondary Outcomes (2)

Treatment Selection Accuracy across clinical vignette settings
Day 0
Diagnosis specific diagnostic accuracy across clinical vignette settings
Day 0

Study Arms (6)

AI model biased for heart failure, no AI explanation

EXPERIMENTAL

Participants in this arm will be shown standard AI model predictions during 3 patient clinical vignettes within the survey and systematically biased AI model predictions during 3 clinical vignettes. When shown systematically biased AI model predictions, the model will be biased against heart failure, always predicting that heart failure is present with high likelihood in patients with a body mass index (BMI) at or above 30. Standard predictions will be shown for the other 2 diagnoses. Participants in this arm will not be shown an AI explanation when shown AI model predictions.

Other: Artificial Intelligence model predictions without explanationOther: AI model biased against heart failure

AI model biased for pneumonia, no AI explanation

EXPERIMENTAL

Participants in this arm will be shown standard AI model predictions during 3 patient clinical vignettes within the survey and systematically biased AI model predictions during 3 clinical vignettes. When shown systematically biased AI model predictions, the model will be biased against pneumonia, always predicting that pneumonia is present with high likelihood in patients 80 years or older. Standard predictions will be shown for the other 2 diagnoses. Participants in this arm will not be shown an AI explanation when shown AI model predictions.

Other: Artificial Intelligence model predictions without explanationOther: AI model biased against pneumonia

AI model biased for COPD, no AI explanation

EXPERIMENTAL

Participants in this arm will be shown standard AI model predictions during 3 patient clinical vignettes within the survey and systematically biased AI model predictions during 3 clinical vignettes. When shown systematically biased AI model predictions, the model will be biased against COPD, always predicting that COPD is present with high likelihood when a pre-processing filter was applied to the patient's X-ray. Standard predictions will be shown for the other 2 diagnoses. Participants in this arm will not be shown an AI explanation when shown AI model predictions.

Other: Artificial Intelligence model predictions without explanationOther: AI model biased against COPD

AI model biased for heart failure, Image-based AI explanation presented

EXPERIMENTAL

Participants in this arm will be shown standard AI model predictions during 3 patient clinical vignettes within the survey and systematically biased AI model predictions during 3 clinical vignettes. When shown systematically biased AI model predictions, the model will be biased against heart failure, always predicting that heart failure is present with high likelihood in patients with a body mass index (BMI) at or above 30. Standard predictions will be shown for the other 2 diagnoses. Participants in this arm will also be shown AI explanation when shown AI model predictions.

Other: Artificial intelligence model predictions with explanationOther: AI model biased against heart failure

AI model biased for pneumonia, Image-based AI explanation presented

EXPERIMENTAL

Participants in this arm will be shown standard AI model predictions during 3 patient clinical vignettes within the survey and systematically biased AI model predictions during 3 clinical vignettes. When shown systematically biased AI model predictions, the model will be biased against pneumonia, always predicting that pneumonia is present with high likelihood in patients 80 years or older. Standard predictions will be shown for the other 2 diagnoses. Participants in this arm will also be shown AI explanation when shown AI model predictions.

Other: Artificial intelligence model predictions with explanationOther: AI model biased against pneumonia

AI model biased for COPD, Image-based AI explanation presented

EXPERIMENTAL

Participants in this arm will be shown standard AI model predictions during 3 patient clinical vignettes within the survey and systematically biased AI model predictions during 3 clinical vignettes. When shown systematically biased AI model predictions, the model will be biased against COPD, always predicting that COPD is present with high likelihood when a pre-processing filter was applied to the patient's X-ray. Standard predictions will be shown for the other 2 diagnoses. Participants in this arm will also be shown AI explanation when shown AI model predictions.

Other: Artificial intelligence model predictions with explanationOther: AI model biased against COPD

Interventions

Artificial Intelligence model predictions without explanationOTHER

During 6 clinical vignettes, participants will see AI model predictions without a corresponding AI explanation. The AI model will provide a score for each diagnosis (heart failure, pneumonia, COPD) on a scale of 0-100 estimating how likely the patient's presentation was due to each of these diagnoses. In 3 of the clinical vignettes, participants will be shown standard AI model predictions and 3 vignettes they will be shown systematically biased AI model predictions, with the model specifically biased against one of the three diagnoses.

AI model biased for COPD, no AI explanationAI model biased for heart failure, no AI explanationAI model biased for pneumonia, no AI explanation

Artificial intelligence model predictions with explanationOTHER

During 6 clinical vignettes, participants will see AI model predictions with explanation. The AI model will provide a score for each diagnosis on a scale of 0-100. In 3 clinical vignettes, participants will be shown standard AI model predictions and 3 vignettes they will be shown systematically biased AI model predictions with the model specifically biased against one of the three diagnoses. If the AI model provides a score above 50 an AI model explanation will be shown as gradient-weighted class activation mapping (Grad-CAM) heatmaps overlaid on the chest X-ray that highlighted which regions of the image most affecting the AI model's prediction.

AI model biased for COPD, Image-based AI explanation presentedAI model biased for heart failure, Image-based AI explanation presentedAI model biased for pneumonia, Image-based AI explanation presented

AI model biased against heart failureOTHER

In 3 clinical vignettes, participants will be shown systematically biased AI model predictions with the model specifically biased against heart failure, always predicting that heart failure is present with high likelihood in survey vignette patients with a body mass index (BMI) at or above 30. Standard predictions will be shown for the other 2 diagnoses (pneumonia, COPD).

AI model biased for heart failure, Image-based AI explanation presentedAI model biased for heart failure, no AI explanation

AI model biased against pneumoniaOTHER

In 3 clinical vignettes, participants will be shown systematically biased AI model predictions with the model specifically biased against pneumonia, always predicting that pneumonia is present with high likelihood in survey vignette patients 80 years or older. Standard predictions will be shown for the other 2 diagnoses (heart failure, COPD).

AI model biased for pneumonia, Image-based AI explanation presentedAI model biased for pneumonia, no AI explanation

AI model biased against COPDOTHER

In 3 clinical vignettes, participants will be shown systematically biased AI model predictions with the model specifically biased against COPD, always predicting that COPD is present with high likelihood in survey vignette patients where a pre-processing filter was applied to the patient's X-ray. Standard predictions will be shown for the other 2 diagnoses (heart failure, pneumonia).

AI model biased for COPD, Image-based AI explanation presentedAI model biased for COPD, no AI explanation

Eligibility Criteria

Age18 Years+

Sexall

Healthy VolunteersNo

Age GroupsAdult (18-64), Older Adult (65+)

You may qualify if:

Physicians, nurse practitioners, and physician assistants that care for patients with acute respiratory failure as part of their clinical practice

You may not qualify if:

Physicians, nurse practitioners, and physician assistants that only provide patient care in outpatient settings

Contact the study team to confirm eligibility.

Sponsors & Collaborators

University of Michiganlead
National Heart, Lung, and Blood Institute (NHLBI)collaborator

Study Sites (1)

University of Michigan

Ann Arbor, Michigan, 48103, United States

Location

Related Publications (1)

Jabbour S, Fouhey D, Shepard S, Valley TS, Kazerooni EA, Banovic N, Wiens J, Sjoding MW. Measuring the Impact of AI in the Diagnosis of Hospitalized Patients: A Randomized Clinical Vignette Survey Study. JAMA. 2023 Dec 19;330(23):2275-2284. doi: 10.1001/jama.2023.22295.
PMID: 38112814DERIVED

Study Officials

Michael Sjoding, MD
University of Michigan
PRINCIPAL INVESTIGATOR

Study Design

Study Type: interventional
Phase: not applicable
Allocation: RANDOMIZED
Masking: SINGLE
Who Masked: PARTICIPANT
Masking Details: Participants are not aware of what type of AI model predictions are shown during the clinical vignettes within the survey.
Purpose: OTHER
Intervention Model: PARALLEL
Sponsor Type: OTHER
Responsible Party: PRINCIPAL INVESTIGATOR
PI Title: Associate Professor of Internal Medicine

Study Record Dates

First Submitted

October 17, 2023

First Posted

October 25, 2023

Study Start

April 1, 2022

Primary Completion

January 31, 2023

Study Completion

January 31, 2023

Last Updated

October 25, 2023

Record last verified: 2023-10

Data Sharing

IPD Sharing: Will share
Shared Documents: STUDY PROTOCOL, SAP
Time Frame: Data will be shared indefinitely once the study is published
Access Criteria: This information will be published as supplements with the study manuscript.

Locations

US(1)

Brief Summary

Trial Health

Trial Health Score

Trial Relationships

Related Scientific Literature

Study Timeline

Study Start

Primary Completion

Study Completion

First Submitted

First Posted

Conditions

Keywords

Outcome Measures

Primary Outcomes (1)

Secondary Outcomes (2)

Study Arms (6)

AI model biased for heart failure, no AI explanation

AI model biased for pneumonia, no AI explanation

AI model biased for COPD, no AI explanation

AI model biased for heart failure, Image-based AI explanation presented

AI model biased for pneumonia, Image-based AI explanation presented

AI model biased for COPD, Image-based AI explanation presented

Interventions

Eligibility Criteria

You may qualify if:

You may not qualify if:

Sponsors & Collaborators

Study Sites (1)

Related Publications (1)

Study Officials

Study Design

Study Record Dates

Data Sharing

Locations