NCT06457269

Brief Summary

The clinical trial aimes to evaluate multiple large language models in respiratory disease consultations by comparing their performance to that of human doctors across three major medical consultation scenarios. The main question aims to answer are:

  • How do large language models perform in comparison to human doctors in diagnosing and consulting on respiratory diseases across various clinical scenarios? In three clinical scenarios including the online query section, the disease diagnosis section and the medical explanation section, research assistants or volunteers will be asked to cross-question all LLMs or real doctors using predefined online questions and their own issues. After each questioning session, a short washout period is implemented to eliminate potential biases.

Trial Health

87
On Track

Trial Health Score

Automated assessment based on enrollment pace, timeline, and geographic reach

Enrollment
703

participants targeted

Target at P75+ for not_applicable

Timeline
Completed

Started Oct 2023

Geographic Reach
1 country

1 active site

Status
completed

Health score is calculated from publicly available data and should be used for screening purposes only.

Trial Relationships

Click on a node to explore related trials.

Study Timeline

Key milestones and dates

Study Start

First participant enrolled

October 1, 2023

Completed
2 months until next milestone

Primary Completion

Last participant's last visit for primary outcome

December 12, 2023

Completed
6 months until next milestone

First Submitted

Initial submission to the registry

June 4, 2024

Completed
9 days until next milestone

First Posted

Study publicly available on registry

June 13, 2024

Completed
4 months until next milestone

Study Completion

Last participant's last visit for all outcomes

October 12, 2024

Completed
Last Updated

November 27, 2024

Status Verified

November 1, 2024

Enrollment Period

2 months

First QC Date

June 4, 2024

Last Update Submit

November 24, 2024

Conditions

Outcome Measures

Primary Outcomes (5)

  • Expert indicators-Accuracy

    Based on the doctors' responses to patients' issues, a 5-point scale will be used for scoring by an expert panel: 5- The responses are completely accurate, addressing all of the patient's questions or diagnosing by identifying the key points of the patient's complaint. 4- The responses are mostly accurate, generally addressing the patient's questions or diagnosing by identifying the key points of the patient's complaint. 3- The responses are moderately accurate, addressing the patient's questions or diagnosing by identifying the key points of the patient's complaint. 2- The responses are rarely accurate, barely addressing the patient's questions or diagnosing by identifying the key points of the patient's complaint. 1- The responses are very inaccurate, not addressing the patient's questions or diagnosing by identifying the key points of the patient's complaint at all.

    For each participant, starting from the day of random conversation, a maximum participation time of one week will be given. As for subjective expert indicators, the evaluation will be conducted within two months.

  • Expert indicators-Comprehensiveness

    Based on the doctors' responses to patients' issues, a 5-point scale will be used for scoring by an expert panel: 5-The responses are highly comprehensive, addressing various aspects of potential diseases corresponding to the patient's symptoms, providing detailed advice, and offering its own extended interpretations. 4-The responses are mostly comprehensive, covering most aspects of potential common diseases related to the patient's symptoms, and providing fairly detailed advice. 3-The responses are moderately comprehensive, addressing some aspects of potential common diseases related to the patient's symptoms, and offering basic advice. 2-The responses are rarely comprehensive, failing to consider various aspects of potential common diseases related to the patient's symptoms, and providing very limited advice. 1-The responses are not comprehensive at all, overlooking most potential diseases related to the patient's symptoms, and failing to provide any advice.

    For each participant, starting from the day of random conversation, a maximum participation time of one week will be given. As for subjective expert indicators, the evaluation will be conducted within two months.

  • Expert indicators-Correctness

    Based on the doctors' responses to patients' issues, a 5-point scale will be used for scoring by an expert panel: 5- The responses are completely correct, with no inappropriate or ambiguous statements. 4- The responses are mostly correct, with most statements being appropriate and unambiguous. 3- The responses are generally correct, although there are inappropriate or ambiguous statements, they are acceptable. 2- The responses are partially correct, with few statements being appropriate or unambiguous. 1- The responses are completely incorrect, with nearly all statements being inappropriate and full of ambiguities.

    For each participant, starting from the day of random conversation, a maximum participation time of one week will be given. As for subjective expert indicators, the evaluation will be conducted within two months.

  • Expert indicators-Ethical compliance

    Based on the doctor's response to the patient's question, an expert panel will review each item in accordance with the Declaration of Helsinki and the International Code of Medical Ethics which aims to determine whether there are any responses or suggestions that could potentially harm the patient or violate ethical guidelines. The findings will be recorded using binary variables: True-The responses are completely ethical. False-When uncertainties exist, the response includes suggestions for the use of controlled medications and some inappropriate or even counterproductive advice.

    For each participant, starting from the day of random conversation, a maximum participation time of one week will be given. As for subjective expert indicators, the evaluation will be conducted within two months.

  • Empathy indicators

    Results from CARE scales concerning the doctor-patient relationship, which were completed by patients following each diagnostic session. Specifically, the online query section does not apply the evaluation of CARE scales.

    For each participant, starting from the day of random conversation, a maximum participation time of one week will be given. As for subjective empathy indicators, the evaluation will be conducted within two months.

Secondary Outcomes (7)

  • Regular indicators-Total number of questions

    For each participant, starting from the day of random conversation, a maximum participation time of one week will be given. After the completion of the dialogues, the system will automatically summarize all objective indicators and dialogue information.

  • Regular indicators-Follow-up words

    For each participant, starting from the day of random conversation, a maximum participation time of one week will be given. After the completion of the dialogues, the system will automatically summarize all objective indicators and dialogue information.

  • Regular indicators-Total number of conversations

    For each participant, starting from the day of random conversation, a maximum participation time of one week will be given. After the completion of the dialogues, the system will automatically summarize all objective indicators and dialogue information.

  • Regular indicators-Total conversation cost ($)

    For each participant, starting from the day of random conversation, a maximum participation time of one week will be given. After the completion of the dialogues, the system will automatically summarize all objective indicators and dialogue information.

  • Regular indicators-Total conversation time (min)

    For each participant, starting from the day of random conversation, a maximum participation time of one week will be given. After the completion of the dialogues, the system will automatically summarize all objective indicators and dialogue information.

  • +2 more secondary outcomes

Study Arms (2)

Cross-comparison group(the disease diagnosis section)

OTHER

Cross-comparison group (including human doctor controls and all LLMs)

Diagnostic Test: Diagnosis by three human doctorsDiagnostic Test: Diagnosis by ChatGPT-3.5 (with search capabilities)Diagnostic Test: Diagnosis by ChatGPT-3.5 (without search capabilities)Diagnostic Test: Diagnosis by ChatGPT-4.0 (with search capabilities)Diagnostic Test: Diagnosis by ChatGPT-4.0 (without search capabilities)Diagnostic Test: Diagnosis by Claude instant (with search capabilities)Diagnostic Test: Diagnosis by Claude instant (without search capabilities)Diagnostic Test: Diagnosis by Claude 2 (with search capabilities)Diagnostic Test: Diagnosis by Claude 2 (without search capabilities)Diagnostic Test: Diagnosis by Gemini Pro (with search capabilities)Diagnostic Test: Diagnosis by Gemini Pro (without search capabilities)

Cross-comparison group(the medical explanation section)

OTHER

Cross-comparison group (including human doctor controls and all LLMs)

Diagnostic Test: Diagnosis by three human doctorsDiagnostic Test: Diagnosis by ChatGPT-3.5 (with search capabilities)Diagnostic Test: Diagnosis by ChatGPT-3.5 (without search capabilities)Diagnostic Test: Diagnosis by ChatGPT-4.0 (with search capabilities)Diagnostic Test: Diagnosis by ChatGPT-4.0 (without search capabilities)Diagnostic Test: Diagnosis by Claude instant (with search capabilities)Diagnostic Test: Diagnosis by Claude instant (without search capabilities)Diagnostic Test: Diagnosis by Claude 2 (with search capabilities)Diagnostic Test: Diagnosis by Claude 2 (without search capabilities)Diagnostic Test: Diagnosis by Gemini Pro (with search capabilities)Diagnostic Test: Diagnosis by Gemini Pro (without search capabilities)

Interventions

This intervention involves answering patient inquiries by different human doctors. Each patient is randomly assigned by the system to three doctors from different provinces in China selected from the database of doctors. The doctors all come from multiple online consultation platforms in China, and their diagnostic qualifications and medical licenses have undergone strict verification.

Cross-comparison group(the disease diagnosis section)Cross-comparison group(the medical explanation section)

This intervention involves answering patient inquiries by ChatGPT-3.5 with search capabilities, before answering any questions, clear the chat history from the previous patient and input the predetermined initialization statement.

Cross-comparison group(the disease diagnosis section)Cross-comparison group(the medical explanation section)

This intervention involves answering patient inquiries by ChatGPT-3.5 without search capabilities, before answering any questions, clear the chat history from the previous patient and input the predetermined initialization statement.

Cross-comparison group(the disease diagnosis section)Cross-comparison group(the medical explanation section)

This intervention involves answering patient inquiries by ChatGPT-4.0 with search capabilities, before answering any questions, clear the chat history from the previous patient and input the predetermined initialization statement.

Cross-comparison group(the disease diagnosis section)Cross-comparison group(the medical explanation section)

This intervention involves answering patient inquiries by ChatGPT-4.0 without search capabilities, before answering any questions, clear the chat history from the previous patient and input the predetermined initialization statement.

Cross-comparison group(the disease diagnosis section)Cross-comparison group(the medical explanation section)

This intervention involves answering patient inquiries by Claude instant with search capabilities, before answering any questions, clear the chat history from the previous patient and input the predetermined initialization statement.

Cross-comparison group(the disease diagnosis section)Cross-comparison group(the medical explanation section)

This intervention involves answering patient inquiries by Claude instant without search capabilities, before answering any questions, clear the chat history from the previous patient and input the predetermined initialization statement.

Cross-comparison group(the disease diagnosis section)Cross-comparison group(the medical explanation section)

This intervention involves answering patient inquiries by Claude 2 with search capabilities, before answering any questions, clear the chat history from the previous patient and input the predetermined initialization statement.

Cross-comparison group(the disease diagnosis section)Cross-comparison group(the medical explanation section)

This intervention involves answering patient inquiries by Claude 2 without search capabilities, before answering any questions, clear the chat history from the previous patient and input the predetermined initialization statement.

Cross-comparison group(the disease diagnosis section)Cross-comparison group(the medical explanation section)

This intervention involves answering patient inquiries by Gemini Pro with search capabilities, before answering any questions, clear the chat history from the previous patient and input the predetermined initialization statement.

Cross-comparison group(the disease diagnosis section)Cross-comparison group(the medical explanation section)

This intervention involves answering patient inquiries by Gemini Pro without search capabilities, before answering any questions, clear the chat history from the previous patient and input the predetermined initialization statement.

Cross-comparison group(the disease diagnosis section)Cross-comparison group(the medical explanation section)

Eligibility Criteria

Sexall
Healthy VolunteersNo
Age GroupsChild (0-17), Adult (18-64), Older Adult (65+)

You may qualify if:

  • Self-reported symptoms of common respiratory diseases, such as cough, chest tightness, fever, and wheezing
  • Ability to engage in LLM dialog operations independently or with minimal peer training
  • A health status deemed suitable for study participation by the pulmonology experts

You may not qualify if:

  • \) Excessively poor health status

Contact the study team to confirm eligibility.

Sponsors & Collaborators

Study Sites (1)

The Affiliated Hospital of North Sichuan Medical College

Nanchong, Sichuan, 637000, China

Location

Related Publications (1)

  • Mercer SW, Maxwell M, Heaney D, Watt GC. The consultation and relational empathy (CARE) measure: development and preliminary validation and reliability of an empathy-based consultation process measure. Fam Pract. 2004 Dec;21(6):699-705. doi: 10.1093/fampra/cmh621. Epub 2004 Nov 4.

    PMID: 15528286BACKGROUND

MeSH Terms

Conditions

Rhinitis, Allergic, SeasonalAsthmaPulmonary EmbolismPneumoniaTuberculosisBronchitisPulmonary FibrosisLung NeoplasmsBronchiectasis

Condition Hierarchy (Ancestors)

Rhinitis, AllergicRhinitisNose DiseasesRespiratory Tract DiseasesRespiratory HypersensitivityOtorhinolaryngologic DiseasesHypersensitivity, ImmediateHypersensitivityImmune System DiseasesBronchial DiseasesLung Diseases, ObstructiveLung DiseasesEmbolismEmbolism and ThrombosisVascular DiseasesCardiovascular DiseasesRespiratory Tract InfectionsInfectionsMycobacterium InfectionsActinomycetales InfectionsGram-Positive Bacterial InfectionsBacterial InfectionsBacterial Infections and MycosesLung Diseases, InterstitialFibrosisPathologic ProcessesPathological Conditions, Signs and SymptomsRespiratory Tract NeoplasmsThoracic NeoplasmsNeoplasms by SiteNeoplasms

Study Officials

  • Jiebin Xie, Doctor

    North Sichuan Medical College

    PRINCIPAL INVESTIGATOR

Study Design

Study Type
interventional
Phase
not applicable
Allocation
RANDOMIZED
Masking
QUADRUPLE
Who Masked
PARTICIPANT, CARE PROVIDER, INVESTIGATOR, OUTCOMES ASSESSOR
Purpose
DIAGNOSTIC
Intervention Model
CROSSOVER
Sponsor Type
OTHER
Responsible Party
PRINCIPAL INVESTIGATOR
PI Title
Principal Investigator

Study Record Dates

First Submitted

June 4, 2024

First Posted

June 13, 2024

Study Start

October 1, 2023

Primary Completion

December 12, 2023

Study Completion

October 12, 2024

Last Updated

November 27, 2024

Record last verified: 2024-11

Locations