Evaluation of AI Large Models for Diagnosis and Treatment in Real-World Cases: Multicenter Retrospective Study
1 other identifier
observational
800
1 country
1
Brief Summary
This multicenter retrospective study aims to evaluate the diagnostic and therapeutic performance of three large language models-ChatGPT, Gemini and Deepseek-using 800 archived inpatient medical records from urology departments across four tertiary hospitals. The study will focus on the accuracy and applicability of these models in disease recognition, preliminary diagnosis and treatment recommendation generation, in order to explore their potential value and limitations in supporting clinical decision-making in real-world settings.
Trial Health
Trial Health Score
Automated assessment based on enrollment pace, timeline, and geographic reach
participants targeted
Target at P75+ for all trials
Started Jan 2026
Shorter than P25 for all trials
1 active site
Health score is calculated from publicly available data and should be used for screening purposes only.
Trial Relationships
Click on a node to explore related trials.
Study Timeline
Key milestones and dates
First Submitted
Initial submission to the registry
December 9, 2025
CompletedStudy Start
First participant enrolled
January 1, 2026
CompletedFirst Posted
Study publicly available on registry
January 30, 2026
CompletedPrimary Completion
Last participant's last visit for primary outcome
April 1, 2026
CompletedStudy Completion
Last participant's last visit for all outcomes
June 1, 2026
CompletedJanuary 30, 2026
January 1, 2026
3 months
December 9, 2025
January 26, 2026
Conditions
Keywords
Outcome Measures
Primary Outcomes (6)
Diagnostic Accuracy: Assessed by Top-1 accuracy
Top-1: Proportion of cases where the model's first diagnosis matches the true primary diagnosis.
Through study completion, an average of 3 months
Diagnostic Accuracy: Assessed by Top-3 accuracy
Top-3: Proportion of cases where the true diagnosis appears in the model's top 3.
Through study completion, an average of 3 months
Diagnostic Completeness
Proportion of the model's diagnoses that overlap with all diagnoses (primary and secondary) in the case.
Through study completion, an average of 3 months
Differential Diagnosis Quality
Evaluated by experts using a Likert 5-point scale, considering factors like common disease coverage, logical clarity, and specificity
Through study completion, an average of 3 months
Treatment Plan Quality
Assesses whether the model's treatment suggestions align with clinical guidelines, scored by experts on completeness, appropriateness, and safety.
Through study completion, an average of 3 months
Analysis Time
5.Time taken by the AI model to provide diagnoses and treatment suggestions (in seconds), reflecting real-time capability.
Through study completion, an average of 3 months
Interventions
De-identified inpatient medical records were retrospectively collected from the urology departments of four tertiary hospitals (200 cases per site, 800 in total). Each case included standardized clinical information such as demographics, chief complaint, history of present illness, past medical history, physical examination, laboratory and imaging findings, discharge diagnosis and treatment plan. To simulate the role of an AI system in a "first-visit physician" scenario, all diagnostic conclusions, differential diagnoses and treatment plans were removed before being input into the models. Three large language models (ChatGPT, Gemini and DeepSeek) were prompted with a standardized instruction: "Based on the above clinical information, provide your preliminary diagnosis, differential diagnoses and treatment recommendations." Each model generated outputs including (i) primary and secondary diagnoses, (ii) differential diagnosis lists with reasoning and (iii) preliminary treatment suggesti
Eligibility Criteria
The study population was drawn from the following institutions: The First Affiliated Hospital of Fujian Medical University, The Second Affiliated Hospital of Fujian Medical University,Shishi City Hospital and Shaowu City Hospital
You may qualify if:
- The case data is sourced from the four hospitals involved in the study, with complete and authentic diagnosis and treatment records.
- Patients must be 18 years or older, with no gender restrictions.
- Complete medical records, including the following core information: patient' s basic information, present illness history, past medical history, physical examination, and auxiliary examinations (including laboratory and imaging tests).
- A clear discharge diagnosis and treatment plan (including therapeutic measures and follow-up arrangements).
- Medical records have been archived, with objective and accurate information that has not been altered.
- The patient or their legal representative has provided informed consent, agreeing to the use of their anonymized medical data for research analysis.
You may not qualify if:
- Medical records with significant missing information, such as key clinical details (present illness history, diagnostic or treatment records, etc.).
- Cases where the diagnosis or treatment plan is unclear, or where treatment has not been fully completed for an initial diagnosis.
- Cases where the primary diagnosis is not urological.
- Cases with major errors or inconsistencies in the records that could affect further assessment.
- Medical records in special formats or images that are not readable (e.g., handwritten notes, non-standard documentation).
- Patients who have not signed the informed consent form or who refuse to allow their medical data to be used for research.
Contact the study team to confirm eligibility.
Sponsors & Collaborators
Study Sites (1)
The First Affiliated Hospital of Fujian Medical University
Fuzhou, China
MeSH Terms
Conditions
Condition Hierarchy (Ancestors)
Central Study Contacts
Study Design
- Study Type
- observational
- Observational Model
- COHORT
- Time Perspective
- RETROSPECTIVE
- Sponsor Type
- OTHER
- Responsible Party
- SPONSOR
Study Record Dates
First Submitted
December 9, 2025
First Posted
January 30, 2026
Study Start
January 1, 2026
Primary Completion
April 1, 2026
Study Completion
June 1, 2026
Last Updated
January 30, 2026
Record last verified: 2026-01
Data Sharing
- IPD Sharing
- Will not share