NCT07626060|Not Yet Recruiting

Diagnostic Accuracy of GPT-4o and Claude for HEART Score Calculation in Chest Pain

LLM-HEART

Diagnostic Accuracy of Large Language Models (GPT-4o and Claude) in HEART Score Calculation and 30-Day MACE Prediction in Emergency Department Chest Pain Patients: A Prospective Observational Validation Study Against Three-Expert Consensus

Marmara University Pendik Training and Research Hospital ClinicalTrials.gov

Compare

1 other identifier

09.2026.26-0150

Study Type

observational

Target

690

Locations

1 country

Sites

Timeline

RegisteredJun 2026

StartedJun 2026

CompletionJun 2027

Brief Summary

This prospective observational diagnostic accuracy study evaluates whether large language models (LLMs) - GPT-4o (OpenAI, gpt-4o-2024-11-20) and Claude (Anthropic, claude-sonnet-4-6) - can accurately calculate HEART scores from unstructured Turkish clinical notes and predict 30-day major adverse cardiac events (MACE) in emergency department patients presenting with non-traumatic chest pain. The study will enroll 600 consecutive adult patients. For each patient, the same anonymized data (free-text anamnesis, ECG report text, troponin value, and age) will be independently processed by both LLMs via separate API calls with deterministic settings (temperature=0, JSON format). A three-expert consensus HEART score - derived through blinded independent scoring by three emergency medicine physicians with majority-vote adjudication - serves as the reference standard for agreement analysis. Actual 30-day MACE (all-cause death, AMI Type 1/2/4b, unplanned revascularization) determined via national health database and telephone follow-up serves as the outcome for diagnostic accuracy analysis. A secondary documentation-quality sub-study will quantify how spontaneously Turkish emergency anamnesis notes capture HEART score parameters.

Trial Health

Monitor

Trial Health Score

Automated assessment based on enrollment pace, timeline, and geographic reach

Enrollment

690

participants targeted

Target at P75+ for all trials

Timeline

12mo left

Started Jun 2026

Shorter than P25 for all trials

Geographic Reach

1 country

1 active site

Status

not yet recruiting

Health score is calculated from publicly available data and should be used for screening purposes only.

Trial Relationships

Click on a node to explore related trials.

Study Timeline

Key milestones and dates

1 year study duration

Study Progress4%

Jun 2026Jun 2027

First Submitted

Initial submission to the registry

May 27, 2026

Completed

5 days until next milestone

Study Start

First participant enrolled

June 1, 2026

Completed

3 days until next milestone

First Posted

Study publicly available on registry

June 4, 2026

Completed

9 months until next milestone

Primary Completion

Last participant's last visit for primary outcome

March 1, 2027

Expected

3 months until next milestone

Study Completion

Last participant's last visit for all outcomes

June 1, 2027

Last Updated

June 4, 2026

Status Verified

June 1, 2026

Enrollment Period

9 months

First QC Date

May 27, 2026

Last Update Submit

June 3, 2026

Conditions

Emergency Medicine Artificial Intelligence (AI)Chest Pain Rule Out Myocardial Infarction Artificial Intelligence (AI) in Diagnosis

Keywords

Large Language ModelGPT-4oClaude SonnetEmergency DepartmentDiagnostic AccuracyMedical InformaticsPhysician vs AIHEART score

Outcome Measures

Primary Outcomes (1)

Area Under the ROC Curve (AUC) of GPT-4o and Claude HEART Score for 30-Day MACE Prediction
AUC calculated separately for GPT-4o and Claude using the Hanley-McNeil method. MACE is defined as a composite of all-cause death, acute myocardial infarction (Type 1/2/4b), and unplanned revascularization within 30 days. HEART score range is 0-10; a higher score indicates a higher risk of MACE. Analysis will be performed on complete cases only (0 indeterminate components).
30 days after index emergency department visit

Secondary Outcomes (6)

Sensitivity and Specificity of GPT-4o and Claude HEART Score at Prespecified Thresholds
30 days after index emergency department visit
Component-Level and Total-Score Agreement (Cohen's Kappa) Between LLMs and Expert Consensus
Baseline (At index emergency department visit)
Comparative AUC Difference Between GPT-4o and Claude (DeLong Test)
30 days after index emergency department visit
Proportion of Indeterminate Results for GPT-4o and Claude
Baseline (At index emergency department visit)
HEART Parameter Documentation Rate in Routine Turkish Anamnesis Notes
Baseline (At index emergency department visit)
+1 more secondary outcomes

Interventions

GPT-4o HEART Score CalculatorOTHER

OpenAI GPT-4o (model: gpt-4o-2024-11-20, temperature=0, max\_tokens=500, response\_format=JSON). Each patient's anonymized anamnesis text, ECG report text, troponin value, and age are submitted via a separate API call with no conversation history. Output: HEART score components (0-2 each), total score (0-10), risk group, and indeterminate status.

Also known as: GPT-4o İndeks Testi

Claude HEART Score CalculatorOTHER

Anthropic Claude (model: claude-sonnet-4-6, temperature=0, max\_tokens=500, response\_format=JSON). Identical system prompt and input format as GPT-4o. Processed independently with no cross-contamination between models. Output: same JSON schema as GPT-4o.

Also known as: Claude İndeks Testi

Three-Expert Consensus HEART ScoreOTHER

Three emergency medicine physicians (\>=3 years experience, HEART-score trained) independently score each anonymized record. Majority vote (2/3) determines component scores; a 4th adjudicator resolves ties. Experts are blinded to LLM scores, each other's scores, and MACE outcomes.

Also known as: Referans Standart

Eligibility Criteria

Age18 Years+

Sexall

Healthy VolunteersNo

Age GroupsAdult (18-64), Older Adult (65+)

Sampling MethodNon-Probability Sample

Study Population

The study population consists of consecutive adult patients presenting with a chief complaint of non-traumatic chest pain to the emergency department of Marmara University Pendik Training and Research Hospital, a tertiary care academic medical center in Istanbul, Turkey. This target population comprises real-world emergency medicine admissions that require acute coronary syndrome risk stratification and evaluation with the HEART score. It excludes individuals presenting with traumatic pain etiologies or acute ST-elevation myocardial infarction (STEMI) requiring immediate, time-critical reperfusion pathways.

You may qualify if:

Age \>=18 years
Chief complaint of non-traumatic chest pain at the emergency department
Written informed consent obtained from the patient or legally authorized representative
Availability for 30-day follow-up (reachable by telephone and/or actively registered in the e-Nabiz national health database)

You may not qualify if:

Traumatic chest pain etiology
ST-elevation myocardial infarction (STEMI) at presentation requiring immediate reperfusion protocol
Refusal or subsequent withdrawal of informed consent
Inability to complete the mandatory 30-day follow-up period
WITHDRAWAL CRITERIA:
Patient or representative requests data withdrawal after initial consent
Administrative identification of retrospective data entry after enrollment

Contact the study team to confirm eligibility.

Sponsors & Collaborators

Marmara University Pendik Training and Research Hospitallead

Study Sites (1)

Marmara University Pendik Training and Research Hospital

Istanbul, İ̇stanbul, 34870, Turkey (Türkiye)

Location

Related Publications (6)

Mahler SA, Riley RF, Hiestand BC, Russell GB, Hoekstra JW, Lefebvre CW, Nicks BA, Cline DM, Askew KL, Elliott SB, Herrington DM, Burke GL, Miller CD. The HEART Pathway randomized trial: identifying emergency department patients with acute chest pain for early discharge. Circ Cardiovasc Qual Outcomes. 2015 Mar;8(2):195-203. doi: 10.1161/CIRCOUTCOMES.114.001384. Epub 2015 Mar 3.
PMID: 25737484RESULT
Singhal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, Scales N, Tanwani A, Cole-Lewis H, Pfohl S, Payne P, Seneviratne M, Gamble P, Kelly C, Babiker A, Scharli N, Chowdhery A, Mansfield P, Demner-Fushman D, Aguera Y Arcas B, Webster D, Corrado GS, Matias Y, Chou K, Gottweis J, Tomasev N, Liu Y, Rajkomar A, Barral J, Semturs C, Karthikesalingam A, Natarajan V. Large language models encode clinical knowledge. Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.
PMID: 37438534RESULT
Collins GS, Moons KGM, Dhiman P, Riley RD, Beam AL, Van Calster B, Ghassemi M, Liu X, Reitsma JB, van Smeden M, Boulesteix AL, Camaradou JC, Celi LA, Denaxas S, Denniston AK, Glocker B, Golub RM, Harvey H, Heinze G, Hoffman MM, Kengne AP, Lam E, Lee N, Loder EW, Maier-Hein L, Mateen BA, McCradden MD, Oakden-Rayner L, Ordish J, Parnell R, Rose S, Singh K, Wynants L, Logullo P. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ. 2024 Apr 16;385:e078378. doi: 10.1136/bmj-2023-078378.
PMID: 38626948RESULT
Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig L, Lijmer JG, Moher D, Rennie D, de Vet HC, Kressel HY, Rifai N, Golub RM, Altman DG, Hooft L, Korevaar DA, Cohen JF; STARD Group. STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. BMJ. 2015 Oct 28;351:h5527. doi: 10.1136/bmj.h5527.
PMID: 26511519RESULT
Backus BE, Six AJ, Kelder JC, Bosschaert MA, Mast EG, Mosterd A, Veldkamp RF, Wardeh AJ, Tio R, Braam R, Monnink SH, van Tooren R, Mast TP, van den Akker F, Cramer MJ, Poldervaart JM, Hoes AW, Doevendans PA. A prospective validation of the HEART score for chest pain patients at the emergency department. Int J Cardiol. 2013 Oct 3;168(3):2153-8. doi: 10.1016/j.ijcard.2013.01.255. Epub 2013 Mar 7.
PMID: 23465250RESULT
Albrecht M. C4-bound imidazolylidenes: from curiosities to high-impact carbene ligands. Chem Commun (Camb). 2008 Aug 21;(31):3601-10. doi: 10.1039/b806924g. Epub 2008 Jul 8.
PMID: 18665276RESULT

MeSH Terms

Conditions

Emergencies

Condition Hierarchy (Ancestors)

Disease AttributesPathologic ProcessesPathological Conditions, Signs and Symptoms

Central Study Contacts

Emir Unal, Assistant Professor

CONTACT

+905327766010 emirunal@gmail.com

Emre Kudu, associate professor

CONTACT

dr.emre.kudu@gmail.com

Study Design

Study Type: observational
Observational Model: COHORT
Time Perspective: PROSPECTIVE
Sponsor Type: OTHER
Responsible Party: PRINCIPAL INVESTIGATOR
PI Title: MD, Assistant Professor

Study Record Dates

First Submitted

May 27, 2026

First Posted

June 4, 2026

Study Start

June 1, 2026

Primary Completion (Estimated)

March 1, 2027

Study Completion (Estimated)

June 1, 2027

Last Updated

June 4, 2026

Record last verified: 2026-06

Data Sharing

IPD Sharing: Will share
Shared Documents: STUDY PROTOCOL, SAP, ANALYTIC CODE
Time Frame: The anonymized dataset, protocol documents, and analytic code will be made available immediately upon formal publication of the study results.
Access Criteria: Data and code will be accessible via an open-access repository on the Open Science Framework (OSF) for researchers and clinicians interested in replication or meta-analysis.

Locations

TU(1)

Brief Summary

Trial Health

Trial Health Score

Trial Relationships

Related Scientific Literature

Study Timeline

First Submitted

Study Start

First Posted

Primary Completion

Study Completion

Conditions

Keywords

Outcome Measures

Primary Outcomes (1)

Secondary Outcomes (6)

Interventions

Eligibility Criteria

You may qualify if:

You may not qualify if:

Sponsors & Collaborators

Study Sites (1)

Related Publications (6)

MeSH Terms

Conditions

Condition Hierarchy (Ancestors)

Central Study Contacts

Study Design

Study Record Dates

Data Sharing

Locations