Diagnostic Accuracy of GPT-4o and Claude for HEART Score Calculation in Chest Pain
LLM-HEART
Diagnostic Accuracy of Large Language Models (GPT-4o and Claude) in HEART Score Calculation and 30-Day MACE Prediction in Emergency Department Chest Pain Patients: A Prospective Observational Validation Study Against Three-Expert Consensus
1 other identifier
observational
690
1 country
1
Brief Summary
This prospective observational diagnostic accuracy study evaluates whether large language models (LLMs) - GPT-4o (OpenAI, gpt-4o-2024-11-20) and Claude (Anthropic, claude-sonnet-4-6) - can accurately calculate HEART scores from unstructured Turkish clinical notes and predict 30-day major adverse cardiac events (MACE) in emergency department patients presenting with non-traumatic chest pain. The study will enroll 600 consecutive adult patients. For each patient, the same anonymized data (free-text anamnesis, ECG report text, troponin value, and age) will be independently processed by both LLMs via separate API calls with deterministic settings (temperature=0, JSON format). A three-expert consensus HEART score - derived through blinded independent scoring by three emergency medicine physicians with majority-vote adjudication - serves as the reference standard for agreement analysis. Actual 30-day MACE (all-cause death, AMI Type 1/2/4b, unplanned revascularization) determined via national health database and telephone follow-up serves as the outcome for diagnostic accuracy analysis. A secondary documentation-quality sub-study will quantify how spontaneously Turkish emergency anamnesis notes capture HEART score parameters.
Trial Health
Trial Health Score
Automated assessment based on enrollment pace, timeline, and geographic reach
participants targeted
Target at P75+ for all trials
Started Jun 2026
Shorter than P25 for all trials
1 active site
Health score is calculated from publicly available data and should be used for screening purposes only.
Trial Relationships
Click on a node to explore related trials.
Study Timeline
Key milestones and dates
First Submitted
Initial submission to the registry
May 27, 2026
CompletedStudy Start
First participant enrolled
June 1, 2026
CompletedFirst Posted
Study publicly available on registry
June 4, 2026
CompletedPrimary Completion
Last participant's last visit for primary outcome
March 1, 2027
ExpectedStudy Completion
Last participant's last visit for all outcomes
June 1, 2027
June 4, 2026
June 1, 2026
9 months
May 27, 2026
June 3, 2026
Conditions
Keywords
Outcome Measures
Primary Outcomes (1)
Area Under the ROC Curve (AUC) of GPT-4o and Claude HEART Score for 30-Day MACE Prediction
AUC calculated separately for GPT-4o and Claude using the Hanley-McNeil method. MACE is defined as a composite of all-cause death, acute myocardial infarction (Type 1/2/4b), and unplanned revascularization within 30 days. HEART score range is 0-10; a higher score indicates a higher risk of MACE. Analysis will be performed on complete cases only (0 indeterminate components).
30 days after index emergency department visit
Secondary Outcomes (6)
Sensitivity and Specificity of GPT-4o and Claude HEART Score at Prespecified Thresholds
30 days after index emergency department visit
Component-Level and Total-Score Agreement (Cohen's Kappa) Between LLMs and Expert Consensus
Baseline (At index emergency department visit)
Comparative AUC Difference Between GPT-4o and Claude (DeLong Test)
30 days after index emergency department visit
Proportion of Indeterminate Results for GPT-4o and Claude
Baseline (At index emergency department visit)
HEART Parameter Documentation Rate in Routine Turkish Anamnesis Notes
Baseline (At index emergency department visit)
- +1 more secondary outcomes
Interventions
OpenAI GPT-4o (model: gpt-4o-2024-11-20, temperature=0, max\_tokens=500, response\_format=JSON). Each patient's anonymized anamnesis text, ECG report text, troponin value, and age are submitted via a separate API call with no conversation history. Output: HEART score components (0-2 each), total score (0-10), risk group, and indeterminate status.
Anthropic Claude (model: claude-sonnet-4-6, temperature=0, max\_tokens=500, response\_format=JSON). Identical system prompt and input format as GPT-4o. Processed independently with no cross-contamination between models. Output: same JSON schema as GPT-4o.
Three emergency medicine physicians (\>=3 years experience, HEART-score trained) independently score each anonymized record. Majority vote (2/3) determines component scores; a 4th adjudicator resolves ties. Experts are blinded to LLM scores, each other's scores, and MACE outcomes.
Eligibility Criteria
The study population consists of consecutive adult patients presenting with a chief complaint of non-traumatic chest pain to the emergency department of Marmara University Pendik Training and Research Hospital, a tertiary care academic medical center in Istanbul, Turkey. This target population comprises real-world emergency medicine admissions that require acute coronary syndrome risk stratification and evaluation with the HEART score. It excludes individuals presenting with traumatic pain etiologies or acute ST-elevation myocardial infarction (STEMI) requiring immediate, time-critical reperfusion pathways.
You may qualify if:
- Age \>=18 years
- Chief complaint of non-traumatic chest pain at the emergency department
- Written informed consent obtained from the patient or legally authorized representative
- Availability for 30-day follow-up (reachable by telephone and/or actively registered in the e-Nabiz national health database)
You may not qualify if:
- Traumatic chest pain etiology
- ST-elevation myocardial infarction (STEMI) at presentation requiring immediate reperfusion protocol
- Refusal or subsequent withdrawal of informed consent
- Inability to complete the mandatory 30-day follow-up period
- WITHDRAWAL CRITERIA:
- Patient or representative requests data withdrawal after initial consent
- Administrative identification of retrospective data entry after enrollment
Contact the study team to confirm eligibility.
Sponsors & Collaborators
Study Sites (1)
Marmara University Pendik Training and Research Hospital
Istanbul, İ̇stanbul, 34870, Turkey (Türkiye)
Related Publications (6)
Mahler SA, Riley RF, Hiestand BC, Russell GB, Hoekstra JW, Lefebvre CW, Nicks BA, Cline DM, Askew KL, Elliott SB, Herrington DM, Burke GL, Miller CD. The HEART Pathway randomized trial: identifying emergency department patients with acute chest pain for early discharge. Circ Cardiovasc Qual Outcomes. 2015 Mar;8(2):195-203. doi: 10.1161/CIRCOUTCOMES.114.001384. Epub 2015 Mar 3.
PMID: 25737484RESULTSinghal K, Azizi S, Tu T, Mahdavi SS, Wei J, Chung HW, Scales N, Tanwani A, Cole-Lewis H, Pfohl S, Payne P, Seneviratne M, Gamble P, Kelly C, Babiker A, Scharli N, Chowdhery A, Mansfield P, Demner-Fushman D, Aguera Y Arcas B, Webster D, Corrado GS, Matias Y, Chou K, Gottweis J, Tomasev N, Liu Y, Rajkomar A, Barral J, Semturs C, Karthikesalingam A, Natarajan V. Large language models encode clinical knowledge. Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.
PMID: 37438534RESULTCollins GS, Moons KGM, Dhiman P, Riley RD, Beam AL, Van Calster B, Ghassemi M, Liu X, Reitsma JB, van Smeden M, Boulesteix AL, Camaradou JC, Celi LA, Denaxas S, Denniston AK, Glocker B, Golub RM, Harvey H, Heinze G, Hoffman MM, Kengne AP, Lam E, Lee N, Loder EW, Maier-Hein L, Mateen BA, McCradden MD, Oakden-Rayner L, Ordish J, Parnell R, Rose S, Singh K, Wynants L, Logullo P. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ. 2024 Apr 16;385:e078378. doi: 10.1136/bmj-2023-078378.
PMID: 38626948RESULTBossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig L, Lijmer JG, Moher D, Rennie D, de Vet HC, Kressel HY, Rifai N, Golub RM, Altman DG, Hooft L, Korevaar DA, Cohen JF; STARD Group. STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. BMJ. 2015 Oct 28;351:h5527. doi: 10.1136/bmj.h5527.
PMID: 26511519RESULTBackus BE, Six AJ, Kelder JC, Bosschaert MA, Mast EG, Mosterd A, Veldkamp RF, Wardeh AJ, Tio R, Braam R, Monnink SH, van Tooren R, Mast TP, van den Akker F, Cramer MJ, Poldervaart JM, Hoes AW, Doevendans PA. A prospective validation of the HEART score for chest pain patients at the emergency department. Int J Cardiol. 2013 Oct 3;168(3):2153-8. doi: 10.1016/j.ijcard.2013.01.255. Epub 2013 Mar 7.
PMID: 23465250RESULTAlbrecht M. C4-bound imidazolylidenes: from curiosities to high-impact carbene ligands. Chem Commun (Camb). 2008 Aug 21;(31):3601-10. doi: 10.1039/b806924g. Epub 2008 Jul 8.
PMID: 18665276RESULT
MeSH Terms
Conditions
Condition Hierarchy (Ancestors)
Central Study Contacts
Study Design
- Study Type
- observational
- Observational Model
- COHORT
- Time Perspective
- PROSPECTIVE
- Sponsor Type
- OTHER
- Responsible Party
- PRINCIPAL INVESTIGATOR
- PI Title
- MD, Assistant Professor
Study Record Dates
First Submitted
May 27, 2026
First Posted
June 4, 2026
Study Start
June 1, 2026
Primary Completion (Estimated)
March 1, 2027
Study Completion (Estimated)
June 1, 2027
Last Updated
June 4, 2026
Record last verified: 2026-06
Data Sharing
- IPD Sharing
- Will share
- Shared Documents
- STUDY PROTOCOL, SAP, ANALYTIC CODE
- Time Frame
- The anonymized dataset, protocol documents, and analytic code will be made available immediately upon formal publication of the study results.
- Access Criteria
- Data and code will be accessible via an open-access repository on the Open Science Framework (OSF) for researchers and clinicians interested in replication or meta-analysis.
Anonymized individual participant data (including de-identified baseline demographics, clinical presentation characteristics, index test outputs from GPT-4o and Claude, and the reference standard expert consensus HEART scores) will be made publicly available to support academic transparency and replication. Additionally, the complete deterministic system prompt texts (verified with SHA-256 cryptographic hashes) and the complete statistical analysis code will be included as supplementary material.