Brief Summary

Background: Electronic health records contain a vast amount of data about diseases and treatments. Researchers could use this data to test their ideas, but they would need to use records from more than just their own group of patients. But access to those records is restricted to ensure patient privacy. U.S. National Library of Medicine (NLM) has created a computer tool called NLM Scrubber. This program recognizes and deletes personal information from health records. The researchers who developed this program now need access to the original records. This will allow them to see how well the program removes personal information from patient records and how they can make it more accurate. Objectives: To find ways to improve clinical text de-identification. Eligibility: No new participants. Researchers will review data that have already been collected. Design: Researchers will collect a random sample of reports. These will be from different doctors in different fields. Researchers will manually remove personal information from the records. Researchers will also automatically remove personal information from original records using NLM-Scrubber. Researchers will compare the results of the computer program versus the manual changes. They will note when the program has not been removing personal information correctly. They will also note when the program has been deleting nonpersonal health information incorrectly. Researchers will use the results to revise the program. They will keep testing it until the de-identification process is complete. ...

Trial Health

On Track

Trial Health Score

Automated assessment based on enrollment pace, timeline, and geographic reach

Enrollment

50,000

participants targeted

Target at P75+ for all trials

Timeline

9mo left

Started May 2016

Longer than P75 for all trials

Geographic Reach

1 country

1 active site

Status

enrolling by invitation

Health score is calculated from publicly available data and should be used for screening purposes only.

Trial Relationships

Click on a node to explore related trials.

Study Timeline

Key milestones and dates

10.7 years study duration

Study Progress93%

May 2016Jan 2027

Study Start

First participant enrolled

May 25, 2016

Completed

15 days until next milestone

First Submitted

Initial submission to the registry

June 9, 2016

Completed

1 day until next milestone

First Posted

Study publicly available on registry

June 10, 2016

Completed

10.6 years until next milestone

Primary Completion

Last participant's last visit for primary outcome

January 31, 2027

Expected

Same day until next milestone

Study Completion

Last participant's last visit for all outcomes

January 31, 2027

Last Updated

April 1, 2026

Status Verified

December 16, 2025

Enrollment Period

10.7 years

First QC Date

June 9, 2016

Last Update Submit

March 31, 2026

Conditions

Personally Identifiable Information

Keywords

AddressNatural History

Outcome Measures

Primary Outcomes (1)

The rate of de-identification of PII
HIPAA Privacy Rule defines 18 types of personally identifying information, that need to be de-identified, which include personal names, addresses, significant dates, numeric identifiers (such as social security number). Our annotators label those words and numbers creating a gold standard and NLM-Scrubber tries to recognize and eliminate all of them. The rate of de-identification of PII refers to success of this outcome measure.
01/01/2017-01/31/2027

Secondary Outcomes (1)

The rate of erroneously redacted clinical information
01/01/2017-01/31/2027

Study Arms (1)

1

Everybody for whom a clinical narrative report is created.

Eligibility Criteria

Age1 Day+

Sexall

Healthy VolunteersNo

Age GroupsChild (0-17), Adult (18-64), Older Adult (65+)

Sampling MethodProbability Sample

Study Population

Everybody for whom a clinical narrative report is created.

* No new participant enrollment. Researchers will review data that have already been collected.

Contact the study team to discuss eligibility requirements. They can help determine if this study is right for you.

Sponsors & Collaborators

National Library of Medicine (NLM)lead
National Cancer Institute (NCI)collaborator
National Institutes of Health Clinical Center (CC)collaborator

Study Sites (1)

National Library of Medicine

Bethesda, Maryland, United States

Location

Related Publications (3)

Kayaalp M. Patient Privacy in the Era of Big Data. Balkan Med J. 2018 Jan 20;35(1):8-17. doi: 10.4274/balkanmedj.2017.0966. Epub 2017 Sep 13.
PMID: 28903886BACKGROUND
Kayaalp M, Browne AC, Dodd ZA, Sagan P, McDonald CJ. De-identification of Address, Date, and Alphanumeric Identifiers in Narrative Clinical Reports. AMIA Annu Symp Proc. 2014 Nov 14;2014:767-76. eCollection 2014.
PMID: 25954383BACKGROUND
Kayaalp M, Browne AC, Callaghan FM, Dodd ZA, Divita G, Ozturk S, McDonald CJ. The pattern of name tokens in narrative clinical text and a comparison of five systems for redacting them. J Am Med Inform Assoc. 2014 May-Jun;21(3):423-31. doi: 10.1136/amiajnl-2013-001689. Epub 2013 Sep 11.
PMID: 24026308BACKGROUND

Study Officials

Mehmet M Kayaalp, Ph.D.
National Library of Medicine (NLM)
PRINCIPAL INVESTIGATOR

Study Design

Study Type: observational
Observational Model: OTHER
Time Perspective: RETROSPECTIVE
Sponsor Type: NIH
Responsible Party: SPONSOR

Study Record Dates

First Submitted

June 9, 2016

First Posted

June 10, 2016

Study Start

May 25, 2016

Primary Completion (Estimated)

January 31, 2027

Study Completion (Estimated)

January 31, 2027

Last Updated

April 1, 2026

Record last verified: 2025-12-16

Data Sharing

IPD Sharing: Will not share

Locations

US(1)

Brief Summary

Trial Health

Trial Health Score

Trial Relationships

Related Scientific Literature

Study Timeline

Study Start

First Submitted

First Posted

Primary Completion

Study Completion

Conditions

Keywords

Outcome Measures

Primary Outcomes (1)

Secondary Outcomes (1)

Study Arms (1)

1

Eligibility Criteria

Sponsors & Collaborators

Study Sites (1)

Related Publications (3)

Study Officials

Study Design

Study Record Dates

Data Sharing

Locations