NCT02795806

Brief Summary

Background: Electronic health records contain a vast amount of data about diseases and treatments. Researchers could use this data to test their ideas, but they would need to use records from more than just their own group of patients. But access to those records is restricted to ensure patient privacy. U.S. National Library of Medicine (NLM) has created a computer tool called NLM Scrubber. This program recognizes and deletes personal information from health records. The researchers who developed this program now need access to the original records. This will allow them to see how well the program removes personal information from patient records and how they can make it more accurate. Objectives: To find ways to improve clinical text de-identification. Eligibility: No new participants. Researchers will review data that have already been collected. Design: Researchers will collect a random sample of reports. These will be from different doctors in different fields. Researchers will manually remove personal information from the records. Researchers will also automatically remove personal information from original records using NLM-Scrubber. Researchers will compare the results of the computer program versus the manual changes. They will note when the program has not been removing personal information correctly. They will also note when the program has been deleting nonpersonal health information incorrectly. Researchers will use the results to revise the program. They will keep testing it until the de-identification process is complete. ...

Trial Health

75
On Track

Trial Health Score

Automated assessment based on enrollment pace, timeline, and geographic reach

Enrollment
50,000

participants targeted

Target at P75+ for all trials

Timeline
9mo left

Started May 2016

Longer than P75 for all trials

Geographic Reach
1 country

1 active site

Status
enrolling by invitation

Health score is calculated from publicly available data and should be used for screening purposes only.

Trial Relationships

Click on a node to explore related trials.

Study Timeline

Key milestones and dates

Study Progress93%
May 2016Jan 2027

Study Start

First participant enrolled

May 25, 2016

Completed
15 days until next milestone

First Submitted

Initial submission to the registry

June 9, 2016

Completed
1 day until next milestone

First Posted

Study publicly available on registry

June 10, 2016

Completed
10.6 years until next milestone

Primary Completion

Last participant's last visit for primary outcome

January 31, 2027

Expected
Same day until next milestone

Study Completion

Last participant's last visit for all outcomes

January 31, 2027

Last Updated

April 1, 2026

Status Verified

December 16, 2025

Enrollment Period

10.7 years

First QC Date

June 9, 2016

Last Update Submit

March 31, 2026

Conditions

Keywords

AddressNatural History

Outcome Measures

Primary Outcomes (1)

  • The rate of de-identification of PII

    HIPAA Privacy Rule defines 18 types of personally identifying information, that need to be de-identified, which include personal names, addresses, significant dates, numeric identifiers (such as social security number). Our annotators label those words and numbers creating a gold standard and NLM-Scrubber tries to recognize and eliminate all of them. The rate of de-identification of PII refers to success of this outcome measure.

    01/01/2017-01/31/2027

Secondary Outcomes (1)

  • The rate of erroneously redacted clinical information

    01/01/2017-01/31/2027

Study Arms (1)

1

Everybody for whom a clinical narrative report is created.

Eligibility Criteria

Age1 Day+
Sexall
Healthy VolunteersNo
Age GroupsChild (0-17), Adult (18-64), Older Adult (65+)
Sampling MethodProbability Sample
Study Population

Everybody for whom a clinical narrative report is created.

* No new participant enrollment. Researchers will review data that have already been collected.

Contact the study team to discuss eligibility requirements. They can help determine if this study is right for you.

Sponsors & Collaborators

Study Sites (1)

National Library of Medicine

Bethesda, Maryland, United States

Location

Related Publications (3)

  • Kayaalp M. Patient Privacy in the Era of Big Data. Balkan Med J. 2018 Jan 20;35(1):8-17. doi: 10.4274/balkanmedj.2017.0966. Epub 2017 Sep 13.

    PMID: 28903886BACKGROUND
  • Kayaalp M, Browne AC, Dodd ZA, Sagan P, McDonald CJ. De-identification of Address, Date, and Alphanumeric Identifiers in Narrative Clinical Reports. AMIA Annu Symp Proc. 2014 Nov 14;2014:767-76. eCollection 2014.

    PMID: 25954383BACKGROUND
  • Kayaalp M, Browne AC, Callaghan FM, Dodd ZA, Divita G, Ozturk S, McDonald CJ. The pattern of name tokens in narrative clinical text and a comparison of five systems for redacting them. J Am Med Inform Assoc. 2014 May-Jun;21(3):423-31. doi: 10.1136/amiajnl-2013-001689. Epub 2013 Sep 11.

    PMID: 24026308BACKGROUND

Study Officials

  • Mehmet M Kayaalp, Ph.D.

    National Library of Medicine (NLM)

    PRINCIPAL INVESTIGATOR

Study Design

Study Type
observational
Observational Model
OTHER
Time Perspective
RETROSPECTIVE
Sponsor Type
NIH
Responsible Party
SPONSOR

Study Record Dates

First Submitted

June 9, 2016

First Posted

June 10, 2016

Study Start

May 25, 2016

Primary Completion (Estimated)

January 31, 2027

Study Completion (Estimated)

January 31, 2027

Last Updated

April 1, 2026

Record last verified: 2025-12-16

Data Sharing

IPD Sharing
Will not share

We receive patient data, protected health information (PHI), from our collaborating data sources with the promise that we would protect PHI to the full extent and not share it with third parties.

Locations