NLM Scrubber: NLM s Software Application to De-identify Clinical Text Documents
NLM Scrubber: NLM's Software Application to De-identify Clinical Text Documents
2 other identifiers
observational
50,000
1 country
1
Brief Summary
Background: Electronic health records contain a vast amount of data about diseases and treatments. Researchers could use this data to test their ideas, but they would need to use records from more than just their own group of patients. But access to those records is restricted to ensure patient privacy. U.S. National Library of Medicine (NLM) has created a computer tool called NLM Scrubber. This program recognizes and deletes personal information from health records. The researchers who developed this program now need access to the original records. This will allow them to see how well the program removes personal information from patient records and how they can make it more accurate. Objectives: To find ways to improve clinical text de-identification. Eligibility: No new participants. Researchers will review data that have already been collected. Design: Researchers will collect a random sample of reports. These will be from different doctors in different fields. Researchers will manually remove personal information from the records. Researchers will also automatically remove personal information from original records using NLM-Scrubber. Researchers will compare the results of the computer program versus the manual changes. They will note when the program has not been removing personal information correctly. They will also note when the program has been deleting nonpersonal health information incorrectly. Researchers will use the results to revise the program. They will keep testing it until the de-identification process is complete. ...
Trial Health
Trial Health Score
Automated assessment based on enrollment pace, timeline, and geographic reach
participants targeted
Target at P75+ for all trials
Started May 2016
Longer than P75 for all trials
1 active site
Health score is calculated from publicly available data and should be used for screening purposes only.
Trial Relationships
Click on a node to explore related trials.
Study Timeline
Key milestones and dates
Study Start
First participant enrolled
May 25, 2016
CompletedFirst Submitted
Initial submission to the registry
June 9, 2016
CompletedFirst Posted
Study publicly available on registry
June 10, 2016
CompletedPrimary Completion
Last participant's last visit for primary outcome
January 31, 2027
ExpectedStudy Completion
Last participant's last visit for all outcomes
January 31, 2027
April 1, 2026
December 16, 2025
10.7 years
June 9, 2016
March 31, 2026
Conditions
Keywords
Outcome Measures
Primary Outcomes (1)
The rate of de-identification of PII
HIPAA Privacy Rule defines 18 types of personally identifying information, that need to be de-identified, which include personal names, addresses, significant dates, numeric identifiers (such as social security number). Our annotators label those words and numbers creating a gold standard and NLM-Scrubber tries to recognize and eliminate all of them. The rate of de-identification of PII refers to success of this outcome measure.
01/01/2017-01/31/2027
Secondary Outcomes (1)
The rate of erroneously redacted clinical information
01/01/2017-01/31/2027
Study Arms (1)
1
Everybody for whom a clinical narrative report is created.
Eligibility Criteria
Everybody for whom a clinical narrative report is created.
Contact the study team to discuss eligibility requirements. They can help determine if this study is right for you.
Sponsors & Collaborators
Study Sites (1)
National Library of Medicine
Bethesda, Maryland, United States
Related Publications (3)
Kayaalp M. Patient Privacy in the Era of Big Data. Balkan Med J. 2018 Jan 20;35(1):8-17. doi: 10.4274/balkanmedj.2017.0966. Epub 2017 Sep 13.
PMID: 28903886BACKGROUNDKayaalp M, Browne AC, Dodd ZA, Sagan P, McDonald CJ. De-identification of Address, Date, and Alphanumeric Identifiers in Narrative Clinical Reports. AMIA Annu Symp Proc. 2014 Nov 14;2014:767-76. eCollection 2014.
PMID: 25954383BACKGROUNDKayaalp M, Browne AC, Callaghan FM, Dodd ZA, Divita G, Ozturk S, McDonald CJ. The pattern of name tokens in narrative clinical text and a comparison of five systems for redacting them. J Am Med Inform Assoc. 2014 May-Jun;21(3):423-31. doi: 10.1136/amiajnl-2013-001689. Epub 2013 Sep 11.
PMID: 24026308BACKGROUND
Study Officials
- PRINCIPAL INVESTIGATOR
Mehmet M Kayaalp, Ph.D.
National Library of Medicine (NLM)
Study Design
- Study Type
- observational
- Observational Model
- OTHER
- Time Perspective
- RETROSPECTIVE
- Sponsor Type
- NIH
- Responsible Party
- SPONSOR
Study Record Dates
First Submitted
June 9, 2016
First Posted
June 10, 2016
Study Start
May 25, 2016
Primary Completion (Estimated)
January 31, 2027
Study Completion (Estimated)
January 31, 2027
Last Updated
April 1, 2026
Record last verified: 2025-12-16
Data Sharing
- IPD Sharing
- Will not share
We receive patient data, protected health information (PHI), from our collaborating data sources with the promise that we would protect PHI to the full extent and not share it with third parties.