| Literature DB >> 27410040 |
Katherine Tucker1, Janice Branson2, Maria Dilleen3, Sally Hollis4,5, Paul Loughlin4, Mark J Nixon6, Zoë Williams7.
Abstract
BACKGROUND: Greater transparency and, in particular, sharing of patient-level data for further scientific research is an increasingly important topic for the pharmaceutical industry and other organisations who sponsor and conduct clinical trials as well as generally in the interests of patients participating in studies. A concern remains, however, over how to appropriately prepare and share clinical trial data with third party researchers, whilst maintaining patient confidentiality. Clinical trial datasets contain very detailed information on each participant. Risk to patient privacy can be mitigated by data reduction techniques. However, retention of data utility is important in order to allow meaningful scientific research. In addition, for clinical trial data, an excessive application of such techniques may pose a public health risk if misleading results are produced. After considering existing guidance, this article makes recommendations with the aim of promoting an approach that balances data utility and privacy risk and is applicable across clinical trial data holders. DISCUSSION: Our key recommendations are as follows: 1. Data anonymisation/de-identification: Data holders are responsible for generating de-identified datasets which are intended to offer increased protection for patient privacy through masking or generalisation of direct and some indirect identifiers. 2. Controlled access to data, including use of a data sharing agreement: A legally binding data sharing agreement should be in place, including agreements not to download or further share data and not to attempt to seek to identify patients. Appropriate levels of security should be used for transferring data or providing access; one solution is use of a secure 'locked box' system which provides additional safeguards. This article provides recommendations on best practices to de-identify/anonymise clinical trial data for sharing with third-party researchers, as well as controlled access to data and data sharing agreements. The recommendations are applicable to all clinical trial data holders. Further work will be needed to identify and evaluate competing possibilities as regulations, attitudes to risk and technologies evolve.Entities:
Keywords: Anonymisation; Clinical trial; Data sharing; De-identification; Pharmaceutical research; Transparency
Mesh:
Year: 2016 PMID: 27410040 PMCID: PMC4943495 DOI: 10.1186/s12874-016-0169-4
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
HIPAA eighteen direct identifiers
| A. Name |
| B. Geographic subdivisions smaller than a state. The initial three digits of a ZIP code can be retained if certain criteria are met. |
| C. With the exception of year, all elements of dates directly related to an individual (such as birth date, admission date, discharge date, date of death). For ages over 89 and elements of dates (including year) indicating such an age, ages and elements may be aggregated into a single category of age 90 or older. |
| D. Telephone numbers, |
| E. Fax numbers |
| F. Email addresses |
| G. Social security numbers |
| H. Medical record numbers |
| I. Health plan beneficiary numbers |
| J. Account numbers |
| K. Certificate/licence numbers |
| L. Vehicle identifiers and serial numbers, including license plate numbers |
| M. Device identifiers and serial numbers |
| N. Web Universal Resource Locators (URLs) |
| O. Internet Protocol (IP) addresses |
| P. Biometric identifiers, including finger and voice prints |
| Q. Full-face photographs and any comparable images |
| R. Any other unique identifying number, characteristic, or code, except as permitted by paragraph (c) of HIPAA Safe Harbor section; and |
| S. The covered entity does not have actual knowledge that the information could be used alone or in combination with other information to identify an individual who is a subject of the information |
Specific recommendations for indirect identifiers
| Site Code Number/Investigator Identifier |
| • In clinical trial data, place of treatment is usually collected as a site code number/investigator identifier. These site codes should be re-coded to a new random site code (similar to patient code number/identifier). Sites which include few patients may be aggregated to a single site code number/identifier. Countries which include few patients could also be pooled. |
| Demographics and anthropometry measures |
| • Date of birth is a direct identifier and should be should be replaced with age. As a general rule, ages above 89 should be set to a category ‘ > 89’; however depending on the disease and the population under consideration, further grouping of age categories should be considered. Consideration should also be given to recoding/grouping other ages at the lower or upper limits. Another consideration, assuming this does not impact data utility adversely, is to group ages (for example into five year age categories). All other patient-related dates including date of death should be removed and replaced either with a derived study day relative to a baseline or reference date or offset by some random interval. |
| Verbatim text |
| • Verbatim (free) text may include information that identifies a patient e.g. names, dates or other personal information. Examples of variables containing verbatim text are adverse events, medications, medical history and general comments. Preferably, variables containing free text are either removed from the dataset or set to blank. Alternatively, the data could be reviewed to assess the risk of patient identification, especially if the data add scientific value to the dataset, and any identifiers at the observational level removed. |
| Small populations, low frequency and rare events, rare diseases, sensitive data |
| • Hrynaszkiewicz recommends that for indirect identifiers with small denominators (population size of <100) or very small numerators (event counts of <3), may present a risk if present in combination with other indirect identifiers. However, to exclude such data in all cases may limit the ability of a researcher to perform meaningful analyses, particularly in the case of small numerators for adverse event reporting which may result in the removal of rare events of interest. |
| Other |
| • Potential indirect identifiers which are important for data utility may be retained and could be recoded/grouped, otherwise they should be removed. |