OBJECTIVE: To propose a centralized method for generating global unique identifiers to link collections of research data and specimens. DESIGN: The work is a collaboration between the Simons Foundation Autism Research Initiative and the National Database for Autism Research. The system is implemented as a web service: an investigator inputs identifying information about a participant into a client application and sends encrypted information to a server application, which returns a generated global unique identifier. The authors evaluated the system using a volume test of one million simulated individuals and a field test on 2000 families (over 8000 individual participants) in an autism study. MEASUREMENTS: Inverse probability of hash codes; rate of false identity of two individuals; rate of false split of single individual; percentage of subjects for which identifying information could be collected; percentage of hash codes generated successfully. RESULTS: Large-volume simulation generated no false splits or false identity. Field testing in the Simons Foundation Autism Research Initiative Simplex Collection produced identifiers for 96% of children in the study and 77% of parents. On average, four out of five hash codes per subject were generated perfectly (only one perfect hash is required for subsequent matching). DISCUSSION: The system must achieve balance among the competing goals of distinguishing individuals, collecting accurate information for matching, and protecting confidentiality. Considerable effort is required to obtain approval from institutional review boards, obtain consent from participants, and to achieve compliance from sites during a multicenter study. CONCLUSION: Generic unique identifiers have the potential to link collections of research data, augment the amount and types of data available for individuals, support detection of overlap between collections, and facilitate replication of research findings.
OBJECTIVE: To propose a centralized method for generating global unique identifiers to link collections of research data and specimens. DESIGN: The work is a collaboration between the Simons Foundation Autism Research Initiative and the National Database for Autism Research. The system is implemented as a web service: an investigator inputs identifying information about a participant into a client application and sends encrypted information to a server application, which returns a generated global unique identifier. The authors evaluated the system using a volume test of one million simulated individuals and a field test on 2000 families (over 8000 individual participants) in an autism study. MEASUREMENTS: Inverse probability of hash codes; rate of false identity of two individuals; rate of false split of single individual; percentage of subjects for which identifying information could be collected; percentage of hash codes generated successfully. RESULTS: Large-volume simulation generated no false splits or false identity. Field testing in the Simons Foundation Autism Research Initiative Simplex Collection produced identifiers for 96% of children in the study and 77% of parents. On average, four out of five hash codes per subject were generated perfectly (only one perfect hash is required for subsequent matching). DISCUSSION: The system must achieve balance among the competing goals of distinguishing individuals, collecting accurate information for matching, and protecting confidentiality. Considerable effort is required to obtain approval from institutional review boards, obtain consent from participants, and to achieve compliance from sites during a multicenter study. CONCLUSION: Generic unique identifiers have the potential to link collections of research data, augment the amount and types of data available for individuals, support detection of overlap between collections, and facilitate replication of research findings.
Authors: D H Geschwind; J Sowinski; C Lord; P Iversen; J Shestack; P Jones; L Ducat; S J Spence Journal: Am J Hum Genet Date: 2001-08 Impact factor: 11.025
Authors: Stephen J Chanock; Teri Manolio; Michael Boehnke; Eric Boerwinkle; David J Hunter; Gilles Thomas; Joel N Hirschhorn; Goncalo Abecasis; David Altshuler; Joan E Bailey-Wilson; Lisa D Brooks; Lon R Cardon; Mark Daly; Peter Donnelly; Joseph F Fraumeni; Nelson B Freimer; Daniela S Gerhard; Chris Gunter; Alan E Guttmacher; Mark S Guyer; Emily L Harris; Josephine Hoh; Robert Hoover; C Augustine Kong; Kathleen R Merikangas; Cynthia C Morton; Lyle J Palmer; Elizabeth G Phimister; John P Rice; Jerry Roberts; Charles Rotimi; Margaret A Tucker; Kyle J Vogan; Sholom Wacholder; Ellen M Wijsman; Deborah M Winn; Francis S Collins Journal: Nature Date: 2007-06-07 Impact factor: 49.962
Authors: David B Keator; J S Grethe; D Marcus; B Ozyurt; S Gadde; Sean Murphy; S Pieper; D Greve; R Notestine; H J Bockholt; P Papadopoulos Journal: IEEE Trans Inf Technol Biomed Date: 2008-03
Authors: Andrew J McMurry; Clint A Gilbert; Ben Y Reis; Henry C Chueh; Isaac S Kohane; Kenneth D Mandl Journal: J Am Med Inform Assoc Date: 2007-04-25 Impact factor: 4.497
Authors: Sean Ekins; Alex M Clark; S Joshua Swamidass; Nadia Litterman; Antony J Williams Journal: J Comput Aided Mol Des Date: 2014-06-19 Impact factor: 3.686
Authors: Dominique Duncan; Paul Vespa; Asla Pitkänen; Adebayo Braimah; Niina Lapinlampi; Arthur W Toga Journal: Neurobiol Dis Date: 2018-06-01 Impact factor: 5.996