BACKGROUND: Systematic study of clinical phenotypes is important for a better understanding of the genetic basis of human diseases and more effective gene-based disease management. A key aspect in facilitating such studies requires standardized representation of the phenotype data using common data elements (CDEs) and controlled biomedical vocabularies. In this study, the authors analyzed how a limited subset of phenotypic data is amenable to common definition and standardized collection, as well as how their adoption in large-scale epidemiological and genome-wide studies can significantly facilitate cross-study analysis. METHODS: The authors mapped phenotype data dictionaries from five different eMERGE (Electronic Medical Records and Genomics) Network sites studying multiple diseases such as peripheral arterial disease and type 2 diabetes. For mapping, standardized terminological and metadata repository resources, such as the caDSR (Cancer Data Standards Registry and Repository) and SNOMED CT (Systematized Nomenclature of Medicine), were used. The mapping process comprised both lexical (via searching for relevant pre-coordinated concepts and data elements) and semantic (via post-coordination) techniques. Where feasible, new data elements were curated to enhance the coverage during mapping. A web-based application was also developed to uniformly represent and query the mapped data elements from different eMERGE studies. RESULTS: Approximately 60% of the target data elements (95 out of 157) could be mapped using simple lexical analysis techniques on pre-coordinated terms and concepts before any additional curation of terminology and metadata resources was initiated by eMERGE investigators. After curation of 54 new caDSR CDEs and nine new NCI thesaurus concepts and using post-coordination, the authors were able to map the remaining 40% of data elements to caDSR and SNOMED CT. A web-based tool was also implemented to assist in semi-automatic mapping of data elements. CONCLUSION: This study emphasizes the requirement for standardized representation of clinical research data using existing metadata and terminology resources and provides simple techniques and software for data element mapping using experiences from the eMERGE Network.
BACKGROUND: Systematic study of clinical phenotypes is important for a better understanding of the genetic basis of human diseases and more effective gene-based disease management. A key aspect in facilitating such studies requires standardized representation of the phenotype data using common data elements (CDEs) and controlled biomedical vocabularies. In this study, the authors analyzed how a limited subset of phenotypic data is amenable to common definition and standardized collection, as well as how their adoption in large-scale epidemiological and genome-wide studies can significantly facilitate cross-study analysis. METHODS: The authors mapped phenotype data dictionaries from five different eMERGE (Electronic Medical Records and Genomics) Network sites studying multiple diseases such as peripheral arterial disease and type 2 diabetes. For mapping, standardized terminological and metadata repository resources, such as the caDSR (Cancer Data Standards Registry and Repository) and SNOMED CT (Systematized Nomenclature of Medicine), were used. The mapping process comprised both lexical (via searching for relevant pre-coordinated concepts and data elements) and semantic (via post-coordination) techniques. Where feasible, new data elements were curated to enhance the coverage during mapping. A web-based application was also developed to uniformly represent and query the mapped data elements from different eMERGE studies. RESULTS: Approximately 60% of the target data elements (95 out of 157) could be mapped using simple lexical analysis techniques on pre-coordinated terms and concepts before any additional curation of terminology and metadata resources was initiated by eMERGE investigators. After curation of 54 new caDSR CDEs and nine new NCI thesaurus concepts and using post-coordination, the authors were able to map the remaining 40% of data elements to caDSR and SNOMED CT. A web-based tool was also implemented to assist in semi-automatic mapping of data elements. CONCLUSION: This study emphasizes the requirement for standardized representation of clinical research data using existing metadata and terminology resources and provides simple techniques and software for data element mapping using experiences from the eMERGE Network.
Authors: S Trent Rosenbloom; Randolph A Miller; Kevin B Johnson; Peter L Elkin; Steven H Brown Journal: J Am Med Inform Assoc Date: 2006-02-24 Impact factor: 4.497
Authors: James E Andrews; Timothy B Patrick; Rachel L Richesson; Hana Brown; Jeffrey P Krischer Journal: J Biomed Inform Date: 2008-02-05 Impact factor: 6.317
Authors: Natalya F Noy; Sherri de Coronado; Harold Solbrig; Gilberto Fragoso; Frank W Hartel; Mark A Musen Journal: Appl Ontol Date: 2008-01-01 Impact factor: 1.115
Authors: Patrick J Stover; William R Harlan; Jane A Hammond; Tabitha Hendershot; Carol M Hamilton Journal: Curr Opin Lipidol Date: 2010-04 Impact factor: 4.776
Authors: Jyotishman Pathak; Helen Pan; Janey Wang; Sudha Kashyap; Peter A Schad; Carol M Hamilton; Daniel R Masys; Christopher G Chute Journal: AMIA Jt Summits Transl Sci Proc Date: 2011-03-07
Authors: Natalya F Noy; Nigam H Shah; Patricia L Whetzel; Benjamin Dai; Michael Dorf; Nicholas Griffith; Clement Jonquet; Daniel L Rubin; Margaret-Anne Storey; Christopher G Chute; Mark A Musen Journal: Nucleic Acids Res Date: 2009-05-29 Impact factor: 16.971
Authors: F Biering-Sørensen; S Alai; K Anderson; S Charlifue; Y Chen; M DeVivo; A E Flanders; L Jones; N Kleitman; A Lans; V K Noonan; J Odenkirchen; J Steeves; K Tansey; E Widerström-Noga; L B Jakeman Journal: Spinal Cord Date: 2015-02-10 Impact factor: 2.772
Authors: William Hsu; Nestor R Gonzalez; Aichi Chien; J Pablo Villablanca; Päivi Pajukanta; Fernando Viñuela; Alex A T Bui Journal: J Biomed Inform Date: 2015-03-26 Impact factor: 6.317
Authors: Caroline S Fox; Jennifer L Hall; Donna K Arnett; Euan A Ashley; Christian Delles; Mary B Engler; Mason W Freeman; Julie A Johnson; David E Lanfear; Stephen B Liggett; Aldons J Lusis; Joseph Loscalzo; Calum A MacRae; Kiran Musunuru; L Kristin Newby; Christopher J O'Donnell; Stephen S Rich; Andre Terzic Journal: Circulation Date: 2015-04-16 Impact factor: 29.690
Authors: Katherine M Newton; Peggy L Peissig; Abel Ngo Kho; Suzette J Bielinski; Richard L Berg; Vidhu Choudhary; Melissa Basford; Christopher G Chute; Iftikhar J Kullo; Rongling Li; Jennifer A Pacheco; Luke V Rasmussen; Leslie Spangler; Joshua C Denny Journal: J Am Med Inform Assoc Date: 2013-03-26 Impact factor: 4.497
Authors: Yehoshua Perl; James Geller; Michael Halper; Christopher Ochs; Ling Zheng; Joan Kapusnik-Uner Journal: Ann N Y Acad Sci Date: 2016-10-17 Impact factor: 5.691