| Literature DB >> 34189274 |
Zahra A Almowil1, Shang-Ming Zhou2, Sinead Brophy1.
Abstract
INTRODUCTION: Electronic health records (EHR) are linked together to examine disease history and to undertake research into the causes and outcomes of disease. However, the process of constructing algorithms for phenotyping (e.g., identifying disease characteristics) or health characteristics (e.g., smoker) is very time consuming and resource costly. In addition, results can vary greatly between researchers. Reusing or building on algorithms that others have created is a compelling solution to these problems. However, sharing algorithms is not a common practice and many published studies do not detail the clinical code lists used by the researchers in the disease/characteristic definition. To address these challenges, a number of centres across the world have developed health data portals which contain concept libraries (e.g., algorithms for defining concepts such as disease and characteristics) in order to facilitate disease phenotyping and health studies.Entities:
Keywords: concept libraries; linked Electronic health records; phenotype algorithms; review
Mesh:
Year: 2021 PMID: 34189274 PMCID: PMC8210840 DOI: 10.23889/ijpds.v5i1.1362
Source DB: PubMed Journal: Int J Popul Data Sci ISSN: 2399-4908
Figure 1: Overview of the steps taken in the priority-setting process.| Concept Libraries | Definitions/Purposes | Developers/Leaders | References of the Manuscripts/URL Access of the Concept Libraries | Electronic data sources/Coding systems | Examples of phenotypes |
|---|---|---|---|---|---|
| ClinicalCodes.org | An online repository that contains a set of published studies. For each study a code list or a group of code lists has been uploaded on the ClinicalCodes.org site. Code lists are publicly accessible to improve validity and reproducibility of electronic medical record studies. | The University of Manchester. Institute of Population Health, UK |
9. Spring ate DA, Ketopantoic E, Ashcroft DM, Olier I, Parisi R, Chamapiwa E, et al. ClinicalCodes: An online clinical codes repository to improve the validity and reproducibility of research using electronic medical records. 2014; 9(6):6–11. | Primary and secondary care using Read, OXMIS, SNOMED, CPRD, product/medical code, BNF code, ICD-9, ICD-10 |
Research article: Are symptoms of insomnia in primary care associated with subsequent onset of dementia? A matched retrospective case-control study, Link to the shared phenotypic descriptions at: |
| CALIBER data portal | An open online repository of phenotyping algorithms that contains all definitions of research variables using CALIBER data sources in order to encourage research and promote transparency. | Led from the University College London (UCL) Institute of Health Informatics, UK |
21. Dewaxes S, Gonzalez-Inquired A, Direk K, Fitzpatrick NK, Fatemifar G, Banerjee A, et al. UK phenomics platform for developing and validating electronic health record phenotypes: CALIBER. J Am Med Inform Assoc. 2019; 26(12):1545–59. | Primary care, hospital records, social deprivation information, cause-specific mortality data. Using Read codes (a subset of SNOMED-CT), ICD-9, ICD-10, OPCS-4 (analogous to Current Procedural Terminology terms) and Gemscript. |
Abdominal Hernia: “At the specified date, a patient is defined as having had Abdominal Hernia If they meet the criteria for any of the following on or before the specified date. The earliest date on which the individual meets any of the following criteria on or before the specified date is defined as the first event date: Primary care 1. Abdominal Hernia diagnosis or history of diagnosis or procedure during a consultation OR Secondary care 1. ALL diagnoses of Abdominal Hernia or history of diagnosis during a hospitalization OR Secondary care (OPCS4) 1. ALL procedures for Abdominal Hernia during a hospitalization” Link to the shared phenotypic descriptions at: https://www.caliberresearch.org/portal/phenotypes/chronological-map |
| The MCHP Concept Dictionary and Glossary | The Concept Dictionary includes comprehensive operational definitions and programming code for measurements used in MCHP research including a description of the problem(s) involved, methods used, and programming tips/ cautions, and the Glossary records terms that are widely used in research based on population. The Concept Dictionary was developed to help researchers use reliable, validated algorithms to perform methodologically comprehensive research. | The Manitoba Centre for Health Policy (MCHP), Canada |
13. Soapy T. Manitoba Centre for Health Policy Data Repository. In: Michalos AC (eds) Encyclopaedia of Quality of Life and Well-Being Research. Springer, Dordrecht; 2014. |
- The MCHP databases: Health, Education, Social, Justice, Registries, Support Files. - Operational definitions and SAS program code for variables or measures developed from administrative data. - The International Classification of Disease (ICD) diagnoses or ICD / CCI (Canadian Classification of Health Interventions) procedure / intervention |
Manitoba Asthma Algorithms The following is an example of asthma algorithm developed by a research project. “Raymond et al. (2011) use a broader scope in their definition for asthma, defining it as one physician claim OR one hospital claim with a corresponding diagnosis of: ICD-9-CM: 464, 466, 490, 491, 493 or ICD-10-CA: J04, J05, J20, J21, J40, J41, J42, J45, J441, J448 OR one prescription for an asthma medication in a three-year period. “ Link to the shared phenotypic descriptions at: |
| Phenotype knowledgebase (PheKB) | An online environment supporting the workflow of building, sharing, and validating electronic phenotype algorithms. The PheKB was designed to facilitate the transportability of algorithms into various research applications across different organizations, health care systems, and repositories of clinical data. | Led by Vanderbilt University, (the eMERGE Network Coordinating Center), USA | 19. Kirby JC, Speltz P, Rasmussen L V., Basford M, Gottesman O, Peissig PL, et al. PheKB: A catalogue and workflow for creating electronic phenotype algorithms for transportability. J Am Med Informatics Assoc. 2016; 23(6):1046–52. |
Clinical and genomic data from electronic health records. HCPT Codes, ICD 10 Codes, ICD 9 Codes, Laboratories, Medications, Natural Language Processing |
Urinary Incontinence The cohort is defined with the following criteria: a. EHR of all male patients of 35 years of age or more, AND b. For which there is an ICD-9-CM / ICD-10-CM diagnosis of prostate cancer, AND c. For which there are at least two encounters before first treatment, AND d. For which there is at least one clinical note before first treatment, AND e. For which there is either prostatectomy surgery or radiation procedure performed as identified by CPT codes. Link to the shared phenotypic descriptions at: |
| 2. Specialized libraries | |||||
| Genome-Phenome Analysis Platform (GPAP) | An online data platform, where data from sequencing experiments contributed by collaborating research projects is processed using a standard pipeline and made accessible to registered users for online analysis through a user-friendly interface. | It was developed by RD-Connect and Led by Aix-Marseille University Medical School (AMU), France |
27. Thompson R, Johnston L, Taruscio D, Monaco L, Béroud C, Gut IG, et al. RD-Connect: An integrated platform connecting databases, registries, biobanks and clinical bioinformatics for rare disease research. J Gen Intern Med. 2014; 29(SUPPL. 3):780–7. |
Genomic and clinical data from RD-Connect’s partners rare disease-based research projects. The PhenoTips database stores phenotypic profiles for individual cases coded by human phenotype ontology (HPO). A directory of biobanks and patient registries and a bio sample catalogue. |
Case 1: description RD-Connect identifier: Case1C Gender: Male, Age: 5 years, Referral: Congenital myasthenic syndrome, Onset: Congenital, Global pace of progression: Progressive (slow), Main clinical features: Neonatal hypotonia, Distal arthrogryposis, Inability to walk, Recurrent lower respiratory tract infections. Link to the shared phenotypic descriptions at: |
| The PhenoScanner V2 | A database that contains publicly existing results of large-scale genomic association studies. It was developed to facilitate the cross-referencing of genetic variants with a wide variety of phenotypes for better comprehension of biology and pathways of disease. | The Cardiovascular Epidemiology Unit, University of Cambridge, UK | 24. Kamat MA, Blackshaw JA, Young R, Surendran P, Burgess S, Danesh J, et al. PhenoScanner V2: an expanded tool for searching human genotype-phenotype associations. Bioinformatics. 2019; 35(22):4851–3. | 137 genotype–phenotype association datasets, including results for anthropometric traits, blood pressure, lipids, cardiometabolic diseases, renal function measures, glycemic traits, inflammatory diseases, psychiatric diseases and smoking phenotypes. It also includes the NHGRI-EBI GWAS catalog, and dbGaP catalogues of associations. |
Trait: Crohn's disease “A gastrointestinal disorder characterized by chronic inflammation involving all layers of the intestinal wall, noncaseating granulomas affecting the intestinal wall and regional lymph nodes, and transmural fibrosis. Crohn disease most commonly involves the terminal ileum; the colon is the second most common site of involvement. A chronic transmural inflammation that may involve any part of the DIGESTIVE TRACT from MOUTH to ANUS, mostly found in the ILEUM, the CECUM, and the COLON. In Crohn disease, the inflammation, extending through the intestinal wall from the MUCOSA to the serosa, is characteristically asymmetric and segmental. Epithelioid GRANULOMAS may be seen in some patients.” Link to the shared phenotypic descriptions at: |
| Genotypes and Phenotypes Database (dbGap) | A National Institute of Health-sponsored repository tasked with archiving, curating and distributing information provided by studies examining genotype and phenotype interactions. It was developed with standardized identifiers that allows published studies to address or cite the primary data in a clear and uniform way. | National Centre for Biotechnology Information, USA | 15. Tryka KA, Hao L, Sturcke A, Jin Y, Wang ZY, Ziyabari L, et al. NCBI’s database of genotypes and phenotypes: DbGaP. Nucleic Acids Res. 2014; 42(D1):975–9. | Genetic and phenotypic databases sponsored by NIH and other agencies around the world including: Genotype, phenotype, exposure, expression array, epigenomic and pedigree data from genome-wide association studies (GWAS), sequencing studies and other large-scale genomic studies. | “Autism_Genome_Project_Subject_Phenotypes: The subject phenotype table includes data collected on sociodemography (n=2 variables; sex and European ancestry) and psychological and psychiatric observations (n=8 variables; spectrum and strict definition of autism, whether the subject is non-verbal and/or verbal, has low or high IQ, and the age of their first word and phrase). This table now also includes the stage of the study in which the individual was present and whether the individual is a member of a multiplex or simplex family.” Link to the shared phenotypic descriptions at: https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/variable.cgi?study_id=phs000267.v5.p2&phv=161303&phd=3659&pha=3690&pht=2305&phvf=&phdf=&phaf=&phtf=&dssp=1&consent=&temp=1 |
| Concept Libraries | Access to the underlying data sources | Sharing/Uploading of Concepts | Reusing/Downloading of Concepts |
|---|---|---|---|
| 1.General libraries | |||
|
| |||
| ClinicalCodes.org | In the top menu tab 'Browse published studies,' the user may choose a published study from the list. It then shows all the code lists associated with that study. | "Users must register with ClinicalCodes.org (in the menu bar login/signup) and choose 'upload codes.' First, they need to add some metadata of the published study and then they can upload several codes lists as delimited text files into that study.Metadata and links to studies code lists could be shared in a machine-readable form using the available open-source R package (rClinicalCodes). | Code lists are released on ClinicalCodes.org using a Creative Commons Attribution 3.0 Unported License (CC BY 3.0), and a file containing all codes associated with a study can be downloaded and used freely by any user.Downloading individual code lists is a single-click process that does not involve logging in or supplying user information.Users can choose to explore and download some or all of the code lists as csv files. |
| CALIBER data portal | Access to CPRD linked data on the CALIBER portal complies with the governance policies for data access by the CPRD.Researchers should first sign agreements with UCL to access CPRD data. Non-UCL partners must apply for CPRD to become a CPRD-approved partner and sign a UCL-approved sub-license agreement. | If a project proposal for a researcher has been accepted, a registration with the UCL Identifiable Data Handling Service (IDHS) will be arranged in order to create a new share for the project on the safe haven.The data facilitator at CALIBER will direct researchers through the entire process. | All definitions of research variables that use CALIBER data sources are publicly available and can be accessed in human and machine-readable formats.CALIBER codelists package enable users to search for code lists by synonym or code stub, combine search terms using Boolean operators, and download the list of codes and some basic metadata as a csv file. |
| The MCHP Concept Dictionary and Glossary | The Data Access Process (DAP) of the Manitoba Centre for Health Policy (MCHP) are the processes that a researcher has to complete to access the data and conduct research using the Manitoba Population Research Data Repository. | Researchers can share their work, such as creating new concepts or updating existing concepts.The concept development guidelines are defined in the Concept Development Template. |
|
| Phenotype Knowledgebase (PheKB) | Private Phenotypes with "In Development" status, phases of "Testing," or "Validation" are not publicly accessible, which can only be accessed if the user is logged in and the phenotype was shared with the user via one of the two collaborative groups: Owner Group Phenotypes or View Group Phenotypes. | Researchers can upload:Related documents and their phenotypes along with multidimensional metadata labels. Documents including detail descriptions of the computable algorithms, such as types of used data, logic of execution, definitions of data, and flow charts. | Algorithms and multiple implementation results can be publicly viewed in the PheKB website when author designated it as “final”.By using metadata, users can search an algorithm based on inclusion or exclusion of data elements classes, such as diagnosis, author, or keyword. |
|
| |||
| 2. Specialized libraries | |||
|
| |||
| Genome-Phenome Analysis Platform (GPAP) | Only approved users who have completed the registration and verification process can access the data stored on the GPAP. Users must be affiliated with a recognized academic institution as accredited clinicians/researchers and must demonstrate their approval of the RD-Connect Code of Conduct by signing theAdherence Agreement. | Data sharing is open for project submissions from all users, not only from partners of RD-Connect, but they have to register first in the GPAP website.The GPAP enables clinicians and researchers who upload patient datasets to analyse their own data. | Registered users are allowed to access and search data sets provided by other researchers on similar patients.Registered users can match make, find second families, and find patient populations for validation studies. |
| The PhenoScanner V2 | Some of the datasets are available for download including: dbSNP 147 with variant annotation from VEP, Linkage disequilibrium statistics from 1000 Genomes and a subset of the processed GWAS datasets, but users should first contact phenoscanner@gmail.com to get an approval. | Users can input one genetic variant, gene, genomic region or trait in the home page text box (www.phenoscanne.medschl.cam.ac.uk) or upload as a tab-delimited text file up to 100 genetic variants, 10 genes or 10 genomic regions. | "Users can use the archived findings from large-scale genetic association studies which are publicly accessible.Information provided by project members are unrestrictedly accessible. |
| Genotypes and Phenotypes Database (dbGaP) | Free access to information on completed studies are open to the public.Individual level data is accessible to scientists around the world through controlled application of access. | NIH-funded researchers can share their produced data. However, studies that are not sponsored by the NIH, individual NIH Institutes and Centres (IC) make judgments about whether non-NIH sponsored data should be accepted. | Open-access data can be accessed online or downloaded without prior authorization or permission from dbGaP.Individual level data download requests are handled through the dbGaP Authorized Access System (dbGaPAA), a web portal that manages request submissions, and enables safe high-speed large data download for authorized users. |