Literature DB >> 33868597

GPCards: An integrated database of genotype-phenotype correlations in human genetic diseases.

Bin Li1,2,3, Zheng Wang1, Qian Chen1, Kuokuo Li4, Xiaomeng Wang4, Yijing Wang4, Qian Zeng2, Ying Han4, Bin Lu5, Yuwen Zhao2, Rui Zhang2, Li Jiang2, Hongxu Pan2, Tengfei Luo4, Yi Zhang1, Zhenghuan Fang4, Xuewen Xiao2, Xun Zhou2, Rui Wang4, Lu Zhou2, Yige Wang2, Zhenhua Yuan2, Lu Xia4, Jifeng Guo2, Beisha Tang1,2, Kun Xia4, Guihu Zhao1,2, Jinchen Li1,4,2.   

Abstract

Genotype-phenotype correlations are the basis of precision medicine of human genetic diseases. However, it remains a challenge for clinicians and researchers to conveniently access detailed individual-level clinical phenotypic features of patients with various genetic variants. To address this urgent need, we manually searched for genetic studies in PubMed and catalogued 8,309 genetic variants in 1,288 genes from 17,738 patients with detailed clinical phenotypic features from 1,855 publications. Based on genotype-phenotype correlations in this dataset, we developed an user-friendly online database called GPCards (http://genemed.tech/gpcards/), which not only provided the association between genetic diseases and disease genes, but also the prevalence of various clinical phenotypes related to disease genes and the patient-level mapping between these clinical phenotypes and genetic variants. To accelerate the interpretation of genetic variants, we integrated 62 well-known variant-level and gene-level genomic data sources, including functional predictions, allele frequencies in different populations, and disease-related information. Furthermore, GPCards enables automatic analyses of users' own genetic data, comprehensive annotation, prioritization of candidate functional variants, and identification of genotype-phenotype correlations using custom parameters. In conclusion, GPCards is expected to accelerate the interpretation of genotype-phenotype correlations, subtype classification, and candidate gene prioritisation in human genetic diseases.
© 2021 The Author(s).

Entities:  

Keywords:  GPCards; Genotype; Phenotype; Variant

Year:  2021        PMID: 33868597      PMCID: PMC8042245          DOI: 10.1016/j.csbj.2021.03.011

Source DB:  PubMed          Journal:  Comput Struct Biotechnol J        ISSN: 2001-0370            Impact factor:   7.271


Introduction

Extraordinary advances in sequencing technology have resulted in major scientific breakthroughs in human genetics [1], [2]. In particular, next-generation sequencing (NGS) technologies, especially whole-exome sequencing and whole-genome sequencing, have accelerated the identification of pathogenic variants and disease-causing genes in human genetic diseases [3]. NGS technologies have been effectively applied to biomedical genetics and clinical genetics [1], [3], and revolutionised the way researchers and clinicians prioritise disease-causing genes in Mendelian disorders and other human complex diseases [4]. Moreover, medical genetics still play a huge role in the diagnosis of rare diseases and promote personalised diagnosis and treatment. Experienced clinicians now combine clinical phenotypic features with molecular genetics in disease diagnosis and treatment [5]. Since the correlation between genotype and phenotype in genetic diseases was first reported decades ago [6], increasing evidence has demonstrated that patients carrying pathogenetic variants in some disease-causing genes presented clinically recognisable phenotypes and accompanying syndromes [7]. Meanwhile, researchers and clinicians turned their attentions to the molecular subtypes classification of the disease based on the genotypes of patients [7]. Although amounts of variants have been discovered, the speed of interpretation lags far behind, scientists are not yet able to decipher the correlations between most variants and diseases. Many phenotypic features caused by genetic variants cannot be used for accurate clinical diagnosis and treatment. A better understanding of correlations between genotypes and phenotypes will revolutionise clinical diagnosis and treatment in patients with genetic diseases [8]. Nevertheless, data for genotype–phenotype correlations are distributed across a massive number of published studies and are therefore difficult to access and utilize. To address this problem, the appropriate integration of these distributed data is necessary, and the development of a database with aggregated information of genotype–phenotype correlations and detailed individual-level clinical phenotype with genetic variants is a key goal [9]. Several databases, such as Online Mendelian Inheritance in Man (OMIM) [10], Human Phenotype Ontology (HPO) [11], ClinVar [12], MalaCards[13], DisGeNET{Pinero, 2020 #98}, Monarch{Shefchek, 2020 #99}, and CentoMD [4], have been developed to catalogue disease-associated genes. However, there is no open-access database with detailed individual-level clinical phenotypic features related to genetic variants. Accordingly, we developed a comprehensive, global, open-access database of genotype–phenotype correlations, named GPCards (http://www.genemed.tech/gpcards). In GPCards, detailed information about genotype–phenotype correlations for individual patients with genetic variants is presented with a user-friendly interface and does not require registration. Moreover, the most well-known annotated information at the gene and variant levels is provided by easily operated links. GPCards provides an important resource for genetic counselling and disease diagnosis and treatment.

Material and methods

Data collection and quality control

Genotype–phenotype correlations were retrieved by manual searches of each human gene against PubMed using the search strategy “gene symbol [Title/Abstract] AND (mutation [Title/Abstract] OR variant [Title/Abstract])” (Fig. 1). Though all human genes were searched, only effective genetic studies were obtained according to the following inclusion criteria: (i) no fewer than three patients with detailed data for genotype–phenotype correlations and (ii) within the top five studies with respect to level of detail for genotype–phenotype correlations, if there are more than five eligible studies for a human gene. Exclusion criteria were as follows: (i) studies that focused on molecular mechanisms, rather than genetic studies; (ii) studies reporting fewer than three patients; and (iii) studies without original data for genotype–phenotype correlations, or without original phenotypic details of patients, which may cite from other published studies. We get rid of unsuitable studies by reading abstracts of the searched publications which were retrieved from PubMed according with the exclusion criteria. After that, we screened out effective genetic studies from the rest literature according with the inclusion criteria. The data collectors collected genotype and phenotype information in the literature. At last, a geneticist was assigned to reviewed and curate the genetic and phenotypic data and to confirm the accuracy of the collected data. All data collectors, who were rigorously trained to ensure the consistency of collected data, were researchers or PhD students with a strong background in clinical genetics.
Fig. 1

A general workflow of GPCards. Data collection and quality control information were showed in green box; Variants annotation and integration flow chart was listed in yellow box; and database construction and interface were exhibited in red box. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

A general workflow of GPCards. Data collection and quality control information were showed in green box; Variants annotation and integration flow chart was listed in yellow box; and database construction and interface were exhibited in red box. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) For each genetic study meeting the quality control criteria described above, two types of information were collected. First, we catalogued the phenotypes associated with each disease-causing gene, including the PubMed ID, gene symbol, diagnostic diseases, total number of patients with genetic variants in a given gene, and number of patients with each clinical phenotype or symptom (Fig. 1). Second, we catalogued the detailed phenotypic features and genotypes of each patient, including the PubMed ID, sample ID, Mendelian inheritance (recessive or dominant), genomic position of each variant, nucleotide change, amino acid change, origin of variants (de novo or inherited), types of variants (homozygous or heterozygous), gender, and status of each phenotypic features or symptoms (Fig. 1). LiftOver was employed to convert the genomic position from one genome assembly (hg18 or hg38) to the genome assembly hg19. If the genomic positions of genetic variants were not available in the original studies, VarCards [14] was used to match the genomic positions based on definitions of transcripts from RefSeq.

Variant annotation and integration

ANNOVAR was used for the comprehensive annotation of genetic variants in each study (Fig. 1). The allele frequencies in different populations were extracted from various human genetic variation databases, such as gnomAD (release 2.1.1) [15], [16], ExAC (release 1.0) [15], [17], ESP6500 (release ESP6500SI-V2) [18]; 1000 Genomes Project (final phase of the project) [19]; Kaviar genomic variant database (version 160204-Public) [20], and Haplotype Reference Consortium (HRC) (15). The predicted pathogenicity of missense variants was also evaluated using 24 widely accepted algorithms, including ReVe [21], REVEL [22], SIFT [23], [24], PolyPhen2 HDIV [25], PolyPhen2 HVAR [25], LRT [26], MutationTaster [27], MutationAssessor [28], FATHMM [29], PROVEAN [30], VEST 3.0 [31], MetaSVM [32], MetaLR [32], M-CAP [33], CADD [34], DANN [35], FATHMM MKL [36], Eigen [37], GenoCanyon [37], fitCons [38], GERP++ [39], PhyloP [40], PhastCons [41], and SiPhy [42]. Some disease-related information for variants was also annotated, including InterVar [43] (103), COSMIC [44], ICGC [45], nci60, InterPro [46], dbSNP v150 [47], and ClinVar [12]. Comprehensive annotations were also performed at the gene level, as described in our previous studies [14], [48], including six data types: basic information, gene function, phenotype and disease, gene expression, variants in different populations, and drug–gene interactions (Fig. 1). In the panel of basic information, core gene-level information was extracted from NCBI Gene [49], Gene Ontology (GO; V1.4) [50] (113), and InBio Map (release 20160912) ) [51]. Data were obtained for the intolerance score (RVIS) [52], novel gene intolerance ranking system LoFtool [53], heptanucleotide context intolerance score [54], gene damage index (GDI) [55], Episcore [56], and the probability of loss of function intolerance score [15]. In the gene function panel, core information from UniProt (release 201902) [57], InterPro [46], InBio Map (release 20160912) [51], and NCBI BioSystems (release 20170421) [58] was integrated. In the phenotype and disease-related information panel, the gene-level information from OMIM [10], ClinVar [12], Gene4Denovo [48], MGI [59], and HPO [51] was catalogued. In the gene expression panel, data were sourced from BrainSpan [60], GTEx [61], and the Human Protein Atlas [62]. The final panel included drug–gene interaction data and gene druggability in the drug–gene interaction from DGIdb [63].

Database construction and interfaces

Integrating all of the information for genotype–phenotype correlations and comprehensive annotations described above, GPCards (http://genemed.tech/gpcards/) was developed by combining Vue with a PHP-based web framework lavarel to construct a user-friendly web interface (Fig. 1). The front and back models were separated for construction. The UI Toolkit Element, supporting most modern browsers across platforms, such as Microsoft Edge, Google Chrome, and Safari, was used. The back-end was developed using Laravel, a common PHP web framework. GPCards could run smoothly and is compatible with multiple operating systems, including Windows, Mac, and Linux. Finally, all genotype–phenotype data and annotation data were stored in the MySQL database.

Database update

The data in GPCards will be updated semi-annually, by manually searching the genotype–phenotype correlations in PubMed with search strategy “(phenotype [Title/Abstract] or clinical feature [Title/Abstract]) AND (mutation [Title/Abstract] OR variant [Title/Abstract])” accompany with published data limited from the latest update time. We also encourage users upload original phenotype-genotype data, which have been anonymized in Upload section of GPCards.

Results and web interface

Summary of catalogued data for genotype–phenotype correlations

We reviewed more than 20,000 studies from PubMed and 1,855 genetic studies with detailed phenotype information satisfying the quality control requirements were finally integrated. In total, 8,309 nonredundant genetic variants in 1,288 genes from 17,738 patients with formatted and detailed clinical phenotypic features were integrated into the GPCards database. For these 1,288 disease-associated genes with individual-level detailed phenotypic features, 119 (10.9%), 92 (8.4%), 59 (5.4%), and 436 (39.9%) were reported to carry three, four, five and no fewer than six genetic variants, respectively (Fig. 2A). Among the 1,855 studies, 129 (7.9%), 175 (10.8%), 131 (8.1%), and 1,070 (65.9%) described two, three, four, and no fewer than five phenotypic features for each patient, respectively (Fig. 2B). Moreover, 220 (13.4%), 185 (11.3%), 190 (11.6%) and 1,049 (63.8%) studies described three, four, five and no fewer than six patients, respectively (Fig. 2C). Furthermore, for 17,738 patients with clinical information, we found that 1,200 (8.0%), 1,795 (12.0%), 1,253 (8.4%), and 9,619 (64.3%) patients had two, three, four, and no fewer than five clinical phenotypic features, respectively (Fig. 2D).
Fig. 2

Summary of catalogued genotype–phenotype correlation data. (A) The distribution of disease-associated genes with different number of genetic variants. (B) The distribution of studies with different number of clinical phenotypes. (C) The distribution of studies with different number of patients. (D) The distribution of patients with different number of clinical phenotypes.

Summary of catalogued genotype–phenotype correlation data. (A) The distribution of disease-associated genes with different number of genetic variants. (B) The distribution of studies with different number of clinical phenotypes. (C) The distribution of studies with different number of patients. (D) The distribution of patients with different number of clinical phenotypes.

Search modules of GPCards

To facilitate the mining and application of genotype–phenotype correlation data, the GPCards database was developed with a user-friendly query interface. It provides an overview of the individual-level genotype–phenotype correlation with comprehensive annotation information. A quick search bar was set up with various types of searches as prompts in a prominent position on the home page of GPCards (Fig. 3). This quick search panel could automatically recognise a variety of key terms related to phenotype or phenotypic features information, including gene symbols, genomic regions, cytoband, genetic variants, gene transcripts, genomic coordinates, disease symbols, phenotype keywords, and identifiers of GPCards (GP_ID). In addition, GPCards provided an advanced search function, by which users can conveniently search for the catalogued genotype–phenotype correlation data in batches (http://www.genemed.tech/gpcards/search) (Fig. 3). Examples of different types of search items are presented in this panel. Another key feature of the advanced search of GPCards is that it allows users to assign annotation information presented in the search results, including pathogenicity information based on 24 predictive tools, population-specific allele frequencies, and data from established disease- and phenotype-related databases (Fig. 3). To avoid excessive data, users can select any of these data sources, as needed, in the advanced search panel. For example, users could select gnomAD datasets [15], [16] only in the allele frequency section, which is considered the most comprehensive and ethnically diverse allele frequency database, as shown on the search results page.
Fig. 3

Snapshot of search modules in GPCards. The quick search bar is set with 11 types of searches prompts as the example of JAG1. The advanced search could be used to conveniently search in batches with nine type of search prompts. The searching results would show PubMed ID, gene symbol, disorder name, number of variants, patients, and phenotypes. “Specify annotation datasets” is a selectable panel with 24 predictive tools, population-specific allele frequencies, and data from established disease- and phenotype-related databases, allowing users to assign annotation information presented in the panel of searching results.

Snapshot of search modules in GPCards. The quick search bar is set with 11 types of searches prompts as the example of JAG1. The advanced search could be used to conveniently search in batches with nine type of search prompts. The searching results would show PubMed ID, gene symbol, disorder name, number of variants, patients, and phenotypes. “Specify annotation datasets” is a selectable panel with 24 predictive tools, population-specific allele frequencies, and data from established disease- and phenotype-related databases, allowing users to assign annotation information presented in the panel of searching results.

Genotype–phenotype correlations in GPCards

The results of the quick search and advanced search are presented as tables that summarise the basic information for disease-associated genes, including the PubMed ID, gene symbol, disorder name, number of variants, patients, and phenotypes in each study (Fig. 4). Notably, the genotype-phenotype correlations of GPCards were specific to each original study and not aggregated across different studies which reported phenotypic features with different manners and vocabularies. When users click “phenotype summary and genotype–phenotype correlation”, a new and clear interface is presented with two sections: phenotype summary and genotype–phenotype correlation. The phenotype summary section shows the frequencies of various clinical phenotypic features or symptoms of disease-causing genes in a given study (Fig. 4). For each clinical phenotypic feature, users would obtain the total number of patients examined, the number of patients that presented this phenotypic feature and the prevalence of this phenotypic features in the study. For example, by searching JAG1, users could conveniently learn that one study reported Alagille syndrome associated with JAG1 in 70 patients. Furthermore, 17 patients (24.29%) show an interlobular bile duct paucity, 64 (91.43%) have a cardiac murmur, and 57 (81.43%) have characteristic facial features, in addition to other summarised phenotypes.
Fig. 4

Snapshot of genotype-phenotype correlations in GPCards. In “Phenotype Summary and Genotype-Phenotype Correlation” panel, the basic information of the searched genes was presented. The frequencies of various clinical phenotypes or symptoms of disease-causing genes is exhibited in the “Phenotype Summary” panel. The detailed individual-level phenotypes and genotypes were present in “Genotype-Phenotype Correlation” panel. Moreover, comprehensive variant-level annotations of each genetic variant were also present in this panel.

Snapshot of genotype-phenotype correlations in GPCards. In “Phenotype Summary and Genotype-Phenotype Correlation” panel, the basic information of the searched genes was presented. The frequencies of various clinical phenotypes or symptoms of disease-causing genes is exhibited in the “Phenotype Summary” panel. The detailed individual-level phenotypes and genotypes were present in “Genotype-Phenotype Correlation” panel. Moreover, comprehensive variant-level annotations of each genetic variant were also present in this panel. In the section on genotype–phenotype correlations, users can conveniently browse the detailed clinical phenotypic feature and genotype information as well as comprehensive annotations for genetic variants (Fig. 4). For the genotype information, users could obtain the genomic position, reference allele, alternative allele, Mendelian inheritance (recessive or dominant), origin of variants (de novo or inherited), variant type (homozygous or heterozygous), functional effects (stop-gain, frameshift, nonsynonymous, or splicing), and functional consequences predicted by several tools. For phenotype information, users could learn whether a patient with a specific genotype presents specific phenotypic features. For the annotation information, users could evaluate pathogenicity based on 24 predictive tools, allele frequencies in different populations, and whether the variant has been catalogued in other well-known disease- and phenotype-related databases. For example, a patient with Alagille syndrome carries a heterozygous de novo nonsynonymous variant (c.550C > T, p.R184C) in JAG1 [64], with clinical phenotypic features of interlobular bile duct paucity, cholestasis, cardiac murmur, skeletal abnormalities, characteristic facial features, and posterior embryotoxon and without interlobular bile duct paucity and kidney abnormalities which phenotypic features may be presented in other patients with different genetic variants of JAG1. By clicking “Detailed Annotation”, users can learn that this variant was predicted to be deleterious or conserved by all 24 predictive tools, has not been reported in any population in gnomAD, ExAC, and other population databases, is catalogued as likely pathogenic variant in InterVar, and is reported as pathogenic in the ClinVar database (Fig. 4). Notably, by clicking the JAG1 gene symbol, users could also obtain comprehensive gene-level information (http://genemed.tech/gpcards/geneDetail/main?gene_symbol=JAG1), as mentioned in the Material and Methods section, similar to the Gene4denovo database (48) previously developed by our group.

Other functions in GPCards

GPCards support an analysis service that is freely available to all users on the Analysis page (http://www.genemed.tech/gpcards/analysis). Users are able to analyse genetic data by uploading the anonymized patient data files in VCF4 format and inputting their E-mail address. If users choose the “Trio” option for uploading a VCF file, they should select the sample IDs of the proband, unaffected father, and mother, and GPCards would automatically identify de novo mutations, homozygous variants, compound heterozygous variants, and the inherited hemizygous. If users choose the “Non-Trio” option and set the genotype (heterozygous, homozygous, or wild) of each sample, GPCards would automatically identify the co-segregated genetic variants that meet users’ requirements. With informed patient consent, GPCards would link the anonymized genetic variants to genotype–phenotype correlations. If GPCards identified a variant that has been catalogued, it would provide the detailed phenotypes of patients carrying the same variants. If GPCards prioritised a gene that has been catalogued, it would provide gene-level summary information for genotype–phenotype correlations. In addition, GPCards provides several parameters for quality control and detection of co-segregating rare damaging variants. There are also some additional useful sections in GPCards. In the download section, users are allowed to freely download all of the genotype–phenotype correlation data compiled by about 20 professionals over several months (http://genemed.tech/gpcards/download). In the upload section, users could upload anonymized genotype–phenotype data, which would be helpful for enriching the database (http://genemed.tech/gpcards/upload). After receiving the data uploaded by users, we will connect users to inform the consent, and perform de-identification before public release in GPCards database. Users could also access information for genotype–phenotype correlations by the browse function, which lists all catalogued genes and the total number variants, patients, and phenotypes in each study (http://genemed.tech/gpcards/browse). Moreover, in the browse section, users can efficiently access phenotypic data by choosing the first letter of the gene symbol. In the data source section, all integrated databases or algorithms are listed with summary information (http://genemed.tech/gpcards/source). We also supply an instruction manual on the Tutorial page, with detailed information about how to get started (http://genemed.tech/gpcards/tutorial).

Discussion

With the exponential growth of genetic data, especially in view of the extensive application of NGS technologies in the past decade, increasing disease-associated genetic variants have been discovered and implemented in diagnostic settings in medical genetics [1], [2], [5]. However, the overall diagnostic yield still lags behind the discovery of disease-associated genes [1], [5]. Owing to the amount of data, it is increasingly difficult for clinical investigators and geneticists to extract relevant genotype–phenotype information from various literatures. To resolve this issue, we developed the GPCards database, which enables users to conveniently access information about genotype–phenotype correlations without requiring registration or payment. By using the GPCards database, clinicians could classify complex diseases and syndromes into “molecular subtypes”, which would improve diagnostic accuracy and therapeutic efficacy. Clinicians could also conveniently identify genes or variants related to a phenotype using the disease name or phenotypic feature as a search term. This database is expected to substantially improve the application of genetic data to clinical diagnosis and treatment. It is a complex, laborious, and expensive task to archive genotype–phenotype data from a large number of published studies to construct a useful database [4], [65]. Owing to the substantial input of expertise, resources, and time, the newly developed GPCards database is practical and highly integrative. This database includes patients from a wide range of ethnic groups and geographical locations worldwide. All data were screened by professionals following a strict quality control system. Furthermore, we annotated all variants and genes using 62 well-established genetic or clinical data sources, providing a convenient one-stop database for the interpretation of pathogenicity of genetic variants. A quick search model and advanced search model provide easy operation interfaces with simple and easy-to-understand tips for users with a wide range of expertise, from beginners to scientists. GPCards is the first freely available database combining detailed individual-level information for genotype–phenotype correlations in human genetic diseases. Users can effectively simplify genotype–phenotype correlation data by utilising different functions of GPCards with personal needs, such as quick search, advanced search, browse, analysis, download and upload. GPCards is a practical and highly integrative database aimed at aiding geneticists and clinicians. It can be used to prioritise novel candidate genes, for example. Different categories of human diseases may share extensive phenotypic features and therefore may be caused by mutations in the same genes, such as de novo mutations (DNMs) in SCN2A were reported to be associated different neuropsychiatric disorders we previously reported [66]. Therefore, a single gene may be associated with two correlated diseases. If the phenotype information indicated this gene is associated to a given disease, we can infer that this gene may be associated with another disease which share the similar clinical features, based on the genotype-phenotype association. For example, previous studies demonstrated that DNMs in CHD8 were associated with autism spectrum disorder [67], a disease usually accompanied with intellectual disability, suggesting that CHD8 is a candidate gene for intellectual disability. In the past decades, many disease-related databases have been developed, such as OMIM [10], CentoMD [4], HGMD [68], HPO [11], ClinVar [12], DECIPHER [69], and MalaCards [13], as well as PhenoTips [70], Phenopolis [71], RD-Connect [72] and Patient Archive [73]. Both OMIM and HPO were database of descripting human genes and associated diseases/phenotypes without enough variant-level information. In addition, CentoMD, PhenoTips, and HGMD were all pay-per-use databases with genetic and clinical information from HPO and OMIM, users have to pay for the query services. Meanwhile, ClinVar database was well known for the variant-level information and associated disease, but lacks the detailed individual-level phenotypic information, as well as other listed databases above. DECIPHER was used by the clinical community to share and compare phenotypic and genotypic data. MalaCards listed the known aliases, as well as inter-disease connections, consolidated from 74 sources. There are also some workflows, which can be adapted to any set of patients for which phenomic and genomic data is available, such as PhenCo {Diaz-Santiago, 2020 #97}, were reported recently. Furthermore, GWASkb [74], GWAS Central [75], GWAS Catalogue [76], [77], PhenoScanner [78] and GRASP [79] focused on the relationship between different human traits and common SNPs instead of pathogenic variants. Compared to these databases, GPCards was an open accessed database which integrated peer-reviewed patient-level genotype–phenotype associations of genetic diseases and provided one-stop service for researchers and clinicals to interpret the pathogenicity of genetic variants. Furthermore, most of the existing genotype and phenotype databases do not supply analysis service, especially the free analysis function. However, GPCards features a free analysis service that allows users to easily complete a preliminary analysis of genotype data, annotated and prioritised genes with valuable information in gene level associated with phenotypes. This free analysis service will be groundbreaking in providing convenience to users and advancing the development of genotype and phenotype data analysis. Meanwhile, the download section and upload section are other highlights of GPCards. Based on the concept of maximum openness, users can upload anonymous genotype-phenotype information, which is necessary for patients’ data protection, and can also download the data collected by GPCards for re-analysis and re-mining. Thus, GPCards provides a platform for researchers to jointly promote the development of genotypic and phenotypic correlation research. GPCards will provide more accurate and comprehensive information regarding to genotype-phenotype correlations in more patients with the development of medical genetics. The current version of GPCards is the beginning and attempt to decipher the genotype and phenotype correlation and will be widely concerned by researchers and clinicians. However, there are some limitations in the present study. First, a large number of genotype–phenotype correlation data have been reported in thousands of literatures, but the format and standards of these information were differed widely. We try our best to search genetics studies of each human gene and found that the clinical phenotypic features were not available for most studies, leading to some genetic variants were missed in GPCards. We suggested that phenotypic data and corresponding variant data should be recorded in as much detail as possible in future publications. Meanwhile, although the continuous updating of the database is costly in terms of both in terms of money and time, we firmly believe in the potential utility of the database, so we will keep to update it semi-annually. Furthermore, we also encourage users to upload anonymized genotype–phenotype data to GPCards. Second, genotype–phenotype correlations in GPCards could be used to assist in diagnosis but not as diagnostic criteria, due to the following three points: (i) pathogenic variants may later be identified as non-pathogenic, as previous reported [80]; (ii) some of the reported pathogenic have not been functionally validated in cell and animal experiments; (iii) many genes and variants may present incomplete penetrance. In conclusion, GPCards offers extensive information about patient-level genotype–phenotype correlations in a user-friendly open-access interface, without requiring registration. GPCards also provides comprehensive gene- and variant-level annotations to facilitate the interpretation of the pathogenesis of genetic variants. We expect GPCards to be helpful for the prioritisation of novel candidate genes, genetic counselling and diagnosis, and development of appropriate treatment strategies.

Availability of data and materials

The datasets analyzed during the current study are available in the Gene4PD repository (http://genemed.tech/gene4pd/).

Contribution

BL, GZ, and JL were involved in study conception and design; WZ, QC, KL, XW, YW, QZ, YH, Blu, YZ, RZ, LJ, HP, TL, YZ, ZF, XX, XZ, RW, LZ, YW, ZY, LX, JG, BT, KX collected the data. GZ and QZ build this online platform. BL, GZ and JL wrote the manuscript. All authors contributed to the preparation of the manuscript and read and approved the final manuscript.

Funding

This work was supported by (81801133 to JCL, 82001362 to BL), Young Elite Scientist Sponsorship Program by CAST (2018QNRC001 to JCL), (20180033040004 to JCL), and Natural Science Foundation for Young Scientists of Hunan Province, China (2019JJ50974 to GHZ), Hunan Natural Science Foundation Outstanding Youth Fund (2020JJ3059 to JCL), Changsha Municipal Natural Science Foundation (kq2014278 to BL).

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
  79 in total

1.  Predicting the functional effect of amino acid substitutions and indels.

Authors:  Yongwook Choi; Gregory E Sims; Sean Murphy; Jason R Miller; Agnes P Chan
Journal:  PLoS One       Date:  2012-10-08       Impact factor: 3.240

2.  An integrative approach to predicting the functional effects of non-coding and coding sequence variation.

Authors:  Hashem A Shihab; Mark F Rogers; Julian Gough; Matthew Mort; David N Cooper; Ian N M Day; Tom R Gaunt; Colin Campbell
Journal:  Bioinformatics       Date:  2015-01-11       Impact factor: 6.937

3.  DECIPHER: database for the interpretation of phenotype-linked plausibly pathogenic sequence and copy-number variation.

Authors:  Eugene Bragin; Eleni A Chatzimichali; Caroline F Wright; Matthew E Hurles; Helen V Firth; A Paul Bevan; G Jawahar Swaminathan
Journal:  Nucleic Acids Res       Date:  2013-10-22       Impact factor: 16.971

4.  Analysis of protein-coding genetic variation in 60,706 humans.

Authors:  Monkol Lek; Konrad J Karczewski; Eric V Minikel; Kaitlin E Samocha; Eric Banks; Timothy Fennell; Anne H O'Donnell-Luria; James S Ware; Andrew J Hill; Beryl B Cummings; Taru Tukiainen; Daniel P Birnbaum; Jack A Kosmicki; Laramie E Duncan; Karol Estrada; Fengmei Zhao; James Zou; Emma Pierce-Hoffman; Joanne Berghout; David N Cooper; Nicole Deflaux; Mark DePristo; Ron Do; Jason Flannick; Menachem Fromer; Laura Gauthier; Jackie Goldstein; Namrata Gupta; Daniel Howrigan; Adam Kiezun; Mitja I Kurki; Ami Levy Moonshine; Pradeep Natarajan; Lorena Orozco; Gina M Peloso; Ryan Poplin; Manuel A Rivas; Valentin Ruano-Rubio; Samuel A Rose; Douglas M Ruderfer; Khalid Shakir; Peter D Stenson; Christine Stevens; Brett P Thomas; Grace Tiao; Maria T Tusie-Luna; Ben Weisburd; Hong-Hee Won; Dongmei Yu; David M Altshuler; Diego Ardissino; Michael Boehnke; John Danesh; Stacey Donnelly; Roberto Elosua; Jose C Florez; Stacey B Gabriel; Gad Getz; Stephen J Glatt; Christina M Hultman; Sekar Kathiresan; Markku Laakso; Steven McCarroll; Mark I McCarthy; Dermot McGovern; Ruth McPherson; Benjamin M Neale; Aarno Palotie; Shaun M Purcell; Danish Saleheen; Jeremiah M Scharf; Pamela Sklar; Patrick F Sullivan; Jaakko Tuomilehto; Ming T Tsuang; Hugh C Watkins; James G Wilson; Mark J Daly; Daniel G MacArthur
Journal:  Nature       Date:  2016-08-18       Impact factor: 49.962

5.  PhenoScanner V2: an expanded tool for searching human genotype-phenotype associations.

Authors:  Mihir A Kamat; James A Blackshaw; Robin Young; Praveen Surendran; Stephen Burgess; John Danesh; Adam S Butterworth; James R Staley
Journal:  Bioinformatics       Date:  2019-11-01       Impact factor: 6.937

6.  Disruptive CHD8 mutations define a subtype of autism early in development.

Authors:  Raphael Bernier; Christelle Golzio; Bo Xiong; Holly A Stessman; Bradley P Coe; Osnat Penn; Kali Witherspoon; Jennifer Gerdts; Carl Baker; Anneke T Vulto-van Silfhout; Janneke H Schuurs-Hoeijmakers; Marco Fichera; Paolo Bosco; Serafino Buono; Antonino Alberti; Pinella Failla; Hilde Peeters; Jean Steyaert; Lisenka E L M Vissers; Ludmila Francescatto; Heather C Mefford; Jill A Rosenfeld; Trygve Bakken; Brian J O'Roak; Matthew Pawlus; Randall Moon; Jay Shendure; David G Amaral; Ed Lein; Julia Rankin; Corrado Romano; Bert B A de Vries; Nicholas Katsanis; Evan E Eichler
Journal:  Cell       Date:  2014-07-03       Impact factor: 41.582

7.  Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models.

Authors:  Hashem A Shihab; Julian Gough; David N Cooper; Peter D Stenson; Gary L A Barker; Keith J Edwards; Ian N M Day; Tom R Gaunt
Journal:  Hum Mutat       Date:  2012-11-02       Impact factor: 4.878

8.  Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants.

Authors:  Wenqing Fu; Timothy D O'Connor; Goo Jun; Hyun Min Kang; Goncalo Abecasis; Suzanne M Leal; Stacey Gabriel; Mark J Rieder; David Altshuler; Jay Shendure; Deborah A Nickerson; Michael J Bamshad; Joshua M Akey
Journal:  Nature       Date:  2012-11-28       Impact factor: 49.962

9.  Transcriptional landscape of the prenatal human brain.

Authors:  Jeremy A Miller; Song-Lin Ding; Susan M Sunkin; Kimberly A Smith; Lydia Ng; Aaron Szafer; Amanda Ebbert; Zackery L Riley; Joshua J Royall; Kaylynn Aiona; James M Arnold; Crissa Bennet; Darren Bertagnolli; Krissy Brouner; Stephanie Butler; Shiella Caldejon; Anita Carey; Christine Cuhaciyan; Rachel A Dalley; Nick Dee; Tim A Dolbeare; Benjamin A C Facer; David Feng; Tim P Fliss; Garrett Gee; Jeff Goldy; Lindsey Gourley; Benjamin W Gregor; Guangyu Gu; Robert E Howard; Jayson M Jochim; Chihchau L Kuan; Christopher Lau; Chang-Kyu Lee; Felix Lee; Tracy A Lemon; Phil Lesnar; Bergen McMurray; Naveed Mastan; Nerick Mosqueda; Theresa Naluai-Cecchini; Nhan-Kiet Ngo; Julie Nyhus; Aaron Oldre; Eric Olson; Jody Parente; Patrick D Parker; Sheana E Parry; Allison Stevens; Mihovil Pletikos; Melissa Reding; Kate Roll; David Sandman; Melaine Sarreal; Sheila Shapouri; Nadiya V Shapovalova; Elaine H Shen; Nathan Sjoquist; Clifford R Slaughterbeck; Michael Smith; Andy J Sodt; Derric Williams; Lilla Zöllei; Bruce Fischl; Mark B Gerstein; Daniel H Geschwind; Ian A Glass; Michael J Hawrylycz; Robert F Hevner; Hao Huang; Allan R Jones; James A Knowles; Pat Levitt; John W Phillips; Nenad Sestan; Paul Wohnoutka; Chinh Dang; Amy Bernard; John G Hohmann; Ed S Lein
Journal:  Nature       Date:  2014-04-02       Impact factor: 49.962

10.  The RD-Connect Registry & Biobank Finder: a tool for sharing aggregated data and metadata among rare disease researchers.

Authors:  Sabina Gainotti; Paola Torreri; Chiuhui Mary Wang; Robert Reihs; Heimo Mueller; Emma Heslop; Marco Roos; Dorota Mazena Badowska; Federico de Paulis; Yllka Kodra; Claudio Carta; Estrella Lopez Martìn; Vanessa Rangel Miller; Mirella Filocamo; Marina Mora; Mark Thompson; Yaffa Rubinstein; Manuel Posada de la Paz; Lucia Monaco; Hanns Lochmüller; Domenica Taruscio
Journal:  Eur J Hum Genet       Date:  2018-02-02       Impact factor: 4.246

View more
  1 in total

Review 1.  Repurposing Drugs via Network Analysis: Opportunities for Psychiatric Disorders.

Authors:  Trang T T Truong; Bruna Panizzutti; Jee Hyun Kim; Ken Walder
Journal:  Pharmaceutics       Date:  2022-07-14       Impact factor: 6.525

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.