| Literature DB >> 33834021 |
Michal Krassowski1, Diogo Pellegrina2, Miles W Mee2, Amelie Fradet-Turcotte3,4, Mamatha Bhat5,6, Jüri Reimand2,7,8.
Abstract
Deciphering the functional impact of genetic variation is required to understand phenotypic diversity and the molecular mechanisms of inherited disease and cancer. While millions of genetic variants are now mapped in genome sequencing projects, distinguishing functional variants remains a major challenge. Protein-coding variation can be interpreted using post-translational modification (PTM) sites that are core components of cellular signaling networks controlling molecular processes and pathways. ActiveDriverDB is an interactive proteo-genomics database that uses more than 260,000 experimentally detected PTM sites to predict the functional impact of genetic variation in disease, cancer and the human population. Using machine learning tools, we prioritize proteins and pathways with enriched PTM-specific amino acid substitutions that potentially rewire signaling networks via induced or disrupted short linear motifs of kinase binding. We then map these effects to site-specific protein interaction networks and drug targets. In the 2021 update, we increased the PTM datasets by nearly 50%, included glycosylation, sumoylation and succinylation as new types of PTMs, and updated the workflows to interpret inherited disease mutations. We added a recent phosphoproteomics dataset reflecting the cellular response to SARS-CoV-2 to predict the impact of human genetic variation on COVID-19 infection and disease course. Overall, we estimate that 16-21% of known amino acid substitutions affect PTM sites among pathogenic disease mutations, somatic mutations in cancer genomes and germline variants in the human population. These data underline the potential of interpreting genetic variation through the lens of PTMs and signaling networks. The open-source database is freely available at www.ActiveDriverDB.org.Entities:
Keywords: cancer drivers; cell signaling; databases; disease genes; genome variation; post-translational modifications (PTM); protein interaction networks
Year: 2021 PMID: 33834021 PMCID: PMC8021862 DOI: 10.3389/fcell.2021.626821
Source DB: PubMed Journal: Front Cell Dev Biol ISSN: 2296-634X
FIGURE 1Outline of ActiveDriverDB. ActiveDriverDB is an interactive proteo-genomics database for interpreting human genetic variation using post-translational modification (PTM) sites. (A,B) The database integrates PTM sites from experimental studies collected from proteomics databases with amino acid substitutions from genome sequencing projects and curated databeses of disease mutations. (C) In the Sequence View, substitutions in PTM sites are classified based on their functional impact as direct (at a PTM residue), proximal or distal (within 1–2 or 3–7 positions of a PTM residue), or network-rewiring. (D) Network-rewiring substitutions at PTM sites are predicted to disrupt short linear motifs or create new motifs bound by kinases and other enzymes. (E) In the Network View, proteins and PTM sites are visualized with their interactions with PTM enzymes (e. g., kinases) and the known drugs targeting the enzymes. (F) The database also provides prioritized lists of genes and pathways, comprehensive data visualizations and an application user interface (API) for analysing custom variant datasets using computational pipelines.
FIGURE 2PTM sites and mutations in ActiveDriverDB. (A) Summary of genetic variants (i.e., amino acid substitutions) affecting PTM sites in the database. Eight types of PTM sites are shown as horizontal stacked bar plots (left to right) with five genome variation data-stes (top to bottom): interited disease mutations (*ClinVar: only pathogenic and likely pathogenic variants), somatic cancer mutations (TCGA, PCAWG) and human population variation (1000 Genomes, ESP6500). Colors indicate the predicted impact of substitution on PTM sites. Total numbers of unique PTM-associated substitutions in consensus protein isoforms are shown. (B) Bar plot shows counts of PTM sites and relatedd substitutions in ActiveDriverDB. The current and previous versions of the database are compared. (C) Allele frequency of substitutions in the human population (1000 Genomes) affecting the phosphosites modulated by the SARC-CoV-2 infection in Vero E6 cells. Population cohorts are shown in colors (AFR, African; Admixed American; EAS, East Asian; EUR, European; SAS, South Asian). (D) Top genes with PTM-related substitutions in all PTM sites in inheried disease and cancer, genes with glycosylation and sumoylation-associated subtitutions, and top genes in the human population with SARS-CoV-2-specific phosphosites affected by substitutions. Colors indicate the predicted impact of substitutions on PTM sites. Genes were prioritized using ActiveDriver (FDR < 0.05), except for the rightmost group where unique substitution counts were used.
FIGURE 3Putative impact of adjacent and distal PTM-flanking residues on kinase binding motifs. (A) Histogram of substitutions in PTM sites relative to the distance to the closest modified residues. (B) Enrichments of amino acids in the 125 kinase binding site models of position weight matrices (PWMs). Each point represents a position in the consensus binding sequence (short linear motif) of a specific kinase. For each flanking position in the motif (X-axis), the amino acid with the highest enrichment relative to its proteome-wide distribution is shown on the Y-axis, indicating the potential impact of sustitutions at these positions. Kinases with amino acids showing at least eight-fold enrichment at the furthest flanking positions (6th, 7th) are labelled. (C) Examples of kinases with enrichments at the 6th and 7th flanking positions of PTM sites. PWM logos show the prevalence of specific amino acids (Y-axis) at the flanking positions (X-axis). Asterisks show the furthest flanking positions from panel A.