| Literature DB >> 35669450 |
Daniele Raimondi1, Francesco Codicè2, Gabriele Orlando3,4, Joost Schymkowitz3,4, Frederic Rousseau3,4, Yves Moreau1.
Abstract
Current human Single Amino acid Variants (SAVs) databases provide a link between a SAVs and their effect on the carrier individual phenotype, often dividing them into Deleterious/Neutral variants. This is a very coarse-grained description of the genotype-to-phenotype relationship because it relies on un-realistic assumptions such as the perfect Mendelian behavior of each SAV and considers only dichotomic phenotypes. Moreover, the link between the effect of a SAV on a protein (its molecular phenotype) and the individual phenotype is often very complex, because multiple level of biological abstraction connect the protein and individual level phenotypes. Here we present HPMPdb, a manually curated database containing human SAVs associated with the detailed description of the molecular phenotype they cause on the affected proteins. With particular regards to machine learning (ML), this database can be used to let researchers go beyond the existing Deleterious/Neutral prediction paradigm, allowing them to build molecular phenotype predictors instead. Our class labels describe in a succinct way the effects that each SAV has on 15 protein molecular phenotypes, such as protein-protein interaction, small molecules binding, function, post-translational modifications (PTMs), sub-cellular localization, mimetic PTM, folding and protein expression. Moreover, we provide researchers with all necessary means to re-producibly train and test their models on our database. The webserver and the data described in this paper are available at hpmp.esat.kuleuven.be.Entities:
Keywords: Bioinformatics; Database; Molecular phenotype; Single aminoacid variants; Variant-effect predictor
Year: 2022 PMID: 35669450 PMCID: PMC9166469 DOI: 10.1016/j.crstbi.2022.04.004
Source DB: PubMed Journal: Curr Res Struct Biol ISSN: 2665-928X
Fig. 1Figure showing the “Browse” page of HPMPdb database, where the user can interactively search for specific SAVs, proteins, keywords, class labels and use various filters to select subsets of the dataset (e.g. by collapsing columns).
Fig. 2Figure showing the number of occurrences of each class label in the dataset.
Fig. 3Figure showing, for each class label, the proportion of the SAVs annotate to reduce or eliminate the corresponding phenotype (label −1), to leave it unchanged (label 0) and to cause a gain of function (label 1).