| Literature DB >> 34762822 |
Cigdem Sevim Bayrak1, David Stein2, Aayushee Jain3, Kumardeep Chaudhary1, Girish N Nadkarni4, Tielman T Van Vleck1, Anne Puel5, Stephanie Boisson-Dupuis5, Satoshi Okada6, Peter D Stenson7, David N Cooper7, Avner Schlessinger8, Yuval Itan9.
Abstract
Identifying whether a given genetic mutation results in a gene product with increased (gain-of-function; GOF) or diminished (loss-of-function; LOF) activity is an important step toward understanding disease mechanisms because they may result in markedly different clinical phenotypes. Here, we generated an extensive database of documented germline GOF and LOF pathogenic variants by employing natural language processing (NLP) on the available abstracts in the Human Gene Mutation Database. We then investigated various gene- and protein-level features of GOF and LOF variants and applied machine learning and statistical analyses to identify discriminative features. We found that GOF variants were enriched in essential genes, for autosomal-dominant inheritance, and in protein binding and interaction domains, whereas LOF variants were enriched in singleton genes, for protein-truncating variants, and in protein core regions. We developed a user-friendly web-based interface that enables the extraction of selected subsets from the GOF/LOF database by a broad set of annotated features and downloading of up-to-date versions. These results improve our understanding of how variants affect gene/protein function and may ultimately guide future treatment options.Entities:
Keywords: database; feature importance; functional consequence; gain-of-function; genetic variants; loss-of-function; machine learning; natural language processing; online server
Mesh:
Substances:
Year: 2021 PMID: 34762822 PMCID: PMC8715146 DOI: 10.1016/j.ajhg.2021.10.007
Source DB: PubMed Journal: Am J Hum Genet ISSN: 0002-9297 Impact factor: 11.043