| Literature DB >> 30040077 |
Oriol Fornes1, Marius Gheorghe2, Phillip A Richmond1, David J Arenillas1, Wyeth W Wasserman1, Anthony Mathelier2,3.
Abstract
Interpreting the functional impact of noncoding variants is an ongoing challenge in the field of genome analysis. With most noncoding variants associated with complex traits and disease residing in regulatory regions, altered transcription factor (TF) binding has been proposed as a mechanism of action. It is therefore imperative to develop methods that predict the impact of noncoding variants at TF binding sites (TFBSs). Here, we describe the update of our MANTA database that stores: 1) TFBS predictions in the human genome, and 2) the potential impact on TF binding for all possible single nucleotide variants (SNVs) at these TFBSs. TFBSs were predicted by combining experimental ChIP-seq data from ReMap and computational position weight matrices (PWMs) derived from JASPAR. Impact of SNVs at these TFBSs was assessed by means of PWM scores computed on the alternate alleles. The updated database, MANTA2, provides the scientific community with a critical map of TFBSs and SNV impact scores to improve the interpretation of noncoding variants in the human genome.Entities:
Mesh:
Substances:
Year: 2018 PMID: 30040077 PMCID: PMC6057437 DOI: 10.1038/sdata.2018.141
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
List of published tools with the capacity to evaluate the impact of noncoding variants.
| For each “Method”, we describe its “Intended use”, “Algorithmic approach”, underlying “Genomic features” and PubMed ID (“PMID”) of the corresponding publication. | ||||
|---|---|---|---|---|
| CADD | pathogenicity | support vector machine | conservation, epigenomic annotations | |
| CpGenie | impact on methylation | deep neural network | conservation, epigenomic annotations, TFBS alterations | |
| DANN | pathogenicity | deep neural network | conservation, epigenomic annotations | |
| DeepSEA | regulatory potential | deep neural network, logistic regression classifier | conservation, epigenomic annotations, TFBS alterations | |
| deltaSVM | regulatory potential | support vector machine | epigenomic annotations, TFBS alterations | |
| Eigen | pathogenicity | spectral clustering | conservation, epigenomic annotations | |
| FATHMM | pathogenicity | hidden Markov model | conservation, epigenomic annotations | |
| fitCons | fitness consequence | generative probability, genome partitioning | conservation, epigenomic annotations | |
| FunSeq2 | cancer pathogenicity | feature-based scoring, PWM scoring, somatic hotspots | conservation, epigenomic annotations, TFBS alterations | |
| GWAVA | pathogenicity | random forest | conservation, epigenomic annotations, TFBS alterations | |
| LINSIGHT | regulatory potential | linear regression, generative probability | conservation, epigenomic annotations, TFBS alterations | |
| MANTA | regulatory potential | PWM scoring | TFBS alterations | |
| RegulomeDB | regulatory potential | feature-based scoring, PWM scoring | conservation, epigenomic annotations, TFBS alterations | |
| ReMM | pathogenicity | random forest | conservation, epigenomic annotations | |
| RVSP | regulatory potential | random forest | conservation, epigenomic annotations | |
| SNP2TFBS | regulatory potential | PWM scoring | TFBS alterations |
Figure 1Overview of MANTA2.
a) Intersection of the ReMap ChIP-seq regions with JASPAR TFBS predictions to produce a set of TFBSs with both experimental and computational evidence of TF binding. A mock example of JUN is given for a region on chromosome one. b) A matrix representing the difference in PWM score for all possible SNVs compared to the reference sequence at that TFBS, including negative impact (−), positive impact (+), and no change (0) of score. Black boxes indicate that nucleotides of the reference TFBS sequence are not stored in the database. The sequence logo for JUN is provided below the matrix where the information content is proportional to the size of the nucleotide letters. c) Mock distribution of TFBS SNV impact scores when considering all possible SNVs in the TFBS. The distribution is annotated with examples of decreased TF binding capacity (red), no change in TF binding capacity (yellow), and increased TF binding capacity (green).
Figure 2Assessing MANTA2 impact scores with heterozygous TF-binding events.
a) Allelic imbalance is calculated as the number of ChIP-seq reads mapped on the alternate allele divided by the total number of reads mapped at heterozygous sites. b) MANTA2 impact scores correlate with allelic imbalance of ChIP-seq data. Events (blue dots) are plotted with respect to their allelic imbalance of ChIP-seq reads (x-axis) and impact scores from MANTA2 (y-axis). The Pearson coefficient (R) and P-value (p) of the correlation between allelic imbalance and impact score are provided in the plot.