| Literature DB >> 23369106 |
Ivan Merelli1, Andrea Calabria, Paolo Cozzi, Federica Viti, Ettore Mosca, Luciano Milanesi.
Abstract
BACKGROUND: The capability of correlating specific genotypes with human diseases is a complex issue in spite of all advantages arisen from high-throughput technologies, such as Genome Wide Association Studies (GWAS). New tools for genetic variants interpretation and for Single Nucleotide Polymorphisms (SNPs) prioritization are actually needed. Given a list of the most relevant SNPs statistically associated to a specific pathology as result of a genotype study, a critical issue is the identification of genes that are effectively related to the disease by re-scoring the importance of the identified genetic variations. Vice versa, given a list of genes, it can be of great importance to predict which SNPs can be involved in the onset of a particular disease, in order to focus the research on their effects.Entities:
Mesh:
Year: 2013 PMID: 23369106 PMCID: PMC3548692 DOI: 10.1186/1471-2105-14-S1-S9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1General schema of SNPranker 2.0 pipeline. The SNPranker 2.0 reference database collects data from various public data sources using a gene-centric design. This infrastructure represents the core of the SNPranker 2.0 pipeline, which starts with a list of genes (or a biological process or a list of SNPs) as input. If required, the ontological expansion retrieves all genes related to input ones according to a user defined similarity measure and threshold. SNPranker 2.0 performs the score computation using selected features and their corresponding weights, and a final table of SNPs is returned to users with a prioritization score for each SNP.
Figure 2Fitness optimization trend of the genetic algorithm. The figure shows that the optimization algorithm is able to minimize the fitness value within the limit of 100 generations.
Figure 3SNPranker 2.0 home page screenshot. A screenshot of the SNPranker 2.0 home page, with feature sections collapsed and expanded: the protein section gives an example of the available features with their relative weights. At the bottom, in the scoring function section, an information balloon is opened.
Default feature weights as result of the optimization process.
| Section | Feature Name | Feature Weight |
|---|---|---|
| SNPs and Genes | MAF | 0.3133 |
| Localization | 0.7052 | |
| Essential Genes | 0.2665 | |
| Phylo | 0.5797 | |
| Lamina associated domains | 0.2444 | |
| Epigenetics and transcription regulations | Open Chromatin | 0.1596 |
| Chromatin Structure | 0.7525 | |
| Methylation (seq regions) | 0.4009 | |
| Methylation | 0.3743 | |
| CpG Island | 0.8992 | |
| DNase clusters | 0.9558 | |
| TSS (eponine) | 0.3705 | |
| CpG islands, promoters, first exons | 0.9665 | |
| FOX2 CLIP-seq | 0.5608 | |
| TAF1 binding sites | 0.2468 | |
| Intergenic regulatory elements | 0.4006 | |
| TSS (SwitchGear) | 0.6773 | |
| Regulatory regions (OregAnno) | 0.8818 | |
| TFBS (TRANSFAC) | 0.8243 | |
| TXN factor ChIP-Seq | 0.4477 | |
| Enhancers (VISTA) | 0.9571 | |
| Translation regulations | Alternative Splicing | 0.7032 |
| miRNA binding regions | 0.8358 | |
| Proteins | Hub protein | 0.5796 |
| Protein Domain | 0.6316 | |
| PolyPhen | 0.5678 | |
| SNPs 3D | 0.5977 | |
| LS-SNP | 0.3158 | |
| Protein Interactions | 0.3728 | |
| PTM | 0.5399 | |
| Disease | Pathologies OMIM | 0.2904 |
The table presents the SNP features used by SNPranker 2.0 with their default weights, according to the optimization performed using the genetic algorithm (sensitivity = 0.814, specificity = 0.761 and accuracy = 0.761).
Semantic similarity analysis of tested genes.
| OMIM Disorder | Gene Symbol | Similarity score | |
|---|---|---|---|
| Input Gene | Known Associated | ||
| B-Cell Cll/Lymphoma 2 | BCL2 | CDKN2A | 0.389 |
| MYC | 0.305 | ||
| TP53 | 0.434 | ||
| BRCA1 | 0.382 | ||
| BRCA2 | 0.329 | ||
| CCND1 | 0.247 | ||
| ATM | 0.370 | ||
| Bipolar disorder | KLF12 | RORA | 0.636 |
| RORB | 0.759 | ||
| ARNTL | 0.636 | ||
| HTR2A | 0.301 | ||
Given two scenarios of different disorders, the table shows the similarities, computed using the Wang metrics score, among genes that are known to be associated with the pathologies and the ontologically enriched output gene lists.
SNPranker results comparison with a GWAS for Bipolar Disorder.
| Gene | SNP ID | Chr | Position | Strand | Alleles | Function | |
|---|---|---|---|---|---|---|---|
| ARNTL | rs900145 | 11 | 13250480 | 13250481 | - | A/G | unknown (intergenic) |
| HTR2A | rs1575891 | 13 | 47096716 | 47096717 | + | C/T | unknown (intergenic) |
| KLF12 | rs9543325 | 13 | 72814628 | 72814629 | + | C/T | unknown (intergenic) |
| KLF12 | rs1886512 | 13 | 73418186 | 73418187 | + | A/T | intron |
| RORA | rs3743266 | 15 | 58568804 | 58568805 | - | A/G | unknown (UTR-3) |
| RORA | rs340029 | 15 | 58682256 | 58682257 | + | C/T | intron |
| RORA | rs3784609 | 15 | 58697841 | 58697842 | - | A/G | intron |
| RORA | rs11071559 | 15 | 58857279 | 58857280 | + | C/T | intron |
| RORA | rs12912233 | 15 | 59054387 | 59054388 | + | C/T | intron |
| RORA | rs809736 | 15 | 59117079 | 59117080 | + | A/G | intron |
Given the best results of the GWAS concerning the Bipolar Disorder [53], the table shows the SNPs that have been correctly predicted in our final SNPs table.