| Literature DB >> 25473426 |
Graham Rs Ritchie1, Paul Flicek1.
Abstract
Identifying sequence variants that play a mechanistic role in human disease and other phenotypes is a fundamental goal in human genetics and will be important in translating the results of variation studies. Experimental validation to confirm that a variant causes the biochemical changes responsible for a given disease or phenotype is considered the gold standard, but this cannot currently be applied to the 3 million or so variants expected in an individual genome. This has prompted the development of a wide variety of computational approaches that use several different sources of information to identify functional variation. Here, we review and assess the limitations of computational techniques for categorizing variants according to functional classes, prioritizing variants for experimental follow-up and generating hypotheses about the possible molecular mechanisms to inform downstream experiments. We discuss the main current bioinformatics approaches to identifying functional variation, including widely used algorithms for coding variation such as SIFT and PolyPhen and also novel techniques for interpreting variation across the genome.Entities:
Year: 2014 PMID: 25473426 PMCID: PMC4254438 DOI: 10.1186/s13073-014-0087-1
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
A summary of selected computational tools and their applications
|
|
|
|
|
|
|---|---|---|---|---|
|
| ||||
| Ensembl Genome Browser | Manual variant annotation and genomic context | Web server, data also available via Perl and REST APIs |
| [ |
| UCSC Genome Browser | Manual variant annotation and genomic context | Web server, data also available for download using the UCSC table browser | http://www.genome.ucsc.edu | [ |
| Bedtools | Automatic high performance feature overlap and proximity | Command line tool and Python interface |
| [ |
| Bedops | Automatic high performance feature overlap and proximity | Command line tool |
| [ |
| HaploReg | Web server identifying non-coding annotations for variants and haplotypes | Web server with pre-computed results for several GWAS |
| [ |
|
| ||||
| Ensembl Variant Effect Predictor (VEP) | Wide support for variant annotation, emphasis on genic variants, but also incorporates regulatory elements and TF motifs from JASPAR | Downloadable software, web server, Perl and REST APIs, plugin system to add functionality |
| [ |
| ANNOVAR | Annotation of genic variants, can also identify overlaps with other annotated elements | Downloadable software |
| [ |
| VAT | Annotation of genic variants | Downloadable software |
| [ |
| SnpEff | Annotation of genic variants, companion tool SnpSift can filter results by annotations | Downloadable software |
| [ |
| RegulomeDB | Identifies overlaps with non-coding elements and applies heuristic rules to predict consequences | Web server |
| [ |
|
| ||||
| JASPAR | Open access database of TF binding PWMs | Queryable interface and database downloads |
| [ |
| MEME suite | Several tools for handling PWMs | Web services and downloadable tools |
| [ |
| MOODS | Tool for aligning PWMs to sequences | Command line tool | http:// | [ |
| Human Splicing Finder | Tool for computing the effects of mutations on splicing | Web server | http:// | [ |
|
| ||||
| GERP | Nucleotide resolution conservation scores | Downloadable software, pre-computed scores and elements for human and mouse genomes |
| [ |
| PHAST package | Suite of tools for phylogenetic analyses, including phastCons and phyloP | Downloadable software and R package |
| [ |
| SCONE | Position-specific conservation scores | Downloadable software |
| [ |
| SIFT | Predicts deleterious AASs) based on conservation and physico-chemical principles | Downloadable software and web server |
| [ |
| FATHMM | Uses a hidden Markov model to identify AASs likely to be deleterious | Downloadable software and web server, VEP plugin |
| [ |
|
| ||||
| PolyPhen | Predicts deleterious AASs based on several sequence and structural features | Downloadable software and web server, pre-computed predictions for all possible substitutions |
| [ |
| MutationTaster | Classifier which can predict deleterious variants in genic regions, including coding regions and splice sites | Web server | http:// | [ |
| MutationAssessor | Predicts deleterious AASs based on evolutionary conservation | Web server, pre-computed scores for all possible substitutions | http:// | [ |
| SNAP | Predicts deleterious AASs based on a range of protein level information | Downloadable software and web server |
| [ |
| PhD-SNP | Predicts deleterious AASs based on protein sequence information | Downloadable software and web server |
| [ |
| Condel | Tool that integrates predictions from multiple AAS prediction tools | Downloadable software and web server, VEP plugin |
| [ |
| CAROL | Tool that integrates scores from SIFT and PolyPhen using a weighted Z method | Downloadable R script, VEP plugin | http:// | [ |
| GWAVA | Classifier identifying likely functional regulatory variants | Downloadable software and database of pre-computed scores and annotations for known variants, VEP plugin | http:// | [ |
| CADD | Integrated classifier that can score all classes of variants | Web server, pre-computed scores for all possible SNVs, VEP plugin |
| [ |
|
| ||||
| fgwas | Command line tool for incorporating functional information into a GWAS | Downloadable software |
| [ |
| SKAT | A test for association between a set of variants and dichotomous or quantitative phenotypes | Downloadable software |
| [ |
| VT | Tests for pooled association of multiple rare variants and phenotypes | Downloadable software |
| [ |
| VAAST | Probabilistic tool to identify causal genes and variants in disease | Downloadable software, free for academic use, license required for commercial usage |
| [ |
Abbreviations: AAS amino acid substitution, API application programming interface, GWAS genome-wide association studies, PWM position weight matrix, REST representational state transfer (an architecture style for designing networked applications), TF transcription factor, UCSC University of California Santa Cruz, VEP Variant Effect Predictor.
Figure 1A set of annotation terms used to describe the potential effects of sequence variants according to the genic regions they fall in and their allele sequences. The terms are drawn from the Sequence Ontology and are depicted on the molecules they are predicted to affect. Variants categorized as any of the terms 2, 4, 9 and 10 are often collectively referred to as ‘loss-of-function’ variants, and are typically expected to severely affect gene function [25].
Figure 2A sequence logo for the transcriptional factor CTCF derived from binding site predictions from Ensembl on human chromosome 22. The height of the letters represents information content at each position. For example, if a particular nucleotide is always found at a given position, it will have the maximal height and information content, while if a position has all four nucleotides at equal frequencies, it will have a minimal height and no information content. One instance of a motif alignment is shown, which contains a variant at a high information position (boxed). The alternative allele at this position, A, results in a sequence more different from the motif represented by the PWM as measured by the motif score.
Figure 3A protein multiple alignment for the human gene built from the SIFT alignment pipeline. Color intensity corresponds to conservation in each column. Two variants that are predicted to alter the amino acid sequence (A/V and Y/H) are indicated by arrows and their SIFT scores are presented. Note that SIFT scores ≤0.05 are predicted to be deleterious and other scores are predicted to be tolerated.