| Literature DB >> 22373437 |
Andrew Jaffe1, Genevieve Wojcik1, Audrey Chu1, Asieh Golozar1,2, Ankit Maroo1, Priya Duggal1, Alison P Klein1,3,4.
Abstract
Recent technological advances have allowed us to study individual genomes at a base-pair resolution and have demonstrated that the average exome harbors more than 15,000 genetic variants. However, our ability to understand the biological significance of the identified variants and to connect these observed variants with phenotypes is limited. The first step in this process is to identify genetic variation that is likely to result in changes to protein structure and function, because detailed studies, either population based or functional, for each of the identified variants are not practicable. Therefore algorithms that yield valid predictions of a variant's functional significance are needed. Over the past decade, several programs have been developed to predict the probability that an observed sequence variant will have a deleterious effect on protein function. These algorithms range from empirical programs that classify using known biochemical properties to statistical algorithms trained using a variety of data sources, including sequence conservation data, biochemical properties, and functional data. Using data from the pilot3 study of the 1000 Genomes Project available through Genetic Analysis Workshop 17, we compared the results of four programs (SIFT, PolyPhen, MAPP, and VarioWatch) used to predict the functional relevance of variants in 101 genes. Analysis was conducted without knowledge of the simulation model. Agreement between programs was modest ranging from 59.4% to 71.4% and only 3.5% of variants were classified as deleterious and 10.9% as tolerated across all four programs.Entities:
Year: 2011 PMID: 22373437 PMCID: PMC3287847 DOI: 10.1186/1753-6561-5-S9-S13
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
Agreement between prediction programs
| Program | MAPP (%) | SIFT (%) | VarioWatch (%) | PolyPhen-2 (%) |
|---|---|---|---|---|
| MAPP | 100 | |||
| SIFT | 64.6 | 100 | ||
| VarioWatch | 59.4 | 62.9 | 100 | |
| PolyPhen-2 | 71.4 | 64.2 | 62.9 | 100 |
For the off-diagonal values, this table looks at pairwise agreements between the four prediction programs in predicting whether a variant is tolerated or deleterious. For example, 64.6% of the time, MAPP and SIFT agreed on their prediction, whereas VarioWatch and SIFT only agreed 62.9% of the time. The highest agreement is between MAPP and PolyPhen-2, and the lowest agreement is between MAPP and VarioWatch.
Comparison of deleterious SNPs across programs
| Program | Number of variants classified | Deleterious | Conditional probability of pairwise prediction | ||||
|---|---|---|---|---|---|---|---|
| Number | % | MAPP (%) | SIFT (%) | VarioWatch (%) | PolyPhen-2 (%) | ||
| MAPP | 3,199 | 1,472 | 46.0 | 100 | 58 | 65 | 78 |
| SIFT | 3,562 | 1,603 | 45.0 | 62 | 100 | 71 | 69 |
| VarioWatch | 3,429 | 1,882 | 54.9 | 55 | 58 | 100 | 64 |
| PolyPhen-2 | 3,333 | 1,757 | 52.7 | 65 | 58 | 67 | 100 |
Each program characterizes a different number of deleterious alleles. The pairwise prediction is calculated by conditioning on the probability of a program predicting a variant as deleterious, given that the comparison program already has predicted it as deleterious. For example, in the first row, among those 1,472 variants that MAPP classifies as deleterious, only 58% are also classified as deleterious by SIFT.
Comparison of prediction programs with 15 functional variants
| Chromosome | Nucleotide position | Reference nucleotide | Variant nucleotide | Gene | rs ID number | Amino acid changea | PolyPhen-2 | SIFT | MAPP | VarioWatch | Loss of function |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 138476119 | C | T | rs11558538 | T105I | D | D | D | D | Yes | |
| 4 | 26092552 | C | T | rs52795588 | V365I | D | T | T | D | Yes | |
| 4 | 165337896 | A | G | NA | Y140H | D | D | D | D | Yes | |
| 5 | 7923973 | A | G | rs1801394 | I49M | D | D | D | D | Yes | |
| 7 | 5993056 | C | T | rs1805324 | M622I | T | T | D | T | Yes | |
| 7 | 5993133 | T | A | rs1805318 | T597S | T | T | D | T | Yes | |
| 7 | 127041823 | G | A | NA | R121W | D | D | D | D | Yes | |
| 7 | 127042702 | G | A | rs35155575 | R37W | D | D | D | D | Yes | |
| 10 | 42930125 | G | A | rs1799939 | G691S | D | T | D | D | No | |
| 10 | 72030393 | G | A | rs35947132 | A91V | D | D | D | D | Yes | |
| 12 | 38989178 | G | A | rs7133914 | R1398H | D | T | T | T | Yes | |
| 12 | 39000112 | G | C | rs33949390 | R1628P | D | D | T | D | Yes | |
| 12 | 39043595 | G | A | rs34778348 | G2385R | T | T | D | D | No | |
| 13 | 32911463 | A | G | N991D | T | T | D | D | Yes | ||
| 19 | 15851431 | C | T | rs2108622 | V433M | D | D | T | D | Yes | |
| Correct predictions | 11/15 | 10/15 | 9/15 | 10/15 | |||||||
This table highlights the 15 variants that were a gold standard in a functional study.
a Position of the amino acid change associated with the nucleotide change.