AIMS: Next-generation sequencing has opened the possibility of large-scale sequence-based disease association studies. A major challenge in interpreting whole-exome data is predicting which of the discovered variants are deleterious or neutral. To address this question in silico, we have developed a score called Combined Annotation scoRing toOL (CAROL), which combines information from 2 bioinformatics tools: PolyPhen-2 and SIFT, in order to improve the prediction of the effect of non-synonymous coding variants. METHODS: We used a weighted Z method that combines the probabilistic scores of PolyPhen-2 and SIFT. We defined 2 dataset pairs to train and test CAROL using information from the dbSNP: 'HGMD-PUBLIC' and 1000 Genomes Project databases. The training pair comprises a total of 980 positive control (disease-causing) and 4,845 negative control (non-disease-causing) variants. The test pair consists of 1,959 positive and 9,691 negative controls. RESULTS: CAROL has higher predictive power and accuracy for the effect of non-synonymous variants than each individual annotation tool (PolyPhen-2 and SIFT) and benefits from higher coverage. CONCLUSION: The combination of annotation tools can help improve automated prediction of whole-genome/exome non-synonymous variant functional consequences.
AIMS: Next-generation sequencing has opened the possibility of large-scale sequence-based disease association studies. A major challenge in interpreting whole-exome data is predicting which of the discovered variants are deleterious or neutral. To address this question in silico, we have developed a score called Combined Annotation scoRing toOL (CAROL), which combines information from 2 bioinformatics tools: PolyPhen-2 and SIFT, in order to improve the prediction of the effect of non-synonymous coding variants. METHODS: We used a weighted Z method that combines the probabilistic scores of PolyPhen-2 and SIFT. We defined 2 dataset pairs to train and test CAROL using information from the dbSNP: 'HGMD-PUBLIC' and 1000 Genomes Project databases. The training pair comprises a total of 980 positive control (disease-causing) and 4,845 negative control (non-disease-causing) variants. The test pair consists of 1,959 positive and 9,691 negative controls. RESULTS: CAROL has higher predictive power and accuracy for the effect of non-synonymous variants than each individual annotation tool (PolyPhen-2 and SIFT) and benefits from higher coverage. CONCLUSION: The combination of annotation tools can help improve automated prediction of whole-genome/exome non-synonymous variant functional consequences.
Authors: S T Sherry; M H Ward; M Kholodov; J Baker; L Phan; E M Smigielski; K Sirotkin Journal: Nucleic Acids Res Date: 2001-01-01 Impact factor: 16.971
Authors: Peter D Stenson; Edward V Ball; Matthew Mort; Andrew D Phillips; Jacqueline A Shiel; Nick S T Thomas; Shaun Abeysinghe; Michael Krawczak; David N Cooper Journal: Hum Mutat Date: 2003-06 Impact factor: 4.878
Authors: Robert Brown; Hane Lee; Ascia Eskin; Gleb Kichaev; Kirk E Lohmueller; Bruno Reversade; Stanley F Nelson; Bogdan Pasaniuc Journal: Eur J Hum Genet Date: 2015-04-22 Impact factor: 4.246
Authors: Carolin Knecht; Matthew Mort; Olaf Junge; David N Cooper; Michael Krawczak; Amke Caliebe Journal: Nucleic Acids Res Date: 2017-02-17 Impact factor: 16.971
Authors: Nilah M Ioannidis; Joseph H Rothstein; Vikas Pejaver; Sumit Middha; Shannon K McDonnell; Saurabh Baheti; Anthony Musolf; Qing Li; Emily Holzinger; Danielle Karyadi; Lisa A Cannon-Albright; Craig C Teerlink; Janet L Stanford; William B Isaacs; Jianfeng Xu; Kathleen A Cooney; Ethan M Lange; Johanna Schleutker; John D Carpten; Isaac J Powell; Olivier Cussenot; Geraldine Cancel-Tassin; Graham G Giles; Robert J MacInnis; Christiane Maier; Chih-Lin Hsieh; Fredrik Wiklund; William J Catalona; William D Foulkes; Diptasri Mandal; Rosalind A Eeles; Zsofia Kote-Jarai; Carlos D Bustamante; Daniel J Schaid; Trevor Hastie; Elaine A Ostrander; Joan E Bailey-Wilson; Predrag Radivojac; Stephen N Thibodeau; Alice S Whittemore; Weiva Sieh Journal: Am J Hum Genet Date: 2016-09-22 Impact factor: 11.025