Luciana R Montenegro1, Antônio M Lerário1,2, Miriam Y Nishi1, Alexander A L Jorge3, Berenice B Mendonca1. 1. Unidade de Endocrinologia do Desenvolvimento / LIM42 / SELA, Disciplina de Endocrinologia, Hospital das Clinicas (HCFMUSP), Faculdade de Medicina, Universidade de Sao Paulo, Sao Paulo, SP, BR. 2. Division of Metabolism, Department of Internal Medicine, Endocrinology and Diabetes, University of Michigan, Ann Arbor, United States of America. 3. Unidade de Endocrinologia Genetica (LIM25), Disciplina de Endocrinologia, Faculdade de Medicina (FMUSP), Universidade de Sao Paulo, Sao Paulo, SP, BR.
Abstract
OBJECTIVES: Single nucleotide variants (SNVs) are the most common type of genetic variation among humans. High-throughput sequencing methods have recently characterized millions of SNVs in several thousand individuals from various populations, most of which are benign polymorphisms. Identifying rare disease-causing SNVs remains challenging, and often requires functional in vitro studies. Prioritizing the most likely pathogenic SNVs is of utmost importance, and several computational methods have been developed for this purpose. However, these methods are based on different assumptions, and often produce discordant results. The aim of the present study was to evaluate the performance of 11 widely used pathogenicity prediction tools, which are freely available for identifying known pathogenic SNVs: Fathmn, Mutation Assessor, Protein Analysis Through Evolutionary Relationships (Phanter), Sorting Intolerant From Tolerant (SIFT), Mutation Taster, Polymorphism Phenotyping v2 (Polyphen-2), Align Grantham Variation Grantham Deviation (Align-GVGD), CAAD, Provean, SNPs&GO, and MutPred. METHODS: We analyzed 40 functionally proven pathogenic SNVs in four different genes associated with differences in sex development (DSD): 17β-hydroxysteroid dehydrogenase 3 (HSD17B3), steroidogenic factor 1 (NR5A1), androgen receptor (AR), and luteinizing hormone/chorionic gonadotropin receptor (LHCGR). To evaluate the false discovery rate of each tool, we analyzed 36 frequent (MAF>0.01) benign SNVs found in the same four DSD genes. The quality of the predictions was analyzed using six parameters: accuracy, precision, negative predictive value (NPV), sensitivity, specificity, and Matthews correlation coefficient (MCC). Overall performance was assessed using a receiver operating characteristic (ROC) curve. RESULTS: Our study found that none of the tools were 100% precise in identifying pathogenic SNVs. The highest specificity, precision, and accuracy were observed for Mutation Assessor, MutPred, SNP, and GO. They also presented the best statistical results based on the ROC curve statistical analysis. Of the 11 tools evaluated, 6 (Mutation Assessor, Phanter, SIFT, Mutation Taster, Polyphen-2, and CAAD) exhibited sensitivity >0.90, but they exhibited lower specificity (0.42-0.67). Performance, based on MCC, ranged from poor (Fathmn=0.04) to reasonably good (MutPred=0.66). CONCLUSION: Computational algorithms are important tools for SNV analysis, but their correlation with functional studies not consistent. In the present analysis, the best performing tools (based on accuracy, precision, and specificity) were Mutation Assessor, MutPred, and SNPs&GO, which presented the best concordance with functional studies.
OBJECTIVES: Single nucleotide variants (SNVs) are the most common type of genetic variation among humans. High-throughput sequencing methods have recently characterized millions of SNVs in several thousand individuals from various populations, most of which are benign polymorphisms. Identifying rare disease-causing SNVs remains challenging, and often requires functional in vitro studies. Prioritizing the most likely pathogenic SNVs is of utmost importance, and several computational methods have been developed for this purpose. However, these methods are based on different assumptions, and often produce discordant results. The aim of the present study was to evaluate the performance of 11 widely used pathogenicity prediction tools, which are freely available for identifying known pathogenic SNVs: Fathmn, Mutation Assessor, Protein Analysis Through Evolutionary Relationships (Phanter), Sorting Intolerant From Tolerant (SIFT), Mutation Taster, Polymorphism Phenotyping v2 (Polyphen-2), Align Grantham Variation Grantham Deviation (Align-GVGD), CAAD, Provean, SNPs&GO, and MutPred. METHODS: We analyzed 40 functionally proven pathogenic SNVs in four different genes associated with differences in sex development (DSD): 17β-hydroxysteroid dehydrogenase 3 (HSD17B3), steroidogenic factor 1 (NR5A1), androgen receptor (AR), and luteinizing hormone/chorionic gonadotropin receptor (LHCGR). To evaluate the false discovery rate of each tool, we analyzed 36 frequent (MAF>0.01) benign SNVs found in the same four DSD genes. The quality of the predictions was analyzed using six parameters: accuracy, precision, negative predictive value (NPV), sensitivity, specificity, and Matthews correlation coefficient (MCC). Overall performance was assessed using a receiver operating characteristic (ROC) curve. RESULTS: Our study found that none of the tools were 100% precise in identifying pathogenic SNVs. The highest specificity, precision, and accuracy were observed for Mutation Assessor, MutPred, SNP, and GO. They also presented the best statistical results based on the ROC curve statistical analysis. Of the 11 tools evaluated, 6 (Mutation Assessor, Phanter, SIFT, Mutation Taster, Polyphen-2, and CAAD) exhibited sensitivity >0.90, but they exhibited lower specificity (0.42-0.67). Performance, based on MCC, ranged from poor (Fathmn=0.04) to reasonably good (MutPred=0.66). CONCLUSION: Computational algorithms are important tools for SNV analysis, but their correlation with functional studies not consistent. In the present analysis, the best performing tools (based on accuracy, precision, and specificity) were Mutation Assessor, MutPred, and SNPs&GO, which presented the best concordance with functional studies.
Authors: Lin Lin; Pascal Philibert; Bruno Ferraz-de-Souza; Daniel Kelberman; Tessa Homfray; Assunta Albanese; Veruska Molini; Neil J Sebire; Silvia Einaudi; Gerard S Conway; Ieuan A Hughes; J Larry Jameson; Charles Sultan; Mehul T Dattani; John C Achermann Journal: J Clin Endocrinol Metab Date: 2007-01-02 Impact factor: 5.958
Authors: A C Latronico; A N Abell; I J Arnhold; X Liu; T S Lins; V N Brito; A E Billerbeck; D L Segaloff; B B Mendonca Journal: J Clin Endocrinol Metab Date: 1998-07 Impact factor: 5.958
Authors: Ngak-Leng Sim; Prateek Kumar; Jing Hu; Steven Henikoff; Georg Schneider; Pauline C Ng Journal: Nucleic Acids Res Date: 2012-06-11 Impact factor: 16.971
Authors: Hashem A Shihab; Julian Gough; Matthew Mort; David N Cooper; Ian N M Day; Tom R Gaunt Journal: Hum Genomics Date: 2014-06-30 Impact factor: 4.639
Authors: Philipp Rentzsch; Daniela Witten; Gregory M Cooper; Jay Shendure; Martin Kircher Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971