Gongyi Huang1, Shaoli Wang2, Xueqin Wang3, Na You4. 1. School of Mathematics, Sun Yat-sen University, Guangzhou 510275, China. 2. School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai 200433, China Shanghai Key Laboratory of Financial Information Technology, Shanghai University of Finance and Economics, Shanghai 200433, China. 3. School of Mathematics, Sun Yat-sen University, Guangzhou 510275, China South China Center for Statistical Science, Sun Yat-sen University, Guangzhou 510275, China Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510080, China. 4. School of Mathematics, Sun Yat-sen University, Guangzhou 510275, China South China Center for Statistical Science, Sun Yat-sen University, Guangzhou 510275, China.
Abstract
MOTIVATION: The development of next generation sequencing technology provides an efficient and powerful approach to rare variant detection. To identify genetic variations, the essential question is how to quantity the sequencing error rate in the data. Because of the advantage of easy implementation and the ability to integrate data from different sources, the empirical Bayes method is popularly employed to estimate the sequencing error rate for SNP detection. RESULTS: We propose a novel statistical model to fit the observed non-reference allele frequency data, and utilize the empirical Bayes method for both genotyping and SNP detection, where an ECM algorithm is implemented to estimate the model parameters. The performance of our proposed method is investigated via simulations and real data analysis. It is shown that our method makes less genotype-call errors, and with the parameter estimates from the ECM algorithm, it attains high detection power with FDR being well controlled. AVAILABILITY AND IMPLEMENTATION: The proposed algorithm is wrapped in the R package ebGenotyping, which can be downloaded from http://cran.r-project.org/web/packages/ebGenotyping/ CONTACT: youn@mail.sysu.edu.cnSupplementary information: Supplementary data are available at Bioinformatics online.
MOTIVATION: The development of next generation sequencing technology provides an efficient and powerful approach to rare variant detection. To identify genetic variations, the essential question is how to quantity the sequencing error rate in the data. Because of the advantage of easy implementation and the ability to integrate data from different sources, the empirical Bayes method is popularly employed to estimate the sequencing error rate for SNP detection. RESULTS: We propose a novel statistical model to fit the observed non-reference allele frequency data, and utilize the empirical Bayes method for both genotyping and SNP detection, where an ECM algorithm is implemented to estimate the model parameters. The performance of our proposed method is investigated via simulations and real data analysis. It is shown that our method makes less genotype-call errors, and with the parameter estimates from the ECM algorithm, it attains high detection power with FDR being well controlled. AVAILABILITY AND IMPLEMENTATION: The proposed algorithm is wrapped in the R package ebGenotyping, which can be downloaded from http://cran.r-project.org/web/packages/ebGenotyping/ CONTACT: youn@mail.sysu.edu.cnSupplementary information: Supplementary data are available at Bioinformatics online.