Yaoyao Li1, Xiguo Yuan2, Junying Zhang3, Liying Yang1, Jun Bai4, Shan Jiang5. 1. School of Computer Science and Technology, Xidian University, No. 2 South Taibai Road, Xi'an, Shaanxi, People's Republic of China. 2. School of Computer Science and Technology, Xidian University, No. 2 South Taibai Road, Xi'an, Shaanxi, People's Republic of China. xiguoyuan@mail.xidian.edu.cn. 3. School of Computer Science and Technology, Xidian University, No. 2 South Taibai Road, Xi'an, Shaanxi, People's Republic of China. jyzhang@mail.xidian.edu.cn. 4. Department of Medical Oncology, Shaanxi Provincial People's Hospital, Xi'an, People's Republic of China. 5. The Inquire Life Diagnostics, Inc., Xi'an, People's Republic of China.
Abstract
BACKGROUND: Copy number variation (CNV) is an important form of genomic structural variation and is linked to dozens of human diseases. Using next-generation sequencing (NGS) data and developing computational methods to characterize such structural variants is significant for understanding the mechanisms of diseases. OBJECTIVE: The objective of this study is to develop a new statistical method of detection recurrent CNVs across multiple samples from genomic sequences. METHODS: A statistical method is carried out to detect recurrent CNVs, referred to as SM-RCNV. This method uses a statistic associated with each location by combining the frequency of variation at one location across whole samples and the correlation among consecutive locations. The weights of the frequency and correlation are trained using real datasets with known CNVs. P-value is assessed for each location on the genome by permutation testing. RESULTS: Compared with six peer methods, SM-RCNV outperforms the peer methods under receiver operating characteristic curves. SM-RCNV successfully identifies many consistent recurrent CNVs, most of which are known to be of biological significance and associated with diseased genes. The validation rate of SM-RCNV in the CEU call set and YRI call set with Database of Genomic Variants are 258/328 (79%) and (157/309) 51%, respectively. CONCLUSION: SM-RCNV is a well-grounded statistical framework for detecting recurrent CNVs from multiple genomic sequences, providing valuable information to study genomes in human diseases. The source code is freely available at https://sourceforge.net/projects/sm-rcnv/ .
BACKGROUND: Copy number variation (CNV) is an important form of genomic structural variation and is linked to dozens of human diseases. Using next-generation sequencing (NGS) data and developing computational methods to characterize such structural variants is significant for understanding the mechanisms of diseases. OBJECTIVE: The objective of this study is to develop a new statistical method of detection recurrent CNVs across multiple samples from genomic sequences. METHODS: A statistical method is carried out to detect recurrent CNVs, referred to as SM-RCNV. This method uses a statistic associated with each location by combining the frequency of variation at one location across whole samples and the correlation among consecutive locations. The weights of the frequency and correlation are trained using real datasets with known CNVs. P-value is assessed for each location on the genome by permutation testing. RESULTS: Compared with six peer methods, SM-RCNV outperforms the peer methods under receiver operating characteristic curves. SM-RCNV successfully identifies many consistent recurrent CNVs, most of which are known to be of biological significance and associated with diseased genes. The validation rate of SM-RCNV in the CEU call set and YRI call set with Database of Genomic Variants are 258/328 (79%) and (157/309) 51%, respectively. CONCLUSION: SM-RCNV is a well-grounded statistical framework for detecting recurrent CNVs from multiple genomic sequences, providing valuable information to study genomes in human diseases. The source code is freely available at https://sourceforge.net/projects/sm-rcnv/ .
Entities:
Keywords:
Correlation; Permutation test; Read depth; Recurrent copy number variations
Authors: Ruibin Xi; Angela G Hadjipanayis; Lovelace J Luquette; Tae-Min Kim; Eunjung Lee; Jianhua Zhang; Mark D Johnson; Donna M Muzny; David A Wheeler; Richard A Gibbs; Raju Kucherlapati; Peter J Park Journal: Proc Natl Acad Sci U S A Date: 2011-11-07 Impact factor: 11.205
Authors: Jennifer L Freeman; George H Perry; Lars Feuk; Richard Redon; Steven A McCarroll; David M Altshuler; Hiroyuki Aburatani; Keith W Jones; Chris Tyler-Smith; Matthew E Hurles; Nigel P Carter; Stephen W Scherer; Charles Lee Journal: Genome Res Date: 2006-06-29 Impact factor: 9.043
Authors: Peter J A Cock; Christopher J Fields; Naohisa Goto; Michael L Heuer; Peter M Rice Journal: Nucleic Acids Res Date: 2009-12-16 Impact factor: 16.971
Authors: Stephan Pabinger; Andreas Dander; Maria Fischer; Rene Snajder; Michael Sperk; Mirjana Efremova; Birgit Krabichler; Michael R Speicher; Johannes Zschocke; Zlatko Trajanoski Journal: Brief Bioinform Date: 2013-01-21 Impact factor: 11.622