MOTIVATION: Analysing next-generation sequencing (NGS) data for copy number variations (CNVs) detection is a relatively new and challenging field, with no accepted standard protocols or quality control measures so far. There are by now several algorithms developed for each of the four broad methods for CNV detection using NGS, namely the depth of coverage (DOC), read-pair, split-read and assembly-based methods. However, because of the complexity of the genome and the short read lengths from NGS technology, there are still many challenges associated with the analysis of NGS data for CNVs, no matter which method or algorithm is used. RESULTS: In this review, we describe and discuss areas of potential biases in CNV detection for each of the four methods. In particular, we focus on issues pertaining to (i) mappability, (ii) GC-content bias, (iii) quality control measures of reads and (iv) difficulty in identifying duplications. To gain insights to some of the issues discussed, we also download real data from the 1000 Genomes Project and analyse its DOC data. We show examples of how reads in repeated regions can affect CNV detection, demonstrate current GC-correction algorithms, investigate sensitivity of DOC algorithm before and after quality control of reads and discuss reasons for which duplications are harder to detect than deletions.
MOTIVATION: Analysing next-generation sequencing (NGS) data for copy number variations (CNVs) detection is a relatively new and challenging field, with no accepted standard protocols or quality control measures so far. There are by now several algorithms developed for each of the four broad methods for CNV detection using NGS, namely the depth of coverage (DOC), read-pair, split-read and assembly-based methods. However, because of the complexity of the genome and the short read lengths from NGS technology, there are still many challenges associated with the analysis of NGS data for CNVs, no matter which method or algorithm is used. RESULTS: In this review, we describe and discuss areas of potential biases in CNV detection for each of the four methods. In particular, we focus on issues pertaining to (i) mappability, (ii) GC-content bias, (iii) quality control measures of reads and (iv) difficulty in identifying duplications. To gain insights to some of the issues discussed, we also download real data from the 1000 Genomes Project and analyse its DOC data. We show examples of how reads in repeated regions can affect CNV detection, demonstrate current GC-correction algorithms, investigate sensitivity of DOC algorithm before and after quality control of reads and discuss reasons for which duplications are harder to detect than deletions.
Authors: A J García-Chequer; A Méndez-Tenorio; G Olguín-Ruiz; C Sánchez-Vallejo; P Isa; C F Arias; J Torres; A Hernández-Angeles; M A Ramírez-Ortiz; C Lara; M L Cabrera-Muñoz; S Sadowinski-Pine; J C Bravo-Ortiz; G Ramón-García; J Diegopérez-Ramírez; G Ramírez-Reyes; R Casarrubias-Islas; J Ramírez; M A Orjuela; M V Ponce-Castañeda Journal: Cancer Genet Date: 2015-12-15
Authors: Yong-hui Jiang; Ryan K C Yuen; Xin Jin; Mingbang Wang; Nong Chen; Xueli Wu; Jia Ju; Junpu Mei; Yujian Shi; Mingze He; Guangbiao Wang; Jieqin Liang; Zhe Wang; Dandan Cao; Melissa T Carter; Christina Chrysler; Irene E Drmic; Jennifer L Howe; Lynette Lau; Christian R Marshall; Daniele Merico; Thomas Nalpathamkalam; Bhooma Thiruvahindrapuram; Ann Thompson; Mohammed Uddin; Susan Walker; Jun Luo; Evdokia Anagnostou; Lonnie Zwaigenbaum; Robert H Ring; Jian Wang; Clara Lajonchere; Jun Wang; Andy Shih; Peter Szatmari; Huanming Yang; Geraldine Dawson; Yingrui Li; Stephen W Scherer Journal: Am J Hum Genet Date: 2013-07-11 Impact factor: 11.025