Johannes Smolander1, Sofia Khan1, Kalaimathy Singaravelu1, Leni Kauko1, Riikka J Lund1, Asta Laiho1, Laura L Elo2,3. 1. Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520, Turku, Finland. 2. Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520, Turku, Finland. laura.elo@utu.fi. 3. Institute of Biomedicine, University of Turku, 20520, Turku, Finland. laura.elo@utu.fi.
Abstract
BACKGROUND: Detection of copy number variations (CNVs) from high-throughput next-generation whole-genome sequencing (WGS) data has become a widely used research method during the recent years. However, only a little is known about the applicability of the developed algorithms to ultra-low-coverage (0.0005-0.8×) data that is used in various research and clinical applications, such as digital karyotyping and single-cell CNV detection. RESULT: Here, the performance of six popular read-depth based CNV detection algorithms (BIC-seq2, Canvas, CNVnator, FREEC, HMMcopy, and QDNAseq) was studied using ultra-low-coverage WGS data. Real-world array- and karyotyping kit-based validation were used as a benchmark in the evaluation. Additionally, ultra-low-coverage WGS data was simulated to investigate the ability of the algorithms to identify CNVs in the sex chromosomes and the theoretical minimum coverage at which these tools can accurately function. Our results suggest that while all the methods were able to detect large CNVs, many methods were susceptible to producing false positives when smaller CNVs (< 2 Mbp) were detected. There was also significant variability in their ability to identify CNVs in the sex chromosomes. Overall, BIC-seq2 was found to be the best method in terms of statistical performance. However, its significant drawback was by far the slowest runtime among the methods (> 3 h) compared with FREEC (~ 3 min), which we considered the second-best method. CONCLUSIONS: Our comparative analysis demonstrates that CNV detection from ultra-low-coverage WGS data can be a highly accurate method for the detection of large copy number variations when their length is in millions of base pairs. These findings facilitate applications that utilize ultra-low-coverage CNV detection.
BACKGROUND: Detection of copy number variations (CNVs) from high-throughput next-generation whole-genome sequencing (WGS) data has become a widely used research method during the recent years. However, only a little is known about the applicability of the developed algorithms to ultra-low-coverage (0.0005-0.8×) data that is used in various research and clinical applications, such as digital karyotyping and single-cell CNV detection. RESULT: Here, the performance of six popular read-depth based CNV detection algorithms (BIC-seq2, Canvas, CNVnator, FREEC, HMMcopy, and QDNAseq) was studied using ultra-low-coverage WGS data. Real-world array- and karyotyping kit-based validation were used as a benchmark in the evaluation. Additionally, ultra-low-coverage WGS data was simulated to investigate the ability of the algorithms to identify CNVs in the sex chromosomes and the theoretical minimum coverage at which these tools can accurately function. Our results suggest that while all the methods were able to detect large CNVs, many methods were susceptible to producing false positives when smaller CNVs (< 2 Mbp) were detected. There was also significant variability in their ability to identify CNVs in the sex chromosomes. Overall, BIC-seq2 was found to be the best method in terms of statistical performance. However, its significant drawback was by far the slowest runtime among the methods (> 3 h) compared with FREEC (~ 3 min), which we considered the second-best method. CONCLUSIONS: Our comparative analysis demonstrates that CNV detection from ultra-low-coverage WGS data can be a highly accurate method for the detection of large copy number variations when their length is in millions of base pairs. These findings facilitate applications that utilize ultra-low-coverage CNV detection.
Entities:
Keywords:
Copy number variation; Human embryonic stem cell; Ultra-low-coverage; Whole-genome sequencing
Authors: David A Wheeler; Maithreyan Srinivasan; Michael Egholm; Yufeng Shen; Lei Chen; Amy McGuire; Wen He; Yi-Ju Chen; Vinod Makhijani; G Thomas Roth; Xavier Gomes; Karrie Tartaro; Faheem Niazi; Cynthia L Turcotte; Gerard P Irzyk; James R Lupski; Craig Chinault; Xing-zhi Song; Yue Liu; Ye Yuan; Lynne Nazareth; Xiang Qin; Donna M Muzny; Marcel Margulies; George M Weinstock; Richard A Gibbs; Jonathan M Rothberg Journal: Nature Date: 2008-04-17 Impact factor: 49.962
Authors: K C Allen Chan; Peiyong Jiang; Yama W L Zheng; Gary J W Liao; Hao Sun; John Wong; Shing Shun N Siu; Wing C Chan; Stephen L Chan; Anthony T C Chan; Paul B S Lai; Rossa W K Chiu; Y M D Lo Journal: Clin Chem Date: 2012-10-11 Impact factor: 8.327
Authors: H Christina Fan; Yair J Blumenfeld; Usha Chitkara; Louanne Hudgins; Stephen R Quake Journal: Proc Natl Acad Sci U S A Date: 2008-10-06 Impact factor: 11.205
Authors: Rebecca J Leary; Mark Sausen; Isaac Kinde; Nickolas Papadopoulos; John D Carpten; David Craig; Joyce O'Shaughnessy; Kenneth W Kinzler; Giovanni Parmigiani; Bert Vogelstein; Luis A Diaz; Victor E Velculescu Journal: Sci Transl Med Date: 2012-11-28 Impact factor: 17.956