MOTIVATION: Copy number abnormalities (CNAs) represent an important type of genetic mutation that can lead to abnormal cell growth and proliferation. New high-throughput sequencing technologies promise comprehensive characterization of CNAs. In contrast to microarrays, where probe design follows a carefully developed protocol, reads represent a random sample from a library and may be prone to representation biases due to GC content and other factors. The discrimination between true and false positive CNAs becomes an important issue. RESULTS: We present a novel approach, called CNAseg, to identify CNAs from second-generation sequencing data. It uses depth of coverage to estimate copy number states and flowcell-to-flowcell variability in cancer and normal samples to control the false positive rate. We tested the method using the COLO-829 melanoma cell line sequenced to 40-fold coverage. An extensive simulation scheme was developed to recreate different scenarios of copy number changes and depth of coverage by altering a real dataset with spiked-in CNAs. Comparison to alternative approaches using both real and simulated datasets showed that CNAseg achieves superior precision and improved sensitivity estimates. AVAILABILITY: The CNAseg package and test data are available at http://www.compbio.group.cam.ac.uk/software.html.
MOTIVATION:Copy number abnormalities (CNAs) represent an important type of genetic mutation that can lead to abnormal cell growth and proliferation. New high-throughput sequencing technologies promise comprehensive characterization of CNAs. In contrast to microarrays, where probe design follows a carefully developed protocol, reads represent a random sample from a library and may be prone to representation biases due to GC content and other factors. The discrimination between true and false positive CNAs becomes an important issue. RESULTS: We present a novel approach, called CNAseg, to identify CNAs from second-generation sequencing data. It uses depth of coverage to estimate copy number states and flowcell-to-flowcell variability in cancer and normal samples to control the false positive rate. We tested the method using the COLO-829 melanoma cell line sequenced to 40-fold coverage. An extensive simulation scheme was developed to recreate different scenarios of copy number changes and depth of coverage by altering a real dataset with spiked-in CNAs. Comparison to alternative approaches using both real and simulated datasets showed that CNAseg achieves superior precision and improved sensitivity estimates. AVAILABILITY: The CNAseg package and test data are available at http://www.compbio.group.cam.ac.uk/software.html.
Authors: Michael I Love; Alena Myšičková; Ruping Sun; Vera Kalscheuer; Martin Vingron; Stefan A Haas Journal: Stat Appl Genet Mol Biol Date: 2011-11-08
Authors: Alexander P Drew; Anthony N Cutrupi; Megan H Brewer; Garth A Nicholson; Marina L Kennerson Journal: Hum Genet Date: 2016-08-03 Impact factor: 4.132
Authors: James B Smadbeck; Sarah H Johnson; Stephanie A Smoley; Athanasios Gaitatzes; Travis M Drucker; Roman M Zenka; Farhad Kosari; Stephen J Murphy; Nicole Hoppman; Umut Aypar; William R Sukov; Robert B Jenkins; Hutton M Kearney; Andrew L Feldman; George Vasmatzis Journal: Genes Chromosomes Cancer Date: 2018-07-30 Impact factor: 5.006