| Literature DB >> 31136576 |
Le Zhang1,2,3, Wanyu Bai1, Na Yuan4, Zhenglin Du4.
Abstract
MOTIVATION: Recently, copy number variation (CNV) has gained considerable interest as a type of genomic variation that plays an important role in complex phenotypes and disease susceptibility. Since a number of CNV detection methods have recently been developed, it is necessary to help investigators choose suitable methods for CNV detection depending on their objectives. For this reason, this study compared ten commonly used CNV detection applications, including CNVnator, ReadDepth, RDXplorer, LUMPY and Control-FREEC, benchmarking the applications by sensitivity, specificity and computational demands. Taking the DGV gold standard variants as a standard dataset, we evaluated the ten applications with real sequencing data at sequencing depths from 5X to 50X. Among the ten methods benchmarked, LUMPY performs the best for both high sensitivity and specificity at each sequencing depth. For the purpose of high specificity, Canvas is also a good choice. If high sensitivity is preferred, CNVnator and RDXplorer are better choices. Additionally, CNVnator and GROM-RD perform well for low-depth sequencing data. Our results provide a comprehensive performance evaluation for these selected CNV detection methods and facilitate future development and improvement in CNV prediction methods.Entities:
Mesh:
Year: 2019 PMID: 31136576 PMCID: PMC6555534 DOI: 10.1371/journal.pcbi.1007069
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
CNV detection methods on WGS data.
| Software | Methods | Algorithm detail | Input data | Publish | Latest update | Accessibility | URL | Programing Language | #Citations |
|---|---|---|---|---|---|---|---|---|---|
| RD | Expectation-maximization (EM) clustering | BAM | 2011 | 2018/3 | Y | C# | 29 | ||
| RD | Mixture Poisson model | BAM | 2012 | 2018/10 | Y | R | 226 | ||
| CNVeM | RD | Expectation-maximization (EM) algorithm | CSV | 2013 | NA | Y | C | 14 | |
| CNVer | RP | Maximum-likelihood, Graphic flow | BAM | 2010 | 2011/5 | N | NA | C | 158 |
| RD | Mean shift algorithm | BAM | 2011 | 2016/11 | Y | C++ | 640 | ||
| CNVrd2 | RD | Expectation-maximization (EM) algorithm | BAM/SAM | 2014 | 2015/11 | Y | R | 13 | |
| RD | LASSO regression | BAM/SAM | 2011 | 2018/8 | Y | C++ | 190 | ||
| RD | Quantile normalization | BAM | 2015 | 2017/5 | Y | C | 7 | ||
| RD | DoC approaches | BAM | 2018 | 2018/3 | Y | R,C++ | 1 | ||
| JointSLM | RD | Population-based approach | SAM/BAM | 2011 | NA | N | NA | R | 49 |
| RD, PEM | A probabilistic framework | BAM/CRAM | 2014 | 2016/3 | Y | C++ | 157 | ||
| mrCaNaVAR | RD | mrFAST | SAM | 2009 | 2013/9 | Y | C | 685 | |
| RD | Event-wise testing algorithm | BAM | 2009 | 2013/4 | Y | Python | 496 | ||
| RD | Circular binary segmentation algorithm | Bed Files | 2011 | 2014/8 | Y | R | 150 | ||
| RD | Negative binomial transformations | BAM | 2017 | 2017/7 | Y | C++ | 2 |
Note:
# indicates the software used in this study.
Formula to calculate TPR and FDR.
| Measure | Formula | Illustration |
|---|---|---|
| TPR | TP: the number of true positivis | |
| FDR | TP: the number of true positives |
Fig 1Statistics of the detected CNVs.
(a) Detected CNV number. (b) Distribution of CNV size. (c) The Venn diagram of CNV detection methods.
Fig 2The evaluation of sensitivity and specificity of CNV detection methods.
(a) TPR curves of the ten applications at sequencing depths from 5X to 50X. (b) FDR curves of the ten applications at sequencing depths from 5X to 50X.
Fig 3The computational demands of the ten methods.
(a) Computation time as a function of sequencing depth from 5X to 50X. (b) Memory usage as a function of sequencing depth from 5X to 50X.