| Literature DB >> 28984186 |
Leandro de Araújo Lima1,2, Kai Wang3,4,5.
Abstract
BACKGROUND: The use of high-throughput sequencing data has improved the results of genomic analysis due to the resolution of mapping algorithms. Although several tools for copy-number variation calling in whole genome sequencing have been published, the noisy nature of sequencing data is still a limitation for accuracy and concordance among such tools. To assess the performance of PennCNV original algorithm for array data in whole genome sequencing data, we processed mapping (BAM) files to extract coverage, representing log R ratio (LRR) of signal intensity, and B allele frequency (BAF).Entities:
Keywords: Copy-number variation; PennCNV; Whole-genome sequencing
Mesh:
Year: 2017 PMID: 28984186 PMCID: PMC5629549 DOI: 10.1186/s12859-017-1802-x
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Log R ratio for simulated data, in different types of CNVs. These values are used as input for PennCNV-Seq algorithm, and were estimated for sequencing data. We generated 10 samples with 240 CNVs each, with copy-number (cn) 0, 1, 2, 3 and 4. After that, the mean LRR was generated for each region
Fig. 2Comparison between Precision and Recall of PennCNV, Lumpy and CNVnator. a-b Real data: deletions of sample NA12878, with 30X coverage, downloaded from NIST project database. No duplications were reported for this sample. c-f Simulated data of 10 samples with 20X. c-d are showing deletions and e-f are showing duplications. The overlap to consider the prediction and the real CNV the same has to be 50%
Performance of PennCNV-Seq regarding different number of copies for CNVs: deletions with 0 or 1 copy, and duplication with 3 or 4 copies, and loss-of-heterozygosity (LOH)
| No. of copies | Precision | Recall |
|---|---|---|
| 0 copy (hom. deletion) | 0.814 | 0.399 |
| 1 copy (het. deletion) | 0.711 | 0.665 |
| 3 copy (het. duplication) | 0.962 | 0.528 |
| 4 copy (hom. duplication) | 0.732 | 0.416 |
| Loss-of-heterozygosity (LOH) | 1.000 | 0.650 |
Precision is TP/(TP+FP) and Recall is TP/(TP+FN), where TP=True Positive, FP=False Positive and FN=False Negative
Fig. 3PennCNV plot of Log R Ratio (coverage) and B Allele Frequency for a zero-copy (CN=0) deletion in simulated data. It is possible to see how the coverage is much lower than the average and the lack of data for allele frequency, as there are just very few reads mapped in the read