| Literature DB >> 28253855 |
Yong Chen1,2, Li Zhao3,4, Yi Wang5, Ming Cao6, Violet Gelowani4,7, Mingchu Xu4,7, Smriti A Agrawal4,7, Yumei Li4,7, Stephen P Daiger8, Richard Gibbs4, Fei Wang9,10, Rui Chen11,12,13.
Abstract
BACKGROUND: Targeted next-generation sequencing (NGS) has been widely used as a cost-effective way to identify the genetic basis of human disorders. Copy number variations (CNVs) contribute significantly to human genomic variability, some of which can lead to disease. However, effective detection of CNVs from targeted capture sequencing data remains challenging.Entities:
Keywords: Copy number variation; Maximum penalized likelihood estimation; Next-generation sequencing
Mesh:
Year: 2017 PMID: 28253855 PMCID: PMC5335817 DOI: 10.1186/s12859-017-1566-3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Workflow of SeqCNV. A dynamic programming procedure is included in the step “Recursively find candidate regions”. It aims to quickly break iteration to find candidate regions that are most likely to be CNV events, thereby saving much time for whole algorithm running
Fig. 2An example of simulated CNV data on chromosome 1. The data set, computationally simulated, includes two deletions and two duplications at each of four lengths. Black dots represent read density over 500 bp fixed windows along the entire chromosome. The red bands indicate the results of SeqCNV analysis. Horizontal lines mark significant points of deletion or gain
Summary of 100 runs of SeqCNV on simulation data. Boundary (Start/End) is the average distance to the nearest starting (ending) point of the detected variants
| Type | One copy gain | One copy loss | ||||||
|---|---|---|---|---|---|---|---|---|
| Resolution | 1 MB | 100 KB | 10 KB | 1 KB | 1 MB | 100 KB | 10 KB | 1 KB |
| Sensitivity | 99.50% | 99.00% | 75.00% | 67.80% | 99.00% | 96.50% | 91.00% | 66.80% |
| Boundary (Start) | 1.69 KB | 1.37 KB | 0.73 KB | 0.29 KB | 0.71 KB | 0.64 KB | 0.49 KB | 0.18 KB |
| Boundary (End) | 1.28 KB | 1.01 KB | 0.91 KB | 0.32 KB | 0.91 KB | 0.71 KB | 0.72 KB | 0.21 KB |
| False Positive Rate | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 3.74% |
Number of predicted CNV events and correctly detected events for each method on BAC spike-in data
| SeqCNV | CoNIFER | CNVnator | CNVer | XHMM | |
|---|---|---|---|---|---|
| Number of Predicted CNV events | 53 | 2 | 8032 | 4487 | 2 |
| Number of Correctly Detected events | 7 | 2 | 9 | 8 | 0 |
Fig. 3Precision-Recall Contours for five CNV methods on spike-in data. Light grey contours represent F-measure levels (harmonic mean of precision and recall). SeqCNV achieved the highest F-measure value
Fig. 4aCGH validation of copy number deletion in PRPF31 gene for sample UTAD082. The shaded area indicates the CNV area with loss of one copy of the genomic segment. Validation results for other 4 samples can be found in Additional file . Five samples share different deletion sizes, ranging from several exons to entire genomic region of PRPF31
Number of predicted CNV events and correctly detected events for each method on adRP patient data
| Sample ID | CNV | SeqCNV | CoNIFER | CNVer | CNVnator | XHMM |
|---|---|---|---|---|---|---|
| UTAD034 |
| Y | Y | N | N | Na |
| UTAD069 |
| Y | Y | N | N | Na |
| UTAD082 |
| Y | Na | N | N | Na |
| UTAD411 |
| Y | Y | N | N | Na |
| UTAD611 |
| Y | Y | N | N | Na |
Each element in the table indicates that whether copy number deletion for genomic region of gene PRPF31 in that sample is identified by the CNV method or not. ‘Na’ element indicates the method did not report any CNV for the sample. As we can see, SeqCNV really identified all the copy number deletion for all the 5 samples
Fig. 5CNV result for five methods on adRP patient data. a SeqCNV b CoNIFER c CNVnator d CNVer e XHMM. X-axis represents the genomic position for chromosome chr19. PRPF31 gene is located at chr19:54,618,790–54,635,150. Both SeqCNV and CoNIFER identified the PRPF31 copy number deletion