| Literature DB >> 29792160 |
Li Fang1,2,3, Jiang Hu1, Depeng Wang1, Kai Wang4,5,6.
Abstract
BACKGROUND: Structural variants (SVs) in human genomes are implicated in a variety of human diseases. Long-read sequencing delivers much longer read lengths than short-read sequencing and may greatly improve SV detection. However, due to the relatively high cost of long-read sequencing, it is unclear what coverage is needed and how to optimally use the aligners and SV callers.Entities:
Keywords: Long-read sequencing; Low coverage; PacBio; Structural variants
Mesh:
Year: 2018 PMID: 29792160 PMCID: PMC5966861 DOI: 10.1186/s12859-018-2207-1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Description of PacBio data sets used for this study
| Data Source | Genome | Original Coverage | Down-sampled Coverage | Mean Read Length | Reference |
|---|---|---|---|---|---|
| NCBI SRA | NA12878 | 22X | 2-15X | 4.9 kb | [ |
| NCBI SRA | HX1 | 103X | 6-15X | 7.0 kb | [ |
| NIST | AJ son | 69X | 10X | 8.0 kb | [ |
| NIST | AJ father | 32X | 10X | 7.3 kb | [ |
| NIST | AJ mother | 30X | 10X | 7.8 kb | [ |
Fig. 1Scheme of NextSV workflow
Number of calls in the high-confidence SV sets
| Genome | Platform | Number of Deletions | Number of Insertions | Reference |
|---|---|---|---|---|
| NA12878 | Illumina | 2094 | 1114 | [ |
| HX1 | PacBio | 2387 | 2937 | [ |
Fig. 2Evaluation of recall rates under different coverages on the NA12878 genome. Five down-sampling replicates were performed at each coverage. (a) Recall rates of deletion calls. (b) Recall rates of insertion calls. Data shown represent mean ± SD
Fig. 3Evaluation of precisions and F1 scores under different coverages on the NA12878 genome. Five down-sampling replicates were performed. (a) Precisions of deletion calls. (b) F1 scores of deletion calls. (c) Precisions of insertion calls. (d) F1 scores of insertion calls. Data shown represent mean ± SD
Fig. 4SV calling performance on the HX1 genome. Five down-sampling replicates were performed. (a-c) Recall rates, precisions and F1 scores of deletion calls. (d-e) Recall rates, precisions and F1 scores of insertion calls. Data shown represent mean ± SD
Fig. 5Comparison of allele drop-in rate. For evaluation of ADI rate at 10X coverage, five down-sampling replicates were performed. (a) ADI rates of deletion call. (b) ADI rate of insertion calls. Data shown represent mean ± SD.
Time consumption for each steps in the NextSV pipeline for 10X PacBio data set
| SV caller | Aligner | CPU (number of threads) | Alignment time (hour) | SV calling time (hour) | Total Time (hour) |
|---|---|---|---|---|---|
| PBHoney | BLASR | 12 | 79.6 | 0.27 (Tails) | 80.8 |
| Sniffles | BWA-MEM | 12 | 27.0 | 1.1 | 28.1 |
| Sniffles | NGMLR | 12 | 11.2 | 1.3 | 12.5 |