| Literature DB >> 28155633 |
Kun-Tze Chen1, Cheih-Jung Chen1, Hsin-Ting Shen1, Chia-Liang Liu1, Shang-Hao Huang1, Chin Lung Lu2.
Abstract
BACKGROUND: A draft genome assembled by current next-generation sequencing techniques from short reads is just a collection of contigs, whose relative positions and orientations along the genome being sequenced are unknown. To further obtain its complete sequence, a contig scaffolding process is usually applied to order and orient the contigs in the draft genome. Although several single reference-based scaffolding tools have been proposed, they may produce erroneous scaffolds if there are rearrangements between the target and reference genomes or their phylogenetic relationship is distant. This may suggest that a single reference genome may not be sufficient to produce correct scaffolds of a draft genome.Entities:
Keywords: Bioinformatics; Contigs; Multiple references; Next-generation sequencing; Scaffolding
Mesh:
Year: 2016 PMID: 28155633 PMCID: PMC5260120 DOI: 10.1186/s12859-016-1328-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1The procedure flowchart of multi-CAR
Fig. 2A contig adjacency graph constructed by four scaffolds S 1=(+1,+2,+3), S 2=(+2,+3,+4), S 3=(−1,−4,−3,−2) and S 4=(+1,−4,+2,−3), where the dummy edges with zero weight are omitted
Draft chromosomes used in the testing dataset
| Organism | Accession No. | Size (bp) | #CON | COV (%) |
|---|---|---|---|---|
|
| NC_013926 | 1,486,778 | 35 | 98.63 |
|
| NC_000964 | 4,215,606 | 5 | 99.97 |
|
| NC_010816 | 2,375,792 | 58 | 85.47 |
|
| NC_003317 | 2,117,144 | 41 | 90.83 |
|
| NC_003318 | 1,177,787 | 12 | 99.77 |
|
| NC_015857 | 2,138,342 | 55 | 87.47 |
|
| NC_015858 | 1,260,926 | 34 | 84.38 |
|
| NC_007650 | 2,914,771 | 15 | 70.34 |
|
| NC_007651 | 3,809,201 | 28 | 89.90 |
|
| NC_002620 | 1,072,950 | 4 | 99.09 |
|
| NC_014393 | 5,262,222 | 297 | 96.54 |
|
| NC_012590 | 2,790,189 | 90 | 92.94 |
|
| NC_004369 | 3,147,090 | 118 | 95.09 |
|
| NC_012803 | 2,501,097 | 126 | 86.25 |
|
| NC_009525 | 4,419,977 | 220 | 76.84 |
|
| NC_000908 | 580,076 | 24 | 78.54 |
|
| NC_009142 | 8,212,805 | 238 | 97.10 |
|
| NC_015437 | 2,568,361 | 53 | 94.01 |
|
| NC_014623 | 10,260,756 | 470 | 99.05 |
|
| NC_003028 | 2,160,842 | 209 | 90.31 |
|
| NC_013456 | 3,259,580 | 176 | 91.43 |
|
| NC_013457 | 1,829,445 | 33 | 95.31 |
|
| NC_008149 | 4,534,590 | 17 | 83.86 |
Column “#CON” contains the number of contigs selected for contig scaffolding experiments by excluding, for example, those contigs not mapped to reference chromosome. Column “COV” gives the fraction of each chromosome covered by selected contigs
Fig. 3Performance variation of a average sensitivity and b average precision with respect to the number of reference genomes
Fig. 4Performance variation of a average genome coverage and b average scaffold number with respect to the number of reference genomes
Fig. 5Performance variation of a average scaffold N50 size and b average running time with respect to the number of reference genomes