| Literature DB >> 33194395 |
Benjamin Istace1, Caroline Belser1, Jean-Marc Aury1.
Abstract
MOTIVATION: Long read sequencing and Bionano Genomics optical maps are two techniques that, when used together, make it possible to reconstruct entire chromosome or chromosome arms structure. However, the existing tools are often too conservative and organization of contigs into scaffolds is not always optimal.Entities:
Keywords: Bioinformatics; Bionano; Genome assembly; Long reads; Nanopore; Optical maps; PacBio; Scaffolding; Tool
Year: 2020 PMID: 33194395 PMCID: PMC7649008 DOI: 10.7717/peerj.10150
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1The Bionano scaffolding tool does not merge contigs even if they share labels.
Instead, it inserts 13 N’s gap between contigs, thus artificially duplicating the shared region. (A) BiSCoT merges contigs that share enzymatic labelling sites. (B) If contigs do not share labels but share a genomic region, BiSCoT attempts to merge them by aligning the borders of the contigs. (C) The Bionano scaffolding tool does not handle cases where contigs can be inserted into others. BiSCoT attempts to merge the inserted map with the one containing it if they share labels.
Metrics of the NA12878 scaffolds and contigs before or after BiSCoT treatment.
Bold formatting indicates the best scoring assembly among contigs.
| Nanopore contigs | Bionano | BiSCoT | |||
|---|---|---|---|---|---|
| Contigs | Scaffolds | Contigs | Scaffolds | ||
| Cumulative size | 2,818,937,673 | 2,818,997,568 | 2,878,230,106 | 2,810,480,725 | 2,868,077,379 |
| N50 | 11,821,944 | 10,566,783 | 86,858,024 | 86,833,728 | |
| L50 | 67 | 71 | 14 | 14 | |
| N90 | 2,143,851 | 1,863,173 | 26,054,782 | 26,037,000 | |
| L90 | 280 | 301 | 36 | 36 | |
| auN | 15,164,719 | 14,547,428 | 82,760,251 | 82,474,548 | |
| # Ns | 0 | 0 | 59,232,538 | 0 | 57,596,654 |
| NGA50 | 5,794,944 | 5,729,014 | 10,816,842 | 11,713,900 | |
| NGA75 | 1,511,206 | 1,495,174 | 2,701,541 | 2,938,187 | |
| # misassemblies | 1,356 | 1,299 | 1,602 | 1,515 | |
| Complete BUSCOs | 234 (91.8%) | 231 (90.6%) | 231 (90.6%) | ||
| Duplicated BUSCOs | 5 (2.0%) | 4 (1.6%) | 4 (1.6%) | ||
| Missing BUSCOs | 11 (4.3%) | 13 (5.1%) | 13 (5.1%) | ||
Notes.
auN is a new metric to measure assembly contiguity Li (2020).
Figure 2(A) Distribution of the sizes of overlapping regions in the raw assemblies. Detection was done using either Bionano labels (Case 1) or a BLAT alignment (Case 2). (B) N50 contigs of raw assemblies and assemblies before or after BiSCoT treatment.