| Literature DB >> 32082361 |
Li Tang1, Min Li1, Fang-Xiang Wu1,2, Yi Pan1,3, Jianxin Wang1.
Abstract
With the generation of a large amount of sequencing data, different assemblers have emerged to perform de novo genome assembly. As a single strategy is hard to fit various biases of datasets, none of these tools outperforms the others on all species. The process of assembly reconciliation is to merge multiple assemblies and generate a high-quality consensus assembly. Several assembly reconciliation tools have been proposed. However, the existing reconciliation tools cannot produce a merged assembly which has better contiguity and contains less errors simultaneously, and the results of these tools usually depend on the ranking of input assemblies. In this study, we propose a novel assembly reconciliation tool MAC, which merges assemblies by using the adjacency algebraic model and classification. In order to solve the problem of uneven sequencing depth and sequencing errors, MAC identifies consensus blocks between contig sets to construct an adjacency graph. To solve the problem of repetitive region, MAC employs classification to optimize the adjacency algebraic model. What's more, MAC designs an overall scoring function to solve the problem of unknown ranking of input assembly sets. The experimental results from four species of GAGE-B demonstrate that MAC outperforms other assembly reconciliation tools.Entities:
Keywords: adjacency algebraic model; contig classification; contig reconciliation; de novo assembly; next-generation sequencing
Year: 2020 PMID: 32082361 PMCID: PMC7005248 DOI: 10.3389/fgene.2019.01396
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Flowchart of MAC algorithm.
Figure 2An example for constructing adjacent graph.
Nine types of paths in the adjacent graph.
| No. | Length of path | In the same set | In the same adjacency | Type |
|---|---|---|---|---|
| 1 | Odd | Y | Y | – |
| 2 | Odd | Y | N | – |
| 3 | Odd | N | Y | – |
| 4 | Odd | N | N | Poor-1 |
| 5 | Even | Y | Y | Poor-2 |
| 6 | Even | Y | N | Good |
| 7 | Even | N | Y | – |
| 8 | Even | N | N | – |
| 9 | Even | N | N | Circle |
Figure 3Detail of optimization process.
The experimental results of M. abscessus.
| Contigs num | Largest contig | Size | Genome fraction | N50 | NGA50 | MA | |
|---|---|---|---|---|---|---|---|
| Velvet | 203 | 226,629 | 5,136,825 | 98.965 | 48,155 | 41,485 | 54 |
| ABySS | 149 | 245,660 | 5,116,522 | 98.926 | 70,424 | 68,549 | 2 |
| SOAPdenovo | 91 | 286,460 | 5,133,667 | 99.139 | 131,561 | 113,272 | 19 |
| GAA | 339 | 129,152 | 5,152,501 | 99.094 | 39,271 | 37,715 | 61 |
| MIX | 118 | 245,660 | 5,376,417 | 98.891 | 108,584 | 70,302 | |
| Metassembler | 200 | 226,629 | 5,130,215 | 98.944 | 48,155 | 41,485 | 54 |
| MAC | 190 | 317,945 | 9,856,881 | 99.304 | 58 | ||
| GAA | 211 | 210,497 | 5,146,833 | 99.129 | 54,850 | 50,904 | 55 |
| MIX | 91 | 286,460 | 5,133,667 | 99.139 | 131,561 | 113,272 | 17 |
| Metassembler | 191 | 226,629 | 4,934,916 | 95.03 | 47,284 | 39,706 | 64 |
| MAC | 80 | 287,168 | 5,146,285 | 99.249 |
The bolded data indicates the highest value of N50 or NGA50 within each comparison.
The experimental results of V. cholerae.
| Contigs num | Largest contig | Size | Genome fraction | N50 | NGA50 | MA | |
|---|---|---|---|---|---|---|---|
| Velvet | 156 | 246,346 | 3,944,260 | 97.563 | 92,036 | 63,574 | 14 |
| ABySS | 196 | 178,118 | 3,904,784 | 96.699 | 61,965 | 60,272 | 2 |
| SOAPdenovo | 186 | 246,179 | 3,924,635 | 96.94 | 71,357 | 65,464 | 16 |
| GAA | 271 | 170,890 | 3,958,224 | 97.207 | 73,177 | 56,472 | 14 |
| MIX | 147 | 310,702 | 4,038,894 | 96.915 | 124,754 | 91,942 | 19 |
| Metassembler | 150 | 246,346 | 3,935,482 | 97.48 | 92,036 | 63,574 | |
| MAC | 232 | 312,914 | 7,221,147 | 97.322 | 21 | ||
| GAA | 160 | 243,299 | 3,981,614 | 97.713 | 110,446 | 110,446 | 16 |
| MIX | 118 | 310,703 | 4,338,139 | 97.496 | 112,745 | 86,841 | 32 |
| Metassembler | 145 | 246,346 | 3,914,378 | 96.972 | 93,191 | 63,574 | |
| MAC | 87 | 358,265 | 3,997,554 | 97.709 |
The bolded data indicates the highest value of N50 or NGA50 within each comparison.
The experimental results of B. fragilis.
| Contigs num | Largest contig | Size | Genome fraction | N50 | NGA50 | MA | |
|---|---|---|---|---|---|---|---|
| Velvet | 373 | 91,844 | 5,310,336 | 97.661 | 24,465 | 24,465 | 3 |
| ABySS | 87 | 430,487 | 5,380,960 | 98.451 | 130,570 | 130,570 | 2 |
| SOAPdenovo | 79 | 606,530 | 5,341,631 | 98.226 | 246,346 | 246,346 | 0 |
| GAA | 2053 | 16,951 | 10,676,299 | 98.811 | 4,999 | 4,999 | 4 |
| MIX | 87 | 430,487 | 5,380,960 | 98.451 | 130,570 | 130,570 | |
| Metassembler | 256 | 127,644 | 5,317,077 | 97.819 | 40,339 | 39,580 | 3 |
| MAC | 136 | 568,455 | 10,618,547 | 98.812 | 9 | ||
| GAA | 2933 | 429,861 | 15,592,962 | 98.896 | 6,079 | 6,075 | 4 |
| MIX | 55 | 700,546 | 6,089,165 | 98.554 | 353,741 | 380,728 | 9 |
| Metassembler | 194 | 215,440 | 5,317,760 | 97.819 | 57,802 | 57,596 | 3 |
| MAC | 42 | 1,195,331 | 5,355,147 | 98.306 |
The bolded data indicates the highest value of N50 or NGA50 within each comparison.
The experimental results of R. sphaeroides The highest value of N50 or NGA50 within each comparison.
| Contigs num | Largest contig | Size | Genome fraction | N50 | NGA50 | MA | |
|---|---|---|---|---|---|---|---|
| Velvet | 332 | 71,713 | 4,485,514 | 97.419 | 23,979 | 24,300 | 2 |
| ABySS | 382 | 71,578 | 4,503,182 | 97.76 | 21,441 | 21,441 | 1 |
| SOAPdenovo | 354 | 115,051 | 4,527,360 | 97.98 | 33,491 | 33,491 | 1 |
| GAA | 1745 | 9,976 | 8,988,696 | 98.651 | 6,650 | 6,650 | 3 |
| MIX | 274 | 113,766 | 4,728,490 | 97.493 | 35,067 | 28,685 | 35 |
| Metassembler | 325 | 71,713 | 4,480,778 | 97.337 | 23,979 | 23,979 | |
| MAC | 434 | 126,603 | 8,043,496 | 98.718 | 17 | ||
| GAA | 2683 | 13,133 | 13,487,438 | 99.281 | 7,589 | 7,571 | 4 |
| MIX | 237 | 171,915 | 4,982,251 | 98.446 | 51,508 | 41,915 | 22 |
| Metassembler | 323 | 71,713 | 4,477,669 | 97.269 | 23,979 | 23,979 | |
| MAC | 122 | 173,958 | 4,574,809 | 98.282 | 7 |
The bolded data indicates the highest value of N50 or NGA50 within each comparison.