| Literature DB >> 33158461 |
Lei Liu1,2,3, Yulin Wang1, You Che1, Yiqiang Chen1, Yu Xia1,2,3, Ruibang Luo4, Suk Hang Cheng5, Chunmiao Zheng6,7, Tong Zhang8,9.
Abstract
BACKGROUND: Genome-centric approaches are widely used to investigate microbial compositions, dynamics, ecology, and interactions within various environmental systems. Hundreds or even thousands of genomes could be retrieved in a single study contributed by the cost-effective short-read sequencing and developed assembly/binning pipelines. However, conventional binning methods usually yield highly fragmented draft genomes that limit our ability to comprehensively understand these microbial communities. Thus, to leverage advantage of both the long and short reads to retrieve more complete genomes from environmental samples is a must-do task to move this direction forward.Entities:
Keywords: High contiguity; Iterative hybrid assembly; One-stage partial-nitritation anammox
Mesh:
Substances:
Year: 2020 PMID: 33158461 PMCID: PMC7648391 DOI: 10.1186/s40168-020-00937-3
Source DB: PubMed Journal: Microbiome ISSN: 2049-2618 Impact factor: 14.650
Fig. 1Iterative hybrid assembly workflow for enrichment ecosystems. Subsampled short reads (SRs) and long reads (LRs) are first prepared from the total sequenced dataset or from the simplified dataset that eliminates reads used for assembling qualified MAGs (step 1). SRs and LRs are then combined for hybrid assembly using Unicycler and then binning with MetaWRAP (step 2). Qualified MAGs with high quality and high contiguity were selected (step 3). SRs and LRs clusters for residual microbes in the community are obtained by taking out reads of qualified MAGs and then subsampling for the next run (step 4)
Fig. 2Assembler performance evaluation using the PNA-with-Spiked-Mock dataset. a Genome recoveries by the four assemblers, megahit (n = 7), metaSPAdes (n = 7), OPERA-MS (n = 7), and Unicycler (n = 8). b Reconstructed genome purity and the aligned genome fraction against the reference genome. c Genome continuity (NGA50) for the four assemblers as a function of genome coverage in short-read dataset. d Misassembly event in the assembled genome (one circle, one genome), with the orange triangle and red dashed line indicating the average and median value of the assembled genome misassembly event, respectively. e Gene recovery ratio and percentages of genes assembled by the four assemblers
Genome features for 27 high quality and high contiguity MAGs recovered from the PNA ecosystem using the Iterative Hybrid Assembly (IHA) approach. Taxonomic assignments were identified using GTDB-Tk
| MAGs ID | Status | Genome size (Mbp) | N50 (Mbp) | No. of contigs | GC | C/C (%) | Features (CDS/rRNA/tRNA) | Taxonomy |
|---|---|---|---|---|---|---|---|---|
| H1_BAC1 | 3.379 | 3.379 | 1 | 0.460 | 98.5/0.3 | 3138/3/40 | ||
| H1_PLA1 | 3.610 | 3.610 | 1 | 0.700 | 98.9/0.0 | 3039/3/49 | ||
| H1_PLA2 | Circular | 3.877 | 3.877 | 1 | 0.633 | 96.6/0.0 | 3140/3/51 | |
| H1_AMX1 | Circular | 3.495 | 3.495 | 1 | 0.425 | 100.0/1.6 | 3099/3/47 | |
| H1_AOB3 | 2.842 | 2.842 | 1 | 0.512 | 100.0/0.3 | 2674/3/43 | ||
| H1_ARM1 | Circular | 2.897 | 2.897 | 1 | 0.609 | 94.9/0.9 | 2665/3/48 | |
| H1_PLA3 | Circular | 3.887 | 3.887 | 1 | 0.673 | 96.6/0.0 | 3032/3/53 | |
| H1_PLA4 | Circular | 3.841 | 3.841 | 1 | 0.672 | 97.7/0.0 | 3229/3/54 | |
| H1_AOB1 | Circular | 3.279 | 3.279 | 1 | 0.499 | 99.8/0.0 | 3122/3/38 | |
| H1_CFX2 | 4.791 | 4.742 | 4 | 0.531 | 93.6/3.6 | 4575/3/50 | ||
| H1_AOB2 | Circular | 3.217 | 3.217 | 1 | 0.495 | 99.9/0.0 | 3074/3/39 | |
| H1_BAC2 | 2.842 | 2.842 | 1 | 0.495 | 97.0/1.1 | 2326/6/44 | ||
| H1_PLA5 | 5.011 | 3.408 | 4 | 0.610 | 97.7/1.1 | 4106/3/52 | ||
| H1_PLA6 | 5.026 | 3.822 | 2 | 0.650 | 97.7/1.7 | 4074/3/51 | ||
| H2_CFX3 | 4.684 | 1.286 | 5 | 0.566 | 100.0/2.7 | 3904/6/49 | ||
| H2_MYX1 | 9.475 | 3.220 | 5 | 0.698 | 93.9/2.6 | 8465/3/79 | ||
| H2_BAC3 | Circular | 4.213 | 4.213 | 1 | 0.441 | 98.9/0.0 | 3318/3/43 | |
| H2_PRO1 | 3.105 | 3.105 | 1 | 0.686 | 91.0/1.5 | 2810/3/51 | ||
| H3_ACI1 | 2.971 | 1.982 | 2 | 0.538 | 96.6/2.6 | 2688/3/47 | ||
| H3_CFX4 | 3.148 | 1.455 | 4 | 0.667 | 93.1/1.0 | 3207/3/48 | ||
| H3_PLA7 | 3.869 | 0.987 | 5 | 0.653 | 94.3/3.4 | 3180/3/67 | ||
| H5_SPI1 | 4.129 | 2.456 | 4 | 0.531 | 96.5/0.0 | 4087/3/40 | ||
| H5_BAC4 | 3.186 | 2.101 | 3 | 0.388 | 99.5/1.7 | 2670/3/42 | ||
| H5_UNCL1 | 4.583 | 2.293 | 3 | 0.545 | 96.6/2.7 | 3627/3/49 | ||
| H5_CFX6 | 2.882 | 1.591 | 2 | 0.525 | 92.7/0.9 | 2714/3/48 | ||
| H5_SPI2 | 3.423 | 2.007 | 4 | 0.346 | 100.0/0.0 | 3317/6/37 | ||
| H5_BAC6 | 3.824 | 2.408 | 4 | 0.348 | 99.4/0.0 | 3310/3/44 |
Note: MAGs ID is short for the phylum-level taxonomy assigned using GTDB-Tk. The red asterisk means the genome was classified using the identified 16S rRNA, which had a significant improvement in classification. The column “C/C” stands for the completeness and contamination for each reconstructed MAG evaluated by using CheckM. The “Taxonomy” column contained the phylum level assignments, as well as the highest clade level annotations so far in the bracket
Fig. 3Assembly of a deep-sequenced PNA microbiome by using an iterative hybrid assembly (IHA) approach. a Coverage distribution of assembled MAGs after five cycles in short reads and long reads datasets that were normalized to 10 Gbp. b, c Contig numbers and N50 distributions for 47 paired MAGs assembled by using the hybrid and SR only approaches. Two MAGs only assembled by the hybrid approach were not involved in the comparison
Fig. 5Community composition of the 49 MAGs assembled by using the iterative hybrid assembly (IHA) approach. a Taxonomic composition of the assembled MAGs, with only the top four most frequently observed lineages in the PNA system displayed and the remaining taxon grouped into other classified taxa. b Contig counts for each assembled MAG and their relative abundance in the community. The blue and brown circles indicated MAGs with relative abundance < 0.5% and > 0.5% in the community, respectively
Fig. 4Genomes comparison retrieved by using two strategies. a Unweighted functional gene set distributions for 22 paired MAGs. The red asterisk indicates that genes in sets II and III accounted for more than 3% of the total functional genes from the representative genome. b Circular closed genome of Ca. Brocadia. The outermost ring stands for the closed genome reconstructed by hybrid assembly approach. The middle ring stands for the MAG reconstructed by short reads-only assembly method, which was mapped to the hybrid assembled genome. The inner ring indicated the key nitrogen metabolism gene associated with the anammox process and transposases predicted by Prokka