| Literature DB >> 34453168 |
Valentina Galata1, Susheel Bhanu Busi1, Benoît Josef Kunath1, Laura de Nies1, Magdalena Calusinska2, Rashi Halder1, Patrick May1, Paul Wilmes1, Cédric Christian Laczny1.
Abstract
Real-world evaluations of metagenomic reconstructions are challenged by distinguishing reconstruction artifacts from genes and proteins present in situ. Here, we evaluate short-read-only, long-read-only and hybrid assembly approaches on four different metagenomic samples of varying complexity. We demonstrate how different assembly approaches affect gene and protein inference, which is particularly relevant for downstream functional analyses. For a human gut microbiome sample, we use complementary metatranscriptomic and metaproteomic data to assess the metagenomic data-based protein predictions. Our findings pave the way for critical assessments of metagenomic reconstructions. We propose a reference-independent solution, which exploits the synergistic effects of multi-omic data integration for the in situ study of microbiomes using long-read sequencing data.Entities:
Keywords: Oxford Nanopore Technologies; functional omics; hybrid assembly; long reads; meta-omics; third-generation sequencing
Mesh:
Year: 2021 PMID: 34453168 PMCID: PMC8575027 DOI: 10.1093/bib/bbab330
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
Figure 1Discrepancy and uniqueness of predicted proteins in assemblies. (i) Number of proteins (total and partial) predicted by Prodigal in each assembly and sample. The color corresponds to the metagenomic assembly approach. (ii) Number of shared predicted proteins which were clustered using MMSesq2 per sample. Each protein cluster was labeled by the combination of assembly tools represented by the clustered proteins (i.e. the assembly where these proteins originated from). The depicted number of shared proteins per assembly tool combination is the total protein count over all associated clusters. Top 20 combinations are shown. The number of proteins found in clusters representing all assembly tools is highlighted in red; the number of proteins exclusive to an assembly is highlighted in orange.
Figure 2Assembly effects on antimicrobial resistance gene identification. (i) Number of hits (‘all’, ‘strict’ and ‘nudged’) for each assembly and sample when searching the assembly proteins in the CARD database using RGI. The NWC sample is not shown because no hits were found in any of its assemblies. ‘Nudged’ hits are loose hits (distant/incomplete homolog) flagged as such by RGI; the remaining hits are ‘strict’ hits. (ii) Number of Antibiotic Resistance Ontologies (AROs), which were covered by ‘strict’ RGI hits by different assemblies per sample. The bar plot shows the number of shared AROs per assembly tools combination. (iii) Metatranscriptomic (metaT) coverage of the two coding sequences (CDSs) from the long-read (LR) assembly constructed with Flye and having a ‘nudged’ RGI hit to ARO 3004454 (a chloramphenicol acetyltransferase) in the GDB sample. The x-axis represents the contig coordinates and the y-axis the metaT coverage. The amino acid sequence of the two CDSs and the ARO is included in the plot.