| Literature DB >> 31481973 |
Lyam Baudry1,2,3, Théo Foutel-Rodier1,2,3, Agnès Thierry1,2, Romain Koszul1,2, Martial Marbouty1,2.
Abstract
Characterizing the complete genomic structure of complex microbial communities would represent a key step toward the understanding of their diversity, dynamics, and evolution. Current metagenomics approaches aiming at this goal are typically done by analyzing millions of short DNA sequences directly extracted from the environment. New experimental and computational approaches are constantly sought for to improve the analysis and interpretation of such data. We developed MetaTOR, an open-source computational solution that bins DNA contigs into individual genomes according to their 3D contact frequencies. Those contacts are quantified by chromosome conformation capture experiments (3C, Hi-C), also known as proximity-ligation approaches, applied to metagenomics samples (meta3C). MetaTOR was applied on 20 meta3C libraries of mice gut microbiota. We quantified the program ability to recover high-quality metagenome-assembled genomes (MAGs) from metagenomic assemblies generated directly from the meta3C libraries. Whereas nine high-quality MAGs are identified in the 148-Mb assembly generated using a single meta3C library, MetaTOR identifies 82 high-quality MAGs in the 763-Mb assembly generated from the merged 20 meta3C libraries, corresponding to nearly a third of the total assembly. Compared to the hybrid binning softwares MetaBAT or CONCOCT, MetaTOR recovered three times more high-quality MAGs. These results underline the potential of 3C-/Hi-C-based approaches in metagenomic projects.Entities:
Keywords: Hi-C; binning algorithm; gut microbiome; metagenome-assembled genomes; metagenomic analysis; metagenomics Hi-C; metagenomics binning
Year: 2019 PMID: 31481973 PMCID: PMC6710406 DOI: 10.3389/fgene.2019.00753
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1MetaTOR pipeline. Schematic representation of the MetaTOR pipeline. (A) MetaTOR is initialized with an assembly and a set of 3C/Hi-C PE reads. (B) [Align] will align, sort, and merge reads to deliver a network of contig interactions. (C) [Partition] will deconvolve the previously defined network using a Louvain iterative procedure and (D) [Binning] will retrieve CCs (FASTA file and corresponding sub-network) from selected partition to evaluate them using CheckM. At this step, it is possible to perform a recursive procedure on selected CCs to split them further into sub-CCs. (F) [Annotation] is an optional step that use HMM models to provide final annotations. (E) The final output of the pipeline is a set of annotated bins.
Meta3C libraries constructed and sequenced.
| Sample | Raw paired-end reads |
|---|---|
| Cage1-day1 | 79 868 626 |
| Cage1-day2 | 38 728 350 |
| Cage1-day3 | 33 173 429 |
| Cage2-day1 | 40 380 356 |
| Cage2-day2 | 62 424 123 |
| Cage2-day3 | 31 436 086 |
| Cage2-day4 | 34 124 320 |
| Cage2-day5 | 48 472 570 |
| Cage2-day6 | 36 129 310 |
| Cage2-day7 | 32 608 370 |
| Cage2-day8 | 43 473 731 |
| Cage2-day9 | 67 768 796 |
| Cage3-day1 | 108 114 353 |
| Cage3-day2 | 39 719 377 |
| Cage3-day3 | 37 792 067 |
| Cage3-day4 | 36 805 550 |
| Cage3-day5 | 34 529 306 |
| Cage3-day6 | 59 092 136 |
| Cage3-day7 | 28 833 461 |
| Cage3-day8 | 30 521 091 |
Assembly metrics. Only the metrics concerning assemblies filtered for the contigs above 500 bp are shown.
| PE reads (filtered) | Total size (contigs > 500 bp) | Contigs > 500 bp | N50 (contigs > 500 bp) | |
|---|---|---|---|---|
| Assembly #1 (cage 3—day 2) | 100,258,683 | 146,319,508 bp | 61,666 | 6,176 bp |
| Assembly #2 (cage 3—samples x 8) | 330,324,521 | 475,681,220 bp | 167,810 | 7,578 bp |
| Assembly #3 (samples x 20) | 813,376,239 | 763,455,888 bp | 237,868 | 12,339 bp |
Network features.
| PE reads (filtered) | Mapped PE reads | Intercontig interactions | Weighted interactions | |
|---|---|---|---|---|
| Assembly #1 | 100,258,683 | 67,994,798 | 6,457,842 | 1,322,003 |
| Assembly #2 | 330,324,521 | 215,768,714 | 30,206,795 | 8,505,609 |
| Assembly #3 | 813,376,239 | 541,384,131 | 96,546,376 | 77,577,924 |
Figure 2MetaTOR partitioning of a complex microbial community. (A) Evolution of the number of CCs, ordered by size categories, during 400 Louvain iterations for assembly n°3 (20 samples). Color represents the amount of DNA in a given CC. Blue: 10 to 100 kb. Red: 100 to 500 kb. Green: > 500 kb. (B) Contact matrix encompassing the 224 largest CCs ordered by size, after 100 Louvain iterations (1 pixel = 200 kb). Y-axis: cumulated DNA size. (C) Completion (red) and contamination (blue) of the 129 CCs containing more than 500 kb after 100 Louvain iterations. Dashed lines: thresholds used to process CCs through a recursive procedure (completion threshold: upper 70%; contamination threshold: upper 10%). (D) Contact map of a highly contaminated CC (CC #3—100% complete—1,400% contaminated) before (left) and after (right) the recursive procedure (10 iterations; 1 pixel: 20 kb). Left map: contigs are ordered by size. Right map: sub-CCs are ordered by size. (E) Completion and contamination of the 269 CCs and sub-CCs bigger than 500 kb defined after the whole procedure. Red: completion. Blue: contamination. (F) Completion (red) and contamination (blue) levels of the sub-CCs retrieved from the original CC #3 after recursive procedure (10 iterations).
Comparison of MetaTOR, CONCOCT, and MetaBAT results.
| Assembly #1 (148 Mb) | Assembly #2 (483 Mb) | Assembly #3 (763 Mb) | |||||
|---|---|---|---|---|---|---|---|
| Nb | Size (bp) | Nb | Size (bp) | Nb | Size (bp) | ||
| Metator | 10 kb < bins < 100 kb | 284 | 7,537,821 | 807 | 21,139,528 | 617 | 15,175,457 |
| 100 kb < bins < 500 kb | 43 | 11,319,827 | 144 | 30,749,287 | 106 | 22,963,515 | |
| Bins > 500 kb | 56 | 119,111,306 | 183 | 399,972,204 | 271 | 685,955,810 | |
| Low-quality MAGs | 31 | 36,042,593 | 97 | 107,071,523 | 96 | 128,486,895 | |
| Medium-quality MAGs | 16 | 47,397,754 | 39 | 131,055,387 | 87 | 285,670,443 | |
| High-quality MAGs | 9 | 35,670,959 | 41 | 140,967,746 | 82 | 259,541,396 | |
| MetaBAT | 10 kb < bins < 100 kb | 0 | 0 | 0 | 0 | 0 | 0 |
| 100 kb < bins < 500 kb | 18 | 5,703,905 | 55 | 17,583,986 | 65 | 24,087,225 | |
| Bins > 500 kb | 36 | 82,290,484 | 126 | 284,973,235 | 172 | 420,081,339 | |
| Low-quality MAGs | 14 | 12,478,196 | 44 | 52,797,176 | 95 | 36,277,628 | |
| Medium-quality MAGs | 21 | 61,439,633 | 73 | 202,719,703 | 143 | 322,230,178 | |
| High-quality MAGs | 0 | 0 | 3 | 5,488,345 | 22 | 58,276,800 | |
| CONCOCT | 10 kb < bins < 100 kb | 11 | 432,808 | 25 | 1,040,872 | 24 | 1,122,733 |
| 100 kb < bins < 500 kb | 7 | 1,351,308 | 23 | 6,275,583 | 6 | 5,193,580 | |
| Bins > 500 kb | 29 | 120,778,514 | 126 | 412,598,588 | 195 | 673,338,423 | |
| Low-quality MAGs | 8 | 17,152,380 | 41 | 76,579,222 | 42 | 70,748,222 | |
| Medium-quality MAGs | 11 | 25,303,368 | 49 | 134,612,509 | 114 | 358,231,099 | |
| High-quality MAGs | 0 | 0 | 11 | 49,146,272 | 12 | 47,807,957 | |
Figure 3Comparison of MetaTOR, MetaBAT, and CONCOCT. CheckM output comparison for the three binning methods applied on the three assemblies tested in this work. (A) Assembly 1 (one meta3C library). (B) Assembly 2 (eight libraries). (C) Assembly 3 (20 libraries). Box plot for completion (left) and contamination (middle) and histogram of retrieved MAGs (right) are presented for the three binning methods. Only MAGs over 500 kb and harboring less than 10% of contamination are analyzed.
Figure 4Statistics of low contaminated reconstructed bins. (A–B) Correlation between completion rate and N50 (A) or mean coverage (B) for bins with a contamination rate below 10%. Blue circles = MetaTOR bins. Purple diamonds = MetaBAT bins. ) (C–D) Box plot for N50 (C) and mean coverage (D) of retrieved bins with a contamination rate below 10% are presented for MetaTOR (blue circles) and MetaBAT (purple diamonds). A t-test shows a clear difference between distribution of bins’ N50 for the two software (C—p-value = 3.9 x 10-7).