| Literature DB >> 33074335 |
Senying Lai1, Longhao Jia1, Balakrishnan Subramanian2, Shaojun Pan1, Jinglong Zhang1, Yanqi Dong1, Wei-Hua Chen2,3, Xing-Ming Zhao1,4,5.
Abstract
Extrachromosomal mobile genetic elements (eMGEs), including phages and plasmids, that can move across different microbes, play important roles in genome evolution and shaping the structure of microbial communities. However, we still know very little about eMGEs, especially their abundances, distributions and putative functions in microbiomes. Thus, a comprehensive description of eMGEs is of great utility. Here we present mMGE, a comprehensive catalog of 517 251 non-redundant eMGEs, including 92 492 plasmids and 424 759 phages, derived from diverse body sites of 66 425 human metagenomic samples. About half the eMGEs could be further grouped into 70 074 clusters using relaxed criteria (referred as to eMGE clusters below). We provide extensive annotations of the identified eMGEs including sequence characteristics, taxonomy affiliation, gene contents and their prokaryotic hosts. We also calculate the prevalence, both within and across samples for each eMGE and eMGE cluster, enabling users to see putative associations of eMGEs with human phenotypes or their distribution preferences. All eMGE records can be browsed or queried in multiple ways, such as eMGE clusters, metagenomic samples and associated hosts. The mMGE is equipped with a user-friendly interface and a BLAST server, facilitating easy access/queries to all its contents easily. mMGE is freely available for academic use at: https://mgedb.comp-sysbio.org.Entities:
Year: 2021 PMID: 33074335 PMCID: PMC7778953 DOI: 10.1093/nar/gkaa869
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The overall workflow of mMGE. (A) Data pre-processing. A total of 66 425 human metagenomic samples and associated meta-data were collected, followed by pre-processing and assembly of raw sequencing reads. (B) eMGE identification. State-of-art toolsets were used to identify eMGEs. (C) eMGE annotation. Comprehensive annotations were provided for the eMGEs, including putative protein function and host information, etc; (D) Abundance calculation. Abundances and prevalence of the eMGEs across samples were also determined. See ‘Materials and Methods’ section for more details.
The distribution of metagenomic samples included in mMGE across diverse human body sites
| Body site | #samples | #projects | #associated phenotypes | #associated countries |
|---|---|---|---|---|
| Gut | 41 841 | 233 | 63 | 42 |
| Oral cavity | 11 313 | 41 | 9 | 9 |
| Skin | 5384 | 30 | 7 | 7 |
| Blood | 2976 | 20 | 26 | 8 |
| Nasopharyngeal | 1930 | 16 | 9 | 6 |
| Vagina | 1028 | 6 | 1 | 3 |
| Sputum | 379 | 4 | 1 | 6 |
| Eye | 229 | 10 | 1 | 2 |
| Urethra | 123 | 5 | 3 | 3 |
| Tooth | 106 | 3 | 4 | 3 |
| Reproductive system | 76 | 1 | 0 | 1 |
| Milk | 60 | 2 | 0 | 2 |
| Trachea | 33 | 1 | 2 | 1 |
| Lung | 25 | 2 | 2 | 1 |
| Liver | 20 | 2 | 2 | 2 |
| Circulatory system | 12 | 1 | 1 | 0 |
| Lymphatic system | 11 | 3 | 3 | 2 |
| Excretory system | 1 | 1 | 1 | 1 |
Figure 2.The distribution of identified eMGE-hosts. (A) The number of eMGE populations associated with their corresponding bacterial and archaeal host phyla; The inset with blue background provides resolution for the low frequency bacteria host phyla and each letter on the y-axis corresponds to the first letter of host phyla's name. (B) The number of eMGE clusters distributed across different host range levels.
Figure 3.Contents of mMGE and comparisons with public databases. (A) The completeness and quality of phage contigs estimated by CheckV for mMGE, IMG/VR and GVD, where high denotes high quality and the same for medium and low; (B) and (C) The venn diagram of plasmids and phages from different sources, where all contigs were dereplicated at population level and decontaminated with CheckV and only phage populations from human samples were considered; (D–F) The percentage of mapped reads for phages or plasmids from HMP dataset (D and E) and Virome dataset (F), where the HMP dataset includes 20 samples from PRJNA48479 and the Virome dataset contains viral enriched samples that came from PRJNA588313.
Figure 4.The user-friendly web interface of mMGE. (A) The ‘eMGE’ page shows the basic information of eMGEs; (B) The ‘eMGE cluster’ page shows the information about eMGE clusters; (C) The ‘Interaction’ page presents the interactions between eMGEs and their hosts; (D) The ‘Data’ page shows the information about each sample and project; (E) The ‘Proteins’ page presents the protein content of each eMGE. Those pages can be cross searched to provide more detailed information of eMGEs or eMGE clusters.
Figure 5.The example matrix view of protein clusters within the eMGE cluster ‘MC_62’. The columns correspond to the protein clusters while the rows represent eMGE members within this cluster. The protein clusters were colored according to their functional annotations.