| Literature DB >> 29177508 |
Na L Gao1,2, Chengwei Zhang3,4, Zhanbing Zhang1, Songnian Hu3, Martin J Lercher2, Xing-Ming Zhao5, Peer Bork6,7,8,9, Zhi Liu1, Wei-Hua Chen1.
Abstract
Phages invade microbes, accomplish host lysis and are of vital importance in shaping the community structure of environmental microbiota. More importantly, most phages have very specific hosts; they are thus ideal tools to manipulate environmental microbiota at species-resolution. The main purpose of MVP (Microbe Versus Phage) is to provide a comprehensive catalog of phage-microbe interactions and assist users to select phage(s) that can target (and potentially to manipulate) specific microbes of interest. We first collected 50 782 viral sequences from various sources and clustered them into 33 097 unique viral clusters based on sequence similarity. We then identified 26 572 interactions between 18 608 viral clusters and 9245 prokaryotes (i.e. bacteria and archaea); we established these interactions based on 30 321 evidence entries that we collected from published datasets, public databases and re-analysis of genomic and metagenomic sequences. Based on these interactions, we calculated the host range for each of the phage clusters and accordingly grouped them into subgroups such as 'species-', 'genus-' and 'family-' specific phage clusters. MVP is equipped with a modern, responsive and intuitive interface, and is freely available at: http://mvp.medgenius.info.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29177508 PMCID: PMC5753265 DOI: 10.1093/nar/gkx1124
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Overlaps in phages within data-sources
| Data source | # clusters | % overlap * | Notes |
|---|---|---|---|
| ‘Earth's virome’ project ( | 5412 | 57.4% | Over 3000 samples were sequenced; most are environmental samples |
| Predicted prophages in human gut ( | 1505 | 18.67% | ∼1700 fecal samples from two gut metagenomic studies ( |
| Predicted viral and prophage sequences from complete and draft genomes ( | 7117 | 18.07% | |
| Predicted prophages from NCBI complete genomes ( | 6964 | 15.4% | All available complete prokaryotic genomes (as of May 2017) |
| NCBI reference viral genome database ( | 776 | 0.64% | |
| Predicted prophages from EMBL proGenomes database ( | 3275 | 0.61% | Representative complete prokaryotic genomes (as of May 2017) |
| ICTV | 668 | 0 | Data obtained from the International Committee on Taxonomy of Viruses ( |
* within each data-source, the overlap ratio is defined as proportion of phage clusters containing multiple sequences from the data source, out of the total phage clusters containing any number of sequences from the same data source.
Figure 1.Distribution of the 9245 prokaryotic hosts across the bacterial and archaeal phylogeny at the genus level according to NCBI taxonomy and their associated phage clusters. For each bacterial and archaeal genus-level group, the number daughter species collected in MVP and the corresponding number of associated virial clusters (unique count) are indicated with light-green and red bars. Bacterial and archaeal species that are not collected in MVP are not shown. Bar heights are log-transformed. The tree and the datasets were visualized using Evolview, an online visualization and management tool for customized and annotated phylogenetic trees (55). An interactive version of the tree can be found at: http://www.evolgenius.info/evolview/#shared/mvp2017_stats/462.
Overlaps in host prokaryotes
| Data source | # hosts | % overlap with other data sources* |
|---|---|---|
| ICTV | 11 | 100% |
| ‘Earth's virome’ project ( | 1247 | 79.4% |
| Predicted prophages from EMBL proGenomes database ( | 2549 | 78.6% |
| Predicted prophages from NCBI complete genomes ( | 4398 | 68.18% |
| Predicted prophages in human gut ( | 210 | 67.61% |
| NCBI reference viral genome database ( | 282 | 56.73% |
| Predicted viral and prophage sequences from complete and draft genomes ( | 6388 | 56.6% |
* the overlap ratio is defined as proportion of hosts in a data source that could also found in any of the other data sources.
Figure 2.Most phage clusters have rather narrow host ranges. For phage clusters with at least two hosts, their host ranges were calculated as the LCAs in the NCBI taxonomic database (see ‘Data Generation’ for more details). (A) X-axis: host range of phage clusters, Y-axis: percentage of phage clusters (out of total) with their LCAs in the taxonomic groups. The Y-axis has been log-transformed. (B) X-axis: number of hosts (i.e. phage clusters were grouped into bins according to the numbers of hosts they have); ‘(5,10)’ specifies a subgroup in which phage clusters have >5 and ≤10 hosts. Y-axis, percentage of phage clusters (in each bin) that have host ranges at the ‘species’ or ‘genus’ levels in each subgroup.
Figure 3.A screenshot of the ‘Phages’ page; highlighted are built-in widgets (i.e. functional elements of a web page that serve specific purposes) that enables users to easily find what they are interested. (1) a navigation toolbar that floats on top of the page, allowing users to access our data in pre-organized categories (i.e. ‘microbes’, ‘phages’ and ‘interactions’ and etc.); (2) a global search widget that enables uses to search for microbes and virial clusters with any information, including the taxonomy IDs, scientific names and taxonomic ranks, and then redirect to the corresponding page that the users choose; (3) a set of widgets allowing users to search for (or filter out when the ‘Except for…’ checkbox is selected) the contents of the table below (a list of phages in MVP in this case) with any keywords; (4) a widget allowing users to filter for phage clusters according to the values in the column of ‘Host range’.
Figure 4.A screenshot of the interaction network (only partial) visualized with our built-in visualization tool. Microbes and phage clusters are visualized as light green and pink/reddish circles, respectively, with their sizes (diameters) being propositional to the numbers of the interacting partners (including also those that may not be shown in the visualization). Two colors, namely pink and reddish are used for phages, in order to distinguish those that infect only one host (pink) from those that infect multiple hosts (reddish). Click the text-labels next to the circles, users will be redirect the page for the corresponding microbe or phage cluster. In addition to the canvas, two additional widgets are also provided. The first is the selector at the top of the canvas, from which users can browse or search for a node of interests, select it from the drop-down menu and highlight it and bring it into the middle of the canvas. The other includes two buttons that can be used to export the visualization to an external file in either SVG or PNG format. For more information please consult the Interactions page (http://mvp.medgenius.info/interactions).