| Literature DB >> 26938550 |
Tomoko Mihara1, Yosuke Nishimura2, Yugo Shimizu3, Hiroki Nishiyama4, Genki Yoshikawa5, Hideya Uehara6, Pascal Hingamp7,8, Susumu Goto9, Hiroyuki Ogata10.
Abstract
Environmental genomics can describe all forms of organisms--cellular and viral--present in a community. The analysis of such eco-systems biology data relies heavily on reference databases, e.g., taxonomy or gene function databases. Reference databases of symbiosis sensu lato, although essential for the analysis of organism interaction networks, are lacking. By mining existing databases and literature, we here provide a comprehensive and manually curated database of taxonomic links between viruses and their cellular hosts.Entities:
Keywords: GenomeNet; KEGG; database; genomes; taxonomy; virus-host interactions
Mesh:
Year: 2016 PMID: 26938550 PMCID: PMC4810256 DOI: 10.3390/v8030066
Source DB: PubMed Journal: Viruses ISSN: 1999-4915 Impact factor: 5.048
Figure 1GenomeNet Virus-Host Database. (a) Comparison of the number of viral genomes with host information in different databases; (b) Number of viral genomes in the Virus-Host Database across different groups of viruses with information of host taxonomic domain; (c) Number of viruses in the Virus-Host Database with or without links to host genomic sequence data.
Figure 2Viral and host genomic G + C content. Genomic G+C% for 746 virus-host genome pairs for Caudovirales (a) and 51 other prokaryotic viruses (b) are plotted. Pearson’s correlation coefficients are as follows: Myoviridae: r = 0.755, p = 2.73 × 10−39, n = 206; Myoviridae without tRNA genes: r = 0.945, p = 2.12 × 10−32, n = 65; Myoviridae with tRNA genes: r = 0.703, p = 2.67 × 10−22, n = 141; Podoviridae: r = 0.892, p = 1.63 × 10−40, n = 114; Siphoviridae: r = 0.969, p = 9.94 × 10−261, n = 426; Other bacteriophages: r = 0.864, p = 2.09 × 10−14, n = 45; Archaeal viruses: r = 0.931, p = 6.99 × 10−3, n = 6. Lines in the plot areas indicate linear regressions by the least squares method.
Figure 3Assessment of the host range predictability based on viral genomic similarities. Dot plot of virus genomic similarity estimated by two measures: tetramer similarity (y axis) and protein alignment scores (x axis). Each dot represents a pair of virus genomes. The vertical (x = 3.75) and horizontal (y = 93) lines are the thresholds delineating the top right sector corresponding to same host genus prediction with a false discovery rate of 4.58%. The colors of the dots indicate if the two viruses have the same host (green) or not (red).