| Literature DB >> 26005436 |
Ramy K Aziz1, Bhakti Dwivedi2, Sajia Akhter3, Mya Breitbart2, Robert A Edwards4.
Abstract
Phages are the most abundant biological entities on Earth and play major ecological roles, yet the current sequenced phage genomes do not adequately represent their diversity, and little is known about the abundance and distribution of these sequenced genomes in nature. Although the study of phage ecology has benefited tremendously from the emergence of metagenomic sequencing, a systematic survey of phage genes and genomes in various ecosystems is still lacking, and fundamental questions about phage biology, lifestyle, and ecology remain unanswered. To address these questions and improve comparative analysis of phages in different metagenomes, we screened a core set of publicly available metagenomic samples for sequences related to completely sequenced phages using the web tool, Phage Eco-Locator. We then adopted and deployed an array of mathematical and statistical metrics for a multidimensional estimation of the abundance and distribution of phage genes and genomes in various ecosystems. Experiments using those metrics individually showed their usefulness in emphasizing the pervasive, yet uneven, distribution of known phage sequences in environmental metagenomes. Using these metrics in combination allowed us to resolve phage genomes into clusters that correlated with their genotypes and taxonomic classes as well as their ecological properties. We propose adding this set of metrics to current metaviromic analysis pipelines, where they can provide insight regarding phage mosaicism, habitat specificity, and evolution.Entities:
Keywords: bacteriophage; ecology; genomics; metagenomics; virus
Year: 2015 PMID: 26005436 PMCID: PMC4424905 DOI: 10.3389/fmicb.2015.00381
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Metrics used to describe and compare different metagenomes based on their phage content (metagenome-level metrics).
| Abundance index (AI) of phage X | nHits of phage X/size of metagenome Y (Mbp) | 0–1.244 | This value describes the fraction of a metagenome library that matches a given phage genome. Dividing the number of sequence hits by the metagenome size (in millions of basepairs) permits comparison of different metagenomic samples. |
| Total AI | Σ nHits of a set of phages/size of metagenome Y (Mbp) | 4.067–28.859 | This value reflects the abundance of all sequences with similarity to phages in a metagenomic library. |
| Median AI (AI50) | AI of the 50th percentile phage genome | 0–3.061 | This value gives an indication of the abundance of sequences with similarity to phages within a metagenomic library and is less sensitive to outliers than Total AI; however, it may underestimate real differences between samples (e.g., if more than half of the phage genomes have no sequence similarities to a metagenomic library, AI50 will be zero regardless of whether the total abundance of the remaining phage genomes is high or low). |
| nPhages (richness) | Number of phage genomes which match at least one sequence read in metagenome Y | 8–487 | This value is a proxy for |
| Shannon Diversity Index | H = -Σ | 2.061–5.813 | This value (Shannon, |
| Shannon E (evenness) | E = H/ln nPhages | 0.008–0.258 | This value describes the |
Metrics used to describe phage ecological features at the genome level.
| Phage abundance index (PAI) | Σ AI of phage X (hits per Mbp)/length of phage X (in Kbp) | 0–194.84 | This value describes the abundance of a phage in a set of environments. Normalizing the AI of each phage genome to the genome length allows the comparison of different phages. This normalization is useful for most phages; however, it might artificially inflate PAI value if the phage genome is significantly smaller than the median genome size, which is ~41 Kbp (e.g., microviruses, with 4 Kbp genomes) |
| nMG | Number of metagenomes with hits to phage X | 0–293 | This value reflects the ubiquity of a particular phage genome. A high nMG suggests that a phage genome (or part of it) is universally distributed or cosmopolitan; a low nMG suggests that the phage is localized or ecologically limited (i.e., specific to one or a few habitats). |
| PAI50 | Median AI of phage X in all tested metagenomes/length of phage X | 0–0.13 | This value is another indication of the abundance of a phage genome in different metagenomic samples and is less sensitive to outliers. It is also dependent on the ubiquity of a phage genome since PAI50 of phage genomes present in fewer than half samples, for example, will be zero, even if these genomes have a high PAI. |
| Abund. CV (Coefficient of variation) | StDev/Mean AI of phage X | 0.86–17.20 | This value reflects the spread or variation of AIs of a given phage among metagenomes. A large CV suggests that a phage genome has extreme AIs while a small CV suggests uniform AI values (but doesn't give information on their magnitude). |
Representative examples of each value are given in Figure .
Figure 1Phage distribution metrics. Inter-phage metrics and statistics quantifying different aspects of phage abundance and distribution in 296 metagenomic samples. Graphical examples show the phage genomes at the high and low ends of each parameter. X-axes represent the metagenomes (MG) listed in the same order as in Table S1 (i.e., grouped by environment). Y-axes are in logarithmic scales.
Metrics used to describe phage ecological features at the nucleotide level.
| Coverage density (AUC/nNuc) | Area of a genome coverage plot (area-under-the curve) normalized to the total number of nucleotides in the phage genome. | 0–2.920 | This value is similar to the total abundance of a phage in all metagenomes; however, it also considers each nucleotide covered in the phage genome and not just the number of sequence reads that match that genome. |
| Density per metagenome (cumulative AUC/nMG) | Average overall phage density divided by the number of metagenomes. | 126–1.71 × 106 | This value normalizes the coverage density to the number of metagenomes in which the phage genome is found. It differentiates between the densities of ubiquitous phages (high nMG) and that of habitat-specific phages (low nMG). |
| %genome covered | Fraction of the phage genome that matches at least one metagenomic sequence. | 0–100% | This value reflects the homogeneity of overall phage coverage in metagenomes as well as the gaps in coverage. It marks areas within a phage genome that have not been matched in any metagenomic sample, but is magnitude-independent—thus does not show which areas of the genome are overrepresented. A %genome coverage of 40% means that combined uncovered gaps are 60%. |
| Gene coverage evenness | Adapted Shannon Evenness Index (Shannon E) of the coverage of phage genes. | 0–0.92 | This value reflects whether protein-encoding genes within a phage genome are equally represented relative to each other. A gene evenness of one means that all phage genes are equally represented (regardless of the magnitude of their coverage), while low evenness values suggest possible non-specific or cross-matching genes (i.e., parts or all of the phage genome is absent). |
| Coverage coefficient of variation (CV) | Standard deviation of coverage density/Mean coverage density (Coverage density = AUC/nNuc) | 0.76–12.58 | This value reflects the variation or spread of coverage along a phage genome. Typically a phage genome coverage plot with high CV has higher coverage values for certain parts of the genome and zero values for other parts. |
| Median coverage density | Median number of hits per nucleotide per phage | 0–686 | Less sensitive to extreme values, the median coverage density provides another indicator of the homogeneity of phage genome coverage in metagenomic samples. |
| Coverage kurtosis | Kurtosis equation: | 0.02–423.12 | Kurtosis is a statistical measure of uniformity or lack thereof within a frequency distribution curve. It is often used as a measure of skewness, bimodality, or “peakiness” of a distribution plot. It has been adopted here to reflect the irregularity of a phage coverage density plot. If a phage genome coverage plot has high kurtosis, this means that some areas of this genome have sharp coverage peaks while others have low or no coverage values. Negative kurtosis values reflect flatter coverage plots but do not provide information about the coverage magnitude. |
Representative examples of each value are given in Figure .
Figure 2Phage coverage metrics, including (A) density and (B) uniformity estimates. Graphical examples show high and low ends of each parameter used. X-axes represent the genome coordinates while Y-axes represent number of hits to each nucleotide. Graphs are scaled differently. The coverage plots are for the following phages: (A) Staphylococcus phage 44AHJD compared to Cyanophage P-SSM2; Salterprovirus His2 virus compared to Mycobacteriophage TM4; Lactococcus phage asccphi28 compared to Cyanophage P-SSM2. (B) Mycoplasma virus P1 compared to Bacteriophage VWB; Mycobacterophage Cooper compared to Burkholderia cenocepacia phage BcepB1A; Chlamydia phage phiCPAR39 compared to Enterobacteria phage P1.
Figure 3Scatter plots showing correlation between (A) abundance and ubiquity or (B) gene evenness and % genome coverage of 588 viruses in 296 metagenomes. Data points are labeled according to phage family (different colors), and nucleic acid content (circles: dsDNA phages; crosses: other phages, i.e., ssRNA, dsRNA, and ssDNA phages). Correlation coefficients (r) are shown for all phages and for dsDNA phages alone.
Examples of the lowest and highest scoring metagenomes or phages according to different metrics.
| Total AI | Lung samples (Table | Hydrostation S, Sargasso Sea, Bermuda (open ocean) (Value = 28.859) |
| Median AI (AI50) | Lung samples (Table | Chesapeake Bay, MD (estuary): Chesapeake Bay Virioplankton–Station 834 (Value = 3.061) |
| nPhages | Viral data from the human lung (Sample 109) Value = 8 phages | AntarcticaAquatic_5–Marine-derived lake (Value = 487 phages) |
| Shannon Diversity Index | Viral data from the human lung (Sample 109) Value = 2.061 | Stool metagenome (sample 179) Value = 5.813 |
| Shannon evenness E | GS051 Shotgun–Coral Reef Atoll–Polynesia Archipelagos–Rangirora Atoll–Fr. Polynesia (Value = 0.008) | Viral data from the human lung (sample 109) Value = 0.258 |
| Phage abundance index (PAI) | Eleven out of 17 RNA phages have zero values | |
| PAI50 | T4-like cyanophage P-SSM2 (Value = 0.13) | |
| nMG | Eleven RNA viruses have zero values; | T4-like cyanophage P-SSM2 (Value = 293) |
| Abund. CV | Myoviridae Bacillus phage 0305phi8-36 (Value = 0.86) | Ralstonia phage P12 J (dsDNA, Value = 14.4), |
| Coverage density | Levivirus Enterobacteria phage MS2 (ssRNA, Value = 0.04); | Coliphage phiX174 (ssDNA, Value = 2.920); T4-like cyanophage P-SSM2 (1.989) |
| Density per metagenome | Enterobacteria phage MS2 (ssDNA, Value = 126); | T4-like cyanophage P-SSM2 (1.71 × 106) |
| %genome covered | Salterprovirus His 2 (Value = 10%; lowest non-zero value for a dsDNA virus) | Mycobacteriophages Rosebush and Cooper (Value = 100%) |
| Gene coverage evenness | Bacteriophage VWB (Value = 0.918) and | |
| Spread (CV) | Actinoplanes phage phiAsp (Value = 0.757) | |
| Coverage kurtosis | Enterobacteria phage P1 (Value = 423.12) | |
| Median density | T4-like cyanophage P-SSM2 (Value = 686) | |
If the high end is not a dsDNA phage, the next highest/lowest dsDNA phage is also shown.
Figure 4Principal component analysis of phage genomes according to their ecological properties. All phages were compared based on 11 metrics, then the 11 dimensions were reduced into two principal components that explain most of the variance. Circles represent dsDNA phages and x signs represent other types of phage genomes; colors represent different phage classes. Examples of phages and groups of phage discussed in the text are labeled.