| Literature DB >> 35937545 |
Xianglilan Zhang1, Ruohan Wang2, Xiangcheng Xie3, Yunjia Hu4, Jianping Wang2, Qiang Sun5, Xikang Feng6, Wei Lin4, Shanwei Tong7, Wei Yan8, Huiqi Wen1, Mengyao Wang2, Shixiang Zhai9, Cheng Sun10, Fangyi Wang11, Qi Niu10, Andrew M Kropinski12, Yujun Cui1, Xiaofang Jiang8, Shaoliang Peng10, Shuaicheng Li2, Yigang Tong4.
Abstract
Temperate phages (active prophages induced from bacteria) help control pathogenicity, modulate community structure, and maintain gut homeostasis. Complete phage genome sequences are indispensable for understanding phage biology. Traditional plaque techniques are inapplicable to temperate phages due to their lysogenicity, curbing their identification and characterization. Existing bioinformatics tools for prophage prediction usually fail to detect accurate and complete temperate phage genomes. This study proposes a novel computational temperate phage detection method (TemPhD) mining both the integrated active prophages and their spontaneously induced forms (temperate phages) from next-generation sequencing raw data. Applying the method to the available dataset resulted in 192 326 complete temperate phage genomes with different host species, expanding the existing number of complete temperate phage genomes by more than 100-fold. The wet-lab experiments demonstrated that TemPhD can accurately determine the complete genome sequences of the temperate phages, with exact flanking sites, outperforming other state-of-the-art prophage prediction methods. Our analysis indicates that temperate phages are likely to function in the microbial evolution by (i) cross-infecting different bacterial host species; (ii) transferring antibiotic resistance and virulence genes and (iii) interacting with hosts through restriction-modification and CRISPR/anti-CRISPR systems. This work provides a comprehensively complete temperate phage genome database and relevant information, which can serve as a valuable resource for phage research.Entities:
Year: 2022 PMID: 35937545 PMCID: PMC9346568 DOI: 10.1093/nargab/lqac057
Source DB: PubMed Journal: NAR Genom Bioinform ISSN: 2631-9268
General characteristics of our method TemPhD and eight commonly used prophage prediction tools. In the feature of ‘Latest Update’, we listed the years when the methods were last updated, not the years when the methods were first shown in public. The last updated years can be founded on their websites. In the feature of ‘Boundary Identification’, the ‘repeated sequence’ is used to identify the flanking sites (attP and attB sites) of temperate phage and its integrated host genome, the paired-end reads of ‘NGS data’ is used to check the completeness of the temperate phage genome sequence
|
|
|
|
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|---|---|---|---|
|
| 2021 | 2020 | 2019 | 2016 | 2021 | 2015 | 2021 | 2012 | 2006 | |
|
| Temperate phage (active prophage) | Virus | Prophage | Prophage | Virus | Prophage | Prophage | Prophage | Prophage | |
|
| capture replication process of temperate phages | computational prediction | computational prediction | computational prediction | computational prediction | computational prediction | computational prediction | computational prediction | computational prediction | |
|
| Bacterial NGS data | Assembled bacterial genome sequence | Assembled bacterial genome sequence | Assembled bacterial genome sequence | Assembled bacterial genome sequence | Bacterial genome sequence in GenBank format | Assembled bacterial genome sequence | Assembled bacterial genome sequence | Assembled bacterial genome sequence | |
|
|
| Alignment-based | Annotation-based | Alignment-based | Alignment-based | Alignment-based | Statistic-based | Machine learning models | Alignment-based | Alignment-based |
|
| Repeated sequence and NGS data | — | Repeated sequence | Repeated sequence | — | — | — | Repeated sequence | — | |
|
| Detecting with NGS data | Annotation-based | Machine learning models | Scoring | — | — | — | — | — | |
|
| Temperate phages (real active prophage) | Prophage/negative | active/ambiguous/ inactive | intact/questionable/ incomplete | full/partial | prophage/negative | prophage/negative | prophage/negative | prophage/negative | |
|
|
|
|
|
|
|
|
|
|
| |
Figure 1.Workflow of our temperate phage detection method and illustration of temperate phage induction and integration processes. The main step of temperate phage detection is based on the temperate phage induction and integration processes, which is illustrated at the bottom of the figure. attP is short for attachment site of phage, attB represents the attachment site of host strain, attL stands for the left attachment site after integration, attR is short for the right attachment site after integration, O stands for core region of phage and bacterium, B represents host, while P represents phage. Temperate phage is also called prophage after integrating into a lysogenic host strain.
Figure 2.The connected network of temperate phages (edge) with their hosts (nodes). All data labeled as metagenome or unclassified by GenBank are not included. The more temperate phages shared, the thicker the line (the node) is. The gray line represents the same phages identified within the same host genus, while the yellow line represents the phages identified across the host genera.
Figure 3.Genome size and GC content distribution of temperate phages in the top 20 host species listed on NCBI. We also display the GC content distribution of the top 20 host species here. We keep the original names in the item of NCBI host species, including bacterium and human gut metagenome. The ‘bacterium’ relates to NCBI taxonomy ID 1869227 which includes all the unclassified bacteria that have not been separated into NCBI taxonomic hierarchies.
Figure 4.Phylogenetic relationships of the host species shown in Figure 2. (A) Phylogenetic relationships of the host species that have identical temperate phages with Klebsiella pneumonia. (B) Phylogenetic relationships of the host species that have identical temperate phages with Escherichia coli. (C) Phylogenetic relationships of the phage entries that constitute the phage clusters in Figure 3. The numbers at the tips of the branches represent the phage entries. Our study used phage entry as a short form for nonredundant complete temperate phage genome sequence.
Figure 5.Gene-sharing networks were built using all the 147 phage entries formed host-sharing clusters in Figure 2 and bacterial virus genomes retrieved from Viral RefSeq v.94. VCs were obtained by vConTACT2.