| Literature DB >> 34551431 |
Kaihao Tang1,2, Weiquan Wang1,2,3, Yamin Sun4, Yiqing Zhou1,2,3, Pengxia Wang1,2,3, Yunxue Guo1,2,3, Xiaoxue Wang1,2,3.
Abstract
The life cycle of temperate phages includes a lysogenic cycle stage when the phage integrates into the host genome and becomes a prophage. However, the identification of prophages that are highly divergent from known phages remains challenging. In this study, by taking advantage of the lysis-lysogeny switch of temperate phages, we designed Prophage Tracer, a tool for recognizing active prophages in prokaryotic genomes using short-read sequencing data, independent of phage gene similarity searching. Prophage Tracer uses the criterion of overlapping split-read alignment to recognize discriminative reads that contain bacterial (attB) and phage (attP) att sites representing prophage excision signals. Performance testing showed that Prophage Tracer could predict known prophages with precise boundaries, as well as novel prophages. Two novel prophages, dsDNA and ssDNA, encoding highly divergent major capsid proteins, were identified in coral-associated bacteria. Prophage Tracer is a reliable data mining tool for the identification of novel temperate phages and mobile genetic elements. The code for the Prophage Tracer is publicly available at https://github.com/WangLab-SCSIO/Prophage_Tracer.Entities:
Mesh:
Year: 2021 PMID: 34551431 PMCID: PMC8682789 DOI: 10.1093/nar/gkab824
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Prophage Tracer workflow using overlapping split-read alignment to detect prophages. (A) The workflow schematics of Prophage Tracer including extracting, clustering and filtering steps. (B) Reads containing attB or attP caused by prophage excision can generate overlapping alignments (overlapping length is approximately equal to att sites), which can be a discriminative signal for prophage detection. SRs: split reads; DRPs: discordant read pairs.
Figure 2.Performance comparison of Prophage Tracer and LUMPY using simulated data. (A) Comparison of sensitivity for prophage detection. Sensitivity is defined as the average ratio of positive hits of three rounds of simulated data (each round with 20 genomes). The ratio of the host genome, host genome with prophage excised and circular prophage genome (WT: attB: attP) is on the top of each panel. (B) The average relative ratio of recovered split reads between LUMPY and Prophage Tracer. (C) The recovered split reads by Prophage Tracer and LUMPY from simulated data with att sites ranging from 2 to160 bp. Expected split reads in the SAM file using simulated data was extracted according to CIGAR strings of aMbS or aSbM (integer values of a and b from 1–149) mapping at expected prophage positions. Detailed information on the simulated data is listed in Supplementary Table S3.
Prediction of known prophages in four representative strains
| Strains | Prophage | Contig |
|
|
|
| Size (bp) | Length of | References |
|---|---|---|---|---|---|---|---|---|---|
|
| Pf4 | NC_002516.2 | 785288 | 785336 | 797699 | 797747 | 12411 | 49 | ( |
|
| CP4So | NC_004347.2 | 1501853 | 1501946 | 1538064 | 1538157 | 36211 | 94 | ( |
| LambdaSo | NC_004347.2 | 3074594 | 3074605 | 3126435 | 3126446 | 51841 | 12 | ( | |
|
| rac | NZ_CP009273.1 | 1406156 | 1406198 | 1429216 | 1429258 | 23060 | 43 | ( |
|
| Φ10403S | NC_017544.1 | 2319845 | 2319847 | 2357456 | 2357458 | 37611 | 3 | ( |
Comparison of outputs of the predicted active prophages by Prophage Tracer with PHASTER or Prophage Hunter in seven coral-associated bacterial strainsa
| Prophage Tracer | ||||||||
|---|---|---|---|---|---|---|---|---|
| Strain name | Prophage |
|
|
|
| Size | PHASTERb | Prophage Hunterc |
|
| Pea1 | 1888722 | 1888741 | 1936851 | 1936870 | 48129 | Questionable ( | Active (0.9): 1885416–1903092 |
| 1895962–1915899 | Active (0.97): 1888722–1936870 | |||||||
| Active (0.91): 1917844–1948048 | ||||||||
|
| Prc1 | 1373446 | 1373460 | 1379997 | 1380011 | 6551 | - | Inactive (0.12):1362384–1392851 |
|
| Phm1 | 292609 | 292683 | 333351 | 333425 | 40742 | Intact (150): | Inactive (0.14): 274307–310741 |
| 293153–331437 | Active (0.9): 292613–333425 | |||||||
| Phm2 | 1064123 | 1064145 | 1100156 | 1100178 | 36033 | Intact (150): | Ambiguous (0.73): 1064268–1100128 | |
| 1075299–1101737 | ||||||||
| Phm3 | 2090511 | 2090576 | 2139945 | 2140010 | 49434 | Incomplete ( | Active (0.93): 2077440–2104589 | |
| 2090437–2116834 | Active (0.97): 2090511–2140010 | |||||||
| Ambiguous (0.76): 2124591–2138678 | ||||||||
|
| Pvn1 | 353280 | 353303 | 367745 | 367768 | 14465 | - | Ambiguous (0.77): 350448–374105 |
|
| Pmo1 | 2643352 | 2643371 | 2676198 | 2676217 | 32846 | - | - |
|
| Pms1 | 2668021 | 2668042 | 2679241 | 2679262 | 11220 | - | Inactive (0.34): 2648530–2674443 |
|
| Pzm1 | 1472262 | 1472314 | 1512357 | 1512409 | 40095 | Incomplete ( | Ambiguous (0.72): 1461300–1484415 |
| 1486303–1511274 | Active (0.92): 1469187–1485662 | |||||||
| Active (0.95): 1487346–1517819 | ||||||||
| Inactive (0.26): 1510309–1532089 | ||||||||
aFull outputs of these three tools and LUMPY are shown in Supplementary Table S6.
bOutputs of prophage regions predicted by PHASTER (the scores are in parenthesis and the predicted ends are shown). ‘–’ indicates ‘not detected’.
cOutputs of prophage regions predicted by Prophage Hunter (the scores are in parenthesis and the predicted ends are shown). ‘–’ indicates ‘not detected’.
Figure 3.Gene maps and phylogenetic analysis of major capsid proteins of representative prophages. Gene maps of Pcr1 (A), Pvn1 (B) and Pms1 (C). Gene orientation of circular genomes was adjusted to make the aligned major capsid proteins. All the genomes are on the same scale as indicated. Genes are represented by block arrows and are colored according to gene function. Homologs of hypothetical proteins in (B) are indicated in black. Unrooted maximum likelihood trees of MCP homologs of Pcr1 (D), Pvn1 (E) and Pms1 (F). MCPs from isolated or uncultured viruses are highlighted in the trees, and MCPs from prophages are indicated as branches. Branch lengths are proportional to the number of amino acid substitutions.
Figure 4.Prophage Tracer combined with qPCR to estimate the fold-change of prophage excision rate with or without mitomycin C. (A) Read counts in the outputs of Prophage Tracer of seven coral-associated bacterial strains with or without mitomycin C. SR, split read; DRP, discordant read pair. ‘–’ indicates ‘not detected’ or ‘unable to calculate’. The calculation of the fold-change of excision rate using read counts in the outputs of Prophage Tracer (if a zero is in the dividend, use one instead of zero). Prophage Tracer outputs using contig-level genomes are shown at left bottom. ‘::’ indicates a potential junction of two contigs. ‘ = contig’ indicates left junction and ‘contig = ’ indicates right junction. Full outputs including positions of att sites on each contig are shown in Supplementary Table S8. (B) Excision rates of Phm1, Phm2 and Phm3 prophages in SCSIO 43005 quantified by qPCR. Fold-change are indicated for Phm1 and Phm3, and significant changes are marked with one asterisk for P < 0.05. (C) Alignments of prophages to contig-level genomes.