| Literature DB >> 30526500 |
I S Rusinov1, A S Ershova1,2,3, A S Karyagina1,2,3, S A Spirin1,4,5,6, A V Alexeevski7,8,9.
Abstract
BACKGROUND: Restriction-modification (R-M) systems protect bacteria and archaea from attacks by bacteriophages and archaeal viruses. An R-M system specifically recognizes short sites in foreign DNA and cleaves it, while such sites in the host DNA are protected by methylation. Prokaryotic viruses have developed a number of strategies to overcome this host defense. The simplest anti-restriction strategy is the elimination of recognition sites in the viral genome: no sites, no DNA cleavage. Even a decrease of the number of recognition sites can help a virus to overcome this type of host defense. Recognition site avoidance has been a known anti-restriction strategy of prokaryotic viruses for decades. However, recognition site avoidance has not been systematically studied with the currently available sequence data. We analyzed the complete genomes of almost 4000 prokaryotic viruses with known host species and more than 17,000 restriction endonucleases with known specificities in terms of recognition site avoidance.Entities:
Keywords: Anti-restriction; Archaeal viruses; Bacteriophages; Compositional bias; Restriction-modification systems; Site avoidance
Mesh:
Substances:
Year: 2018 PMID: 30526500 PMCID: PMC6286503 DOI: 10.1186/s12864-018-5324-3
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Composition of Experimental dataset with respect to Types of R-M systems recognizing the site. The same site could be recognized by R-M systems of different Types, as shown in the last three rows
| Type of REase | Number of (site, genome) pairs | Number of different sites | Number of different genomes |
|---|---|---|---|
| I | 23,223 | 202 | 1571 |
| II | 34,912 | 186 | 2767 |
| IIG | 2584 | 65 | 1081 |
| IIM | 975 | 6 | 628 |
| III | 3795 | 43 | 1050 |
| IV | 597 | 2 | 597 |
| II and IIG | 425 | 5 | 399 |
| II and IIM | 167 | 3 | 167 |
| II and III | 26 | 1 | 26 |
Composition of the datasets used in this work
| Dataset | Number of different sites | Number of different genomes | Number of (site, genome) pairs |
|---|---|---|---|
| Experimental dataset | 494a | 2861b | 66,704 |
| Control dataset 1 | 899 | 3407 | 3,062,893 |
| Control dataset 2 | 899 | 4021 | 3,614,879 |
aR-M systems encoded in the genomes of the known phage hosts recognize 494 among all 899 known RS. bOnly 2861 phages among 3407 have known host species with available data on the encoded R-M systems
Percentages of RS with compositional bias value (CB) less or greater than 1
| Genome type | Experimental dataset | Control dataset 1 | Control dataset 2 | |||
|---|---|---|---|---|---|---|
| CB < 1 | CB > 1 | CB < 1 | CB > 1 | CB < 1 | CB > 1 | |
| dsDNAa,b | 85.3% (22678) | 14.6% (3883) | 69.3% (335436) | 30.6% (148037) | 53.5% (70490) | 46.2% (52069) |
| ssDNAa,b | 68.6% (1294) | 31.1% (586) | 60.8% (11868) | 38.8% (7577) | 51.8% (51409) | 47.7% (40962) |
| ssRNA | 55.9% (208) | 43.5% (162) | 53.1% (1072) | 46.6% (941) | 51.3% (160372) | 48.3% (151175) |
| dsRNA | 50.9% (27) | 49.1% (26) | 46.5% (532) | 53.1% (608) | 52.3% (6402) | 47.3% (5790) |
aThe experimental set significantly differs from Control set 1 (p-value < 0.01, Fisher’s exact test)
bThe experimental set significantly differs from Control set 2 (p-value < 0.01, Fisher’s exact test)
Fig. 1Percentages of sites with reduced numbers of occurrences in the viral genomes of different types. Blue circles are for the Experimental dataset, red squares are for Control dataset 1 (prokaryotic viral control), and gray diamonds are for Control dataset 2 (eukaryotic viral control)
Fig. 2Percentages of RS with reduced numbers of occurrences calculated for the different types of R-M systems. Designations are the same as in Fig. 1
Fig. 3Histograms of compositional bias values for different types of R-M systems. Dotted blue lines correspond to the subsets of the Experimental dataset, solid red lines are for Control dataset 1, and gray solid lines are for Control dataset 2
Fig. 4Fractions of Type II sites with different CB values. Bar height corresponds to the fraction of sites with the reduced number (CB < 1), the colored portion is for sites with CB < 0.8, and the hatched portion indicates the fraction with CB < 0.1. “Control” stands for the subset of Control dataset I with Type II sites and dsDNA genomes. a Comparison of temperate and non-temperate dsDNA viruses. b Comparison of the coliphages with or without the hydroxymethylase (HM) gene