| Literature DB >> 25196521 |
Abstract
Diversity-generating retroelements (DGRs) are a unique family of retroelements that confer selective advantages to their hosts by accelerating the evolution of target genes through a specialized, error-prone, reverse transcription process. First identified in a Bordetella phage (BPP-1), which mediates the phage tropism specificity by generating variability in an involved gene, DGRs were predicted to be present in a larger collection of viral and bacterial species. A minimal DGR system is comprised of a reverse transcriptase (RTase) gene, a template sequence (TR) and a variable region (VR) within a target gene. We developed a computational tool, DGRscan, to allow either de novo identification (based on the prediction of potential template-variable region pairs) or similarity-based searches of DGR systems using known template sequences as the reference. The application of DGRscan to the human microbiome project (HMP) datasets resulted in the identification of 271 non-redundant DGR systems, doubling the size of the collection of known DGR systems. We further identified a large number of putative target genes (651, which share no more than 90% sequence identity at the amino acid level) that are potentially under diversification by the DGR systems. Our study provides the first survey of the DGR systems in the human microbiome, showing that the DGR systems are frequently found in human-associated bacterial communities, although they are of low incidence in individual genomes. Our study also provides functional clues for a large number of genes (reverse transcriptases and target genes) that were previously annotated as proteins of unknown functions or nonspecific functions.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25196521 PMCID: PMC4159848 DOI: 10.3390/ijms150814234
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1DGRscan provides both de novo and homology-based predictions of DGR systems. DGRscan scans a genomic sequence for potential TR–VR pairs, repeats that contain substitutions mostly involving adenines in the TR region. Predictions of potential RTase genes can also be utilized to constrain the search of TR–VR pairs, so that a de novo search is applied only in the neighborhood of predicted RTase genes for speedup. If TR sequences are given, DGRscan can be guided to look for DGR systems that have homologous TRs (and VRs) similar to the reference TRs.
Identification of template regions (TRs) by DGRscan for the human virome dataset.
| Sequence (gi) | Previous Study [ | DGRscan | ||
|---|---|---|---|---|
| Length (bp) | Start–End | Length (bp) | Start–End | |
| gi|377805826|gb|JQ680351.1| | 101 | 22,583–22,683 | 112 | 22,869–22,980 |
| gi|377805855|gb|JQ680352.1| | 139 | 7998–8136 | 131 | 7998–8128 |
| gi|377805862|gb|JQ680353.1| | 89 | 14,041–14,129 | 128 | 14,041–14,168 |
| gi|377805758|gb|JQ680349.1| | 97 | 39,459–39,555 | 126 | 39,449–39,574 |
| gi|377805877|gb|JQ680354.1| | 128 | 46,739–46,866 | 128 | 46,739–46,866 |
| gi|377805785|gb|JQ680350.1| | 101 | 17,450–17,550 | 128 | 17,450–17,577 |
| gi|377805936|gb|JQ680355.1| | 101 | 3782–3882 | 94 | 3782–3875 |
| gi|377806015|gb|JQ680359.1| | 116 | 4435–4550 | 126 | 4435–4560 |
| gi|377805967|gb|JQ680357.1| | 101 | 3148–3248 | 158 | 3091–3248 |
| gi|377806003|gb|JQ680358.1| | 128 | 5598–5725 | 128 | 5598–5725 |
| gi|377805941|gb|JQ680356.1| | 128 | 38,849–38,976 | 128 | 38,849–38,976 |
| gi|377806060|gb|JQ680361.1| | 78 | 21,703–21,780 | 116 | 21,692–21,807 |
| gi|377806090|gb|JQ680362.1| | 121 | 4920–5040 | 122 | 4918–5039 |
| gi|377806097|gb|JQ680363.1| | 101 | 4306–4406 | 110 | 4306–4415 |
| gi|377806107|gb|JQ680364.1| | 121 | 11,625–11,745 | 122 | 11,620–11,741 |
| gi|377806133|gb|JQ680365.1| | 101 | 32,701–32,801 | 94 | 32,685–32,778 |
| gi|377806170|gb|JQ680366.1| | 101 | 8872–8972 | 113 | 8818–8930 |
| gi|377806186|gb|JQ680367.1| | 120 | 41,651–41,770 | 123 | 42,437–42,559 |
| gi|377806226|gb|JQ680368.1| | 121 | 27,017–27,137 | 126 | 27,012–27,137 |
| gi|377806251|gb|JQ680369.1| | 85 | 3464–3548 | 115 | 3464–3578 |
| gi|377806260|gb|JQ680370.1| | 115 | 25,676–25,790 | 115 | 25,676–25,790 |
| gi|377806297|gb|JQ680372.1| | 151 | 2108–2258 | 121 | 2108–2228 |
| gi|377806399|gb|JQ680376.1| | 44 | 3993–4036 | - | - |
| gi|377806374|gb|JQ680375.1| | 101 | 23,576–23,676 | 75 | 23,569–23,643 |
| gi|377806345|gb|JQ680374.1| | 101 | 4412–4512 | 95 | 4420–4514 |
| gi|377806301|gb|JQ680373.1| | 101 | 20,259–20,359 | 133 | 20,252–20,384 |
| gi|377806422|gb|JQ680377.1| | 121 | 13,118–13,238 | 105 | 13,134–13,238 |
Figure 2A phylogenetic tree of the reverse transcriptases identified from the reference genomes (shown in green) [12], human virome (blue) [16] and human microbiomes (this study; red). The RT identified from Bordetella phage BBP-1 is highlighted in the tree. The multiple alignment was done using MUSCLE [17]. The FastTree program [18] with default parameters was used to reconstruct the neighbor-joining tree, and the tree was visualized using the Archaeopteryx tree viewer [19].
Figure 3Diagrams of two new DGR systems identified from the human microbiome project (HMP) datasets. Note that the RT genes encoded in these contigs were annotated with unknown functions in the human microbiome project (HMP) annotations.
Breakdown of the functions (PFAM domains) assigned to the putative variable proteins identified from the HMP datasets.
| Pfam Domain | Description | Number of Proteins | Example |
|---|---|---|---|
| DUF1566 | Protein of unknown function; similar to Fib_succ_major | 47 | SRS015190.44356 |
| FGE-sulfatase | Sulfatase-modifying factor enzyme 1 | 42 | SRS052027.8709 |
| Fib_succ_major | 9 | SRS014459.31157 | |
| DUF3988 | Found by clustering human gut metagenomic sequences | 9 | SRS077730.37228 |
| Big_2 | Bacterial Ig-like domain (group 2) | 9 | SRS018351.18527 |
| DUF3751 | Phage tail-collar fiber protein | 8 | SRS013687.79027 |
| CotH | Members of this family include the spore coat protein H | 6 | SRS016989.185 |
| Big_3 | Bacterial Ig-like domain (group 3) | 6 | SRS018351.18527 |
| CarboxypepD_reg | Carboxypeptidase regulatory-like domain | 5 | SRS015663.100342 |
Figure 4Representative variable proteins identified from the HMP datasets. (A) SRS015190.44356 with a DUF1566 domain; (B) Superimposition between 2Y3C (TvpA; shown in red) and 4EPS (green); (C) SRS014459.31157, a predicted lipoprotein with Mfa2 (shown in bisque) and Fib_succ_major (blue) domains; (D) The domain architecture of SRS018351.18527 with bacterial immunoglobin-like domains; and (E) the alignment between the VR region in this gene and the matching TR, which contains 15 substitutions (highlighted in red), all involving adenines in the template region.