Literature DB >> 23275546

Highly similar structural frames link the template tunnel and NTP entry tunnel to the exterior surface in RNA-dependent RNA polymerases.

Dorothy M Lang¹, A T Zemla, C L Ecale Zhou.

Abstract

RNA-dependent RNA polymerase (RdRp) is essential to viral replication and is therefore one of the primary targets of countermeasures against these dangerous infectious agents. Development of broad-spectrum therapeutics targeting polymerases has been hampered by the extreme sequence variability of these sequences. RdRps range in length from 400-800 residues, yet contain only ∼20 residues that are conserved in most species. In this study, we made structure-based comparisons that are independent of sequence composition using a recently developed algorithm. We identified residue-to-residue correspondences of multiple protein structures and created (two-dimensional) structure-based alignment maps of 37 polymerase structures that provide both sequence and structure details. Using these maps, we determined that ∼75% of each polymerase species consists of seven protein segments, each of which has high structural similarity to segments in other species, though they are widely divergent in sequence composition and order. We define each of these segments as a 'homomorph', and each includes (though most are much larger than) the well-known conserved polymerase motifs. All homomorphs contact the template tunnel or nucleoside triphosphate (NTP) entry tunnel and the exterior of the protein, suggesting they constitute a structural and functional skeleton common among the polymerases.

Entities: Chemical Disease Gene Mutation Species

Mesh：

Substances：
Nucleotides
RNA Replicase

Year: 2012 PMID： 23275546 PMCID： PMC3561941 DOI： 10.1093/nar/gks1251

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

The polymerase protein family has been studied extensively for >40 years. This interest has been motivated by their unique function—to replicate all forms of life, and confounded by their sequence diversity. As more tertiary structures of polymerase were solved, it became apparent that widely diverse sequences form highly similar structures. There has not, until recently, been a time-effective computational method to make detailed comparisons of these observations. The objective of this study was to clarify the relationship between structure and sequence in a group of RNA-dependent RNA polymerases (RdRps) that replicate many of the viruses that represent significant threats to life throughout the world. We selected well-studied species in order to maximize the amount of experimental data that could be used to evaluate the association of functional residues and structure (Table 1). We used the StralSV algorithm (1) to perform structure comparisons between all of the selected species. We created maps of residue-to-residue (R2R) correspondence from which we determined the boundaries of structurally similar segments—which we named ‘homomorphs’. In contrast to the relatively short lengths of previously described motifs, we found that most homomorphs are long, and each provides a structural connection between the template tunnel or NTP entry tunnel and the exterior of the protein.

Table 1.

Viral species and PDB structures used as queries in this study

Queries	Abbreviations	PDB	Length of structure
Picornaviridae, RdRp, +ssRNA
Poliovirus	PV	1RA6 (3)	461
Poliovirus (open)	PV	3OL6 (9)	471
Poliovirus (closed)	PV	3OL7 (9)	471
Coxsackie virus	COXV	3DDK (11)	462
Human rhinovirus	HRV	1XR6 (12)	460
Human rhinovirus	HRV	1XR7 (12)	460
Foot and mouth disease virus	FMDV	1U09 (2)	476
Caliciviridae, RdRp, +ssRNA
Norwalk virus	NV	1SH0 (13)	510
Rabbit hemorrhagic disease virus	RHDV	1KHV (14)	516
Rabbit hemorrhagic disease virus	RHDV	1KHW (14)	516
Sapporo virus	SAPV	2CKW (15)	515
Flaviviridae, RdRp, +ssRNA
Hepatitis C virus	HCV	1NB4 (16)	570
Hepatitis C virus	HCV	1NB7 (16)	570
Bovine viral diarrhea virus	BVDV	1S48 (17)	609
West Nile (Kunjin) virus	WNV	2HCS (18)	595
Dengue virus	DENV	2J7U (19)	635
Cystoviridae, RdRp, dsRNA
Bacteriophage Phi6	PHI6	1HHS (20)	664
Bacteriophage Phi6	PHI6	1HI0 (20)	664
Reoviridae, RdRp, dsRNA
Reovirus lambda 3	REOV	1MUK (21)	1267
Reovirus lambda 3	REOV	1N35 (21)	1267
Rotavirus	ROTAV	2R7X (22)	1095
Birnaviridae, RdRp, dsRNA
Infectious bursal disease, birnavirus	IBDV	2PGG (5)	774
Infectious pancreatic necrosis	IPNV	2YI8 (23)	799
Infectious pancreatic necrosis	IPNV	2YI9 (23)	799
Infectious pancreatic necrosis	IPNV	2YIA (23)	799
RdDp
HIV1	HIV1	1RT1 (24)	560
HIV1 (closed)	HIV1	1RTD (25)	554
HIV1 (open)	HIV1	2HMI (26)	558
Tribolium castaneum	TERT	3DU6 (27)	596
DdRp
Bacteriophage T7 (closed)	T7 RNAP	1S77 (28)	883
Bacteriophage T7 (closed)	T7 RNAP	2AJQ (29)	704
Bacteriophage T7 (open)	T7 RNAP	1MSW (30)	883
Bacteriophage N4	N4	3C2P (31)	1117
DdDp
Thermus aquaticus (open)	TAQ	2KTQ (32)	538
Thermus aquaticus (closed)	TAQ	3KTQ (32)	540
Thermus aquaticus (open)	TAQ	4KTQ (32)	539
Bacteriophage T7	T7 DNAP	1T7P (33)	698

Queries of both open and closed structures were used to assess the impact of this feature on R2R correspondence. Enterobacteria bacteriophage T7 [NCBI:NC_001604], which has a genome of 39 937 bp and 60 proteins and infects Escherichia coli, contains both RNA polymerase (T7 RNAP) and DNA polymerase (T7 DNAP). The T7 RNAP, typified by PDB:1S77 replicates multiple sequences of RNA initiated from transcription sites throughout the genome. The T7 DNAP, typified by PDB:1T7P, ‘ … fills DNA gaps that arise during DNA repair, recombination and replication…’—a Family A polymerase (NCBI:NC_001604).

Viral species and PDB structures used as queries in this study Queries of both open and closed structures were used to assess the impact of this feature on R2R correspondence. Enterobacteria bacteriophage T7 [NCBI:NC_001604], which has a genome of 39 937 bp and 60 proteins and infects Escherichia coli, contains both RNA polymerase (T7 RNAP) and DNA polymerase (T7 DNAP). The T7 RNAP, typified by PDB:1S77 replicates multiple sequences of RNA initiated from transcription sites throughout the genome. The T7 DNAP, typified by PDB:1T7P, ‘ … fills DNA gaps that arise during DNA repair, recombination and replication…’—a Family A polymerase (NCBI:NC_001604). The tertiary structure of the replicative unit of most RdRps is highly conserved (2). It resembles that of a right-handed palm, with finger-like folds curved inward to form a tunnel that encircles the template that is being processed (3). Most single-unit polymerases are 400–800 residues in length. Early polymerase studies found that ∼22 residues are highly conserved in all polymerases, and in most species they are in the same sequential order (4). Some are clustered, with two to four highly conserved residues within a segment ∼10 residues in length. The sequence segment that includes each of the highly conserved residues or clusters has been described as a motif. The motifs are arranged in most species in the order: G-F1-F2-F3-A-B-C-D-E. The birnaviruses (IBDV and IPNV) differ from this scheme due to a transversion involving the C Motif (C-A-B) (5). The references for each of the motifs and species that have been studied are listed in Table 2; these references were selected because they included either alignments of one or more motifs with several species, or alignments for a particular motif not found elsewhere. Apart from these conserved motifs, the RdRp sequences are highly variable. An extensive study of picornaviruses by Koonin et al.(6) illustrates this variability. The basis of Koonin et al.’s study was an alignment of 64 species using the algorithm multiple sequence comparison by log expectation (MUSCLE) (7) and a manual adjustment. The alignment included four species that are also in our sample group, and these had a total of 32 conserved residues within sequences ∼550 residues in length.

Table 2.

Summary of reference publications that identify multiple motifs in RdRps included in the query set

Conserved SxGresidues		KxE; KxR; Q,K DYxxxD		SGxxxTxxxN	GDD	GLTxxxxDK	LKR, E
Motifs	Motif G	Motif F	Motif A	Motif B	Motif C	Motif D	Motif E
PV	Xu et al. (34)	Ferrer-Orta et al. (2)	Poch et al. (4)	Poch et al. (4)	Poch et al. (4)	Poch et al. (4)	Ferrer-Orta et al. (2)
HRV	Pan et al. (5)	Pan et al. (5)	Pan et al. (5)	Love et al. (12)	Love et al. (12)	Love et al. (12)	Love et al. (12)
FMDV	Gorbalenya et al. (35)	Ferrer-Orta et al. (2)	Poch et al. (4)	Ferrer-Orta et al. (2)	Poch et al. (4)	Ferrer-Orta et al. (2)	Ferrer-Orta et al. (2)
NV	Pan et al. (5)	Pan et al. (5)	Ferrer-Orta et al. (2)	Ferrer-Orta et al. (2)	Ferrer-Orta et al. (2)	Pan et al. (5)	Ferrer-Orta et al. (2)
RHDV	Pan et al. (5)	Pan et al. (5)	Ferrer-Orta et al. (2)	Love et al. (12)	Love et al. (12)	Love et al. (12)	Ferrer-Orta et al. (2)
HCV	Pan et al. (5)	Choi et al. (17)	Bressanelli (36)	Pan et al. (5)	Bressanelli et al. (36)	Bressanelli et al. (36)	Bressanelli et al. (36)
BVDV	Pan et al. (5)	Choi et al. (17)	Pan et al. (5)	Choi et al. (17)	Choi et al. (17)	Choi et al. (17)	Choi et al. (17)
HIV	Pan et al. (5)	Pan et al. (5)	Pan et al. (5)	Pan et al. (5)	Pan et al. (5)	Pan et al. (5)
PHI6	Pan et al. (5)	Butcher et al. (20)	Butcher et al. (20)	Pan et al. (5)	Pan et al. (5)	Pan et al. (5)	Butcher et al. (20)
REOV	Pan et al. (5)	Pan et al. (5)	Pan et al. (5)	Pan et al. (5)	Pan et al. (5)	Pan et al. (5)

Summary of reference publications that identify multiple motifs in RdRps included in the query set The analysis that we present in the following pages demonstrates that highly similar structures can be formed by very different sequences. Structure comparison, rather than sequence comparison, enabled us to readily recognize functionally significant segments of similarity and difference between sequences.

MATERIALS AND METHODS

Sample selection and processing

In total, 18 well-studied viral species with solved RNA polymerase structures and four viral species with solved DNA polymerase structures were selected for analysis. We used the StralSV algorithm (http://proteinmodel.org/), described in detail previously (1), to perform the analyses. StralSV compares the R2R (structural) correspondence of each sequence in a set of reference sequences to a specified query sequence, beginning at the start of the sequence and continuing to the end, by evaluating successive overlapping segments of a user-selected length. Each of the selected structures was used as a query to all structures available in the Protein Data Bank (PDB release 2011_01_25; number of chains 176 365) (8). The results were filtered for structural segments of at least 55% LGA_S structure similarity (1) to at least one query segment of 90 amino acids in length (size cutoff for the structural context) from which R2R correspondences were extracted from local tightly superimposed spans: continuous segments of the minimum length of five amino acids. These parameters together contributed to the identification of common regions of structure similarity, which were used to distinguish regions of conservation (structure matches) from regions in which structure deviates (non-matches). The StralSV comparison of each initial query to all structures in PDB resulted in the identification of a final set of 37 PDB structures that were used as a reference polymerase structure set in this study. One representative of each species within the set of 37 PDB structures was used to create an all-against-all structure comparison using StralSV. The species and PDB identities of this reference set are summarized in Table 1.

Creation of structure maps

The output from the all-against-all structure comparisons was parsed to extract R2R correspondences for each query/template pair. In each comparison the full query sequence was represented. At some positions in the template sequences, gaps occurred either due to structure deviation exceeding the alignment cutoff (5 Å) or because the template contained additional residues (e.g. a loop) without correspondence in the query structure. A structure map was created for each set of R2R correspondences derived from each query/template-set StralSV comparison by combining the data for each binary alignment in an Excel spreadsheet. In this article, we report the structure map using poliovirus RdRp as the primary query (Figures 2–8), although in most cases we include structure maps with other species as query (Figures 2 and 4–8). Alternative query species were used when structure similarity of some of the templates was not identified with poliovirus as query (e.g. Motif G in WNV and DENV were identified based on DENV as the query). The query that contributes to non-poliovirus matches is indicated on each alignment (‘q’ following the species abbreviation).

Figure 2.

(A) The residues comprising Motif G as described by Pan et al. (5) and Gorbalenya et al. (35) are indicated by a gold background, and those that are highly conserved by a yellow background. The homomorph relative to PV is 60 residues in length and includes the motif and segments on both sides of the motif (light blue background). Conserved residues outside of Motif G have a pink background. A small ‘x’ indicates no R2R structural correspondence at that position. The upper part of the alignment is based on a PV query. The middle section of the alignment, which includes WN and DENV is based on a DENV query. Motif G as described by previous researchers is shown at the lower section of the figure. Both Pan et al. (5) and Gorbalenya et al. (35) identify Motif G in IBDV and IPNV, although Pan et al.’s description includes more residues. In PHI6, IBDV and IPDV, only a short segment of Motif G was found to have R2R correspondence with the RdRps, and only using a NV query; these segments are similar to those describe by Pan et al. (5), but slightly shorter. (B) All the structural diagrams in this article are illustrated using poliovirus polymerase (PDB:1RA6). The N-terminal segment of hmG is shown in blue, Motif G in gold and the C-terminal segment of hmG in brown. (C) This figure, which is rotated ∼90° from Figure 2B, shows the surface exposure of hmG and the residues within the motif that line the wall of the tunnel.

The structure maps were used as the basis for the structure alignments described in this article. On all structure maps, we identified the Motifs A–F as described by Gong and Peersen (9) by coloring the background of the columns matching the residues of the motif orange and the background of the columns matching the highly conserved residues yellow. A similar coloring scheme was used for Motif G, except that it depicts the residues identified by Pan et al. (5) because Motif G was not specified in the Gong and Peersen study (9). On all structure maps, we colored the residues of the picornaviruses blue; the caliciviruses green; HCV and BVDV (flaviviruses) black; WN and DENV (flaviviruses) red; PHI6 black; REOV brown; ROTAV, IBDV, IPNV black; HIV purple; TERT, T7RNAP, N4 black; TAQ turquoise and T7DNAP black. The segment of conserved structure adjacent to each motif was determined from the StralSV maps. For each of the queries and each of the motifs, the location at which structure conservation of most species became discontinuous was noted. We defined the boundaries of a homomorph as the position at which the structural segment shared by all representatives in a set became discontinuous in more than two species. In all structure maps, the conserved segments that we identified based on StralSV R2R correspondences are colored light blue. We defined the homomorph of each motif as the segment consisting of the conserved motif plus the adjacent structurally conserved segments. The length of each homomorph varied somewhat depending on the query. For each query species, the start and end of the homomorphic segment of each motif were recorded, and a 20 × 20 matrix for each motif was generated to compile the data from each query (data not shown). This matrix was used to identify the minimum start and maximum end of each homomorph, and these values are summarized in Supplementary Table S1. These values are plotted in Figure 1, which illustrates the maximal expanse of the homomorphic segments that include each of the polymerase motifs. All of the tertiary structures were illustrated using the Cn3D program (10).

Figure 1.

The distributions and lengths of the homomorphs and conserved motifs of viral polymerases in this study. The homomorphs of each species are illustrated from the start of Motif G, sequence position = 0, through the end of Motif E. The homomorphs and motifs are colored as follows: G (green), F1–F2–F3 (maroon–gray–black), A (blue), B (gold), C (red), D (purple), E (aqua). The darker bars show the sequence positions of the homomorphs, and the lighter bars show the sequence positions of the motifs as described by Gong and Peersen (9) or Pan et al. (5). The number of residues from the start of the polymerase structure to the start of the first homomorph is identified for each species at the left of the chart. The PDB structures, and consequently, sequence position numbers for KUNJ, DENV and TAQ, which are used throughout this article, do not begin at the polymerase; therefore, for this figure, the distance from the start of the polymerase is shown after the slash. For species lacking Motif G, the first identified homomorph is indicated at the left of the start position. The length of the polymerase of each species is listed at the right of the chart.

RESULTS

Overview

Structural examination of the sequence motif regions yielded extended regions of structural conservation. We named each of these regions a ‘homomorph’, defined as a sequence segment that shares a highly similar tertiary structure with other species, independent of the sequence composition. We found that most of the homomorphs were at least twice as long as the corresponding sequence motif. The extent of this expansion is illustrated in Figure 1. The length of each homomorph was determined separately for each species, using a single structure for each species as the query in a StralSV analysis. The identity of the start and end of each homomorph depends on the structural similarity to a given query. Therefore, there is some minor query-specific variability of the location of the ends of homomorphs that can be observed in Figure 1. Within the homomorphs, non-matching residues can be used to identify minor differences between species. In most single-stranded RdRp (ss-RdRp) species (PV, COXS, HRV, FMDV, NV, RHDV, SAPV, HCV, BVDV, WN and DENV), the homomorphs of Motif G are the largest (median of 53 residues), followed by A (49), B (46), E (37), F3 (28), D (23), C (17), F1 (10) and F2 (8). In double-stranded RdRps (ds-RdRps) (PHI6, REOV, ROTAV, IBDV and IPNV), the homomorphs of Motif G are relatively short (median of 12 residues). Several other homomorphs are also shorter in ds-RdRps: B (48), A (41), E (31), F3 (28), D (16), C (12), F1 (11) and F2 (2). The lengths and occurrences of homomorphs of polymerases that are associated with DNA (HIV, TERT, TAQ, T7 DNAP, T7 RNAP and N4) are variable and will be discussed in the sections describing each motif. The homomorphs of all species are similarly distributed over the length of the polymerase (Figure 1). The ss-RdRps (PV, COXS, HRV, FMDV, NV, RHDV, SAPV, HCV, BVDV, KUNJ and DENV) are most similar to each other. The spacing between homomorphs is more variable in the ds-RdRps (PHI6, REOV, ROTAV, IBDV and IPNV), and in general larger than in the ss-RdRps. The homomorph of Motif C (hmC) is identified in the birnaviruses despite a sequence inversion that places it before Motif A (5). Relatively large segments between homomorphs occur in PHI6 between F1 and F3, and in KUNJ, DENV and PHI6 between B and C. The spacing between motifs is notably reduced in HIV and TERT (RdDps). In birnaviruses (IBDV, IPNV), the homomorphs of C and A are only three residues apart, and the distance between the homomorphs of Motif F3 and C is greater than the typical F3-B distance. Most homomorphs are separated from each other by a segment that contains a turn (secondary structure), or there is a turn at the beginning or end of the homomorph. Within all RdRps, all motifs occur within a length of 375 residues. In T7-DdRp (T7 RNAP) and N4, the motifs are spread out over approximately 600 residues. The amount of R2R correspondence for most of the RdRps, determined from the minimum and maximum values of all homomorphs, is ∼75% over the span from Motif G through Motif E (Supplementary Table S1).

Homomorph of Motif G results

The structurally aligned sequences that comprise hmG are summarized in Figure 2A. (A) The residues comprising Motif G as described by Pan et al. (5) and Gorbalenya et al. (35) are indicated by a gold background, and those that are highly conserved by a yellow background. The homomorph relative to PV is 60 residues in length and includes the motif and segments on both sides of the motif (light blue background). Conserved residues outside of Motif G have a pink background. A small ‘x’ indicates no R2R structural correspondence at that position. The upper part of the alignment is based on a PV query. The middle section of the alignment, which includes WN and DENV is based on a DENV query. Motif G as described by previous researchers is shown at the lower section of the figure. Both Pan et al. (5) and Gorbalenya et al. (35) identify Motif G in IBDV and IPNV, although Pan et al.’s description includes more residues. In PHI6, IBDV and IPDV, only a short segment of Motif G was found to have R2R correspondence with the RdRps, and only using a NV query; these segments are similar to those describe by Pan et al. (5), but slightly shorter. (B) All the structural diagrams in this article are illustrated using poliovirus polymerase (PDB:1RA6). The N-terminal segment of hmG is shown in blue, Motif G in gold and the C-terminal segment of hmG in brown. (C) This figure, which is rotated ∼90° from Figure 2B, shows the surface exposure of hmG and the residues within the motif that line the wall of the tunnel. The R2R correspondences of WN and DENV could not be evaluated for the Motif G region (approximately PV 101–121), as the structural configuration of the segments of these viruses that would be expected to match the homomorph of Motif G (hmG) segment had not been determined. Within the homomorph, most of the ss-RdRps were highly similar (Figure 2A, top). BVDV is similar to the other ss-RdRps in the N-terminal segment, but no longer matches them at the C-terminal segment. Only Motif G, and not a homomorph, was identified in PHI6, IBDV and IPNV. In the region of Motif G, StralSV did not identify R2R correspondences between any of the RNA polymerases and REOV, ROTAV, HIV, TERT or DNA-dependent polymerases (TAQ, T7 DNAP, T7 RNAP and N4). There were structural discontinuities within the homomorph (noted by x in Figure 2A) and similar discontinuities within the motif. These minor discontinuities identify species-specific differences within a segment that is otherwise highly continuous in several species. For example, FMDV has 3 AA between S87 and T90 (PV numbering) and therefore does not match the structure of PV 88-LD-89, which has only 2 AA. In contrast, WN and DENV have the same number of residues for the gaps from A388–R392 and G385–R389, respectively, but these segments were not structurally aligned by StralSV within the parameters used in this study. The segment PV-Y102 to A109 is a β-hairpin unique to picornavirus RdRps(11). The numbering on NV and HCV clarifies that these regions are continuous in these species (and the other caliciviruses, RHDV and SAPV, though not numbered). Figure 2B and C illustrates the tertiary structure of the homomorph using a poliovirus structure (PDB:1RA6). Most of the N-terminal segment is a single helix that extends over nearly half of the surface of the protein. Both ends of the homomorph terminate at the exterior surface of the protein. The distance between the homomorphs of Motifs G and F1 was 20–37 residues in the ss-RdRps (in all species where both were present) and longer in the ds-RdRps (median 47 residues) (Figure 1).

Homomorph of Motif F results

Three components of Motif F have been recognized: F1, F2 and F3 (2,5). In some species there are sequence segments between these motifs. In all the species in our sample set except PHI6, in those species that have R2R correspondence within Motif F, the three F motifs are continuous; therefore, we have combined them, and the adjacent structurally aligned segments, into a single homomorph. The structurally aligned sequences that comprised homomorph of Motif F (hmF) for RdRps and HIV are summarized in Figure 3A. HmF extended five residues upstream from the N-terminal edge of Motif F1 [as defined by Gong and Peersen (9)] and ∼20 residues downstream from Motif F3 [as defined by Gong and Peersen (9)]. HmF1 was found in all RdRp species except WN and DENV; it was not possible to evaluate R2R correspondence for this segment of WN and DENV as the structure of this segment has not been resolved. Motifs F1 and F2 are always continuous if F2 is present, and Motif F2 is present in most species. Motif F2 is represented by a single residue in PHI6, REOV and ROTAV (dsRNA), two residues in HIV (RdDp), 15 residues in BVDV and 10 residues in HCV (two of which, in HCV, are structurally aligned to the other RdRps). Motif F2 varied in length from 6 to 15 residues. In PHI6, there was a 61-residue segment between F1 and F3. HmF3 was present in all RdRp species.

Figure 3.

(A) The residues of Motifs F1 and F2, according to Gong and Peersen (9), are indicated by an orange background and the highly conserved residues by a yellow background. HmF includes the residues with light blue background, and Motifs F1, F2 and F3. Lower case residues indicate that they are present but do not match the structure. (B) HmF1 (light blue), illustrated using the poliovirus structure 1RA6, begins about five amino acids upstream from Motif F1 (dark blue) and ends at hmF3 (dark brown), about 20 amino acids downstream from Motif F3 (orange). Highly conserved residues from Motifs F1, F2 and F3 line the template tunnel (empty space within the crystal near the intersection of F1–F2–F3). Motif F2, which varies in length and composition between species, is at the exterior of the protein. (C) Most of hmF is folded back on itself, and Motif F2 (gray) is at the apex of this fold at the exterior surface of the protein. (D) The terminal residues of hmF (blue) are located at the surface of the protein. In addition to the N-terminal (blue) and the C-terminal (brown) of hmF, several residues of the C-terminal segment (brown) 186-AMRMA-190 are also at the surface of the protein. (E) The length of Motif F2 (black) is highly variable in different species. Motif F3 (gold) is the start of a highly conserved segment of the homomorph that transects the protein and terminates at a nearly opposite exterior surface. Figure 3B and C illustrates the tertiary position of hmF. Most of the structure is hairpin-like, with some residues of Motif F2 at the apex, which is located at the exterior surface of the protein. HmF1 and hmF3 are approximately parallel for several residues. HmF3 then independently extends to the surface of the protein approximately opposite the Motif F2 site. Figure 3D shows the N- and C-terminal residues and some residues of the C-segment of hmF3 at the surface of the protein. Figure 3E shows the position of Motif F2 relative to the template tunnel. The segments between hmF and hmA are 8–17 residues in ss-RdRps and PHI6, 28–30 residues in ds-RdRps (REOV, ROTAV, IBDV and IPNV), 30–40 residues in RdDps (HIV and TERT) and DdRps (T7 RNAP and N4) and 102–131 in DdDps (TAQ and T7 DNAP) (Figure 1).

Homomorph of Motif A results

The structurally aligned sequences that comprise homomorph of Motif A (hmA) are summarized in Figure 4A. The homomorph (relative to PV) extends 15 amino acids from each flank of Motif A, plus the length of a species-specific loop at the N-terminal segment of the motif. All ss-RdRps (PV, COXS, HRV, FMDV, NV, RHDV, SAPV, HCV, BVDV, WN and DENV) are well aligned in the N-terminal segment. PHI6, ROTAV, IBDV and IPNV (ds-RdRps) have fewer aligned residues. REOV (ds-RdRp) and HIV and TERT (RdDps) do not have R2R correspondence with the ss-RdRps. The DdRps (T7 RNAP and N4) and DdDps (TAQ and T7DNAP) share a homomorphic structure within the N-terminal segment, but it is substantially different from the RdRp structure and therefore is not included in the homomorph or Figure 4A. Within the motif, HIV corresponds only to NV and SAPV (only found using an HIV query), indicating a significant structural difference from other species; HIV also lacks R2R correspondence beyond the motif and therefore is not included in the homomorph. At the C-terminal segment of the homomorph, most species in the sample set, except HIV, have a homologous structure. At some sequence positions within hmA, a particular residue composition is conserved throughout a viral family (e.g. picornavirus), and a different residue composition is conserved in another viral family at the same position. This within-family sequence conservation (≥75%) occurs at the following sequence positions (shown in Figure 4A, PV numbering): 214, 234, 237–240, 245 and 249.

Figure 4.

(A) HmA (light blue background) extends about 15 residues from each side of Motif A as defined by Gong and Peersen (9) (orange background), and includes highly conserved residues (yellow background). Non-aligning residues (compared with the query) are indicated by an ‘x’. In cells with a light blue background filled with a number, the number is the sequence position of the adjacent matches for each species; numbers in the white column between them summarize the length of sequence that the non-matched sequence represents in each species. In this segment there are more residues in each species than between the corresponding residues in PV, indicating that this region is a loop that is absent in PV, and the loop length varies by species. At the left of the alignment (209–214, uncolored), there is a structure common to several species, but too few to qualify the region as part of the homomorph. (B) In this figure of poliovirus (PDB:1RA6), the N-terminal segment of the homomorph is blue and the C-terminal segment is brown. The terminal residues of HmA are at the exterior surface of the protein (PDB:1RA6). Motif A is centered within the homomorph at the wall of the template tunnel. (C) The terminal residues of the homomorph and the helix adjacent to each are constituents of the protein surface. (D) In PV, an insertion (red) at the C-terminal edge of the motif is lethal: L241-i-S242 (42). A species-specific loop (green) affects the catalytic rate (in PV) (37). Within the N-terminal side of the homomorph, at the edge of the motif (PV 226–227), there is a minor discontinuity in structure homology (Figure 4A). The distance between the discontinuities in each species is provided in a column within the figure (white) that indicates the entire span over which discontinuity exists for each species. However, the loop represented by this discontinuity varies in length by only one to four amino acids. Figure 4B illustrates the tertiary structure of the hmA. Each end of the homomorph is at the exterior surface of the protein (Figure 4C), and its center—the conserved Motif A—is at the surface of the template tunnel. The overall configuration of the homomorph is spring-like (Figure 4D). The species-specific loop within the homomorph is located at the exterior of the protein. The sequence segment between the homomorphs of Motif A and Motif B (hmB) is ∼4–20 residues in the RdRps, and mostly greater than 20 residues in the DNA-dependent polymerases. It is relatively long in REOV (41), T7 RNAP (81) and N4 (98). In the birnaviruses (IBDV, IPNV), Motif C precedes Motif A in sequence; this sequence inversion is described in a later section of this article, which describes Motif C.

HmB results

Motif B is a component of the largest homomorph identified in the RdRps. The homomorph begins 21 residues upstream from Motif B and extends 10 residues downstream. The motif is 15 residues long. The size of the homomorph is consistent in most species. The structurally aligned sequences that comprise the homomorph are shown in the top section of Figure 5A. They include all the RdRps in the sample set plus TERT (RdDp). Each of these species matched a poliovirus query, indicating there is greater structural similarity than in other homomorphs and motifs. The N-terminal segment of the homomorph contains some discontinuities that are resolved by using R2R matches for alternative queries (Figure 5A, lower section). The C-terminal segment of the homomorph is well represented in all RNA polymerases and TERT. No R2R correspondence was found between the residues comprising hmB in the RNA polymerases and residues in the DNA-dependent polymerases (T7 RNAP, N4, TAQ and T7 DNAP).

Figure 5.

(A) A StralSV alignment based on the PV query is located at the top section of this figure. The middle section compares some of the R2R matches using other queries, to those found by poliovirus. Residues common to both alignments indicate extremely close matches, and those differently represented are closer to the respective query. The species TAQ-N4-T7 DNAP-T7 RNAP align with each other over about 20 residues, but do not align structurally with the other species. (B) HmB [illustrated using poliovirus (PDB:1RA6)] is the largest homomorph in the RdRps. The N-terminal segment of HmB (blue) begins at the exterior surface of the protein, folds back on itself to form a classical β-hairpin (at the apex, PV 275-YKN-278), and then continues to Motif B (gold), which is located at the surface of the template tunnel. The C-terminal segment of the homomorph (brown) continues as a single chain to the exterior surface of the protein. (C) The N-terminal segment of hmB is at the surface of the protein. (D) Each of the terminal residues of hmB, and the apex of the N-terminal loop are on the exterior surface of the protein. The lower section of Figure 5A illustrates the dependence of the R2R correspondence on the query sequence. These differences make it possible to identify fine details between structures. Our definition of each of the homomorphs, however, is based on the inclusion of all R2R alignments using all queries in the sample set. The position of the hmB within the tertiary structure of PV is illustrated in Figure 5B. The N-terminal residue is at the exterior surface of the protein. The N-terminal segment is a classical β-hairpin protein structure that is folded back on itself and is almost entirely exposed on a surface nearly perpendicular to the face of the protein that contains the N-terminal residue (Figure 5C). The base of the loop transitions to Motif B at the template tunnel. The C-terminal side of the homomorph extends from the tunnel to the exterior surface of the protein (Figure 5D). The distance between the homomorphs of Motifs B and C (hmC) (Figure 1) is <6–17 in all RdRps except KUNJ and DENV, which are 36 and 35 residues, respectively. In the DNA-dependent polymerases, this distance is between 98 (TAQ) and 258 (N4) residues. In IBDV and IPNV, the segments between the homomorphs of Motifs B and D are 16 and 11 residues, respectively.

HmC results

The structurally aligned sequences that comprise hmC are shown in Figure 6A. Motif C is the only RdRp motif that is not a component of a larger homomorphic structure. The segments immediately adjacent to both flanks of Motif C do not even cluster into subgroups. Motif C is short—12 residues in most RdRps and folds sharply back on itself (Figure 6B). The highly conserved residues (labeled Motif C) are at the surface of the template tunnel and both the N-terminal and C-terminal residues are at the exterior surface of the protein (Figure 6C).

Figure 6.

(A) The high number of species that align to PV indicates that the structure of Motif C is highly conserved. Although A T7 DNAP query was required to identify the matches for the N4-TAQ-T7 RNAP species, it was achievable. HmC is the only homomorph for which there is R2R correspondence in all species of the study group. (B) Motif C (gold) is the only motif in the RdRps that is not a component of a larger structure. Motif C [illustrated using poliovirus (PDB:1RA6)] is tightly folded upon itself in a manner that places the highly conserved residues (yellow) at the tunnel wall, whereas the N-terminal segment of the motif (blue) and C-terminal segment of the motif (brown) are parallel to each other and penetrate the protein. (C) The terminal residues of both the N- and C-terminal segments are at the surface of the protein. In the birnaviruses IBDV and IPNV, there is a sequence inversion that results in the relocation of Motif C to a position immediately preceding Motif A. Figure 7A shows an alignment that documents this inversion. The top and bottom segments of Figure 7A illustrate that all species are well aligned upstream of Motif C (IPNV positions 365–372) and within Motif A (IPNV positions 399–409). RHDV, SAPV and BVDV are not well aligned within Motif C using the IPNV query, and therefore are missing from the middle section of Figure 7A (IPNV positions 382–393). The numbering of IPNV and IBDV is sequential, indicating that Motif C precedes Motif A in these species. The numberings of NV and HCV indicate there are R2R matches with IPNV at Motif C, but that over this segment the match is not in sequential order. Using a PV query, however, all of these species have R2R matches over this segment (shown in Figure 6A). The IPNV query indicates that the structure of Motif C of the birnaviruses more closely matches NV and HCV than the others in the sample set. The difference in linear order that results from the sequence inversion is compensated by a modified structure that maintains the motifs within a tertiary position that is similar to all other RdRps (Figure 7B and C).

Figure 7.

(A) The sequence of the IPNV query is listed vertically in this table, with the sequence position number of each IPNV residue at the left in the first column. The R2R corresponding residues for each species are listed in the columns to the right. In birnaviruses (IBDV and IPNV), unlike other RdRps, Motif C precedes Motif A. The top segment of this figure (IPNV positions 365–372) shows that all species (that align to IPNV) are well aligned prior to Motif C. IPNV, NV, HCV and IBDV are aligned in Motif C (IPNV positions 382–393), but there are no R2R matches for RHDV, SAPV and BVDV. In Motif A, at the bottom segment of the figure (IPNV segments 399–409) all species are again, well aligned (though fewer residues are aligned in RHDV). (B) In poliovirus (illustrated here with PDB:1RA6), the sequence order of homomorphs is: A (blue), B (gold) and C (red). (C) In IBDV, the difference in sequence order is compensated by a modified structure that maintains the motifs within a tertiary position that is similar to all other RdRps. The sequence order of the IBDV homomorphs is: C (red), A (blue) and B (gold).

HmD results

The structurally aligned sequences that comprise the hmD are shown in Figure 8A. The homomorph is 21 residues long and consists of a 10-residue extension from the N-terminal edge of the motif plus the motif itself. The structure of the N-terminal segment is more highly conserved (i.e. has more R2R matches) than the motif. Various query sequences were tested with the expectation that they would capture additional alignments. The middle section of Figure 8A illustrates that this produced some improvement. For example, using an HCV query, there are R2R matches to TERT, TAQ and T7 DNAP. The C-terminal edge of the motif has some R2R correspondence, suggesting that the structure of the motif is moderately conserved. Using T7 DNAP as a query (lowest segment of the figure), only a small portion of the C-terminal edge of Motif D and a few species have similar structures. There is no alignment of PHI6 within the N-terminal segment of the homomorph, because in this region PHI6 consists of a 24-residue loop between the end of Motif C and the start of Motif D.

Figure 8.

(A) The structure of the N-terminal segment of hmD (blue background) is more conserved than that of Motif D (orange background). There is a high amount of conservation of some residues (pink background) within the motif. Residues consistent with the motif described by Gong and Peersen (9) are bold. In the middle section of the figure, different queries were used to identify more of the R2R correspondences within the motif. In the lowest section of the figure, the alignment is based on a T7 DNAP query. There is an off-by-one alignment of ROTAV (compare lines labeled ‘ROTAV, HCV q’ and ‘ROTA, T7 DNAP q’, both italicized), suggesting an unusual structural conformation. (B) The homomorph of Motif D [illustrated using poliovirus (PDB:1RA6)] includes the motif itself (gold) and an adjacent upstream segment (blue). The motif lines the wall of the template tunnel. (C) Most HmD residues are located on the surface of the protein. (D) The terminal residues of hmD are located at the exterior surface of the protein, on a different face than the main section of the homomorph.

HmE results

The structurally aligned sequences that comprise hmE are summarized in Figure 9A. HmE is large and in most of the ss-RdRps (PV, COXS, HRV, FMDV, NV, RHDV, SAPV, HCV, BVDV, WN and DENV) it is highly conserved. The motif is near the N-terminal edge and a loop region is located near the middle of the homomorph. The sequences vary in length due to the loop region. The length of hmE in the caliciviruses (30–34 residues) is shorter than those in the picornaviruses (36–37 residues); HCV and BVDV loops are 37 and 35 residues, respectively, and the loops of WN and DENV are the longest at 38 and 39 residues, respectively. There is strain-specific amino acid variability in this segment of HRV. HmE is well represented by all RdRps. No R2R correspondence was found with HIV or TERT. These species, however, are structurally matched to each other (Figure 9A, middle section). There is considerable sequence similarity between PV and DENV within this homomorph; this is illustrated in the bottom section of Figure 9A by the shaded conserved residues. DdRps and DdRps are not included in the analysis of this region because the region is missing from the structures in our sample group.

Figure 9.

(A) Motif E is a component of a large, well-defined homomorph. There is considerable sequence similarity between PV and DENV in this homomorph, illustrated at the bottom of the figure. TERT and HIV have R2R correspondence over the motif segment and three amino acids on each side of it. HIV does not have any R2R correspondence with any species in the sample set in this region. (B) Motif E (gold), [illustrated using poliovirus (PDB:1RA6)] forms part of the NTP entry tunnel. Both the N-terminal segment of the homomorph (blue) and the C-terminal segment (brown) extend to the surface of the protein. (C) From the C-terminal edge of the motif, the homomorph extends to the surface of the protein to expose a species specific loop (of variable length) (green) at the surface, then turns, then transects the protein and extends to an opposite surface. The tertiary structure of the hmE is illustrated in Figure 9B and C. Most of the homomorph is at the exterior of the protein near the NTP entry tunnel. Although it has extensive surface exposure, each terminus of the homomorph appears to be anchored by residues that are not part of the homomorph; as a result, the terminal residue at each end of the homomorph is exposed as a single residue at the exterior surface of the protein. Motif E is located near the N-terminal edge of the homomorph and contacts the surface of the NTP entry tunnel (2). The C-terminal segment of the homomorph is folded back on itself in a manner that places the species-specific loop at the surface of the protein (Figure 8C). The homomorph forms a double strand through PV_M392, at which point the remainder of the homomorph is a single-stranded helix that emerges at the exterior surface of the protein. In PV, the C-terminal of hmE (R402) is exposed at the surface the protein and surrounded by the segment 28-SAFHYVFEG-36.

DISCUSSION

Structure-based sequence alignment using the StralSV algorithm (1) enabled us to identify seven distinct homologous structures in most of the polymerases in our collection of 22 species. In the RdRps, the combined regions of structural homology represent ∼75% of the sequence from the start of homomorph of Motif G (hmG) through the end of hmE in each species (∼375 residues). There is <10% conservation of sequence composition among these species. Each of the homomorphs includes a sequence motif consisting of characteristic highly conserved functional residues that are essential to replication. The tertiary position of each of the homomorphs includes at least one residue (and sometimes more) in contact with the exterior surface of the protein and one or more highly conserved functional residues located within or at the wall of the template tunnel. We defined the boundaries of a homomorph as the position where the structural segment shared by all representatives in a set became discontinuous in more than two species. For many queries, this position could be confidently identified. However, these positions sometimes varied by one or two residues, depending on the query sequence. Query-dependent differences in R2R matches were also observed within the motifs themselves, where minor differences in structure resulted in a lack of R2R matches for short segments of some queries. Our approach was to set the boundary at the position where most queries were in agreement, but to keep in mind that these edges might vary by one or two residues. Poliovirus had R2R correspondence with other species in the sample set more often than did any other structure. In almost all instances, we were able to map functional features of other proteins to a structurally similar segment of poliovirus. This property of centrality makes it a useful template for polymerase structure properties.

HmG discussion

HmG is shared by picornaviruses, caliciviruses and flaviviruses, although the structures of each of these groups begin to diverge within the C-terminal segment of Motif G. Motif G is characterized by the conserved motif [T/Sx1-2G], which is located near the outer edge of the template tunnel. The motif may enforce the correct orientation of essential residues and a primer (35). Each flank of the homomorph contains amino acid residues that significantly affect the life cycle of the species. In PV, mutations at the N-terminal residue of the homomorph (D71A/E72A) are lethal (37). Mutations located outside the N-terminal edge of the motif (PV D105A/E108A) result in small plaques (37). Downstream from the C-terminal edge of the motif, there is a nuclear localization signal (NLS) in the picornaviruses and caliciviruses. The NLS is located two residues from the C-terminus of the homomorph and mutations in the NLS (K125A/K126A/K127A and K127A/R128A/D129A) are lethal to PV (37).

HmF discussion

Previous research found that Motif F occurs in all RdRps (38), that it recognizes the incoming NTP (39), serves as the primary fidelity checkpoint for RdRp and reorients the proper triphosphate into a position for efficient catalysis (40). HmF is an extensive structure with surface exposure at both ends and near its mid-section at Motif F2 (Figure 3A–D). Motif F2 (Figure 3E) is analogous to the loop in hmG that varies in composition and length; it is upstream of a highly conserved motif and is species-specific. The large size of this homomorph and its positioning that transects the protein while maintaining contact with the template tunnel is consistent with its established role in transcription, which requires both fine-scale stability and large-scale mobility. Motif F3 consists of mostly basic amino acid residues and forms the roof of the NTP entry tunnel (41); the characteristic conserved arg residue is essential to nucleotide binding (38). The required orientation of the F motifs would be stabilized by the loop formed by hmF and the double-stranded segment formed by the extension of the homomorph beyond the motifs. Both the N-terminal and C-terminal residues of the homomorph are exposed at the exterior surface of the protein. In PV, mutations of residues adjacent to the N-terminal are lethal: G149-i-I150 (42) and H149A/K150A (37).

HmA discussion

The conserved residues of Motif A (in PV, D233 and D238) control the function of the metal ions at the active site (41,43), which perform the phosphotransfer essential to polymerase activity (41). D233 is ligand to the metal (44). D238 is essential to NTP binding (3). Similar functions for the residues of Motif A have been identified for HCV (45), HRV (45) and FMDV (2). Motif A is centered within a spring-like homomorph (Figure 4B–D). Each end terminates at the exterior surface of the protein, and the beginning and end of the homomorph terminate nearly opposite each other. Mutations in the N-terminal segment of the homomorph (in PV at E226A/E227A) result in small plaques (37), suggesting that these residues influence the rate of catalysis. This is the region where species-specific structures protrude from the homomorph (PV L224–L229). This position, relative to the conserved motif, is analogous to a similar structure in hmG and Motif F2. All these structures contain a segment that varies in length and composition by species and is located upstream from a highly conserved motif, essential to replication. An insertion at the C-terminal edge of Motif A (L241-i-S242) is lethal (Figure 4A) (42). The structure of the homomorph is highly conserved in this region, suggesting that the structural consequences of an insertion are not tolerated. The position of this lethal insertion is similar to the position of lethal mutations in Motif G, although the major effect in hmG may be the loss of the nuclear localization signal. Mutations near the C-terminal residue of the Motif A homomorph (PV G257) affect function: E254A/K255A is lethal (37), and the insertion I256-ile-G257 results in temperature sensitivity (46). HmA provides a structural connection between the functional residues at the template tunnel and the exterior surface of the protein. Sequence residues immediately adjacent to the motif have a high degree of functionality. It is possible that the orientation of the conserved segments that comprise the homomorph would be affected by changes in the orientation of residues at the edges of the motif. N-terminal and C-terminal segments of the homomorph are helices, which are likely to be relatively rigid.

HmB discussion

Motif B is near the center of a very large homomorph that contacts the exterior surface at nearly opposite positions. As stated by Bruenn (38), Motif B forms the base of the template-entry channel and may function in guiding the template entry into the active site. Choi et al. (17) observed that the highly conserved asn (N414 in BVDV) is conserved in all picornavirus. Hansen et al. (47) found that in HRV, N297 is involved in positioning NTP for recognition. Ferrer-Orta et al. (2) determined that the equivalent FMDV-N307 and D245 (Motif A) together are involved in ribonucleoside triphosphate (rNTP) selection. Tao et al. (21) and Butcher et al. (20) proposed that Motif B interacts with the 2′-OH group on the incoming nucleotide. Korneeva and Cameron (48) determined that FMDV-N307 interacts with the C-terminal-OH in the uridylylation complex, but with the 2′-OH in the elongation complex. The role of Motif B in the mechanisms of active site closure has recently been described in detail by Gong and Peersen (9). These experiments document the role of the highly conserved asn in the motif in multiple species and suggest that structural alignment may be useful for the identification of potential functionally equivalent residues in structures that have R2R correspondences. StralSV structure analysis indicates that the structure of Motif B is highly conserved in all RdRps, unlike some of the other motifs that have unmatched R2R correspondences. This highly conserved structure is consistent with its role in NTP recognition. An insertion in Motif B at PV C290-s-S291 is lethal (42). Within the N-terminal segment of the hmB, the mutation in PV-K276L results in small plaques (49). The structural position of this mutation (within the homomorph and upstream from the motif) is similar to that of rate-affecting mutations in the homomorphs of Motifs G and A. At the C-terminal end of the homomorph, in BVDV (BVDV-F426), mutation of residues C427, S428 and R447 to ala reduces primer-dependent RNA elongation and abolishes de novo synthesis (17).

HmC discussion

Motif C is not a component of a larger structurally conserved segment, but has the same key features of the other homomorphs. It is folded in a manner that places the apex of the fold at the wall of the template tunnel, and both the N-terminus and C-terminus at the exterior surface of the protein (Figure 6B and C). Therefore, Motif C as defined in the literature comprises the homomorph. The absence of R2R correspondence adjacent to hmC indicates that the structures of the adjacent sequence segments are highly specific to each species. HmC is highly conserved in the RdRps and highly similar to the DNA-dependent polymerases. Although there is a sequence inversion in the birnaviruses (Motif C precedes Motif A), Figure 7B and C illustrates that despite the difference in sequence order, the homomorphs occupy a similar tertiary position. StralSV analysis indicates that the structure of Motif C is highly conserved in the RNA-dependent polymerases, though slightly different in the DNA-dependent polymerases (Figure 6A). Motif C is part of the classic ‘RRM-fold’ that forms the core of the palm domain of all these polymerases (together with that part of Motif A that forms a β-sheet with Motif C. Experimental studies have demonstrated that several residues within Motif C are sensitive to the position and composition of mutants. The highly conserved residues, GDD, occur near the center of the motif. The primary function of these residues is to coordinate the metal ions associated with the incoming rNTP(45,43). In PV, mutation of D to E in either or both positions (D328 or D329) is lethal (50). In HCV, mutation of G317A is also lethal (51). However, in birnaviruses the highly conserved residues are ADN, rather than GDD, and mutation to GDD increases RNA synthesis activity (5). Certain mutations immediately upstream from the GDD motif are lethal in PV: Y326[CHIMS] (50). This is similar to the effect of the L241-i-S242 at the downstream edge of Motif A in PV. Near the N-terminal end of Motif C in HCV (HCV T312), the mutation D311A characterizes chronic hepatitis (52). Mutation at the edge of a highly conserved structure seems to have a substantial effect on the viral life cycle. The R2R comparisons summarized in these structure maps identify the types of sequence variability that can occur while maintaining the same spatial structure (Figure 6A) and demonstrates and identifies the variations in composition that can be tolerated even within a key functional motif. The R2R correspondence of other RdRps with birnaviruses in Motif C (Figure 7A), despite the sequence inversion in birnaviruses (CAB) supports the premise that conservation of structure is a significant, if not dominant, factor in evolution.

HmD discussion

HmD is different from the other homomorphs in that it lies mostly on the surface of the protein (Figure 8C). Like the others, however, its terminal residues are located at a distinctive surface (Figure 8D). In the case of hmD, they come from an opposite surface rather than the interior of the protein. The N-terminal segment of hmD is more conserved than the motif itself, which forms the C-terminal segment. The motion of Motif D in the active state has not been captured by the existing structures in PDB (40). Therefore, the lack of R2R correspondence in the motif may be a reflection of the limitations of the available structures. Residues within hmD perform varied functions. In PV, e.g. polymerases form an extensive lattice system by polymerase–polymerase interactions; L342 and D349, located within hmD, contribute to interface I of this lattice system (53). The most highly conserved residue within the homomorph is a gly (PV G351) at the N-terminal edge of the motif and central to the homomorph; gly in this position would facilitate the folding of the homomorph, and is consistent with Cameron et al.’s (40) hypothesis that Motif D may be the most dynamic structural element of RdRps and RTs. Another conserved residue is a lys near the C-terminal edge (PV-K359). Residues equivalent to PV-K359 supply a proton to the nucleotidyl transfer reaction that increases the rate constant for nucleotide addition by 50- to 1000-fold (40). In PV, within the motif, the insertion T353-t-M354 results in small plaques, likely due to delayed RNA synthesis (54). In other homomorphs, mutations that affect the rate of synthesis occur more commonly outside of the motifs. Immediately downstream from the homomorph, PV-T362I is an attenuating mutation for the Sabin vaccine (40).

HmE discussion

HmE is ∼36 amino acids in length (Figure 9A), and well represented by all RNA-dependent members of the sample set, except that no correspondence was found with HIV or TERT. These two species, however, are structurally matched to each other. There is considerable sequence similarity between PV and DENV within this homomorph, shown in the bottom segment of Figure 9A. Appleby et al. (55) determined that Motif E is unique to RNA polymerases. HmE forms part of the NTP entry tunnel and has a considerable amount of exposure on the surface of the protein. The species-specific loop within hmE is at the outermost edge of the protein, a feature found in other homomorphs (G, F2 and A; Figures 2A, 3A and 4A, respectively). Huang et al. (25) found that the Motif E loop region acts as a pivot point for thumb subdomain movement upon template–primer binding. Motif E may also function in the proper positioning of the thumb relative to the palm (5). The turn of the loop projects into the active site cavity where it has been implicated in helping to position the C-terminal end of the primer strand for attachment to the α-phosphate of the NTP during phosphoryl transfer (56). Motif E in HCV plays a role in binding the priming nucleotide (not the incoming nucleotides) (38); HCV has a longer loop (Figure 9A), possibly related to this function. In PV, the C-terminal of hmE (R402) emerges from the protein into the segment 28-SAFHYVFEG-36. This segment contains residues F30 and F34, which interact with W403 to maintain the polymerase structure (39).

SUMMARY

Comparisons of the tertiary structures of the RdRps of 18 viral species indicated that most of the highly conserved residues essential to polymerase function are embedded in large sequence segments that are highly conserved structurally, yet disparate in composition. We have named these conserved segments ‘homomorphs’ and have identified the composition and length of each homomorph that includes previously recognized polymerase motifs (Table 2). We have demonstrated that the RNA polymerases have structural skeletons (frames) that are highly conserved, with flexible segments between them, and that extensive segments of structure similarity can be identified by the methods we have described. These methods are applicable to the studies of other groups of proteins, and we anticipate that by accessing structure similarity independent of sequence composition, skeletal frameworks will be found in other groups of proteins. Additionally, after structure similarity is identified, differences between members of the group become readily apparent. All of the homomorphs included residues that connect the template tunnel or the NTP entry tunnel with the outer surface of the protein. Although some of the surface residues within these homomorphs have specific functional roles, as reported in the literature (see citations in previous paragraphs), we anticipate that they may all be important for polymerase function; the consistent occurrence of homomorphs embedding motifs—even when a defined sequence motif is small in size—suggests a structure–function relationship between the motif and its structurally conserved flanking regions. It would be interesting to explore the possibility that interactions at the surface of the protein (e.g. protein–protein contact at surface homomorph residues) may subtly affect function buried deep beneath, within the tunnel. Furthermore, each homomorph is either divided by or is separated from another homomorph by a flexible secondary structure. Identification of the span of each homomorph and the terminal residue enables us to identify specific residues on the surface that would not, in many cases, be otherwise noticed. By comparing experimental data with the surface location of the ends of the homomorphs, we have found that these are often the sites of key functional interactions of the protein. A paper describing these sites is in preparation. We have compared the effects of currently recognized mutations within the motifs and within the homomorphs. Most mutations within the motifs are function-specific, related to either a change in charge or size, and in most cases the mutations are lethal [mA (42), mB (42), mC (50),(51)]. Mutations outside of the motif (but within the homomorph) are more often rate related and located in a segment that bulges from the homomorph by an amount that varies by species (Figures 2A, 3A and 4A). These differences support the hypothesis that residues actively involved in template processing are essential to viability, and most of them are components of a consistent, stable structure that places and/or maintains them in their appropriate functional position. However, the practice of mutating residues to ala has resulted in a somewhat ‘all or nothing’ perspective of mutations. StralSV analysis can facilitate informed selection of alternative residues of various compositions, which could possibly affect replication rates to different extents. Experiments involving this type of testing would enhance predictive models and may provide new insights for the design and development of medical countermeasures. The extension of all homomorphs from the template tunnel to the exterior of the protein was an unexpected finding. Its universality in the polymerase family suggests a functional significance. Residues within the homomorphs that were localized to the surface often had species-specific loops. The most likely reason these features have not been identified previously is due to the limitations of existing sequence and structure comparison tools—in particular, the ability to perform multi-species comparisons of structures, using overlapping windows of a size determined by the user, and the ability to select the criteria for R2R matches. The homomorphs as defined in this work add structural clarity and context to sequence-based functional motifs previously observed by numerous authors performing comparative studies among polymerases. The structure maps created from the R2R correspondences identified by the StralSV algorithm provided a unique and informative perspective of structure and function in RdRps. They readily identify unique regions of each species and those shared by proteins within a family. These are features that would be useful for studies of any protein family. Based on the results of this study, it may be possible to define characteristic homomorphs for many other protein families, despite considerable sequence variation. It may be feasible to classify homomorphs in a manner analogous to the SCOP database, and in doing so provide new insight into protein evolution. The StralSV algorithm simultaneously, rapidly and quantitatively identifies the similarities and differences of the structural components of multiple species and provides an output that facilitates the comparison of three-dimensional structure information. StralSV enabled us to cluster protein segments that have the same tertiary structure, independent of sequence variability. In a sense, it is an analog of Blast, although based on structure rather than sequence. The precision of StralSV makes it easy to identify small differences between and within species. The ability to process multiple species at the same time can rapidly accelerate our understanding of differences between them. The identified structural associations may also facilitate the transfer of structure-related functional information among proteins. The traditional perspective of the relationship between the amino acid sequence of a protein and its tertiary structure has been that sequence determines structure. Under this premise, sequence-based evolutionary studies and phylogeny would inherently incorporate structure. In this study of RdRps, we demonstrated that structure accommodates substantial sequence variability, and that highly diverse sequences can generate highly similar tertiary structures. Structure-based phylogeny may provide new perspectives of protein evolution.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online: Supplementary Tables 1 and 2.

FUNDING

Lawrence Livermore National Laboratory (DOE Contract DE-AC52-07NA27344); UC-LLNS Fees grant (PI: CEZ). Funding for open access charge: University of California and Lawrence Livermore National Security Fees Grant (PI: CEZ). Conflict of interest statement. None declared.

55 in total

1. Cn3D: sequence and structure views for Entrez.

Authors: Y Wang; L Y Geer; C Chappey; J A Kans; S H Bryant
Journal: Trends Biochem Sci Date: 2000-06 Impact factor: 13.807

2. Oligomeric structures of poliovirus polymerase are important for function.

Authors: S D Hobson; E S Rosenblum; O C Richards; K Richmond; K Kirkegaard; S C Schultz
Journal: EMBO J Date: 2001-03-01 Impact factor: 11.598

3. Crystal structures of active and inactive conformations of a caliciviral RNA-dependent RNA polymerase.

Authors: Kenneth K S Ng; Maia M Cherney; Ana Lopez Vazquez; Angeles Machin; Jose M Martin Alonso; Francisco Parra; Michael N G James
Journal: J Biol Chem Date: 2001-10-24 Impact factor: 5.157

4. The Protein Data Bank.

Authors: Helen M Berman; Tammy Battistuz; T N Bhat; Wolfgang F Bluhm; Philip E Bourne; Kyle Burkhardt; Zukang Feng; Gary L Gilliland; Lisa Iype; Shri Jain; Phoebe Fagan; Jessica Marvin; David Padilla; Veerasamy Ravichandran; Bohdan Schneider; Narmada Thanki; Helge Weissig; John D Westbrook; Christine Zardecki
Journal: Acta Crystallogr D Biol Crystallogr Date: 2002-05-29

5. RNA synthesis in a cage--structural studies of reovirus polymerase lambda3.

Authors: Yizhi Tao; Diane L Farsetta; Max L Nibert; Stephen C Harrison
Journal: Cell Date: 2002-11-27 Impact factor: 41.582

6. Structural basis for the transition from initiation to elongation transcription in T7 RNA polymerase.

Authors: Y Whitney Yin; Thomas A Steitz
Journal: Science Date: 2002-09-19 Impact factor: 47.728

7. A mechanism for initiating RNA-dependent RNA polymerization.

Authors: S J Butcher; J M Grimes; E V Makeyev; D H Bamford; D I Stuart
Journal: Nature Date: 2001-03-08 Impact factor: 49.962

8. The N-terminus of the RNA polymerase from infectious pancreatic necrosis virus is the determinant of genome attachment.

Authors: Stephen C Graham; L Peter Sarin; Mohammad W Bahar; Reg A Myers; David I Stuart; Dennis H Bamford; Jonathan M Grimes
Journal: PLoS Pathog Date: 2011-06-23 Impact factor: 6.823

9. The palm subdomain-based active site is internally permuted in viral RNA-dependent RNA polymerases of an ancient lineage.

Authors: Alexander E Gorbalenya; Fiona M Pringle; Jean-Louis Zeddam; Brian T Luke; Craig E Cameron; James Kalmakoff; Terry N Hanzlik; Karl H J Gordon; Vernon K Ward
Journal: J Mol Biol Date: 2002-11-15 Impact factor: 5.469

10. Effect of mutation in the hepatitis C virus nonstructural 5B region on HCV replication.

Authors: Izumi Okura; Norio Horiike; Kojiro Michitaka; Morikazu Onji
Journal: J Gastroenterol Date: 2004 Impact factor: 7.527

16 in total

1. Rational Control of Poliovirus RNA-Dependent RNA Polymerase Fidelity by Modulating Motif-D Loop Conformational Dynamics.

Authors: Jingjing Shi; Jacob M Perryman; Xiaorong Yang; Xinran Liu; Derek M Musser; Alyson K Boehr; Ibrahim M Moustafa; Jamie J Arnold; Craig E Cameron; David D Boehr
Journal: Biochemistry Date: 2019-08-26 Impact factor: 3.162

2. Homology-Based Identification of a Mutation in the Coronavirus RNA-Dependent RNA Polymerase That Confers Resistance to Multiple Mutagens.

Authors: Nicole R Sexton; Everett Clinton Smith; Hervé Blanc; Marco Vignuzzi; Olve B Peersen; Mark R Denison
Journal: J Virol Date: 2016-07-27 Impact factor: 5.103

Review 3. Regulation of Flavivirus RNA synthesis and replication.

Authors: Barbara Selisko; Chunling Wang; Eva Harris; Bruno Canard
Journal: Curr Opin Virol Date: 2014-10-17 Impact factor: 7.090

4. Vaccine-derived mutation in motif D of poliovirus RNA-dependent RNA polymerase lowers nucleotide incorporation fidelity.

Authors: Xinran Liu; Xiaorong Yang; Cheri A Lee; Ibrahim M Moustafa; Eric D Smidansky; David Lum; Jamie J Arnold; Craig E Cameron; David D Boehr
Journal: J Biol Chem Date: 2013-09-30 Impact factor: 5.157

5. Structure Unveils Relationships between RNA Virus Polymerases.

Authors: Heli A M Mönttinen; Janne J Ravantti; Minna M Poranen
Journal: Viruses Date: 2021-02-17 Impact factor: 5.048

6. Structural Analysis of Monomeric RNA-Dependent Polymerases: Evolutionary and Therapeutic Implications.

Authors: Rodrigo Jácome; Arturo Becerra; Samuel Ponce de León; Antonio Lazcano
Journal: PLoS One Date: 2015-09-23 Impact factor: 3.240

7. Evolution of tertiary structure of viral RNA dependent polymerases.

Authors: Jiří Černý; Barbora Černá Bolfíková; James J Valdés; Libor Grubhoffer; Daniel Růžek
Journal: PLoS One Date: 2014-05-09 Impact factor: 3.240

Review 8. Common and unique features of viral RNA-dependent polymerases.

Authors: Aartjan J W te Velthuis
Journal: Cell Mol Life Sci Date: 2014-08-01 Impact factor: 9.261

Review 9. Viruses and viral proteins.

Authors: Nuria Verdaguer; Diego Ferrero; Mathur R N Murthy
Journal: IUCrJ Date: 2014-10-14 Impact factor: 4.769

10. Discovery of an essential nucleotidylating activity associated with a newly delineated conserved domain in the RNA polymerase-containing protein of all nidoviruses.

Authors: Kathleen C Lehmann; Anastasia Gulyaeva; Jessika C Zevenhoven-Dobbe; George M C Janssen; Mark Ruben; Hermen S Overkleeft; Peter A van Veelen; Dmitry V Samborskiy; Alexander A Kravchenko; Andrey M Leontovich; Igor A Sidorov; Eric J Snijder; Clara C Posthuma; Alexander E Gorbalenya
Journal: Nucleic Acids Res Date: 2015-08-24 Impact factor: 16.971