| Literature DB >> 21635786 |
Adam T Zemla1, Dorothy M Lang, Tanya Kostova, Raul Andino, Carol L Ecale Zhou.
Abstract
BACKGROUND: Most of the currently used methods for protein function prediction rely on sequence-based comparisons between a query protein and those for which a functional annotation is provided. A serious limitation of sequence similarity-based approaches for identifying residue conservation among proteins is the low confidence in assigning residue-residue correspondences among proteins when the level of sequence identity between the compared proteins is poor. Multiple sequence alignment methods are more satisfactory--still, they cannot provide reliable results at low levels of sequence identity. Our goal in the current work was to develop an algorithm that could help overcome these difficulties by facilitating the identification of structurally (and possibly functionally) relevant residue-residue correspondences between compared protein structures.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21635786 PMCID: PMC3121648 DOI: 10.1186/1471-2105-12-226
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Overview of the process and data types used in the StralSV algorithm.
Excerpt of sample StralSV output "matrix" file for positions 229-269 from poliovirus RdRp analysis
| AA | Rname | A | V | L | I | P | M | F | W | G | S | T | C | Y | N | Q | D | E | K | R | H | X |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| L | 229 | 4 | 0 | 21 | 10 | 0 | 0 | 1 | 0 | 103 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0 |
| F | 230 | 0 | 4 | 4 | 3 | 0 | 8 | 127 | 0 | 0 | 0 | 0 | 0 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 |
| A | 231 | 30 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 116 | 0 | 2 | 5 | 0 | 0 | 4 | 0 | 0 | 0 | 0 | 0 |
| F | 232 | 8 | 10 | 7 | 4 | 0 | 0 | 34 | 0 | 0 | 0 | 49 | 0 | 140 | 0 | 0 | 5 | 0 | 0 | 0 | 0 | 0 |
| D | 233 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 272 | 0 | 0 | 0 | 0 | 0 |
| Y | 234 | 0 | 49 | 4 | 11 | 0 | 0 | 0 | 0 | 0 | 0 | 152 | 0 | 60 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| T | 235 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 91 | 25 | 0 | 0 | 0 | 0 | 0 | 4 | 7 | 144 | 0 | 0 |
| G | 236 | 23 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 23 | 0 | 0 | 139 | 0 | 13 | 5 | 48 | 0 | 11 | 13 | 0 | 1 |
| Y | 237 | 4 | 0 | 0 | 0 | 0 | 0 | 152 | 37 | 4 | 0 | 0 | 7 | 28 | 0 | 0 | 0 | 0 | 0 | 0 | 44 | 0 |
| D | 238 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 0 | 0 | 268 | 4 | 0 | 0 | 0 | 0 |
| A | 239 | 41 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 14 | 165 | 56 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| S | 240 | 0 | 0 | 0 | 0 | 0 | 0 | 44 | 0 | 0 | 37 | 160 | 0 | 0 | 18 | 7 | 0 | 0 | 0 | 5 | 0 | 0 |
| L | 241 | 0 | 151 | 20 | 22 | 0 | 4 | 0 | 44 | 0 | 0 | 0 | 4 | 0 | 0 | 16 | 0 | 0 | 0 | 0 | 10 | 0 |
| S | 242 | 0 | 0 | 0 | 0 | 44 | 0 | 0 | 0 | 0 | 24 | 165 | 10 | 0 | 3 | 13 | 0 | 0 | 4 | 0 | 10 | 0 |
| P | 243 | 0 | 0 | 0 | 0 | 37 | 0 | 0 | 7 | 44 | 17 | 0 | 0 | 0 | 0 | 10 | 0 | 142 | 0 | 20 | 0 | 0 |
| A | 244 | 34 | 7 | 0 | 8 | 10 | 0 | 0 | 44 | 0 | 14 | 0 | 4 | 0 | 114 | 4 | 12 | 0 | 6 | 13 | 0 | 0 |
| W | 245 | 10 | 20 | 48 | 0 | 0 | 0 | 10 | 30 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 152 | 0 | 0 | 0 | 4 | 0 |
| F | 246 | 0 | 4 | 32 | 140 | 0 | 14 | 30 | 0 | 0 | 0 | 3 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 54 | 0 | 0 |
| E | 247 | 19 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 7 | 0 | 0 | 0 | 10 | 12 | 44 | 18 | 22 | 145 | 2 | 0 |
| A | 248 | 44 | 135 | 59 | 10 | 0 | 0 | 0 | 0 | 10 | 0 | 12 | 5 | 0 | 5 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| L | 249 | 20 | 0 | 30 | 72 | 0 | 11 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 145 | 0 | 0 | 0 | 0 |
| K | 250 | 3 | 0 | 16 | 14 | 0 | 10 | 10 | 0 | 15 | 0 | 0 | 29 | 0 | 0 | 5 | 0 | 144 | 20 | 0 | 0 | 0 |
| M | 251 | 11 | 0 | 0 | 0 | 0 | 31 | 0 | 0 | 0 | 144 | 0 | 0 | 3 | 0 | 0 | 22 | 20 | 4 | 0 | 0 | 4 |
| V | 252 | 8 | 27 | 2 | 168 | 0 | 0 | 0 | 0 | 9 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 6 | 0 | 0 | 0 | 0 |
| L | 253 | 0 | 0 | 52 | 10 | 0 | 14 | 0 | 0 | 0 | 0 | 0 | 0 | 140 | 0 | 7 | 0 | 0 | 0 | 0 | 0 | 0 |
| E | 254 | 3 | 13 | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 1 | 0 | 140 | 16 | 23 | 7 | 0 | 1 | 0 |
| K | 255 | 12 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 128 | 7 | 10 | 0 | 2 | 1 | 33 | 3 | 0 | 0 |
| I | 256 | 0 | 0 | 17 | 14 | 0 | 0 | 16 | 0 | 1 | 0 | 0 | 85 | 7 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| G | 257 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 20 | 23 | 0 | 0 | 7 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| F | 258 | 0 | 0 | 0 | 0 | 0 | 0 | 18 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| G | 259 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 14 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| D | 260 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 14 | 0 | 0 | 0 | 0 | 0 |
| R | 261 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 14 | 0 | 0 |
| V | 262 | 0 | 46 | 0 | 2 | 5 | 0 | 0 | 0 | 0 | 7 | 3 | 0 | 0 | 0 | 13 | 0 | 0 | 4 | 2 | 0 | 0 |
| D | 263 | 2 | 13 | 0 | 0 | 0 | 0 | 0 | 10 | 0 | 2 | 5 | 3 | 0 | 7 | 12 | 14 | 0 | 32 | 0 | 0 | 0 |
| Y | 264 | 4 | 13 | 44 | 12 | 0 | 0 | 3 | 0 | 0 | 0 | 10 | 0 | 17 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 0 |
| I | 265 | 13 | 0 | 12 | 37 | 2 | 0 | 32 | 0 | 0 | 0 | 7 | 0 | 0 | 0 | 10 | 0 | 0 | 0 | 0 | 0 | 0 |
| D | 266 | 2 | 3 | 0 | 10 | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 8 | 2 | 26 | 50 | 10 | 0 | 0 | 0 |
| Y | 267 | 3 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 42 | 0 | 17 | 10 | 0 | 16 | 0 | 0 | 10 | 7 | 0 |
| L | 268 | 0 | 2 | 63 | 2 | 0 | 5 | 0 | 0 | 0 | 31 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 2 |
| N | 269 | 0 | 13 | 38 | 3 | 0 | 10 | 0 | 0 | 0 | 3 | 6 | 15 | 0 | 13 | 2 | 0 | 0 | 0 | 0 | 0 | 0 |
The first column contains the amino acid of the query sequence at the position indicated in the second column. Each row in the table comprises a tally of the amino acids of each type (including non-standard indicated by 'X') from the corresponding positions in all qualified template fragments (positional hits). The following heading from the output provides the total number of templates searched plus other run parameters. Number of all PDB chains (SEQRES.all_list.11_03_09): 147,634; Structure: Polio RNApolym; Length: 461; MIN span: 5; Structure similarity: 55; LGA distance: 4.0; Window: 90; Legend: 's' AA reference sequence, 'd' dominant amino acid, '+' number of hits per position (max '+': n = 398).
Excerpt of sample StralSV output "profile" file for positions 229-269 from poliovirus RdRp analysis
| # | AA | ord | Rname | rnk | s/+ | d/+ | 2nd/+ | 3rd/+ | +/n | + | nv | variety |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LSV_P: | L | 229 | 229 | 2 | 14.7 | 72 | 14.7 | 7 | 35.9 | 143 | 6 | GLIAHF |
| LSV_P: | F | 230 | 230 | 1 | 81.9 | 81.9 | 5.2 | 3.2 | 38.9 | 155 | 7 | FMYVLXI |
| LSV_P: | A | 231 | 231 | 2 | 19.1 | 73.9 | 19.1 | 3.2 | 39.4 | 157 | 5 | SAYDC |
| LSV_P: | F | 232 | 232 | 3 | 13.2 | 54.5 | 19.1 | 13.2 | 64.6 | 257 | 8 | YTFVALDI |
| LSV_P: | D | 233 | 233 | 1 | 100 | 100 | 0 | 0 | 68.3 | 272 | 1 | D |
| LSV_P: | Y | 234 | 234 | 2 | 21.7 | 55.1 | 21.7 | 17.8 | 69.3 | 276 | 5 | TYVIL |
| LSV_P: | T | 235 | 235 | 3 | 9.1 | 52.2 | 33 | 9.1 | 69.3 | 276 | 6 | RSTKAE |
| LSV_P: | G | 236 | 236 | 4 | 8.3 | 50.4 | 17.4 | 8.3 | 69.3 | 276 | 9 | CDAGNRKQX |
| LSV_P: | Y | 237 | 237 | 4 | 10.1 | 55.1 | 15.9 | 13.4 | 69.3 | 276 | 7 | FHWYCAG |
| LSV_P: | D | 238 | 238 | 1 | 97.1 | 97.1 | 1.4 | 1.4 | 69.3 | 276 | 3 | DYE |
| LSV_P: | A | 239 | 239 | 3 | 14.9 | 59.8 | 20.3 | 14.9 | 69.3 | 276 | 4 | STAG |
| LSV_P: | S | 240 | 240 | 3 | 13.7 | 59 | 16.2 | 13.7 | 68.1 | 271 | 6 | TFSNQR |
| LSV_P: | L | 241 | 241 | 4 | 7.4 | 55.7 | 16.2 | 8.1 | 68.1 | 271 | 8 | VWILQHMC |
| LSV_P: | S | 242 | 242 | 3 | 8.8 | 60.4 | 16.1 | 8.8 | 68.6 | 273 | 8 | TPSQCHKN |
| LSV_P: | P | 243 | 243 | 3 | 13.4 | 51.3 | 15.9 | 13.4 | 69.6 | 277 | 7 | EGPRSQW |
| LSV_P: | A | 244 | 244 | 3 | 12.6 | 42.2 | 16.3 | 12.6 | 67.8 | 270 | 12 | NWASRDPIVKCQ |
| LSV_P: | W | 245 | 245 | 3 | 10.9 | 55.5 | 17.5 | 10.9 | 68.8 | 274 | 7 | DLWVAFH |
| LSV_P: | F | 246 | 246 | 4 | 10.7 | 49.8 | 19.2 | 11.4 | 70.6 | 281 | 8 | IRLFMVCT |
| LSV_P: | E | 247 | 247 | 5 | 6.4 | 51.6 | 15.7 | 7.8 | 70.6 | 281 | 10 | RDKAEQNSVH |
| LSV_P: | A | 248 | 248 | 3 | 15.7 | 48 | 21 | 15.7 | 70.6 | 281 | 9 | VLATIGCNH |
| LSV_P: | L | 249 | 249 | 3 | 10.7 | 51.6 | 25.6 | 10.7 | 70.6 | 281 | 6 | EILAMS |
| LSV_P: | K | 250 | 250 | 3 | 7.5 | 54.1 | 10.9 | 7.5 | 66.8 | 266 | 10 | ECKLGIMFQA |
| LSV_P: | M | 251 | 251 | 2 | 13 | 60.3 | 13 | 9.2 | 60.1 | 239 | 8 | SMDEAKXY |
| LSV_P: | V | 252 | 252 | 2 | 12.1 | 75.3 | 12.1 | 4 | 56 | 223 | 7 | IVGAEYL |
| LSV_P: | L | 253 | 253 | 2 | 23.3 | 62.8 | 23.3 | 6.3 | 56 | 223 | 5 | YLMIQ |
| LSV_P: | E | 254 | 254 | 2 | 10.7 | 65.1 | 10.7 | 7.4 | 54 | 215 | 10 | QEDVLKATYH |
| LSV_P: | K | 255 | 255 | 2 | 16.3 | 63.4 | 16.3 | 5.9 | 50.8 | 202 | 10 | CKANYTRIDE |
| LSV_P: | I | 256 | 256 | 4 | 9.9 | 60.3 | 12.1 | 11.3 | 35.4 | 141 | 7 | CLFIYGR |
| LSV_P: | G | 257 | 257 | 2 | 39.2 | 45.1 | 39.2 | 13.7 | 12.8 | 51 | 4 | SGYD |
| LSV_P: | F | 258 | 258 | 1 | 85.7 | 85.7 | 14.3 | 0 | 5.3 | 21 | 2 | FY |
| LSV_P: | G | 259 | 259 | 1 | 100 | 100 | 0 | 0 | 3.5 | 14 | 1 | G |
| LSV_P: | D | 260 | 260 | 1 | 77.8 | 77.8 | 11.1 | 11.1 | 4.5 | 18 | 3 | DAN |
| LSV_P: | R | 261 | 261 | 1 | 77.8 | 77.8 | 11.1 | 11.1 | 4.5 | 18 | 3 | RMG |
| LSV_P: | V | 262 | 262 | 1 | 56.1 | 56.1 | 15.9 | 8.5 | 20.6 | 82 | 8 | VQSPKTIR |
| LSV_P: | D | 263 | 263 | 2 | 14 | 32 | 14 | 13 | 25.1 | 100 | 10 | KDVQWNTCAS |
| LSV_P: | Y | 264 | 264 | 2 | 16.2 | 41.9 | 16.2 | 12.4 | 26.4 | 105 | 8 | LYVITAFH |
| LSV_P: | I | 265 | 265 | 1 | 32.7 | 32.7 | 28.3 | 11.5 | 28.4 | 113 | 7 | IFALQTP |
| LSV_P: | D | 266 | 266 | 2 | 23 | 44.2 | 23 | 8.8 | 28.4 | 113 | 9 | EDIKNVASQ |
| LSV_P: | Y | 267 | 267 | 2 | 15.3 | 37.8 | 15.3 | 14.4 | 27.9 | 111 | 9 | TYDNRHLAS |
| LSV_P: | L | 268 | 268 | 1 | 57.3 | 57.3 | 28.2 | 4.5 | 27.6 | 110 | 8 | LSMHVIYX |
| LSV_P: | N | 269 | 269 | 4 | 12.6 | 36.9 | 14.6 | 12.6 | 25.9 | 103 | 9 | LCVNMTISQ |
#: file name; AA: query amino acid; ord: sequence position number; Rname: PDB residue name; rnk: rank (position of the query amino acid in the "variety" list); s/+: percent of all residues corresponding to query; d/+: percent of dominant residue; 2nd/+: percent of second most dominant residue; 3rd/+: percent of third most dominant residue; +/n: percent of total templates selected in given experiment contributing to data at a given position; +: total number of templates contributing to data at a given position; nv: total number of amino acid variants; variety: list of variants in order of frequency.
Figure 2Effect of window size and distance cutoff parameter combinations on capture of structure fragments. A: sequence variability profile around position G64 generated by running StralSV against the PDB database; red type: data for position G64. B: Numbers of structure fragments selected from a custom library comprising 38 polymerase chains plus randomly selected structures from PDB; red type: numbers corresponding to distance cutoff and window_size values selected as defaults for StralSV web service. C: Structure fragment of poliovirus polymerase (1ra6) corresponding to the N-terminal region containing position G64; color variations dark blue to bright turquoise indicate N-to-C-terminal direction of chain.
Figure 3Effect of window size on numbers of positional hits per position. Red: window size 70; blue: 80; green: 90. Vertical lines with labels: positions at which SCOP identifiers were quantified for templates contributing to positional variability data (see Table 3 and additional files 3, 4, 5, 6, 7, and 8: StralSV-RdRp_Suppl_Table2, StralSV-RdRp_Suppl_Table3, StralSV-RdRp_Suppl_Table4, StralSV-RdRp_Suppl_Table5, StralSV-RdRp_Suppl_Table6, StralSV-RdRp_Suppl_Table7). Along x-axis: secondary structure assignments (see Methods). Inset: structure model of poliovirus polymerase upon which have been labeled the six positions highlighted in the main figure.
Figure 4Amino-acid variability (y-axis) versus number of positional hits (x-axis) for three window sizes. Red circles: window size 70; blue diamonds: 80; green triangles: 90. Inset: positional hits at which the dominant residue occurs at frequency > = 80%. Circles: positional hit, variability coordinate pairs corresponding to the 11 positions shown in Fig. 5.
Figure 5Frequencies of dominant residues and correspondence of positional hit frequency with polymerase sequence motifs A-G. Lavender plot: frequencies (quantified along right y-axis) of dominant residue per sequence position along poliovirus polymerase chain; blue plot: positional hit frequency at window size 80; labeled dots: high-frequency residues that have been functionally annotated (see additional file 2: StralSV-RdRp_Suppl_Table1).
SCOP categorization of template fragments selected at each of six positions along poliovirus RdRp
| RdRp | DsPhage | RT | DNA pol1 | Other | Templates | |||
|---|---|---|---|---|---|---|---|---|
| Position | Window | e.8.1.4 | e.8.1.6 | e.8.1.2 | e.8.1.1 | not e.8 | with SCOP ID | |
| D238 | w90 | 85 | 32 | 0 | 0 | 3 | 120 | |
| w80 | 85 | 32 | 0 | 0 | 0 | 119 | ||
| w70 | 79 | 32 | 0 | 0 | 0 | 111 | ||
| w50 | 79 | 0 | 0 | 0 | 0 | 79 | ||
| N297 | w90 | 85 | 32 | 78 | 0 | 9 | 204 | |
| w80 | 85 | 32 | 41 | 2 | 160 | 320 | ||
| w70 | 85 | 32 | 71 | 3 | 289 | 480 | ||
| w50 | 78 | 32 | 0 | 4 | 197 | 311 | ||
| D328 | w90 | 84 | 32 | 3 | 0 | 6 | 125 | |
| w80 | 84 | 32 | 8 | 0 | 17 | 141 | ||
| w70 | 84 | 32 | 4 | 0 | 34 | 154 | ||
| w50 | 71 | 2 | 0 | 0 | 3 | 76 | ||
| L374 | w90 | 77 | 0 | 0 | 0 | 2 | 79 | |
| w80 | 80 | 0 | 0 | 0 | 3 | 83 | ||
| w70 | 82 | 0 | 1 | 0 | 3 | 86 | ||
| w50 | 59 | 0 | 0 | 0 | 2 | 61 | ||
| H398 | w90 | 79 | 0 | 0 | 0 | 3 | 82 | |
| w80 | 80 | 0 | 0 | 0 | 48 | 128 | ||
| w70 | 68 | 0 | 0 | 376 | 446 | |||
| w50 | 36 | 0 | 0 | 0 | 182 | 218 | ||
| H413 | w90 | 37 | 0 | 0 | 0 | 1 | 38 | |
| w80 | 45 | 0 | 0 | 0 | 54 | 99 | ||
| w70 | 40 | 0 | 0 | 2 | 392 | 434 | ||
| w50 | 37 | 0 | 0 | 13 | 695 | 745 | ||
Position: residue and position number in poliovirus polymerase; Window: window_size parameter settings (w: window); RdRp e.8.1.4: RNA-dependent RNA polymerase; DsPhage e.8.1.6: dsRNA phage RNA-dependent RNA-polymerase; RT e.8.1.2: Reverse transcriptase; DNA pol1 e.8.1.1: DNA polymerase I; Other not e.8: all other SCOP families.
KEY:
e.8.1.4 RdRp RNA-dependent RNA-polymerase
e.8.1.6 DsPhage dsRNA phage RNA-dependent RNA-polymerase
e.8.1.2 RT Reverse transcriptase
e.8.1.1 DNApol1 DNA polymerase I
Effect of window_size on sequence variability in SCOP-categorized structure fragments at position N297.
| Position: 297, Window: 80, Categories: 33, PDB templates: 399, SCOP_IDs: 320 | |||||||
|---|---|---|---|---|---|---|---|
| ScopID | e.8 | e.8.1.4 | e.8.1.6 | e.8.1.1 | e.8.1.2 | ||
| max LGA | 100.0 | 100.0 | 76.9 | 57.0 | 59.2 | 76.8 | |
| ave LGA | 75.4 | 85.2 | 75.1 | 57.0 | 56.4 | 57.8 | |
| ave ID | 27.3 | 37.3 | 13.4 | 3.7 | 18.6 | 6.2 | |
| matches | 51 | 61 | 40 | 31 | 39 | 35 | |
| AA | 572 | 160 | 85 | 32 | 2 | 41 | 160 |
| A | 1 | 1 | 1 | ||||
| C | 8 | 3 | |||||
| D | 1 | ||||||
| E | 3 | 1 | |||||
| F | 67 | 42 | 1 | 41 1 | 5 | ||
| G | 44 | 32 | 32 2 | ||||
| H | 14 | 5 | 5 3 | 8 | |||
| I | 19 | 9 | |||||
| K | 2 | 2 | |||||
| L | 11 | 5 | |||||
| M | 2 | 2 | |||||
| N | 224 | 80 | 80 4 | ||||
| P | |||||||
| Q | 15 | 14 | |||||
| R | 82 | 49 | |||||
| S | |||||||
| T | 15 | 4 | |||||
| V | 54 | 50 | |||||
| W | |||||||
| Y | 10 | 8 | |||||
| X | |||||||
1F297 (N = 41): reverse transcriptase of HIV
2G297 (N = 32): dsRNA phage RNA-dependent RNA polymerase of phi6
3H297 (N = 5): RdRp of lambda 3
4N297 (N = 80): RdRp proteins from HCV (21), FM (8), lambda3 (5), polio (6), BVDV (3), Norwalk virus (3), rhinovirus (3), rabbit hemorrhagic fever virus (2), IBDV (1)
Taxonomic diversity represented by positional hits detected at position N297.
| window | 50 | 70 | 80 | 90 |
|---|---|---|---|---|
| No. of PDBs | 189 | 198 | 199 | 141 |
| ARCHAEA | + | + | + | |
| BACTERIA | + | + | + | + |
| BIRD | + | |||
| FUNGUS | + | |||
| INSECT | + | + | + | |
| MAMMAL | + | + | + | + |
| PHAGE | + | + | + | |
| VIRUS | + | + | + | + |
| WORM | + | + | ||
| YEAST | + | + | + |