| Literature DB >> 14623076 |
Lisa Yan1, Mikhail Velikanov, Paul Flook, Wenjin Zheng, Sándor Szalma, Scott Kahn.
Abstract
The ability to rapidly and reliably develop hypotheses on the function of newly discovered protein sequences requires systematic and comprehensive analysis. Such an analysis, embodied within the DS GeneAtlas pipeline, has been used to critically evaluate the severe acute respiratory syndrome (SARS) genome with the goal of identifying new potential targets for viral therapeutic intervention. This paper discusses several new functional hypotheses on the roles played by the constituent gene products of SARS, and will serve as an example of how such assignments can be developed or extended on other systems of interest.Entities:
Mesh:
Substances:
Year: 2003 PMID: 14623076 PMCID: PMC7159027 DOI: 10.1016/s0014-5793(03)01115-3
Source DB: PubMed Journal: FEBS Lett ISSN: 0014-5793 Impact factor: 4.124
Protein transcripts of SARS isolates and their genomic coordinates
| BJ01 | Tor2 | CDCP | Start | Stop | Frame | Actual start | Actual stop | AA length |
| 1a | 1a | 1a | 265 | 13 398 | 1 | 265 | 13 413 | 4 382 |
| 1b | 1b | 1b | 13 398 | 21 485 | 3 | 13 398 | 21 485 | 2 695 |
| S | S | S | 21 492 | 25 259 | 3 | 21 492 | 25 259 | 1 255 |
| 1 | 3 | X1 | 25 268 | 26 092 | 2 | 25 268 | 26 092 | 274 |
| 2 | 4 | X2 | 25 689 | 26153 | 3 | 25 689 | 26 153 | 154 |
| E | E | E | 26 117 | 26347 | 2 | 26 117 | 26 347 | 76 |
| M | M | M | 26 398 | 27 063 | 1 | 26 398 | 27 063 | 221 |
| 3 | 7 | X3 | 27 074 | 27 265 | 2 | 27 074 | 27 265 | 63 |
| 4 | 8 | X4 | 27 273 | 27 641 | 3 | 27 273 | 27 641 | 122 |
| N/A | 9 | N/A | 27 638 | 27 772 | 2 | 27 638 | 27 772 | 44 |
| N/A | 10 | N/A | 27 779 | 27 898 | 2 | 27 779 | 27 898 | 39 |
| N/A | 11 | X5 | 27 864 | 28 118 | 3 | 27 864 | 28 118 | 84 |
| N | N | N | 28 120 | 29 388 | 1 | 28 120 | 29 388 | 422 |
| 5 | 13 | N/A | 28 130 | 28 426 | 2 | 28 130 | 28 426 | 98 |
| N/A | 14 | N/A | 28 583 | 28 795 | 2 | 28 583 | 28 795 | 70 |
| N/A | s2m motif | N/A | 29 590 | 29 621 | N/A | 29 590 | 29 621 | N/A |
Figure 1DS GeneAtlas structural and functional annotations (in green), putative protein transcripts (in blue arrow), non‐synonymous SNPs (in red dots) and synonymous SNPs (in yellow dots) mapped to the Tor2 genome sequence. The figure is created using DS Gene software from Accelrys, Inc.
Structural and functional annotations using DS GeneAtlas
| Domain | Methods | Template and function | Scores | Confidence |
| pp1a 3241–3543 | Structure | 1lvo_A | PSI‐BLAST | Consensus score=1.76 |
| cysteine‐like protease (TGEV) | Model score=0.96 | high | ||
| Seq‐ID%=43.9% | ||||
| pp1a 1026–1154 | HMMer/Pfam | Appr‐1″‐p processing enzyme family |
| high |
| Bit score=78.3 | ||||
| Noise cutoff=−21.5 | ||||
| pp1a 1598–1893 | HMMer/Pfam | Peptidase C16 family |
| medium |
| Bit score=−87.5 | ||||
| Noise cutoff=−95.7 | ||||
| pp1b 4780–5334 | Structure | 1gx5 | PSI‐BLAST | Consensus score=0.5 |
| RNA dependent RNA polymerase (HCV) | Model score=−0.10 | medium | ||
| Seq‐ID=10.3% | ||||
| pp1b 4770–5249 | Structure | 1gx5 | PSI‐BLAST | Consensus score=−0.41 |
| RNA dependent RNA polymerase (HCV) | Model score=−0.21 | low | ||
| Seq‐ID%=10.4% | ||||
| pp1b 5512–6066 | Structure | 1pjr_A | PSI‐BLAST | Consensus score=−0.23 |
| DNA helicase | Model score=−0.23 | low | ||
| Seq‐ID%=8.3% | ||||
| pp1b 4747–5219 | HMMer/Pfam | RNA dependent RNA polymerase |
| low |
| Bit score=−194.6 | ||||
| Noise cutoff=−130.1 | ||||
| pp1b 5569–5887 | HMMer/Pfam | Viral (Superfamily 1) RNA helicase |
| low |
| Bit score=−49.6 | ||||
| Noise cutoff=−30.1 | ||||
| pp1b 6815–6998 | HMMer/Pfam | Fts‐J‐like methyltransferase |
| medium |
| Bit score=−51.6 | ||||
| Noise cutoff=−53.1 | ||||
| S protein 910–949 | Structure | 1svf_A viral fusion protein core | PSI‐BLAST | Consensus score=−0.29 |
| Model score=−0.29 | low | |||
| Seq‐ID%=17.5% | ||||
| S protein 631–1255 | HMMer/Pfam | Coronavirus S1 glycoprotein |
| low |
| Bit score=−283.4 | ||||
| Noise cutoff=−273.1 | ||||
| S protein 631–1255 | HMMer/Pfam | Coronavirus S2 glycoprotein |
| high |
| Bit score=450.2 | ||||
| Noise cutoff=−469.8 | ||||
| S protein 777–1231 | HMMer/Pfam | Fusion glycoprotein F0 |
| low |
| Bit score=−276.3 | ||||
| Noise cutoff=−236.1 |
The PDB files of the models listed in this table are provided at the following link with full DS GeneAtlas output of the SARS genome: http://www.accelrys.com/references/supplemental/.
See Section 2 for the definition of the confidence.
Annotation using full sequence of pp1ab as input.
Annotation using cleaved sequence from residues 4231 to 5301 of pp1ab as input.
Figure 2A: Superimposed crystal structures of Mpro (1lvo_A in red) and 3Cpro (1cqq_A in cyan) with ligand AG7088 (in yellow). B: Model structure of SARS Mpro (residues 3241–3543) predicted using DS GeneAtlas. The figure is created using DS Modeling 1.1 software from Accelrys, Inc.
Figure 3A: The sequence alignment between template (1gx5) and the model (Model) of RNA dependent RNA polymerase. The catalytic residues are annotated with carot (in red) and the residues in surface pocket are annotated with underline (in blue). The predicted secondary structure for the model sequence using the DSC method and the secondary structure of the template are also displayed in the alignment (helix in red and strand in blue). The long insertion in the model sequence where the structure is uncertain is colored with yellow background. B: Model structure (in green) of RNA dependent RNA polymerase domain superimposed with template structure 1gx5 (in red). The rGTP ligands from the template are shown in green and the Mn2+ ions are in purple. The catalytic triad is in red and the other ligand binding site is in cyan. The long insertion in the model where the structure is uncertain is colored blue.
Figure 4Non‐synonymous SNPs mapped to helicase structure.