| Literature DB >> 33617253 |
Alibek Kruglikov1, Mohan Rakesh1, Yulong Wei1, Xuhua Xia1,2.
Abstract
Since the outset of COVID-19, the pandemic has prompted immediate global efforts to sequence SARS-CoV-2, and over 450 000 complete genomes have been publicly deposited over the course of 12 months. Despite this, comparative nucleotide and amino acid sequence analyses often fall short in answering key questions in vaccine design. For example, the binding affinity between different ACE2 receptors and SARS-COV-2 spike protein cannot be fully explained by amino acid similarity at ACE2 contact sites because protein structure similarities are not fully reflected by amino acid sequence similarities. To comprehensively compare protein homology, secondary structure (SS) analysis is required. While protein structure is slow and difficult to obtain, SS predictions can be made rapidly, and a well-predicted SS structure may serve as a viable proxy to gain biological insight. Here we review algorithms and information used in predicting protein SS to highlight its potential application in pandemics research. We also showed examples of how SS predictions can be used to compare ACE2 proteins and to evaluate the zoonotic origins of viruses. As computational tools are much faster than wet-lab experiments, these applications can be important for research especially in times when quickly obtained biological insights can help in speeding up response to pandemics.Entities:
Keywords: COVID-19; SARS-CoV-2; protein similarity; secondary structure; spike protein
Year: 2021 PMID: 33617253 PMCID: PMC7927282 DOI: 10.1021/acs.jproteome.0c00734
Source DB: PubMed Journal: J Proteome Res ISSN: 1535-3893 Impact factor: 4.466
Figure 1An overview of PSSP programs and implemented computational algorithms[18−31] developed over the past 50 years.
A Comparison of PSSP Programs by Q3 Accuracy Assessmentsa
| program | TS115 (%) | CASP10 (%) | CASP11 (%) | CASP12 (%) | TS2019 (%) | CB513 (%) |
|---|---|---|---|---|---|---|
| JPRED4[ | 77.1 | 81.6 | 80.4 | 78.8 | 76.6 | 81.7 |
| PSIPRED v4.0[ | 80.2 | 81.2 | 80.7 | 80.5 | 82.3 | 79.2 |
| CNF[ | – | 78.9 | 79.1 | – | – | 78.3 |
| RAPTORX (DeepCNF)[ | 82.3 | 84.4 | 84.7 | 82.1 | – | 82.3 |
| SPIDER3[ | 83.9 | 82.6 | 81.5 | 79.9 | 84.4 | – |
| PORTER5[ | – | – | – | – | 84.5 | – |
| MUFOLD-SS[ | – | 86.5 | 85.2 | 83.4 | 85.9 | 82.7 |
| CRRNN[ | – | 86.1 | 84.2 | 82.6 | – | 87.3 |
| eCRRNN[ | – | 87.8 | 85.9 | 83.7 | – | 87.8 |
Accuracy scores (in percentage) are obtained from the programs’ publication papers and from Yang et al.[32] and Smolarczuk et al.[33]
A Comparison of PSSP Programs by Q8 Accuracy Assessmentsa
| program | CASP10 (%) | CASP11 (%) | CASP12 (%) | TS2019 (%) | CB513 (%) |
|---|---|---|---|---|---|
| CNF[ | 64.8 | 65.1 | – | – | 64.9 |
| RAPTORX (DeepCNF)[ | 71.8 | 72.3 | 69.8 | – | 68.3 |
| PORTER5[ | – | – | – | 73.6 | – |
| MUFOLD-SS[ | 76.5 | 74.5 | 72.1 | 74.9 | 70.6 |
| CRRNN[ | 73.8 | 71.6 | 68.7 | – | 71.4 |
| eCRRNN[ | 76.3 | 73.9 | 70.7 | – | 74.0 |
Accuracy scores (in percentage) are obtained from program publication papers and from Yang et al.[32] and Smolarczuk et al.[33]
Average PSSP Program Accuracies as Measured Using ACE2 and Spike Protein Data from PDBa
| protein set | metric | PORTER5[ | MUFOLD-SS[ | PSIPRED[ | JPRED4[ |
|---|---|---|---|---|---|
| totals (other 2 sets combined) | Q3 | 75.2 | 77.1 | 77.7 | 76.5 |
| Q8 | 62.8 | 64.0 | 61.0 | 60.9 | |
| SOV | 57.6 | 57.8 | 60.3 | 58.3 | |
| hACE2 (1r42:A, 6m0j:A, 6m18:B, 6m1d:B, 6m17:B) | Q3 | 81.2 | 82.0 | 82.0 | 80.5 |
| Q8 | 69.9 | 70.8 | 65.2 | 65.1 | |
| SOV | 71.2 | 67.5 | 72.3 | 69.7 | |
| SARS-2-S S1 (6vxx:A, 6vyb:A, 6m0j:E, 6m17:E) | Q3 | 67.8 | 71.0 | 72.4 | 71.4 |
| Q8 | 54.0 | 55.5 | 55.7 | 55.6 | |
| SOV | 40.6 | 45.8 | 45.4 | 44.0 |
PDB IDs are shown below the set names.
P_distances between hACE2 SS and Mammalian ACE2 SSa
| SS sequence | P_distance |
|---|---|
| NM_001135696_Macaca_mulatta (Macaque) | 0.0286 |
| XM_008988993_Callithrix_jacchus (Marmoset) | 0.0298 |
| GQ999936_Rhinolophus_sinicus (Chinese horseshoe bat) | 0.0335 |
| EF569964_Rhinolophus_pearsonii (Pearson’s horseshoe bat) | 0.0410 |
| AY996037_Cercopithecus_aethiops (African green monkey) | 0.0435 |
| NM_001130513_Mus_musculus (Mouse) | 0.0472 |
| AY881174_Paguma_larvata (Civet) | 0.0472 |
| XM_005074209_Mesocricetus_auratus (Hamster) | 0.0497 |
| NM_001012006_Rattus_norvegicus (Rat) | 0.0509 |
| AB211998_Procyon_lotor (Raccoon) | 0.0547 |
| NM_001310190_Mustela_putorius_furo (Ferret) | 0.0584 |
| EU024940_Nyctereutes_procyonoides (Raccoon dog) | 0.0622 |
| NM_001039456_Felis_catus (Cat) | 0.0634 |
ACE2 SS are predicted by Mufold-SS.[30]
Figure 2Lake94 distances measured at ACE2 aa sequences poorly correlate P_distance measured at ACE2 SS. Sequence distances in mammalian ACE2 are calculated with respect to hACE2, and the 13 species considered are those listed in Table .
Figure 3SS and aa alignments between Rhinolophus sinicus ACE2 and hACE2. Match and mismatch sites are respectively indicated by green and red for aa alignment and by blue and yellow for SS alignment. Notable regions where conservation levels differ between aa and SS alignments are boxed in light red and yellow. Hotspot positions boxed in light blue represent SARS-2-S contacting sites at hACE2.[53,54]