| Literature DB >> 35726315 |
Srinivasulu Yerukala Sathipati1, Ming-Ju Tsai2, Tonia Carter3, Sanjay K Shukla3, Shinn-Ying Ho4.
Abstract
We describe a protocol to identify physicochemical properties using amino acid sequences of spike (S) proteins of SARS-CoV-2. We present an S protein prediction technique named SPIKES, incorporating an inheritable bi-objective combinatorial genetic algorithm to determine the host species specificity. This protocol addresses the S protein amino acid sequence data collection, preprocessing, methodology, and analysis. For complete details on the use and execution of this protocol, please refer to Yerukala Sathipati et al. (2022).Entities:
Keywords: Bioinformatics; Microbiology; Proteomics; Systems biology
Mesh:
Substances:
Year: 2022 PMID: 35726315 PMCID: PMC9127179 DOI: 10.1016/j.xpro.2022.101460
Source DB: PubMed Journal: STAR Protoc ISSN: 2666-1667
The human and animal host spike protein sequences and their available sources
| Spike protein sequence | Example sequence ID | Availability |
|---|---|---|
| Human host | Spike|hCoV-19/Wuhan/WIV04/2019|2019-12-30|EPI_ISL_402124|Original|hCoV-19ˆˆHubei|Human|Wuhan | |
| Animal host | AAY88866.1 spike glycoprotein [Bat SARS coronavirus HKU3-1 |
Figure 1Screenshots showing spike protein data acquisition from databases
(A and B) (A) Displaying SARS-CoV-2 data from GISAID and (B) NCBI databases. An example of amino acid sequence in FASTA format.
Figure 2The steps involved in feature selection algorithm
Informative physicochemical properties obtained using SPIKES
| Aaindex ID | Feature description | |
|---|---|---|
| 1 | RACS820104 | Average relative fractional occurrence in EL(i) ( |
| 2 | ROBB760101 | Information measure for alpha-helix ( |
| 3 | RACS820109 | Average relative fractional occurrence in AL(i-1) ( |
| 4 | GEIM800105 | Beta-strand indices ( |
| 5 | QIAN880137 | Weights for coil at the window position of 4 ( |
| 6 | PRAM820103 | Correlation coefficient in regression analysis ( |
| 7 | JOND920102 | Relative mutability ( |
| 8 | NAKH920103 | Amino acid composition of EXT of single-spanning proteins ( |
| 9 | OOBM850101 | Optimized beta-structure-coil equilibrium constant ( |
| 10 | CHAM830104 | The number of atoms in the side chain labeled 2+1 ( |
| 11 | ROBB760103 | Information measure for middle helix ( |
Figure 3The comparison of physicochemical property (AAindex ID: RACS820104) between spike proteins of human and animal host coronaviruses
The ID RACS820104 represents the average relative fractional occurrence in EL(i).
Figure 4The comparison of amino acid compositions between spike proteins of human and animal host coronaviruses
Figure 5Spike glycoprotein complex
Spike glycoprotein (PDB: 6ACJ, EM 4.2 Angstrom) in complex with ACE2 (green ribbon) showing the amino acid changes that occurred between Rousettus bat coronavirus (GenBank: AOG30822.1) and hCoV/Wuhan/WIV05/2019. The mutations in different strains are shown as colored balls.
Figure 6Secondary structure and surface hydrophobicity of spike protein 6VXX
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Spike protein sequences | Global Initiative on Sharing Avian Influenza Data | |
| Spike protein sequences | National Center for Biotechnology Information | |
| Support vector machine | ||
| Sequence redundancy | ||
| Perl | Perl program | |
| R studio | The R project | |
| Protein structure visualization | UCSF Chimera | |
| Inheritable bi-objective combinatorial genetic algorithm | ||
| SPIKES methods | This paper | |
| This paper | ||