| Literature DB >> 36052345 |
David D Holcomb1, Katarzyna I Jankowska2, Nancy Hernandez2, Kyle Laurie2, Jacob Kames2, Nobuko Hamasaki-Katagiri2, Anton A Komar3, Michael DiCuccio4, Chava Kimchi-Sarfaty5.
Abstract
Here, we describe a bioinformatics pipeline that evaluates the interactions between coagulation-related proteins and genetic variants with SARS-CoV-2 proteins. This pipeline searches for host proteins that may bind to viral protein and identifies and scores the protein genetic variants to predict the disease pathogenesis in specific subpopulations. Additionally, it is able to find structurally similar motifs and identify potential binding sites within the host-viral protein complexes to unveil viral impact on regulated biological processes and/or host-protein impact on viral invasion or reproduction. For complete details on the use and execution of this protocol, please refer to Holcomb et al. (2021).Entities:
Keywords: Bioinformatics; Computer sciences; Genetics; Health sciences; Molecular biology
Mesh:
Substances:
Year: 2022 PMID: 36052345 PMCID: PMC9345850 DOI: 10.1016/j.xpro.2022.101648
Source DB: PubMed Journal: STAR Protoc ISSN: 2666-1667
Figure 1A view of the AmiGO interface
(A) A typical search bar is shown in (A) where a user can search for a gene ontology term or a gene.
(B) The results for the term “coagulation” are shown in (B). A user may query any desired protein function or biological process.
Figure 2The first section of the analyze_all_variants.ipynb
This region contains the primary variables you should have to change to run this code. This code is available at the FDA GitHub repository listed in the “Data and code availability” section.
Figure 3View of the menus of an analyze_all_variants.ipynb notebook
Assuming everything is set correctly, only the circled yellow button needs to be clicked to run the entire pipeline.
Figure 4A view of the NCBI Protein database interface
This is the result of a query of “SARS-CoV-2 ORF7a”. Several resulting proteins’ sequences are shown, as well as multiple possible filters (circled in red) to apply to results. Link: https://www.ncbi.nlm.nih.gov/protein/.
Figure 5A view of the I-TASSER webserver interface
An example sequence and protein ID have been entered. This website requires a user email and password, but optionally allows for more constraints. We don’t use these constraints.
Figure 6View of structurally aligned proteins in Pymol
The proteins include SARS-CoV-2 ORF7a protein, and two human proteins with significant structural similarity. Additionally, the Pymol console is shown (at top), and the list of proteins is shown at the right along with additional Pymol options.
Figure 7View of the menus of Pymol
Yellow indicates the visualization options, red indicates standard menus, and purple shows the protein sequences. A docked structure consisting of two proteins is shown. The two chains are colored differently as described in step 9, part i.
Figure 8Visualization of the scored synonymous variants in coagulation-involved human proteins that interact with SARS-CoV-2 proteins (the output of the step 4)
This shows the identifier of the variant (with the gene location) as the row name. Primarily minor allele frequencies for different populations from GnomAD are shown. Not all columns are shown here.
Figure 9Visualization of the filtered and processed GWAS data of coagulation-involved human proteins that interact with SARS-CoV-2 proteins (the output of the step 5)
This shows the location of the variant, and effect size of the variant on severe COVID-19 status. Additional columns are now shown.
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| BIOGRID 4.4.209 | ( | RRID: SCR_007393 |
| AmiGO | ( | RRID: SCR_002143 |
| COVID-19 HGI Release 6 | ( | RRID: SCR_022272 |
| dbSNP | NCBI ( | RRID: SCR_002338 |
| ClinVar | NCBI ( | RRID: SCR_006169 |
| Google Scholar | RRID: SCR_008878 | |
| Entrez | NCBI ( | RRID: SCR_016640 |
| Protein Data Bank | Research Collaboratory for Structural Bioinformatics ( | RRID: SCR_012820 |
| Codon and Codon-Pair Usage Tables (September 2021) (CoCoPUTs) | ( | RRID: SCR_018504 |
| ESEfinder | ( | RRID: SCR_007088 |
| FAS ESS | ( | RRID: SCR_022517 |
| ExonScan | ( | RRID: SCR_022516 |
| I-TASSER | ( | RRID: SCR_014627 |
| Dali | ( | RRID: SCR_013433 |
| ZDock | ( | RRID: SCR_022518 |
| Python 3.7.12 | ( | RRID: SCR_008394 |
| Biopython 1.69 | ( | RRID: SCR_007173 |
| BLAST, specifically through BioPython and Entrez | ( | RRID: SCR_004870 |
| Clustal Omega 1.2.4 | ( | RRID: SCR_001591 |
| NetNGlyc 1.0 | ( | RRID: SCR_001570 |
| NetOGlyc 3.1 | ( | RRID: SCR_009026 |
| NetPhos 3.1b | ( | RRID: SCR_017975 |
| NetSurfP 2.0 | ( | RRID: SCR_018781 |
| Coarse-grained Co-translational Folding Analysis and Rare Codon Enrichment | ( | RRID: SCR_022271 |
| %MinMax | ( | RRID: SCR_022268 |
| Ensembl Variant Effect Predictor | ( | RRID: SCR_007931 |
| AL2CO conservation scores | ( | RRID: SCR_022267 |
| RNAfold 2.4.10 | ( | RRID: SCR_008550 |
| NUPACK 3.0.6 | ( | RRID: SCR_022274 |
| KineFold | ( | RRID: SCR_022273 |
| ESRseq hexamer score | ( | RRID: SCR_022270 |
| HEXplorer score (ZEI and ZWS) | ( | RRID: SCR_022269 |
| Rosetta 3.10 | ( | RRID: SCR_015701 |
| Pymol 1.8.4.0 | ( | RRID: SCR_000305 |
| Dell Precision 7730 laptop with Intel Core i7-8850H, 32 GB memory, 500 GB solid state drive | Dell | |