Literature DB >> 29625240

Predicting the receptor-binding domain usage of the coronavirus based on kmer frequency on spike protein.

Zhaozhong Zhu1, Zheng Zhang1, Wenjun Chen1, Zena Cai1, Xingyi Ge1, Haizhen Zhu2, Taijiao Jiang3, Wenjie Tan4, Yousong Peng5.   

Abstract

Entities:  

Mesh:

Substances:

Year:  2018        PMID: 29625240      PMCID: PMC7129160          DOI: 10.1016/j.meegid.2018.03.028

Source DB:  PubMed          Journal:  Infect Genet Evol        ISSN: 1567-1348            Impact factor:   3.342


× No keyword cloud information.
To the Editor, The coronavirus is an enveloped, positive-sense, single-stranded RNA virus. It could be classified into four major genera: Alphacoronavirus, Betacoronavirus, Gammacoronavirus and Deltacoronavirus, based on serological and genetic studies (Li, 2016). The Alphacoronavirus and Betacoronavirus mainly infect mammals, whereas the Gammacoronavirus and Deltacoronavirus mainly infect avians (Tang et al., 2015). The coronavirus poses a serious threat to human health and global security because several coronaviruses could cross-species to infect humans, such as the Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV) and Middle East Respiratory Syndrome Coronavirus (MERS-CoV) (Lu et al., 2015; Peck et al., 2015; Smith, 2006). The SARS-CoV was reported to cause 774 human deaths in 37 countries from 2002 to 2003 (Smith, 2006), while the MERS-CoV is still persistently infecting humans in many countries and has already caused more than 700 deaths around the world (World Health Organization, 2017). How to prevent and control the coronavirus has become a global concern. The genome of the coronavirus generally encodes more than ten proteins (Peck et al., 2015; Yang et al., 2013). Among them, the spike surface envelope glycoprotein is responsible for binding to host receptors and determines the tissue tropism and host range of the virus to a large extent (Li, 2015, Li, 2016; Lu et al., 2015). The spike protein contains an ectodomain, a transmembrane anchor and a short intracellular tail. Among them, the ecotodomain could be cleaved into a receptor-binding S1 subunit and a membrane-fusion S2 subunit during molecular maturation. The S1 subunit binds to a host receptor for entry into the host cell (Li, 2015, Li, 2016; Qian et al., 2015). Depending on the coronavirus species, the spike protein could bind to either protein receptors or glycans (Li, 2016). Multiple receptors were reported for the coronavirus. This is largely attributed to the double receptor-binding domains (RBD) on the S1 subunit: one RBD is located in the N-terminal (denoted as NTD), while the other is located in the C-terminal (denoted as CTD) (Li, 2016). One coronavirus species generally uses one RBD. Some coronaviruses used NTD, for example, the mouse hepatitis virus (MHV) (Peng et al., 2011), while the others used CTD, such as SARS-CoV (Lu et al., 2015) and MERS-CoV (Lu et al., 2015). Previous studies suggest that the usage of two RBDs could facilitate expansion of host range of the virus (Li, 2015, Li, 2016). However, the mechanism under the RBD usage is still obscure. Besides, RBD usage of most coronavirus species is still unknown. Here, we attempted to develop a computational method for determining RBD usage of the coronavirus based on the protein sequence of S1. We firstly manually compiled twelve coronavirus species with RBD usage reported from the literature (Table S1). Four coronavirus species used NTD, including the bovine coronavirus (BCoV), MHV, IBV and the human coronavirus OC43 (HCoV-OC43), while the other eight coronavirus species used CTD, including the human coronavirus 229E (HCoV-229E), feline coronavirus (FCoV), bat coronavirus HKU4 (BatCoV-HKU4), human coronavirus HKU1 (HCoV-HKU1), human coronavirus NL63 (HCoV-NL63), MERS-CoV, SARS-CoV and transmissible gastroenteritis virus (TGEV). The protein sequences of the spike protein S1 subunit of these viruses were collected from the NCBI protein database. For convenience, only 800 amino acids in the N-terminal of each spike protein sequence, which covered the S1 subunit of all coronavirus species, were kept for further analysis (Supplementary Methods). Then, the frequency of kmers (one or two amino acids) was used individually to predict whether a coronavirus used NTD or CTD for binding to the receptor (see Supplementary Methods and Table S2). Most of them achieved a predictive accuracy ranging from 0.6 to 0.8. Surprisingly, we found a pair of amino acids, i.e., “FS”, could discriminate the RBD usage of these 12 coronavirus species with an average predictive accuracy of 97% (Fig. 1A). More specifically, it achieved an accuracy of 100% for BCoV, MHV, HCoV-OC43, BatCoV-HKU4, HCoV-HKU1, HCoV-NL63 and TGEV, and an accuracy of 0.94, 0.87, 0.99, 0.99 and 0.92 for IBV, HCoV-229E, FCoV, MERS-CoV and SARS-CoV, respectively. Analyzing the number of “FS” in the protein sequence of S1 subunit of these viruses, we found that the viruses using NTD generally had less than 3 “FS”s in S1 expect for IBV, while the viruses using CTD generally had 6 or more “FS”s in S1 (Fig. 1A).
Fig. 1

Predicting the RBD usage of the coronavirus based on the number of “FS” in the protein sequence of the spike protein S1 subunit. (A) The distribution of the number of “FS” in S1 and the predictive accuracy based on the number of “FS” in 12 coronavirus species. The coronavirus species using NTD and CTD were colored in blue and red, respectively. The genus each virus belongs to was labeled in the top right of the virus name. (B) and (C) refer to the 3D structure of S1 subunit for MHV and HCoV-NL63, respectively. The receptor-binding interface was inferred manually from the spike-receptor complex (PDB id: 3r4d for MHV and 3kbh for HCoV-NL63). NTD and CTD were colored in cyan and yellow respectively. The “FS”s were colored in blue. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Predicting the RBD usage of the coronavirus based on the number of “FS” in the protein sequence of the spike protein S1 subunit. (A) The distribution of the number of “FS” in S1 and the predictive accuracy based on the number of “FS” in 12 coronavirus species. The coronavirus species using NTD and CTD were colored in blue and red, respectively. The genus each virus belongs to was labeled in the top right of the virus name. (B) and (C) refer to the 3D structure of S1 subunit for MHV and HCoV-NL63, respectively. The receptor-binding interface was inferred manually from the spike-receptor complex (PDB id: 3r4d for MHV and 3kbh for HCoV-NL63). NTD and CTD were colored in cyan and yellow respectively. The “FS”s were colored in blue. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Further analysis of the ratio between the observed and expected number of “FS” in S1 protein of these viruses showed that the “FS” was under-represented in the viruses using NTD (Fig. S1), i.e., the observed number of “FS” in S1 was lower than that of the expected; while for the viruses using CTD, the “FS” was generally over-represented in S1. We next analyzed the location of “FS”s on the 3D structure of S1 protein of the coronavirus. Fig. 1B & C show the 3D structures for S1 protein of MHV and HCoV-NL63 respectively. For most coronavirus species, the “FS”s (colored in blue) were generally scattered around the S1 protein (Fig. 1B & C and Fig. S2). Few of them were located in or near the receptor-binding interface (colored in red), suggesting that “FS” may not contribute directly to the virus-receptor interaction. One exception is the SARS-CoV, for which there was one “FS” in the interface (Fig. S2A). More efforts are needed to clarify how the “FS” influences the RBD usage of the coronavirus. Finally, except for 12 coronavirus species mentioned above, we inferred the RBD usage of all other coronavirus species which had S1 protein sequence available in the NCBI protein database (Table S3), based on the number of “FS” in S1 protein. A total of 31 coronavirus species covering all four major genera were used in prediction. For the virus in Alphacoronavirus, except for the Mink coronavirus 1, all the other coronavirus species were predicted to use CTD; while for other genera, most coronavirus species were predicted to use NTD. Overall, this work provides a simple and effective method for inferring the RBD usage of the coronavirus based on the protein sequence of the spike protein. It may not only help understand the mechanisms behind the RBD usage of the coronavirus, but also help for identification of host receptors for the virus.
  9 in total

1.  Crystal structure of mouse coronavirus receptor-binding domain complexed with its murine receptor.

Authors:  Guiqing Peng; Dawei Sun; Kanagalaghatta R Rajashankar; Zhaohui Qian; Kathryn V Holmes; Fang Li
Journal:  Proc Natl Acad Sci U S A       Date:  2011-06-13       Impact factor: 11.205

Review 2.  Receptor recognition mechanisms of coronaviruses: a decade of structural studies.

Authors:  Fang Li
Journal:  J Virol       Date:  2014-11-26       Impact factor: 5.103

3.  Identification of the Receptor-Binding Domain of the Spike Glycoprotein of Human Betacoronavirus HKU1.

Authors:  Zhaohui Qian; Xiuyuan Ou; Luiz Gustavo Bentim Góes; Christina Osborne; Anna Castano; Kathryn V Holmes; Samuel R Dominguez
Journal:  J Virol       Date:  2015-06-17       Impact factor: 5.103

Review 4.  Coronavirus Host Range Expansion and Middle East Respiratory Syndrome Coronavirus Emergence: Biochemical Mechanisms and Evolutionary Perspectives.

Authors:  Kayla M Peck; Christina L Burch; Mark T Heise; Ralph S Baric
Journal:  Annu Rev Virol       Date:  2015-08-07       Impact factor: 10.431

Review 5.  Structure, Function, and Evolution of Coronavirus Spike Proteins.

Authors:  Fang Li
Journal:  Annu Rev Virol       Date:  2016-08-25       Impact factor: 10.431

6.  The structural and accessory proteins M, ORF 4a, ORF 4b, and ORF 5 of Middle East respiratory syndrome coronavirus (MERS-CoV) are potent interferon antagonists.

Authors:  Yang Yang; Ling Zhang; Heyuan Geng; Yao Deng; Baoying Huang; Yin Guo; Zhengdong Zhao; Wenjie Tan
Journal:  Protein Cell       Date:  2013-12-08       Impact factor: 14.870

7.  Responding to global infectious disease outbreaks: lessons from SARS on the role of risk perception, communication and management.

Authors:  Richard D Smith
Journal:  Soc Sci Med       Date:  2006-09-15       Impact factor: 4.634

Review 8.  Bat-to-human: spike features determining 'host jump' of coronaviruses SARS-CoV, MERS-CoV, and beyond.

Authors:  Guangwen Lu; Qihui Wang; George F Gao
Journal:  Trends Microbiol       Date:  2015-07-21       Impact factor: 17.079

9.  Inferring the hosts of coronavirus using dual statistical models based on nucleotide composition.

Authors:  Qin Tang; Yulong Song; Mijuan Shi; Yingyin Cheng; Wanting Zhang; Xiao-Qin Xia
Journal:  Sci Rep       Date:  2015-11-26       Impact factor: 4.379

  9 in total
  18 in total

Review 1.  Role of inflammatory markers in corona virus disease (COVID-19) patients: A review.

Authors:  Jyoti Upadhyay; Nidhi Tiwari; Mohd N Ansari
Journal:  Exp Biol Med (Maywood)       Date:  2020-07-07

Review 2.  The novel coronavirus Disease-2019 (COVID-19): Mechanism of action, detection and recent therapeutic strategies.

Authors:  Elahe Seyed Hosseini; Narjes Riahi Kashani; Hossein Nikzad; Javid Azadbakht; Hassan Hassani Bafrani; Hamed Haddad Kashani
Journal:  Virology       Date:  2020-09-24       Impact factor: 3.616

3.  Kinetics of SARS-CoV-2 specific IgM and IgG responses in COVID-19 patients.

Authors:  Baoqing Sun; Ying Feng; Xiaoneng Mo; Peiyan Zheng; Qian Wang; Pingchao Li; Ping Peng; Xiaoqing Liu; Zhilong Chen; Huimin Huang; Fan Zhang; Wenting Luo; Xuefeng Niu; Peiyu Hu; Longyu Wang; Hui Peng; Zhifeng Huang; Liqiang Feng; Feng Li; Fuchun Zhang; Fang Li; Nanshan Zhong; Ling Chen
Journal:  Emerg Microbes Infect       Date:  2020-12       Impact factor: 7.163

Review 4.  Fundamental and Advanced Therapies, Vaccine Development against SARS-CoV-2.

Authors:  Nikola Hudakova; Simona Hricikova; Amod Kulkarni; Mangesh Bhide; Eva Kontsekova; Dasa Cizkova
Journal:  Pathogens       Date:  2021-05-21

5.  Attacking COVID-19 Progression Using Multi-Drug Therapy for Synergetic Target Engagement.

Authors:  Mathew A Coban; Juliet Morrison; Sushila Maharjan; David Hyram Hernandez Medina; Wanlu Li; Yu Shrike Zhang; William D Freeman; Evette S Radisky; Karine G Le Roch; Carla M Weisend; Hideki Ebihara; Thomas R Caulfield
Journal:  Biomolecules       Date:  2021-05-23

Review 6.  An Introduction to SARS Coronavirus 2; Comparative Analysis with MERS and SARS Coronaviruses: A Brief Review.

Authors:  Mahsa Taherizadeh; Alireza Tabibzadeh; Mahshid Panahi; Fahimeh Safarnezhad Tameshkel; Mahsa Golahdooz; Mohammad Hadi Karbalaie Niya
Journal:  Iran J Public Health       Date:  2020-10       Impact factor: 1.429

Review 7.  COVID-19 challenges and its therapeutics.

Authors:  Sabi Ur Rehman; Shaheed Ur Rehman; Hye Hyun Yoo
Journal:  Biomed Pharmacother       Date:  2021-08-05       Impact factor: 6.529

Review 8.  Molecular diagnosis of COVID-19: Current situation and trend in China (Review).

Authors:  Ning Li; Pengtao Wang; Xinyue Wang; Chenhao Geng; Jiale Chen; Yanhua Gong
Journal:  Exp Ther Med       Date:  2020-08-25       Impact factor: 2.447

Review 9.  SARS-CoV-2/COVID-19 and advances in developing potential therapeutics and vaccines to counter this emerging pandemic.

Authors:  Ali A Rabaan; Shamsah H Al-Ahmed; Ranjit Sah; Ruchi Tiwari; Mohd Iqbal Yatoo; Shailesh Kumar Patel; Mamta Pathak; Yashpal Singh Malik; Kuldeep Dhama; Karam Pal Singh; D Katterine Bonilla-Aldana; Shafiul Haque; Dayron F Martinez-Pulgarin; Alfonso J Rodriguez-Morales; Hakan Leblebicioglu
Journal:  Ann Clin Microbiol Antimicrob       Date:  2020-09-02       Impact factor: 3.944

Review 10.  The genetic sequence, origin, and diagnosis of SARS-CoV-2.

Authors:  Huihui Wang; Xuemei Li; Tao Li; Shubing Zhang; Lianzi Wang; Xian Wu; Jiaqing Liu
Journal:  Eur J Clin Microbiol Infect Dis       Date:  2020-04-24       Impact factor: 3.267

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.