Literature DB >> 34900126

Pandemic strategies with computational and structural biology against COVID-19: A retrospective.

Ching-Hsuan Liu^1,2, Cheng-Hua Lu¹, Liang-Tzung Lin^1,3.

Abstract

The emergence of the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), which is the etiologic agent of the coronavirus disease 2019 (COVID-19) pandemic, has dominated all aspects of life since of 2020. Research studies on the virus and exploration of therapeutic and preventive strategies has been moving at rapid rates to control the pandemic. In the field of bioinformatics or computational and structural biology, recent research strategies have used multiple disciplines to compile large datasets to uncover statistical correlations and significance, visualize and model proteins, perform molecular dynamics simulations, and employ the help of artificial intelligence and machine learning to harness computational processing power to further the research on COVID-19, including drug screening, drug design, vaccine development, prognosis prediction, and outbreak prediction. These recent developments should help us better understand the viral disease and develop the much-needed therapies and strategies for the management of COVID-19.

Entities: Chemical

Keywords: Artificial intelligence; COVID-19; Disease prediction; Drug design; Drug screening; Machine learning; SARS-CoV-2; Vaccine development

Year: 2021 PMID： 34900126 PMCID： PMC8650801 DOI： 10.1016/j.csbj.2021.11.040

Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN： 2001-0370 Impact factor: 6.155

Introduction

The coronavirus disease 2019 (COVID-19) is caused by the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), which is thought to have a zoonotic origin and started as an outbreak in Wuhan, China, that later turned into a global pandemic. As of July 2021, the virus has led to>184 million people infected and over 3.9 million deaths globally, despite periodic controls [1]. This number is way higher than a combined death toll from the previous coronavirus outbreaks of the 2002 SARS and 2016 MERS (Middle East Respiratory Syndrome) of less than 2000 [2], [3]. SARS-CoV-2 is a single-stranded, enveloped virus that possesses a positive-sense RNA genome of roughly 29.9 kb in length, encoding the spike (S), envelope (E), membrane (M), and nucleocapsid (N) structural proteins and multiple other nonstructural proteins [4], [5], [6]. The virus’ life cycle initiates from binding to host angiotensin converting enzyme 2 (ACE2) by S protein, followed by viral genome release into cytoplasm. Precursor polyprotein is further auto-cleaved into various structural and non-structural proteins via papain-like protease (nsp3) and main protease (Mpro or 3CLpro; nsp5). Viral assembly takes place in ER-Golgi intermediate compartment, and nascent virions are released after S protein glycosylation in Golgi apparatus [7]. Patients with COVID-19 usually experience fever, cough, and dyspnea; however, some patients may be asymptomatic, while some others may develop fulminant disease and require intensive care [8]. Significant research efforts have been invested in COVID-19 research, which generated several vaccine and drug candidates [9], [10]. However, full immunization coverage and therapeutic efficacy evaluation in real-world situation remains an issue, which necessitates the continuous development of optimal therapeutic and prophylactic strategies for better management of COVID-19. Such development can be accelerated using bioinformatics, which has been rapidly evolving in recent years and is capable of tackling issues at a scale that previously would not have been feasible. This includes computational and structural biology, which is a relatively new frontier, but has the ability to ‘decode’ pathogens and hosts based on their genomic sequences, thus allowing researchers to predict and accelerate their understanding of the pathogen and also explore various strategies to help curb its spread. This aspect is critical to public health and has since garnered importance over the last decades with increasing number of emerging and re-emerging viral infections (such as influenza, Ebola, and Zika). The technology is further made powerful with the fast-paced development of computational technology, particularly in artificial intelligence (AI) and machine learning, that now has begun to see increasing applications in biology, medicine, and public health, and revolutionizing the way we approach a disease. Recently, it has been widely used for drug screening, vaccine/drug design and prediction of disease to tackle the COVID-19 pandemic. In this review, we summarize how computational and structural biology and AI platforms have been applied in the current pandemic.

Virtual drug screening

Identification of novel drugs

When the COVID-19 pandemic hit, one of the biggest concerns was finding an active antiviral. Structure-based in silico screening allows screening libraries of pharmacologically active compounds with documented activities to confer insight on how they may dictate interactions with host or viral proteins [11], [12]. Recently, computational models using molecular docking screening followed by absorption, distribution, metabolism, excretion, and toxicity (ADMET) analysis and molecular dynamics simulations have been widely utilized to identify compounds that potentially target SARS-CoV-2 proteins. Compounds identified include potential SARS-CoV-2 S receptor-binding domain (RBD)-specific terpenes NPACT01552, NPACT01557, and NPACT00631 [13], Mpro inhibitors tinosponone [14], ChEMBL275592, montelukast, ChEMBL288347 [15], quercetin-3-O-rhamnoside [16], and biflavone amentoflavone [17], RNA-dependent RNA polymerase (RdRp) inhibitors Galidesivir and the two drug-like compounds CID123624208 and CID11687749 [18]. Such method could also be utilized for high-throughput screening. A study screening plant secondary metabolites suggested flavonoid glycosides, biflavonoids, ellagitannins, anthocyanidins, and triterpenes to be potential TMPRSS2, SARS-CoV-2 S, Mpro and RdRp inhibitors [19]. Of note, one of the top-ranked triterpenoid saponins glycyrrhizic acid (glycyrrhizin) has demonstrated antiviral activities against SARS-CoV [20] and SARS-CoV-2 [21] in vitro and are being evaluated in clinical trials [22]. Another study incorporated molecular docking with machine learning to further expedite the screening procedure and identified six potential Mpro inhibitors from over 2000 natural compounds [23].

Drug repurposing

Computational approaches such as network-based or expression-based algorithms and docking simulations have also been widely applied during the pandemic to identify candidates for drug repurposing [24], [25]. Incorporation of these methods with AI platforms may facilitate more efficient large-scale screening, and in vitro validation may further improve the platforms’ accuracy. For instance, Ke et al. constructed a deep neural network (DNN) platform to screen thousands of previously identified antivirals against SARS-CoV, influenza virus, and human immunodeficiency virus (HIV) or known 3CL pro inhibitors. The predicted drugs were then verified in vitro with a similar feline coronavirus, feline infectious peritonitis (FIP) virus, and reconfigured into the AI algorithm to refine future predictions [26]. Aside from antivirals, due to COVID-19 induced inflammatory response, databases were screened to locate clinical drugs with anti-inflammatory capabilities. For example, the Janus kinase (JAK) inhibitor baricitinib was predicted to be useful by BenevolentAI, a platform that combines Monte Carlo tree search (MCTS), neural networks, and symbolic AI [27], and was further verified for its anti-inflammatory and antiviral activities in vitro and in a small group of COVID-19 patients [28] with bigger clinical trials underway. AI can also be used to analyze how combinations of certain approved drugs affect their efficacy. IDentif.AI, a platform based on orthogonal array composite design (OACD), was utilized to identify a triple-drug combination of remdesivir, ritonavir, and lopinavir that increased antiviral efficacy by 6.5-fold compared to remdesivir alone in vitro [29]. While validation needs to be done in vivo, the applications of AI to predict synergistic effects can provide new platforms of developing treatment modalities.

Identification of druggable targets

Interestingly, computational analyses can be further adapted for identifying novel drug targets, such as host factors, in curbing the viral infection. For example, Gordon et al. established a high-throughput method to analyzed protein–protein interaction (PPI) between 26 SARS-CoV-2 viral proteins and host proteins that physically interact with them. Host factors extracted from PPIs of viral and human proteins will function as druggable targets for identifying candidates from approved, clinical, and preclinical drugs [30]. On the other hand, Riva et al. performed an in vitro high-throughput antiviral screening of more than 11000 compounds from the ReFRAME drug-repurposing library and evaluated the results with gene set enrichment analysis (GSEA) to determine drug targets and select compounds for further antiviral verification [31].

Drug design

Small molecules

Besides drug screening, computational analysis is also a powerful tool for designing small molecules or peptides targeting viral proteins. For example, Zhang et al. optimized α-ketoamide class Mpro inhibitors with additional functional groups by applying x-ray crystallography and molecular docking and validated with in vitro inhibition assay to determine the best candidates [32]. Similar approach is applied by Dai et al. to design novel Mpro inhibitors with a specific backbone [33]. Apart from small molecules, peptide-based inhibitors were developed to target viral proteins as well. A common strategy is to utilize the structure of human ACE2 and SARS-CoV-2 S RBD complex to design peptide inhibitors that contain critical ACE2 residues and are able to bind to the RBD, thereby blocking its interaction with ACE2 on host cells [34], [35], [36], [37].

Neutralizing antibodies

Another strategy often considered to tackle the disease is the identification and characterization of neutralizing antibodies. Incorporating in vitro neutralization assays and cryo-EM, several studies were able to identify neutralizing antibodies from convalescent plasma and reconstitute their antibody-S complexes for structural analyses [38], [39], [40], [41], [42], [43]. Knowing the structure of antibody-S complexes and critical residues for effective neutralization, Luan et al. were able to establish an automated workflow, using molecular docking simulation and free energy perturbation (FEP) method, to perform in silico mutagenesis and identify potential mutations that enhanced binding of neutralizing antibody to SARS-CoV-2 S [44]. Similarly, in a preprint, Boorla et al. were able to analyze solvent-exposed residues on the RBD and design potential antibody variable regions (Fv) with neutralizing properties [45].

Vaccine development strategies

While there were more than a few hundred vaccine candidates that started, only a handful of candidates have emerged as frontrunners [46], [47]. For this reason, reverse-vaccinology can be utilized to identify epitopes or immunogenic regions on SARS-CoV-2 proteins that can be targeted for vaccine design to reduce costs and provide another layer of verification to speed up vaccine research. To this end, immune-informatics approaches were applied to identify immunogenic T cell and B cell epitopes from SARS-CoV-2 viral proteins [48], [49]. Selected epitopes were reconstructed in silico and analyzed for their antigenicity, allergenicity, toxicity, physicochemical properties, and their binding stability to toll-like receptors (TLR), to identify the best vaccine constructs. An immune simulation was further carried out to predict the humoral and cellular immune responses after administering the vaccine candidates. AI and machine learning algorithms have also been developed to expedite reverse vaccinology. Software such as NEC Immune Profiler [50], the newly developed neural network-based ArdImmune Rank model [51], and the eXtreme Gradient Boosting (XGBoost)-based Vaxign-ML model [52], [53] have been used to identify immunogenic epitopes from the SARS-CoV-2 proteome. These approaches may provide significant help in designing multi-epitope chimeric vaccines with theoretically higher immunogenicity and assist the design of further biological experiments to examine the candidates.

Disease prediction

Aside from therapeutic development, machine learning has been widely explored to predict the severity or mortality of COVID-19. Proteomics and biochemical profile of blood and urine samples from patients with or without COVID-19, and with different severity and outcomes, were analyzed to determine prognostic biomarker combinations. Various machine learning models, such as regression analyses, XGBoost, random forest, Bayesian network, and support-vector machines (SVMs), have been used to select parameters that may predict mortality [54], [55], in-hospital mortality [56], [57], [58], [59], [60], and disease severity [61], [62]. A summary of the aforementioned examples and their evaluation matrices are listed in Table 1. Furthermore, lung lesion characterized by chest computed tomography (CT) scans were also proposed to predict disease progression [63], [64], [65]. An algorithm combining the imaging, clinical and biological attributes has been further constructed based on deep convolutional neural networks to generate a holistic forecast model, which has an area under curve (AUC) of 0.86 and 0.76 for predicting short-term and long-term mortality, respectively [66]. In addition, it is known that several variants of concern are reported to cause higher fatality rates [67], [68], [69], and it has been observed that the addition of viral clade or genetic information to demographic parameters (e.g. age and sex) could improve prediction model performance for severe outcomes [70], [71]. Nonetheless, increasing evidence are suggesting that, like other prediction models, external validation of these machine learning-based models is extremely crucial and should be performed prior to adopting them in clinical practice [72], [73].

Table 1

Summary of machine learning models developed for disease prediction.

Readout	Parameters	Algorithm	Sensitivity (Recall)	Specificity	Precision (PPV)	F1-score	Accuracy	AUROC	Test Cohort	Ref.
Mortality	33 clinical parameters	Random forest	85.71 %	92.45%	–	–	89.47%	0.921	No	[54]
Mortality	45 proteins	Bayesian network	92.68%	86%	–	–	89.01%	0.953	No	[54]
Mortality	CRP, BUN, serum calcium, serum albumin, lactic acid	SVM	91%	91%	62.5%	–	–	0.93	No	[55]
In-hospital mortality	Age, lymphocyte, D-dimer, CRP, creatinine (ALDCC)	Logistic regression	0.91 ± 0.03	0.78 ± 0.04	0.92 ± 0.03	0.92 ± 0.03	0.91 ± 0.03	0.992	Yes	[56]
In-hospital mortality	Age, hs-CRP, lymphocyte, d-dimer	Logistic regression	0.839	0.794	–	–	–	0.881	Yes	[57]
In-hospital mortality	LDH, neutrophils, lymphocyte, hs-CRP, age (LNLCA)	Logistic regression	92 ± 2.6%	92 ± 3%	–	–	–	0.991	Yes	[58]
In-hospital mortality	PTA, urea, WBC, IL-2r, indirect bilirubin, myoglobin, FgDP	LASSO logistic regression	98%	91%	–	–	–	0.997	No	[59]
In-hospital mortality	Disease severity, age, hs-CRP, LDH, ferritin, IL-10	Simple-tree XGBoost	>85%	–	>90%	>0.90	>0.90	1.000	Yes	[60]
Disease severity	28 blood and urine parameters	SVM	–	–	–	–	0.8148	–	Yes	[61]
Disease severity	Different biomarker combinations	Penalized logistic regression	>82%	>71%	>87%	–	>85%	–	Yes	[62]

BUN, blood urea nitrogen; CRP, c-reactive protein; FgDP, fibrinogen degradation products; hs-CRP, high-sensitivity C-reactive protein; IL-2r, interleukin-2 receptor; IL-10, interleukin-10; LASSO, least absolute shrinkage and selection operator; LDH, lactate dehydrogenase; MCHC, mean corpuscular hemoglobin concentration; PPV, positive predictive value; PTA, prothrombin; SVM, support vector machine; WBC, white blood cell activity; XGBoost, eXtreme Gradient Boosting.

Summary of machine learning models developed for disease prediction. BUN, blood urea nitrogen; CRP, c-reactive protein; FgDP, fibrinogen degradation products; hs-CRP, high-sensitivity C-reactive protein; IL-2r, interleukin-2 receptor; IL-10, interleukin-10; LASSO, least absolute shrinkage and selection operator; LDH, lactate dehydrogenase; MCHC, mean corpuscular hemoglobin concentration; PPV, positive predictive value; PTA, prothrombin; SVM, support vector machine; WBC, white blood cell activity; XGBoost, eXtreme Gradient Boosting. At present, it remains challenging to computationally predict the emergence of future clinically significant SARS-CoV-2 variants, but a couple of approaches have been developed to model the interaction between newly identified SARS-CoV-2 variants and their host and predict their infectivity. A computational pipeline “SpikePro”, consisting of three-step in silico mutagenesis experiments, calculates the stability of mutant spike protein, the binding affinity between mutant spike and human ACE2, and the binding affinity between mutant spike and neutralizing antibodies to predict viral fitness [74]. Another recently published work also established a neural network model that could predict binding affinity changes of spike mutations to human ACE2 [75]. Such tools may be helpful in screening emerging mutants/variants that are better adapted to humans and are potentially more infective.

Outbreak prediction

Finally, to better control the pandemic, machine learning has been investigated as a tool to predict the epidemic curve of COVID-19. Various algorithms such as long short-term memory (LSTM) network [76], [77], [78], [79], Grey Wolf Optimizer (GWO)-LSTM hybrid model [80], autoregressive integrated moving average (ARIMA) [81], [82], [83], XGboost [84], support vector regression (SVR) [85], [86], and genetic programming [87] were explored for their ability to forecast confirmed cases, recovered cases, and death in some of the most affected countries. With publicly available statistics, these models may be helpful in predicting COVID-19 transmission and may facilitate policy-making to prevent new outbreaks.

Discussion and perspectives

In summary, computational and structural biology with AI assistance has emerged as a new tool to tackle COVID-19 in prognosis and management of the disease (Fig. 1). However, due to the fact that these simulation models serve to provide candidates for preliminary selection, it is crucial that predictions generated from computational approaches be verified with biological confirmation, and to take into account the complex biological reactions [88]. Consequently, the accuracy of the computational models cannot be asserted merely based on simulation models. The representativeness of datasets utilized should also be carefully examined, since most of the prediction models rely on established databases or cohorts, and selection bias may magnify between different studies. Hence, it is vital to incorporate multiple datasets with diverse background to minimize the impact of selection bias. Nonetheless, results obtained from validation tests could be further used to optimize initial prediction models, thereby showing how AI can be utilized at multiple steps in various aspects. Indeed, computational research applied in the biology domain has emerged as a powerful technology to provide us with potential robust and efficient solutions in tackling challenging diseases including the COVID-19 pandemic. The rapid processing with AI may be especially beneficial when different variants are emerging worldwide, resulting in more cases, fatality, and decreased vaccine protection [89]. Disease prediction models may also become useful for identifying potential patients who are prone to post-acute symptoms or complications [90]. With accumulated experiences, the inclusion of AI-assisted computational and structural biology will likely continuously be refined and become a norm and critical parameter in future preparedness and rapid management of viral outbreaks and pandemic diseases.

Fig. 1

Applications of computational and structural biology and artificial intelligence (AI) in the COVID-19 pandemic. Created with BioRender.com

CRediT authorship contribution statement

Ching-Hsuan Liu: Conceptualization, Investigation, Writing – original draft, Writing – review & editing. Cheng-Hua Lu: Investigation, Writing – original draft. Liang-Tzung Lin: Conceptualization, Writing – original draft, Writing – review & editing, Supervision, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

2 in total

Review 1. Machine learning applications for COVID-19 outbreak management.

Authors: Arash Heidari; Nima Jafari Navimipour; Mehmet Unal; Shiva Toumaj
Journal: Neural Comput Appl Date: 2022-06-10 Impact factor: 5.102

2. Comparison of Transcriptomic Signatures between Monkeypox-Infected Monkey and Human Cell Lines.

Authors: Do Thi Minh Xuan; I-Jeng Yeh; Chung-Che Wu; Che-Yu Su; Hsin-Liang Liu; Chung-Chieh Chiao; Su-Chi Ku; Jia-Zhen Jiang; Zhengda Sun; Hoang Dang Khoa Ta; Gangga Anuraga; Chih-Yang Wang; Meng-Chi Yen
Journal: J Immunol Res Date: 2022-09-01 Impact factor: 4.493

2 in total