Literature DB >> 33551661

Computational drug screening against the SARS-CoV-2 Saudi Arabia isolates through a multiple-sequence alignment approach.

Pooi Ling Mok^1,2,3, Avin Ee-Hwan Koh², Aisha Farhana¹, Abdullah Alsrhani¹, Mohammad Khursheed Alam⁴, Subbiah Suresh Kumar^3,5,6,7.

Abstract

COVID-19 is a rapidly emerging infectious disease caused by the SARS-CoV-2 virus currently spreading throughout the world. To date, there are no specific drugs formulated for it, and researchers around the globe are racing against the clock to investigate potential drug candidates. The repurposing of existing drugs in the market represents an effective and economical strategy commonly utilized in such investigations. In this study, we used a multiple-sequence alignment approach for preliminary screening of commercially-available drugs on SARS-CoV sequences from the Kingdom of Saudi Arabia (KSA) isolates. The viral genomic sequences from KSA isolates were obtained from GISAID, an open access repository housing a wide variety of epidemic and pandemic virus data. A phylogenetic analysis of the present 164 sequences from the KSA provinces was carried out using the MEGA X software, which displayed high similarity (around 98%). The sequence was then analyzed using the VIGOR4 genome annotator to construct its genomic structure. Screening of existing drugs was carried out by mining data based on viral gene expressions from the ZINC database. A total of 73 hits were generated. The viral target orthologs were mapped to the SARS-CoV-2 KSA isolate sequence by multiple sequence alignment using CLUSTAL OMEGA, and a list of 29 orthologs with purchasable drug information was generated. The results showed that the SARS CoV replicase polyprotein 1a had the highest sequence similarity at 79.91%. Through ZINC data mining, tanshinones were found to have high binding affinities to this target. These compounds could be ideal candidates for SARS-CoV-2. Other matches ranged between 27 and 52%. The results of this study would serve as a significant endeavor towards drug discovery that would increase our chances of finding an effective treatment or prevention against COVID19.

Entities: Chemical Disease Species

Keywords: COVID-19; Coronavirus; Multiple sequence alignment; Saudi Arabia; Tanshinones

Year: 2021 PMID： 33551661 PMCID： PMC7845492 DOI： 10.1016/j.sjbs.2021.01.051

Source DB: PubMed Journal: Saudi J Biol Sci ISSN： 2213-7106 Impact factor: 4.219

Introduction

The COVID-19 disease initially began as a small outbreak that was first reported in Wuhan, Hubei, China, towards the end of the year 2019. Although the source of the first possible outbreak could not be pinpointed, it began spreading throughout the world. By January 2020, COVID-19 was announced as a global health emergency by the World Health Organization (WHO), a status that is shared by the likes of a few, namely the H1N1 swine flu (2009), Polio (2014), and Ebola (2014) outbreaks. Countries in the middle-east have repeatedly reported infectious disease outbreaks since the last decade. In 2012, the first presumed case of MERS-CoV, a relative to the current SARS-CoV-2 virus, was reported in Saudi Arabia (Chan et al., 2012). The patient developed pneumonia followed by renal failure. The virus, known as HCoV-EMC/2012 at the time, was categorized as a betacoronavirus. Later on, this infection resulted in the second coronavirus (CoV) epidemic (Kandeel et al., 2020, van Boheemen et al., 2012). SARS-CoV-2 is the newest member under this subgenus and was found to share about 50% similarity with MERS-CoV and about 79% with SARS CoV (Pal et al., 2020). During the early phases of the COVID-19 epidemic, three initial spreading or seeding patterns were observed based on population movements into and within Saudi Arabia (Memish et al., 2020). Transmission among international pilgrims heading towards the holy sites of Mecca and Medina accounted for the first infection pattern (Ebrahim and Memish, 2020, Memish et al., 2020). The second pattern involved returning Saudi shiite pilgrims from the eastern province. Finally, the third pattern, which involves general travelers moving in and out of Saudi Arabia (Ebrahim and Memish, 2020, Memish et al., 2020). As of September 2020, Saudi Arabia has reported more than 300,000 cases and almost 4000 deaths (Worldometer, 2020). Proactive and intensive responses, especially during the early stages of COVID-19, as well as a previous history in handling MERS-CoV, has primarily helped Saudi Arabia to manage local outbreaks (Algaissi et al., 2020, Barry et al., 2020, Obied et al., 2020). Besides, there have been ongoing efforts in the country to curb viral transmissions, such as the study of the popular hydroxychloroquine and Ivermectin as a candidate drug treatment (Kelleni, 2020, Meo et al., 2020). At present, pharmaceutical companies are constantly devising new drugs in the laboratory before proceeding with clinical trials. However, such drug studies require costly and lengthy endeavors before new compounds can be brought over from bench to bedside. Not to mention, more than half of these investigational studies fail to pass phase-3 of clinical trials, resulting in huge losses (Fogel, 2018, Hwang et al., 2016). This process can be expedited efficiently in an economical way through computational drug discovery, which utilizes bioinformatics processes and data mining on already available huge datasets to identify new drug targets and screen existing drugs for pharmaceutical research (Bharath et al., 2020, Ou-Yang et al., 2012). This facilitates the repurposing of de-risked drugs and shortens development timelines immensely, which make it a highly attractive approach, especially during a sudden viral outbreak (Pushpakom et al., 2018). Hence, many paid and open-access software have been developed over the years for this purpose. A comprehensive list of databases for computer-aided drug design and screening can be found on click2drug.org, a directory under the ExPASy bioinformatics resource portal (Artimo et al., 2012). In this study, we employed a multiple-sequence alignment approach in identifying potential drug candidates for COVID-19 (March-Vila et al., 2017), with an emphasis on isolates obtained from KSA. The genomic sequence of SARS-CoV-2 isolates from KSA (SARS-CoV-2-SA) were obtained from the GISAID database (gisaid.org) (Shu and McCauley, 2017). At the time of this study, 164 sequences were uploaded from KSA, including the province of Jeddah, Madinah, Makkah, and Riyadh. Phylogenetic analysis was carried out on the isolates using MEGA X. The VIGOR4 annotator was then used to construct a genomic structure using the SARS-CoV-2 sequence. The ZINC database was mined to obtain a list of potential compounds against the therapeutic targets of SARS-CoV-2. It is a free resource available at zinc.docking.org to provide researchers with free access to ligand discovery, annotated compounds, purchasable drug information, and more (Irwin et al., 2012). A total of 73 hits were obtained from the database for specific viral proteins, including the ORF1ab region of SARS CoV, which codes some of the essential proteins for coronaviral replication (e.g. 3C-like protease 3CLPRO, and papain-like protease PLPRO). The targets with annotated drug compounds were then mapped to the SARS-CoV-2 KSA isolate sequence by multiple sequence alignment using CLUSTAL OMEGA. A list of 29 orthologs, including its similarity indexes, was tabulated. Through data mining in ZINC, high-affinity binding drugs to the viral targets were generated.

Methods and materials

Genomic data information and collection

The complete genomic sequences of SARS-CoV-2 isolates from Saudi Arabia (SARS-C0V-2-SA) were downloaded from the GISAID database at gisaid.org. All downloaded 164 sequences were obtained from highly affected regions in Jeddah, Madinah, Makkah, and Riyadh. These also included a list of accession IDs, submission dates, originating labs, and other relevant information (supplementary file gisaid_hcov-19).

Nucleotide sequence alignment and phylogenetic tree analysis

The sequences were first aligned using the alignment explorer feature of MEGAX (Khan, 2017, Kumar et al., 2018). The ClustalW (codons) feature was used for the alignment. The gap opening and gap extension penalty were set to 10.00 and 0.20 respectively, as recommended by Newman et al., 2016, Newman et al., 2016. The aligned sequences were exported and then analyzed to construct the phylogenetic tree. The statistical method employed was the maximum likelihood tree, and the model used was the Tamura-Nei model along with the option to delete partial gaps (Kumar et al., 2018, Tamura and Nei, 1993).

Construction of SARS-CoV-2-SA isolate genomic structure

The hCoV-19/Saudi Arabia/KAIMRC-Alghoribi/2020 isolate, one of the first known sequenced SARS-CoV-2 in the region from a 68 y/o local male, was used to construct the viral genome structure. The genomic sequence of the SARS-CoV-2-SA isolate was annotated using the Viral Genome ORF Reader, VIGOR4 annotator. The open-source for this tool is available at github.com/VirusBRC/VIGOR4, or online at viprbrc.org (Wang et al., 2010). In brief, the sequence (in FASTA format) was loaded into the tool. The reference genome used was the SARS-CoV-2 Wuhan-Hu-1 isolate (GenBank sequence accession NC_045512.2). A total of 28 annotations in the sequence was analyzed and the results were then tabulated.

Database screening and multiple sequence alignment

A list of candidate viral drug targets was generated by screening through annotated entries from the ZINC database. The targets with purchasable drug information were then mapped to the SARS-CoV-2-SA isolate sequence by multiple sequence alignment using CLUSTAL OMEGA. The similarity index was generated, and the list of drugs for high similarity orthologs were then screened and tabulated.

Statistical method

The Maximum Likelihood method was used in the inference and construction of the phylogenetic tree with the highest log likelihood (-1396374.75) based on the Tamura-Nei model. The Neighbor-joining algorithm was applied to a matrix of pairwise distances, and the topology that passes the algorithm was selected to obtain the tree. The site proportion, where at least one unambiguous base is available in one or more sequence per descendent clade, is shown in the tree beside each internal node. The analysis involved 164 nucleotide sequences. There was a total of 30,643 positions in the final dataset.

Results

High homology between all SARS-CoV-2 isolates from Saudi Arabia

The purpose of constructing a phylogenetic tree of the SARS-CoV-2 isolates from Saudi Arabia (SARS-CoV-2-SA) was to determine the homology and evolutionary relationships between these sequences. The data was obtained from GISAID.org and aligned using MEGA X. The phylogenetic tree was then generated using the whole genome sequences compiled in the software. The results suggest that all isolates from Saudi Arabia demonstrated 97–98% similarity in the genome sequences (Fig. 1).

Fig. 1

The phylogenetic tree of SARS-CoV-2 isolates from Saudi Arabia. The tree shows the evolutionary relationship of all 164 SARS-CoV-2-SA isolates obtained from the GISAID database. The relationship shows a high similarity between the tested samples (97–98%). The figure was generated using the Maximum Likelihood method and Tamura-Nei model in MEGA X.

High similarity of open reading frame (ORF) sequences in the SARS-CoV-2-SA isolates to the Wuhan isolate

Based on the phylogenetic analysis, the genomic sequences were found to be highly similar in all SARS-CoV-2-SA isolates. Thus, in order to determine the sequence structure and function of the viral ORFs in relation to the Wuhan isolate, the first isolate uploaded by King Abdullah International Medical Research (KAIMRC) to GISAID.org was used to build its genomic structure using VIGOR4. The data shows that ORFs from the input KAIMRC genome shows 99%-100% similarity to the Wuhan reference (Table 1). This includes the ORF1ab, ORF3a, ORF6a, ORF7a, ORF7b, ORF8, and ORF10 genes that encode for non-structural and accessory proteins essential for viral replication. In addition, there are also presence of highly similar structural ORFs for S, M, E, and N proteins.

Table 1

SARS-CoV-2-SA genomic construction using the Viral Genome ORF Reader, VIGOR4 annotator. The isolate sequence was previously submitted to GISAID database by King Abdullah International Medical Research Center (KAIMRC). The genome consists of ORF1a, ORF1b, Spike, Envelope, Membrane, ORF6, ORF7a, ORF7b, ORF8, Nucleocapsid, and ORF10, respectively. The ORF regions code for the accessory proteins and non-structural proteins, which are involved in virus replication and assembly.

Start..Stop	Gene	Gene Product Name	Reference	Peptide Length	Ref. Length	% Identity	% Similarity	% Coverage
272..13489	ORF1a	ORF1a polyprotein	YP_009725295.1	4405	4405	99.95	99.95	100.00
13448..13486	ORF1a	nsp11	YP_009725312.1	13	13	100.00	100.00	100.00
272..13474, 13474..21561	ORF1ab	ORF1ab polyprotein	YP_009724389.1	7096	7096	99.96	99.96	100.00
272..811	ORF1ab	leader protein	YP_009725297.1	180	180	100.00	100.00	100.00
812..2725	ORF1ab	nsp2	YP_009725298.1	638	638	100.00	100.00	100.00
2726..8560	ORF1ab	nsp3	YP_009725299.1	1945	1945	99.95	99.95	100.00
8561..10060	ORF1ab	nsp4	YP_009725300.1	500	500	100.00	100.00	100.00
10061..10978	ORF1ab	3C-like proteinase	YP_009725301.1	306	306	100.00	100.00	100.00
10979..11848	ORF1ab	nsp6	YP_009725302.1	290	290	99.66	100.00	100.00
11849..12097	ORF1ab	nsp7	YP_009725303.1	83	83	100.00	100.00	100.00
12098..12691	ORF1ab	nsp8	YP_009725304.1	198	198	100.00	100.00	100.00
12692..13030	ORF1ab	nsp9	YP_009725305.1	113	113	100.00	100.00	100.00
13031..13447	ORF1ab	nsp10	YP_009725306.1	139	139	100.00	100.00	100.00
13448..16243	ORF1ab	RNA-dependent RNA polymerase	YP_009725307.1	932	932	99.89	99.89	100.00
16244..18046	ORF1ab	helicase	YP_009725308.1	601	601	100.00	100.00	100.00
18047..19627	ORF1ab	3′-to-5′ exonuclease	YP_009725309.1	527	527	100.00	100.00	100.00
19628..20665	ORF1ab	endoRNAse	YP_009725310.1	346	346	100.00	100.00	100.00
20666..21559	ORF1ab	2′-O-ribose methyltransferase	YP_009725311.1	298	298	100.00	100.00	100.00
21569..25390	S	surface glycoprotein	YP_009724390.1	1273	1273	100.00	100.00	100.00
25399..26226	ORF3a	ORF3a protein	YP_009724391.1	275	275	99.64	100.00	100.00
26251..26478	E	envelope protein	YP_009724392.1	75	75	100.00	100.00	100.00
26529..27197	M	membrane glycoprotein	YP_009724393.1	222	222	100.00	100.00	100.00
27208..27393	ORF6	ORF6 protein	YP_009724394.1	61	61	100.00	100.00	100.00
27400..27765	ORF7a	ORF7a protein	YP_009724395.1	121	121	100.00	100.00	100.00
27762..27893	ORF7b	ORF7b protein	YP_009725296.1	43	43	100.00	100.00	100.00
27900..28265	ORF8	ORF8 protein	YP_009724396.1	121	121	100.00	100.00	100.00
28280..29539	N	nucleocapsid phosphoprotein	YP_009724397.2	419	419	99.76	99.76	100.00
29564..29680	ORF10	ORF10 protein	YP_009725255.1	38	38	100.00	100.00	100.00

Tanshinones show highest binding affinity to replicase 1a in SARS-CoV-2

Drug repurposing remains the most practical method for rapidly developing new treatments using existing drugs. In this study, data mining was performed by screening through the ZINC database for known anti-viral targets. A list of targeted viral genes along with the associated acting compounds were generated (Table 2). According to the database, there are currently 73 viral therapeutic targets, about half of which, contain information on known purchasable drugs against these targets. A multiple sequence alignment approach was then adopted to align these viral target sequences with that of the SARS-CoV-2-SA isolate using CLUSTAL OMEGA. The list generated 29 possible viral target groups with associated drugs that may be effective against SARS-CoV-2 (Table 3). The replicase polyprotein 1a (REP) of SARS CoV had the highest similarity (79.91%) to SARS-CoV-2. Based on chembl 2.0, tanshinones had among the highest binding affinities to REP. The E6 protein (52%) of human papilloma virus had the second highest similarity. The flavonoids, myricetin and morin, were among the listed drugs that targeted the E6 protein. The other alignments were found to be below 50%, which might still be potent against SARS-CoV-2.

Table 2

Name	Description	Sub Class	Orthologs	Observations	Substances	Purchasable	Predicted
NA	Neuraminidase	hydrolase	14	977	480	20	23,041
M	Matrix protein 2	IC-other	1	32	32	12	1025
DPOL_HHV11	DNA polymerase catalytic subunit	transferase	1	6	6	4	31,646
TK	Thymidine kinase	enzyme-other	5	302	147	35	20,453
TAT_HV112	Protein Tat	TF-other	1	58	49	3	3374
NS4A	Non-structural protein 4A	protease	1	1	1	0	4879
UL80	Capsid scaffolding protein	protease	1	208	182	4	176,053
RIR1_HHV11	Ribonucleoside-diphosphate reductase large subunit	enzyme-other	1	30	26	0	88,152
E2	Regulatory protein E2	TF-other	3	1	1	1	2429
UL54	DNA polymerase catalytic subunit	enzyme-other	1	87	80	4	148,906
KITH_VZVD	Thymidine kinase	enzyme-other	1	9	9	2	697
NS5B	NS5B protein	transferase	2	1186	925	56	213,055
NS3	Genome polyprotein	protease	1	980	927	47	64,711
UL26	Capsid scaffolding protein	protease	2	55	55	1	82,085
POLG_HCV1	Genome polyprotein	protease	1	96	95	2	75,703
PROTEASE	Protease	protease	2	953	818	51	158,148
TAT	Protein Tat	TF-other	2	12	12	0	7728
REV	Protein Rev	cytosolic-other	1	19	15	1	3001
ENV	Envelope glycoprotein gp160	surface-antigen	5	55	51	21	1,590,740
ORF_36	Thymidine kinase	enzyme-other	1	0	0	0	0
U38	DNA polymerase catalytic subunit	enzyme-other	1	12	12	2	243,208
U53	Capsid scaffolding protein	protease	1	33	33	0	28,153
SCAF_EBVB9	Capsid scaffolding protein	protease	1	28	28	0	72,426
POL	Pol polyprotein	enzyme-other	6	12,880	9001	876	799,012
GAG-PRO-POL	Gag-Pro-Pol polyprotein	enzyme-other	1	3	3	2	494
POLG_POL1M	Genome polyprotein	protease	1	1	1	0	3037
R1A_CVHSA	Replicase polyprotein 1a	enzyme-other	1	40	35	13	11,327
GAG-PRO	Gag-Pro polyprotein	protease	1	33	33	0	62,841
HA	Hemagglutinin	surface-antigen	3	8	6	0	1051
ABL	Tyrosine-protein kinase transforming protein Abl	kinase	1	1	1	0	10,324
E	Endolysin	enzyme-other	1	0	0	0	0
E1	Replication protein E1	enzyme-other	1	32	31	27	233,548
PRIM_HHV11	DNA primase	nuclear-other	1	7	7	3	1929
VP16_HHV11	Tegument protein VP16	nuclear-other	1	2	2	1	55
US28	G-protein coupled receptor homolog US28	GPCR-A	2	46	14	3	4752
DPOL_VZVD	DNA polymerase catalytic subunit	enzyme-other	1	8	7	5	64,879
MC087R	MC087R	enzyme-other	1	0	0	0	0
POLG_BVDVC	Genome polyprotein	enzyme-other	1	0	0	0	0
Q86831_AVIMB	Polyprotein II	enzyme-other	1	0	0	0	0
POLG_GBVB	Genome polyprotein	enzyme-other	1	0	0	0	0
POLG_DEN26	Genome polyprotein	enzyme-other	1	11	10	0	13,876
PSET	Polynucleotide kinase	enzyme-other	1	0	0	0	0
POLG_WNV	Genome polyprotein	enzyme-other	1	71	52	5	191,900
POLG_HCVCO	Genome polyprotein	enzyme-other	1	34	34	2	707
REP	Replicase polyprotein 1ab	enzyme-other	1	9	9	3	9189
POLG_HRV16	Genome polyprotein	enzyme-other	1	5	5	1	4352
Q82323_9DELA	Protease	unclassified	1	20	20	0	76,070
POLG_HCVBK	Genome polyprotein	enzyme-other	1	7	7	0	2591
V-FPS	Tyrosine-protein kinase transforming protein Fps	kinase	1	1	1	1	0
30	DNA ligase	enzyme-other	1	0	0	0	0
GAG	Gag polyprotein	unclassified	2	15	15	3	1050
DPOL_HHV1K	DNA polymerase catalytic subunit	enzyme-other	1	4	4	2	869
43	DNA polymerase	enzyme-other	1	0	0	0	0
G	Glycoprotein G	unclassified	1	4	4	0	17,091
UL23	Thymidine kinase	unclassified	1	8	8	6	4926
RPOL_BPT7	T7 RNA polymerase	enzyme-other	1	0	0	0	0
HBCAG	External core antigen	unclassified	1	0	0	0	0
THYX_PBCV1	Probable thymidylate synthase	enzyme-other	1	10	10	0	4316
N1L	Virokine/NFkB inhibitor	unclassified	1	22	12	8	31,528
PA	Polymerase acidic protein	unclassified	2	94	51	9	36,557
Q3ZDS5_9HEPC	NS5B	unclassified	1	6	2	0	2097
UL97	Phosphotransferase pUL97	unclassified	1	1	1	0	557
HIVRT	Reverse transcriptase	unclassified	2	294	159	23	24,831
M2	Matrix protein 2	unclassified	1	13	13	3	533
UNG	Uracil-DNA glycosylase	enzyme-other	2	4	4	1	1259
E6	Protein E6	unclassified	1	2	2	2	0
Q76353_9HIV1	Integrase	unclassified	1	153	146	2	63,112
POLG_CXB3N	Genome polyprotein	enzyme-other	1	1	1	0	171
R1A_CVHNL	Replicase polyprotein 1a	enzyme-other	1	0	0	0	0
PB2	Polymerase basic protein 2	unclassified	1	31	31	0	1789
A0A0K1CY61_9HEPC	Nonstructural protein NS3-4A	unclassified	1	59	47	6	996
HIV1_ENV	GP41	unclassified	1	1	1	1	35,089
Q91H74_9FLAV	Genome polyprotein	unclassified	1	30	25	5	24,580

Table 3

Percent identity matrix of therapeutic targets mapped to the SARS-CoV-2-SA isolate sequence. Of the 73 genes, the orthologs with known purchasable drugs were mapped to the SARS-CoV-2-SA isolate sequence by multiple alignment using CLUSTAL OMEGA. A list of 29 orthologs, their respective similarity index, and purchasable drugs information was then generated. Compounds binding to targets with higher similarity to SARS-CoV-2 could be potential drug candidates for further study.

Name	Target description	Orthologs	Ortholog ID	Virus	%	Purchasable	Examples
REP	Replicase polyprotein 1a	1	R1A_CVHSA	SARS-CoV	79.91	13	Tanshinone
E6	Protein E6	1	VE6_HPV16	Human papilloma virus	51.99	2	Myricetin, Morin
NA	Neuraminidase	14	NRAM_I34A1	Influenza A virus	47.76	20	Oseltamivir, Zanamivir, Rapivab
E2	Regulatory protein E2	3	VE2_HPV16	Human papilloma virus	47.61	1	Podofilox
M	Matrix protein 2	1	M2_I72A2	Influenza A virus	45.09	12	Amantadine, Ramantadine
NS3	Genome polyprotein	1	A3EZI9_9HEPC	Hepatitis C virus	40.55	47	Ciluprevir, Victrelis
POLG_WNV	Genome polyprotein	1	POLG_WNV	West Nile virus	40.09	5	ZINC3249673
Q76353_9HIV1	Integrase	1	Q76353_9HIV1	Human immunodeficiency virus 1	39.19	2	Raltegravir
UL26	DNA polymerase catalytic subunit	2	SCAF_HHV11	Human herpesvirus 1	38.26	1	ZINC3625576
TAT_HV112	Tat protein	1	TAT_HV112	Human immunodeficiency virus 1	37.9	1	ZINC5155
ENV	Envelope polyprotein GP160	5	ENV_HV1H2	Human immunodeficiency virus 1	37.7	21	ZINC1780082
UL54	DNApolymerase	1	DPOL_HCMVA	Human cytomegalovirus	35.79	4	Foscarnet
UL80	Capsid scaffolding protein	1	SCAF_HCMVA	Human cytomegalovirus	35.69	4	ZINC901466
POL	Pol polyprotein	6	P88142_9HIV2	Human immunodeficiency virus 2	35.67	876	Elvucitabine, Trovirdine
HIVRT	Reverse transcriptase	2	Q06347_9HIV2	Human immunodeficiency virus 2	35.48	23	Efavirenz, Intelence
PROTEASE	Protease	2	Q4U254_9ENTO	Human rhinovirus	35.28	51	Pleconaril, Pirodavir
PRIM_HHV11	DNA primase	1	PRIM_HHV11	Human herpesvirus 1	35.23	3	ZINC1675992
UL26	Capsid scaffolding protein	2	SCAF_HHV11	Human herpesvirus 1	35.05	1	ZINC3625576
DPOL_VZVD	DNA polymerase catalytic subunit	1	DPOL_VZVD	Varicella-zoster virus	34.45	5	Aphidicolin, Foscarnet
E1	Replication protein E1	1	VE1_HPV11	Human papilloma virus	34.39	6	ZINC3600349
DPOL_HHV11	DNA polymerase catalytic subunit	1	DPOL_HHV11	Human herpesvirus 1	34.33	4	Aphidicolin
GAG	Gag polyprotein	2	GAG_AVIER	Avian erythroblastosis virus	34.33	3	ZINC6584476
PA	Polymerase acidic protein	2	PA_I34A1	Influenza A virus	34.02	9	ZINC3626195
POLG_HRV16	Genome polyprotein	1	POLG_HRV16	Human rhinovirus	33.87	1	ZINC40975895
UNG	Uracil-DNA	2	UNG_VACCW	Vaccinia virus	33.82	1	ZINC359756
US28	G-protein coupled receptor homolog US28	2	US28_HCMVA	Human cytomegalovirus	30.89	3	Metitepine
VP16_HHV11	Tegument protein VP16	1	VP16_HHV11	Human herpesvirus 1	30.84	1	ZINC3831128
TK	Thymidine kinase	5	KITH_HHV1S	Human herpesvirus 1	28.11	35	Brivudine, Sorivudine
N1L	Virokine/NFkB inhibitor	1	Q49PX0_9POXV	Vaccinia virus	27.78	8	ZINC1557545

List of targeted viral genes and the total number of its acting compounds. Using ZINC database, a total of 73 viral therapeutic targets were found. Each target includes its protein function, subclass, orthologs, as well as other information. ‘Observations’ refers to the number of individual reports on compounds that were tested on their respective target. ‘Substances’ refers to the number of compounds for the target. ‘Purchasable’ refers to commercially available compounds, and finally ‘Predicted’ shows the similarity ensemble approach (SEA) predictions-based candidate compounds based on chembl 20. Percent identity matrix of therapeutic targets mapped to the SARS-CoV-2-SA isolate sequence. Of the 73 genes, the orthologs with known purchasable drugs were mapped to the SARS-CoV-2-SA isolate sequence by multiple alignment using CLUSTAL OMEGA. A list of 29 orthologs, their respective similarity index, and purchasable drugs information was then generated. Compounds binding to targets with higher similarity to SARS-CoV-2 could be potential drug candidates for further study.

Discussion

The COVID-19 disease has resulted in an ongoing pandemic with no specific treatment to date, only supportive care that has been endorsed by the World Health Organization (WHO) (Song et al., 2020). Hence, finding a functional cure or even an effective therapy is paramount in stopping the SARS-CoV-2 global outbreaks. At present, based on the analysis of over 48,000 SARS-CoV-2 genomes worldwide from the GISAID database, there are 7 clades (G, GH, GR, L, O, S, V) (Mercatelli and Giorgi, 2020). Each possesses characteristic variants, such as the spike protein variant S-D614G (clade G) and the ORF3a variant NS3-G251 (clade V). Although sequence variation can give rise to contrasting phenotypes and hence possible differences in treatment regimes, its implications in SARS-CoV-2 is still unclear and widely debated today (Young et al., 2020). This is made even more complicated by its relationship with host factors of different countries, which have given rise to varied infection and mortality rates (Toyoshima et al., 2020). Since April 2020, the variants of SARS-CoV-2 genomes were found to be unevenly distributed and highly diversified throughout the continents, which raised the importance of adoptive measures for containing its outbreak (Jones and Manrique, 2020, Mercatelli and Giorgi, 2020). Interestingly, it has also been shown that the mutation rate of SARS-CoV-2 is low, at least in comparison to SARS CoV (Jia et al., 2020, Rausch et al., 2020). In our study of the phylogeny of SARS-CoV-2 isolates in Saudi Arabia (Fig. 1), the genome sequences show very high similarity (between 97 and 98%). This may be beneficial because a uniformed drug treatment may be applicable to the current isolates. The genomic structure reflects that of the reference genome isolated from Wuhan (GenBank sequence accession NC_045512.2) (Table 1). Currently, more genomic sequences are being uploaded to GISAID, and this may increase the number of more diversified sequences and hence mutations that can render many available drugs ineffective (Naqvi et al., 2020, Pachetti et al., 2020). Despite this, a recent study by De Vries et al. (2020) has proven that clade differences do not have an impact on drugs that target highly conserved SARS-CoV-2 regions, such as 3CLPRO (De Vries et al., 2020). Developing drugs for neutralizing the infectivity of a novel pathogen in an outbreak itself is a lengthy endeavor. During a rapid outbreak that shows a high rate of infectivity and fatality, the window period of treatment becomes dangerously short. Hence, drug repurposing is an effective and economical strategy to immediately deliver the desired treatment outcomes in affected individuals with lesser side effects. This is because the use of de-risked drugs bypasses most of the drug developmental hurdles. There are several approaches to this method. Apart from the traditional experimental approaches through in vitro and in vivo studies, an in silico approach is commonly performed due to its high throughput nature (Kumar et al., 2019). With continuous advancements in big data and computing technology, investigators now have an arsenal of analytical or machine learning tools to sift through large numbers of genomes, chemical structures, phenomes, and more to discover ideal drugs for novel targets (Jarada et al., 2020). In our study, we used multiple-sequence alignment, which is a rapid and simple structural-based approach in identifying potential drug candidates (March-Vila et al., 2017). Based on the data mined from ZINC, we have tabulated a list of 73 known viral therapeutic targets (Table 2). This includes those that share some similarities to the SARS-CoV-2 virus, for instance, the rhinovirus, which is a positive-sense, single-stranded RNA virus (+ssRNA) much like SARS-CoV-2 (Pal et al., 2020). A large number of independent and clinical studies that are testing antiviral drugs for COVID-19 (e.g. remdesivir, favipiravir, and lopinavir) are being used commercially for their specificity to these known viral therapeutic targets (Gordon et al., 2020, Kang et al., 2020) (NCT04401579, NCT04358549, NCT04386876). Of these, we have generated a list of 29 viral orthologs that share sequence similarities to that of the SARS-CoV-2-SA isolates (Table 3). Each of these targets comes with information on purchasable drugs. Evidently, the replicase polyprotein 1a of SARS CoV, which also contains the conserved 3CLPRO and PLPRO sequences, had the highest similarity with SARS-CoV-2-SA (79.91%). A ZINC search revealed that tanshinones were among the highest binding affinity substances based on chembl 20. Tanshinones are a class of lipophilic phenanthrene compounds and are the main terpenoid bioactive components of Salvia miltiorrhiza, which is a dried root that is heralded for its therapeutic properties in traditional Chinese medicine (Jiang et al., 2019). It has been shown to activate the AMPK/mTOR signaling pathway, which led to apoptotic inhibition and autophagy induction in heart cells (X. Zhang et al., 2019). This involved the modulation of Bcl-2, Bax, caspase-3 and caspase-7. A study by Park et al (2012) has shown that these compounds are selective inhibitors of SARS CoV 3CLPRO and PLPRO cysteine proteases (Park et al., 2012), and this has been a recent subject of interest for its application in treating COVID-19 (Shahrajabian et al., 2020). These important viral proteins, along with RNA-dependent RNA polymerase (RdRp) and several others, are known to be highly conserved between the two human coronaviruses, including functional regions (C. Wu et al., 2020)). Because of that, finding available drugs that target these regions have garnered significant attention (Báez-Santos et al., 2015). Indeed, several studies have already been done to identify such potential drug candidates that can bind to these viral proteins (Jo et al., 2020a, Jo et al., 2020b, Virdi et al., 2020). Tanshinones are currently being investigated in clinical trials as a potential treatment for acute myocardial infarction (NCT02524964), pulmonary hypertension (NCT01637675), and polycystic ovary syndrome (NCT01452477). Its therapeutic role in vascular diseases is partly due to its modulatory effect on vascular smooth muscle cell proliferation, which is a key contributor in arterial remodeling and hypertension (Wu et al., 2019). To date, there are no related clinical studies involving its use in coronavirus infections. However, a study by Zhang et al. (2020) has shown that cryptotanshinone, a tanshinone derivative, elicited a dose-dependent anti-viral effect on cells infected with SARS-CoV-2 in vitro. This proved that the compound does indeed hold tremendous potential. Meanwhile, myricetin and morin are members of flavonoids, which have been shown by several studies to be chemical inhibitors of the SARS CoV 3CLPRO and nsp3 proteins (Jo et al., 2020a, Jo et al., 2020b, Yu et al., 2012). Currently, flavonoids are being recommended as potential phytochemical-based medicines for COVID-19 treatment (Ngwa et al., 2020, Russo et al., 2020). Podofilox (podophyllotoxin), which is a human papilloma virus drug, may also be a potential candidate. However, a separate virtual screening done by Jordaan et al (2020) showed a low binding affinity of this drug to the SARS-CoV-2 protease (Jordaan et al., 2020). Podofilox is similar to the molecular scaffold of efavirenz, which is an HIV drug (Jordaan et al., 2020). Several of these anti-HIV proteinase drugs (e.g. atazanavir, ritonavir, darunavir, and dolutegravir) have been predicted to possess inhibitory potency against SARS-CoV-2 3CLPRO in silico. An example of these can also be found in Table 3. Other notable mentions from this study are drugs targeting the influenza A virus (a negative-sense RNA virus) neuraminidase and matrix protein. Drugs against this virus (e.g. favipiravir, oseltamivir, etc) are currently in clinical trials and experimental treatments for COVID-19 patients (R. Wu et al., 2020) (NCT04464408, NCT04516915). Other influenza drugs like amantadine, are also being recommended as plausible treatments (Abreu et al., 2020, Araújo et al., 2020). In spite of the lower similarity indexes of other viral targets in our study, further studies should be done to investigate the potential effects of the listed drugs against these viruses. Perhaps, through other in silico methods such as molecular docking.

Conclusion

COVID-19 outbreak will not disappear in a short time and will add more to the rate of mortality and morbidity if appropriate intervention is not correctly applied. Due to the rapid human-to-human transmission of infection worldwide, finding an effective treatment or a functional cure is paramount to ensure our survival. Here, we employed multiple sequence alignment, which is a fast and simple approach in drug repurposing, to identify candidate drugs for COVID-19, with an emphasis on the isolates obtained from Saudi Arabia. Our study showed that these isolates show high sequence similarity (around 98%), and its genomic structure reflects that of the Wuhan reference isolate genome. By using multiple sequence alignment, we showed that among the list of viral target orthologs, the SARS CoV replicase polyprotein 1a sequence had the highest similarity (79.91%). Based on the ZINC database, tanshinones were among the highest compounds with binding affinity to the target, and it has been reported to selectively inhibit SARS CoV 3CLPRO and PLPRO cysteine proteases in the literature. These proteins have garnered significant attention in the discovery of compounds against SARS-CoV-2. However, tanshinones have yet to be investigated in current clinical trials for COVID-19, and thus, these compounds make potential drug candidates.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

12 in total

1. MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms.

Authors: Sudhir Kumar; Glen Stecher; Michael Li; Christina Knyaz; Koichiro Tamura
Journal: Mol Biol Evol Date: 2018-06-01 Impact factor: 16.240

2. Tanshinone I inhibits vascular smooth muscle cell proliferation by targeting insulin-like growth factor-1 receptor/phosphatidylinositol-3-kinase signaling pathway.

Authors: Yu-Ting Wu; Yi-Ming Bi; Zhang-Bin Tan; Ling-Peng Xie; Hong-Lin Xu; Hui-Jie Fan; Hong-Mei Chen; Jun Li; Bin Liu; Ying-Chun Zhou
Journal: Eur J Pharmacol Date: 2019-03-13 Impact factor: 4.432

3. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees.

Authors: K Tamura; M Nei
Journal: Mol Biol Evol Date: 1993-05 Impact factor: 16.240

4. On the Integration of In Silico Drug Design Methods for Drug Repurposing.

Authors: Eric March-Vila; Luca Pinzi; Noé Sturm; Annachiara Tinivella; Ola Engkvist; Hongming Chen; Giulio Rastelli
Journal: Front Pharmacol Date: 2017-05-23 Impact factor: 5.810

5. In vitro activity of lopinavir/ritonavir and hydroxychloroquine against severe acute respiratory syndrome coronavirus 2 at concentrations achievable by usual doses.

Authors: Chang Kyung Kang; Moon-Woo Seong; Su-Jin Choi; Taek Soo Kim; Pyoeng Gyun Choe; Sang Hoon Song; Nam-Joong Kim; Wan Beom Park; Myoung-Don Oh
Journal: Korean J Intern Med Date: 2020-05-29 Impact factor: 2.884

6. Identification of myricetin and scutellarein as novel chemical inhibitors of the SARS coronavirus helicase, nsP13.

Authors: Mi-Sun Yu; June Lee; Jin Moo Lee; Younggyu Kim; Young-Won Chin; Jun-Goo Jee; Young-Sam Keum; Yong-Joo Jeong
Journal: Bioorg Med Chem Lett Date: 2012-04-25 Impact factor: 2.823

7. Analysis of therapeutic targets for SARS-CoV-2 and discovery of potential drugs by computational methods.

Authors: Canrong Wu; Yang Liu; Yueying Yang; Peng Zhang; Wu Zhong; Yali Wang; Qiqi Wang; Yang Xu; Mingxue Li; Xingzhou Li; Mengzhu Zheng; Lixia Chen; Hua Li
Journal: Acta Pharm Sin B Date: 2020-02-27 Impact factor: 11.413

8. From SARS and MERS CoVs to SARS-CoV-2: Moving toward more biased codon usage in viral structural and nonstructural genes.

Authors: Mahmoud Kandeel; Abdelazim Ibrahim; Mahmoud Fayez; Mohammed Al-Nazawi
Journal: J Med Virol Date: 2020-03-16 Impact factor: 2.327

9. Quantitative phylogenomic evidence reveals a spatially structured SARS-CoV-2 diversity.

Authors: Leandro R Jones; Julieta M Manrique
Journal: Virology Date: 2020-08-26 Impact factor: 3.616