Literature DB >> 33687847

CoronaPep: An Anti-Coronavirus Peptide Generation Tool.

Aman Chandra Kaushik, Aamir Mehmood, Gurudeeban Selvaraj, Xiaofeng Dai, Yi Pan, Dong-Qing Wei.

Abstract

The novel coronavirus (COVID-19) infections have adopted the shape of a global pandemic now, demanding an urgent vaccine design. The current work reports contriving an anti-coronavirus peptide scanner tool to discern anti-coronavirus targets in the embodiment of peptides. The proffered CoronaPep tool features the fast fingerprinting of the anti-coronavirus target serving supreme prominence in the current bioinformatics research. The anti-coronavirus target protein sequences reported from the current outbreak are scanned against the anti-coronavirus target data-sets via CORONAPEP which provides precision-based anti-coronavirus peptides. This tool is specifically for the coronavirus data, which can predict peptides from the whole genome, or a gene or protein's list. Besides it is relatively fast, accurate, userfriendly and can generate maximum output from the limited information. The availability of tools like CORONAPEP will immeasurably perquisite researchers in the discipline of oncology and structure-based drug design.

Entities: Chemical

Mesh：

Substances：

Year: 2021 PMID： 33687847 PMCID： PMC8769015 DOI： 10.1109/TCBB.2021.3064630

Source DB: PubMed Journal: IEEE/ACM Trans Comput Biol Bioinform ISSN： 1545-5963 Impact factor: 3.702

Introduction

A range of baffling pneumonia cases was reported since mid-December 2019 in the Hubei Province (Wuhan) of China. The concerned research teams and the Chinese government were rapidly engaged to control this epidemic and thus etiological inquiries were operated. The World Health Organization (WHO) on 12th January 2020, uncertainly called it the 2019 novel coronavirus (2019-nCoV). International exigency regarding the health of the public (PHEIC) was professed on January 30, 2020, in response to the 2019-nCoV epidemic by WHO. This illness was formally called Corona Virus Disease (COVID-19) that is instigated by the 2019-nCoV. The taxonomic committee's Coronavirus study group (CSG) named this virus as Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). Its infection rate is quite higher as compared to that of the 2002 outbreak of Severe Acute Respiratory Syndrome (SARS) in China [1], [2]. The SARS-CoV-2 is a type of the coronavirus, belonging to the cluster of the β-coronavirus. It is the 3rd recognized zoonotic coronavirus ailment after the SARS and Middle East Respiratory Syndrome (MERS). These two viruses (SARS-CoV and MERS-CoV) are also linked to clusters of the β-coronavirus [3]. Reports by the WHO, recognition of SARS-CoV-2 in the surrounding tastings obtained from the seafood market in Wuhan. However, it is still unconfirmed if a particular animal transmits this virus. Ji et al. [4] reports revealed the SARS-CoV-2 was a chimera of a bat coronavirus and a coronavirus of an unidentified source. As a result of comparative analysis, it was observed that snakes are the most probable biota source for the SARS-CoV-2 [4]. Benvenuto et al. [5] disclosed that the SARS-CoV-2 was merely strictly linked to the coronavirus extracts in 2015 from the Chinese chrysanthemum-headed bats, supporting the bats to the human transmission chain concept. One of the SARS-CoV-2 receptors is the angiotensin-converting enzyme 2 (ACE2) reported by Zhao et al. [6]. Its expression occurs on type I and II alveolar epithelial cells in healthy lungs where 83 percent of its expression occurs on the type II alveolar cells, having a higher level in men as compared to women in the alveolar cells. There is an increased expression of ACE2 initiated as a result of SARS-CoV-2 attachment, damaging the alveolar cells. There is no vaccine yet made as of February 23, 2020, for the COVID-19. Thus, the patients currently can only be treated symptomatically. The common complications observed in the 2019 nCoV patients are acute respiratory distress syndrome (ARDS), shadowed by anemia, acute heart injuries, and subordinate infections [7]. Sufferers with obstinate hypoxemia were provided aggressive motorized ventilation. There also have been postulates that besides the antiviral impeders, inhibitors for neuraminidase and RNA synthesis inhibitors, and Chinese traditional medicine could be beneficial as well for the COVID-19 therapeutic purposes [8]. Nevertheless, the efficacy of these medicines still needs to be passed through clinical tests [9], [10], [11]. Even though anti-coronavirus peptides have stupendous therapeutic significance, no tool for the anti-coronavirus peptides has been developed until now that could take clinical info e.g., VCF or SNPFF files and identify peptides from wild and mutated coronavirus protein sequences that can be used as anti-coronavirus peptides [12], [13], [14]. No efforts have been made to analyze and retrieve the important patterns from this big data that could be clinically significant for the treatment of the coronavirus. Therefore, to support the scientific portfolio for developing anti-coronavirus peptides; we have developed an anti-coronavirus peptide scanner (CORONAPEP) tool which accepts coronavirus associated clinical data including information from the above sources that is already analyzed and important evidence are retrieved along with the prediction of anti-coronavirus vaccination based on the target sequence information. We believe that CORONAPEP will be helpful for both the bioinformatics as well as for the experimental researchers working in the field of anti-coronavirus peptide-based therapeutics. This is our contribution to aid in tackling the current situation of the COVID-19 outbreak.

Methodology

The whole procedure and different techniques employed for the designing of CORONAPEP have been explained herein step-by-step. The overall workflow of the CORONAPEP is illustrated in Fig. 1.

Fig. 1.

This figure shows depict the overall background operations of the CORONAPEP tool during the generation of anti-coronavirus peptides. Various steps from input data requirements to the final output detail are graphically produced. The data is provided manually or in the form of the required formats which is scanned for additional information by accessing the gene information at the NCBI database and is stored locally which is later fetched for the incorporation of mutations and checking various properties to predict the final peptide that might have a better affinity with the target.

Information Withdrawal for Anti-Coronavirus Targets

Several coronavirus repositories are a molecular group of precise information schemes that gather mixed archives of anti-coronavirus targets. In an attempt to procure a huge quantity of feasibly related information, several databases were taken into consideration for this purpose. The selected repositories were chosen for information hunt because of their freely available clinically confirmed data samples and the magnitude of information.

Background Operations for the CORONAPEP Tool

The whole functional stability of a tool depends upon this model, running in the background. The accuracy, extensibility, and efficiency make an algorithm stand out among the other related developments for a similar purpose. Therefore, algorithm development is an extremely crucial step in the development of a tool. Amidst the time interval of job submission and output display page, there transpires a series of operations in a particular style within a very less extent of only several seconds that may differ depending on the input data load, server's condition, and speed of the internet. All the background processes are briefly overviewed here. To infiltrate the query data, input files are used. These files contain lots of information except the complete sequence of genes. Once the job is submitted, these files are used to extract information such as oncogenes, and mutations along with their exact positions. Once this information is collected, the software connects itself to the NCBI portal to retrieve the whole genes’ sequences because it is absent in the input files as mentioned earlier. The sequence information is gotten and kept on a user's local directory (probably “C” on Windows or $/home in Linux) [15], [16]. Subsequently, the CORONAPEP draws all the sequences’ information one by one from the user's home directory and mutates them based on the clinical data. Till this stage, the predicted peptides are processed further for their validity testing such as their connections with the MHC-I molecule, energy of the Hydrogen bond, scores of the vdW energy global energy. Lastly, the top-ranked peptides are delivered in the results. The output data from CORONAPEP include wild type, mutant and best-ranked peptides along with their various energy values that could be used for rechecking. This whole procedure is graphically represented in Fig. 2.

Fig. 2.

Overview of the key steps taken during the peptide prediction for the Coronavirus via the CORONAPEP tool. Input files are used. Once the job is submitted, these files are used to extract information such as oncogenes, and mutations along with their exact positions. Subsequently, the CORONAPEP draws all the sequences’ information one by one from the user's home directory and mutates them based on the clinical data.

CORONAPEP Tool Impressions

We aimed to make the proffered CORONAPEP tool very competent and reliable as well as highly user-friendly to operate for anti-coronavirus targets analysis. Thus, it is very helpful for researchers in determining the anti-coronavirus peptides. It accepts coronavirus-associated data including data from different resources containing coronaviruses patterns. Next, the input is analyzed and important information is retrieved from their coronavirus targets/proteins and finally, the anti-coronavirus peptide is predicted. The peptides identified by the proposed CORONAPEP tool can be categorized into personalized medicine as they are made from a target coronavirus protein. In this way, they are more specific to the target, have a greater affinity and, are highly interactive.

CORONAPEP User Interface and Productivity

The command-line interface provides more suppleness and enlarges the working territory of software; but, most of the users have no skills in programming and thus GUI plays an important role in allowing users to examine their information and perform numerous jobs without having a deep knowledge of programming. Thus, great efforts are being made while designing GUIs for software as they might be the reason for the high rate of usability and demand. Following this, our proposed tool bears an attractive and meaningful graphical user interface (Fig. 3) which is scripted in Perl where users can easily import the coronavirus targets list with mutation information and the tool will predict the optimized anti-coronaviruses peptides from those imported targets. It also retrieves the information for the anti-coronavirus targets dataset and swindles with a defined sequence of characteristics.

Fig. 3:

The GUI screenshots of the CORONAPEP tool, showing the fetched gene sequence.

Technique Overview

Input/Output Directions

For the smooth operations via the CORONAPEP tool, the key input and output commands are mentioned in the following lines in sequential order. Enter the sequence accession number. Users can choose any of this format for query input that is supposed to be analyzed further. Submit your job. A ‘SUBMIT” button is given that has to be clicked on after the data input. Calculation of scores; after successfully submitting your inquiry, a couple of values will be estimated based on the chosen parameters, on which the peptides will be ranked further. An ancillary package has been fabricated inside the CORONAPEP tool that permits the operator to locate and rank 8-mer, 9-mer, or 10-mer peptides that contain peptide binding motifs. It is highly insistent for those who are looking for a more specific peptide based on their target. Thus, peptides of varying lengths can be generated using this package. The input instructions for the peptide motif search are mentioned below. Choose from the given list, a gene sequence with accession number through the menu button. The selection of ID determines in the scoring program, a coefficient table that will be cast for the sequence chosen. Choose subsequences’ length. Next to it, the program extracts the information for the submitted input sequence and calculates the scores and ranks. Choose whether to display the submitted input sequence on the returned page using the suitable given option. Finally, submit your job.

Display Page Returned to the Operator

Following the inquiry submission, a set of values is calculated for all 8-mer, 9-mer, or 10-mer subsequences in the input sequence, depending on the preselected user options. Based on these scores, subsequences are ranked. This task is supposed to be accomplished within a short duration. After the accomplishment of the background operations, a display is returned to the operator with this information. A listing of operator's input settings (for verification and future memory), along with necessary recording information and additional valuable data, for instance, the estimated and demanded scores, scores in the output data, and queried sequence's length. The score output table, where the results of calculations on subsequences are displayed. The obtained scores listing that display estimation of the subsequences. The queried peptide sequence (if the user has requested the sequence to be returned). Every row of the output scores is listed in four columns. The values for each item signify the subsequences’ ranking and they process from the initial point in query for the primary amino acid letter in the subsequence.

CORONAPEP Tool Used in the Study

Validation is an important step after the development of any tool. This is vital because the accuracy of a newly developed tool has to be examined and verified to know whether it has the potential of doing the requested job or not. After the successful validity checkup, the software is proposed and disseminated. There are different techniques and tactics shadowed for a model's validation. Nevertheless, among all those techniques, a common and dependable one is to get the same or very close results for an already confirmed model. To make it simpler, we would present an example of a pharmacophore model. One of the validation techniques used for the pharmacophore method is knowing experimentally confirmed compounds that are scanned using the developed model and if it successfully delivers the accurate hits, only then it is subjected to a huge database to virtually screen drug-like compounds. The same story is repeated here in the corroboration of our proffered model where the CORONAPEP predicted peptides were checked manually and the output scoring was observed to be correct. Additionally, it's a good practice to test the model several times to evade any personal or systematic errors. Therefore, the validation of CORONAPEP is done via ten randomly selected compounds and the output observed was extremely precise. Furthermore, no error was experienced while running the program and the operations were conducted smoothly.

CORONAPEP Novelty and Application

Though there are a huge number of coronavirus data being produced every day and are given public access via related repositories to make these data be dissected and availed for the amelioration of manhood. Gears are required that could investigate this information and assist in the interpretation of the obtained outcomes. So far, no means are available yet that could use such clinical information and recognize possibly potent peptides. Therefore, to go parallel with the current challenges, we present the CORONAPEP that is an effective tool. It can consider anti-coronavirus targets based on their attachment affinity and offers multiple choices for the production information. The availability of options will lead to an increased usage of this software as it enlarges the work-space by leaving additional choices for the operators. Some of the major points regarding the novelty and specificity of our newly designed CORONAPEP tool are mentioned below. This tool can only be used for the peptides prediction against the Coronavirus only because only the Coronavirus data can be used as a query. This tool can predict peptides from the Corona whole genome as well. Regardless of the availability, whether a gene list or protein list, the peptide can be generated. Though there is not enough data available regarding the new Coronavirus, still this tool can generate maximum output out of the minimum input data. If there is VCF/SNFF data in the raw information, still the peptide can be generated. The CORONAPEP tool has a friendly user interface and requires no programming skills to operate. Thus, a user can process the data on a single click. CORONAPEP's model is extremely fast and can process the data very quickly, causing no unnecessary lag or delay. The peptides generated are specific and can be validated by any 3rd party technique, such as molecular docking or simulation.

Discussion

The inducement for developing CORONAPEP transpired from descrying copious coronavirus data and their importance. Thus, we aspired to bestow an open-source tool that could use such data or information in the form of anti-coronavirus targets and recognize feasibly vindicated peptides. The cognizance of anti-coronavirus targets is tremendously pertinent in designing strategies to counter the coronavirus. The CORONAPEP has proven to be very helpful in this situation as it uses the information of these targets for the identification of possible peptides that have a great affinity to engage with the target. The workflow of CORONAPEP is very simple but immensely efficient as it takes the input files in the form of clinical data which is used to extract coronavirus-causing genes and also the variations in these genes. From this point, it retrieves the related protein sequence from NCBI and is altered based on the clinical information. As a result, the top-ranked peptides, are presented on the output display. CORONAPEP's user interface is designed to be spontaneously responsive. It only requires the coronavirus data in any of the given formats as an input. The run time of the software is quite instantaneous and the output page is redirected to the user within a few seconds because of its robust coding architecture. However, the speed of the tool may vary concerning the data volume, the server's condition, and internet connectivity. The peptides predicted by CORONAPEP for these ten targets are having a very good score. It becomes evident that the tool correctly analyzes the data and predicts the potential peptides with great precision, hence confirms the validity of this tool. So far, we believe CORONAPEP is the first attempt to let users predict peptides for the coronavirus from such clinical data in addition to supplementary useful information. This information includes wild and mutant sequences and peptides. The shortlisted protein is used for the peptide prediction and from the 2D data of the just predicted peptide and its parent protein, the global energy, vdW, and H-bond energy are also calculated. This data can be further used to affirm the validity of the prognosticated peptides by any sort of molecular docking or molecular dynamics simulation, depending on the user's choice and accessibility, though we believe it's not needed. The experimental validation of these outcomes is highly encouraged. As the peptides are originated from the clinical data, therefore in vivo activity is highly expected. This tool is quite advantageous in finding information about anti-coronavirus targets that are further employed for personalized medicine. It may also be very useful in the novel target identification and proposing a potent drug for this target, which is up to 95 percent accurate. To test the workability of the CORONAPEP's Perl script, a black box technique was employed. Specifically, the prediction of peptides as a result and its ranking establishes an attractive functionality of CORONAPEP because it allows focusing the investigational assenting tests on its extremely enhanced top peptides that must consequently lead to higher potential engagements [17], [18].

Conclusion

The COVID-19 is an issue of international concern and threat to public health and there is an urgent need for vaccine design. The present study offers a novel tool that can predict peptides directly from the sequences of infected patients and thus they might have a better affinity with the target and contribute to vaccine design against the COVID-19. CORONAPEP identifies peptides from the targeted coronaviruses protein which means that the predicted peptides are highly customized for the coronavirus patients.

Future Directions

CORONAPEP provides a novel vital step towards assisting clinicians and researchers in improving their investigations via formulating new coronavirus-based targets and proposing effective peptides against such lethal conditions. In the near future, we could plan to implement drug-target interactions and small molecule-miRNA associations in the revised version of our newly designed CORONAPEP tool.

Data Availability Statement

The Data During and/or analyzed during the current study available from the corresponding author on request.

18 in total

1. A-CaMP: a tool for anti-cancer and antimicrobial peptide generation.

Authors: Aman Chandra Kaushik; Aamir Mehmood; Shaoliang Peng; Yu-Juan Zhang; Xiaofeng Dai; Dong-Qing Wei
Journal: J Biomol Struct Dyn Date: 2020-01-06

2. The Cancer Genomics Hub (CGHub): overcoming cancer through the power of torrential data.

Authors: Christopher Wilks; Melissa S Cline; Erich Weiler; Mark Diehkans; Brian Craft; Christy Martin; Daniel Murphy; Howdy Pierce; John Black; Donavan Nelson; Brian Litzinger; Thomas Hatton; Lori Maltbie; Michael Ainsworth; Patrick Allen; Linda Rosewood; Elizabeth Mitchell; Bradley Smith; Jim Warner; John Groboske; Haifang Telc; Daniel Wilson; Brian Sanford; Hannes Schmidt; David Haussler; Daniel Maltbie
Journal: Database (Oxford) Date: 2014-09-29 Impact factor: 3.451

3. CytoMegaloVirus Infection Database: A Public Omics Database for Systematic and Comparable Information of CMV.

Authors: Aman Chandra Kaushik; Aamir Mehmood; Arnav Kumar Upadhyay; Shalinee Paul; Shubham Srivastava; Prayuv Mali; Yi Xiong; Xiaofeng Dai; Dong-Qing Wei; Shakti Sahi
Journal: Interdiscip Sci Date: 2019-12-07 Impact factor: 2.233

4. Drug treatment options for the 2019-new coronavirus (2019-nCoV).

Authors: Hongzhou Lu
Journal: Biosci Trends Date: 2020-01-28 Impact factor: 2.400

5. COSMIC: exploring the world's knowledge of somatic mutations in human cancer.

Authors: Simon A Forbes; David Beare; Prasad Gunasekaran; Kenric Leung; Nidhi Bindal; Harry Boutselakis; Minjie Ding; Sally Bamford; Charlotte Cole; Sari Ward; Chai Yin Kok; Mingming Jia; Tisham De; Jon W Teague; Michael R Stratton; Ultan McDermott; Peter J Campbell
Journal: Nucleic Acids Res Date: 2014-10-29 Impact factor: 16.971

6. The UCSC Cancer Genomics Browser: update 2015.

Authors: Mary Goldman; Brian Craft; Teresa Swatloski; Melissa Cline; Olena Morozova; Mark Diekhans; David Haussler; Jingchun Zhu
Journal: Nucleic Acids Res Date: 2014-11-11 Impact factor: 19.160

7. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China.

Authors: Chaolin Huang; Yeming Wang; Xingwang Li; Lili Ren; Jianping Zhao; Yi Hu; Li Zhang; Guohui Fan; Jiuyang Xu; Xiaoying Gu; Zhenshun Cheng; Ting Yu; Jiaan Xia; Yuan Wei; Wenjuan Wu; Xuelei Xie; Wen Yin; Hui Li; Min Liu; Yan Xiao; Hong Gao; Li Guo; Jungang Xie; Guangfa Wang; Rongmeng Jiang; Zhancheng Gao; Qi Jin; Jianwei Wang; Bin Cao
Journal: Lancet Date: 2020-01-24 Impact factor: 79.321

8. The 2019-new coronavirus epidemic: Evidence for virus evolution.

Authors: Domenico Benvenuto; Marta Giovanetti; Alessandra Ciccozzi; Silvia Spoto; Silvia Angeletti; Massimo Ciccozzi
Journal: J Med Virol Date: 2020-02-07 Impact factor: 2.327

9. Single-Cell RNA Expression Profiling of ACE2, the Receptor of SARS-CoV-2.

Authors: Yu Zhao; Zixian Zhao; Yujia Wang; Yueqing Zhou; Yu Ma; Wei Zuo
Journal: Am J Respir Crit Care Med Date: 2020-09-01 Impact factor: 21.405

10. Cross-species transmission of the newly identified coronavirus 2019-nCoV.

Authors: Wei Ji; Wei Wang; Xiaofang Zhao; Junjie Zai; Xingguang Li
Journal: J Med Virol Date: 2020-04 Impact factor: 2.327

2 in total

1. A database of anti-coronavirus peptides.

Authors: Qianyue Zhang; Xue Chen; Bowen Li; Chunying Lu; Shanshan Yang; Jinjin Long; Heng Chen; Jian Huang; Bifang He
Journal: Sci Data Date: 2022-06-13 Impact factor: 8.501

2. COVID19db: a comprehensive database platform to discover potential drugs and targets of COVID-19 at whole transcriptomic scale.

Authors: Wenliang Zhang; Yan Zhang; Zhuochao Min; Jing Mo; Zhen Ju; Wen Guan; Binghui Zeng; Yang Liu; Jianliang Chen; Qianshen Zhang; Hanguang Li; Chunxia Zeng; Yanjie Wei; Godfrey Chi-Fung Chan
Journal: Nucleic Acids Res Date: 2022-01-07 Impact factor: 16.971

2 in total