Literature DB >> 33093807

Evolution of Sequence-based Bioinformatics Tools for Protein-protein Interaction Prediction.

Mst Shamima Khatun¹, Watshara Shoombuatong¹, Md Mehedi Hasan¹, Hiroyuki Kurata¹.

Abstract

Protein-protein interactions (PPIs) are the physical connections between two or more proteins via electrostatic forces or hydrophobic effects. Identification of the PPIs is pivotal, which contributes to many biological processes including protein function, disease incidence, and therapy design. The experimental identification of PPIs via high-throughput technology is time-consuming and expensive. Bioinformatics approaches are expected to solve such restrictions. In this review, our main goal is to provide an inclusive view of the existing sequence-based computational prediction of PPIs. Initially, we briefly introduce the currently available PPI databases and then review the state-of-the-art bioinformatics approaches, working principles, and their performances. Finally, we discuss the caveats and future perspective of the next generation algorithms for the prediction of PPIs.

Entities: Chemical Disease Gene Species

Keywords: PPIs database; Protein-protein interactions; bioinformatics; feature selection; machine learning; sequence features

Year: 2020 PMID： 33093807 PMCID： PMC7536797 DOI： 10.2174/1389202921999200625103936

Source DB: PubMed Journal: Curr Genomics ISSN： 1389-2029 Impact factor: 2.236

Introduction

Protein-protein interactions (PPIs) are the physical connections between two or more proteins via electrostatic forces or hydrophobic effects [1]. PPIs play a vital role in diverse biological developments, including immune response, DNA transcription and replication, metabolic cycles, and signal transduction pathways [2-4]. To identify the PPIs responsible for such concerted functions is needed [4-6]. Different studies have suggested that PPIs occur between two species such as human-bacteria, human-virus, and plant-pathogen [7-12]. As a result, an understanding of the molecular mechanisms involved in PPIs is very critical for the design of new medicine and therapeutic targets. Proteins often form complexes with other proteins to perform certain tasks [13-15]. PPIs occur at almost every level of cellular functions and provide a global picture of biological progressions [12, 15, 16]. Particularly, a protein complex with multiple subunits [4, 17-20] assists as an efficient subnetwork inside the whole PPI networks [3, 21]. Due to the development of high-throughput sequencing technologies, the identification of PPIs in specific species (PPIs of intraspecies), confirmed by extensive experiments, has enlarged [3, 22-25]. On the other hand, the identification of PPIs between different species (PPIs of interspecies) is limited. While the identification of PPI in both the intra- and inter-species is required for understanding biological functions, mechanisms by which PPI affects the functions of a cell remains to be revealed [26-32]. Many large-scale experiments have been achieved to identify PPIs based on the molecular signature proteins [18, 32-39]. The experimental investigations are often laborious and time-consuming, making it difficult to perceive all potential PPIs. All these restrictions could be solved by bioinformatics approaches in the era of artificial intelligence. Traditional computational algorithms of intraspecies PPIs are often used to deduce the possible associations of interrogating protein pairs [40-44]. These approaches are usually denoted as the interlog mapping [43, 45], the DDI-based method [41, 42] and the DMI-based method [40]. Meanwhile, in recent decades, machine learning (ML)-based approaches have been booming [10, 46-50] that use the amino acid sequence [51, 52], evolutionary profiles [53, 54], physicochemical properties [47, 55], and structure information [56] of the protein pairs. The interspecies PPI prediction is a relatively earlier stage research topic and more challenging task than the intraspecies PPI prediction. Recently, some of the interspecies prediction models have been developed with increases in experimentally verified data [10, 21, 57]. In this review, we provide an inclusive assessment of the state-of-the-art ML approaches for sequence-based PPIs prediction, as shown in Fig. (, and discuss their benefits and shortcomings to aid readers select the best PPI predictor for their purpose. Moreover, we present the future perspectives of ML-based PPI predictions.

Fig. (1)

A general framework of ML-based PPI prediction. (A higher resolution / colour version of this figure is available in the electronic copy of the article).

Databases of PPIS

Many PPI databases are available, e.g., APID [58], TAIR [59], HIPPIE [60], PPIM [61], BioGrid [62, 63], and DIP [64] (Table ). The APID is the most updated, available database, which delivers an inclusive and curated assortment of PPIs for over 1100 organisms. It includes more than 500 experimentally identified PPIs for each of 30 species.

Dataset preprocessing

To build a high-quality dataset is a crucial step for the sequence-based PPI prediction via ML algorithms. The datasets are normally collected from the Swiss-Prot /UniProtKB. In particular, the experimentally identified PPI pairs were considered as positive samples. Sequentially, all the positive pairs of PPIs were randomly crossed to make negative samples, assuming that the randomly shuffled proteins are very unlikely to be positive PPIs. The optimal numbers of negative samples were considered on training data through several statistical investigations [65]. Then the redundancy of the curated sequence datasets was considered. If two sets of PPIs contain similar sequences, either of them is deleted. Recently, Sun et al. used different subcellular locations for generating the negative samples, while considering the experimentally verified PPIs as positive samples [49]. They used the non-interaction pairs as negative samples by pairing proteins in diverse subcellular locations. First, the Swiss-Prot database (version 57.3) was used. Second, the annotated sequences with uncertain or indeterminate subcellular location terms, such as “possible”, “maybe”, “potential”, or “by similarity”, were accessed from the human protein. Finally, two or more locations were excluded from the annotated sequences. Due to the possibility of sequence homolog, 50% homology reduction was performed. This approach had a slight advantage over the random generation of negative samples. To establish a computational tool for accurately predicting PPIs, one of the major challenges is to handle imbalance positive and negative samples [46]. To solve the potentially imbalanced problem, the negative PPI samples are randomly pooled from the entire negative samples to keep a ratio of positive to negative samples [61]. However, exact solutions of dataset imbalance problems are still indispensable issues. Overfitting and underfitting problems may exist in the datasets. When the datasets are highly homologous, they can cause overestimation in the prediction model. Generally, scientists cluster the composed protein sequences with an identity threshold of 60%, 50%, 40%, and 30% by using CD-HIT [66] or BlasClust (http://nebc.nox.ac.uk/bioinformatics/docs/blastclust.html) to solve the bias problems. However, the curated datasets may contain some correlated sequences that the CD-HIT and BalstClust miss. Exact sequence homolog reduction methods are still an important issue. Another problem is data underfitting, when the prediction model uses a very small dataset as the input. Therefore, the dataset should not be too small.

Sequence encoding methods

Generally, PPI prediction methods created the input data by combining the two feature vectors of protein pairs in a row [67-69]. Feature encoding is one of the major phases for predicting PPIs that encodes protein pairs as numeric feature vectors. Appropriate feature descriptors enable to accurately predict PPIs. Recently, a number of computational approaches have been developed as an alternative to experimental methods for identifying potential PPIs (Table ). Due to the sequence diversity, some of the PPIs may be feeble [70-77]. To solve this issue, the PSI-BLAST [78, 79] can be used to produce an outline profile by using a position-specific scoring matrix (PSSM). The given profiles reproduce the variation and conservation through the evolutionary information between protein sequences [80, 81]. These appearances may be suitable for a particular PPI classification problem. Khatun et al. have used autocorrelation and amino acid compositional features for analyzing Zea mays PPI sequences [46]. Recently, the DLPred used diverse sequence information including PSSM, Hydropathy index (HI), AAindex, conservation scores and 3D-1D scores. Chou’s pseudo amino acid composition (PseAAC) is used to encode the positional-wide composition of PPIs [82]. Several physicochemical features available for PPI prediction are introduced, including hydropathy indexes, physical properties, physicochemical characteristics, pKa, conservation score, and 3D-1D scores. The employed physicochemical features are a pKa value of the amino acid residues, hydrophobicity/hydrophilicity, negatively/positively charged, and uncharged residues, a volume of amino acid side chains, and control of functional groups such as methyl, benzyl, and thioether groups. In the amino acid index (AAindex) database, 544 physicochemical properties are stored as numerical indexes [83, 84]. Khatun et al. proposed a sequence-based algorithm using autocorrelation (AC) for PPIs prediction in Zea Mays [46]. One advantage of this method is that AC considers long-range interaction features of amino acids which are responsible for PPI identification. Recently, several structure-based prediction methods have employed the domain information, secondary structure states, polar surface locations, solvent accessibility and hydrophobicity [25]. Generally, the features are extracted by the two different structure-and sequence-based methods. In most cases of PPIs, the protein sequence data has been used more often than the structure data. There is an alarming limitation because proteins are essentially stated as sequences with unfixed length. In the PPI identification, it is necessary to fix the sequence size s (s is the static number with D dimensions of amino acids information). Thus, every PPI sequence is signified as a feature vector of size Dxs. When the length of a PPI sequence is shorter than s, zero is added to the remaining elements of the feature vector. It requires a long computational time to generate the feature vectors. Furthermore, many ML algorithms classify the high-dimensional dataset very properly. Therefore, precise sequence encoding schemes are necessary for valuable perdition. Furthermore, several domain-based approaches have been developed [85-90] that use the domain-domain interaction scores evaluated by diverse ML algorithms including a relevance vector machine and SVM. These approaches consider the proportion of an important domain or domain co-occurrence relationships, but they do not employ the entire domain evidence [84], which is crucial to the understanding a global view of the PPI.

Machine learning algorithm

To detect the potential PPIs via the sequence-based prediction models, several ML algorithms are employed, such as deep learning (DL), support vector machines (SVM), and random forest (RF). Most of the existing predictors use three types of ML algorithms: DL, SVM, and RF. The description of these algorithms is as follows.

Deep Learning

Deep learning (DL) consists of several approaches including Recurrent Neural Networks (RNN), Deep Belief Networks (DBNs), and Deep Neural Networks (DNN). Different DL algorithms are suitable for different specific applications. For instance, to the analysis of sequential information, RNNs are appropriate. The DBNs are decent at examining inside associations in high-dimensional data. To predict PPIs, DNN is one of the most suitable ML algorithms [49]. The DNN input should be the vectors with a fixed dimension. The main parts of the DNN component are to remove highly homologous samples and eliminate noise, and to decrease data dimensions. DNN architectures are assembled layer-by-layer with a greedy algorithm. DNN helps to pick out unravel features to improve performance.

Support Vector Machine

To classify the PPI datasets, SVM or kernel machines are used [89]. The SVM maximizes the margins that are related to the inevitability of its classification. The objective of this classifier is likely to have small margins [90] using a labeled of the training dataset. SVM is very influential and can classify problems with random density information, although it needs large memory requirements and complex format. The SVM is a little bit slow to train and assess the high dimensional features via radial basis function kernel. Another disadvantage is that the parameters significantly alter the results. We refer to more details [90-92].

Random Forest

The RF algorithm involves numerous ensemble decision trees that categorizes the two-class prediction problem [93-97]. On the training model, each decision tree is built using the casual feature vectors that are sampled from a dataset in every node in a tree independently. Then each classification tree is entirely grown via randomly selected variables. To categorize a new entity, the response vector keeps each of the trees in the forest. Allowing the majority voting, one class is allocated to the entity. The RF is an effective algorithm when there exist a large number of features and datasets, and can rank important features for accurate classification [98, 99]. The RF is widely used in computational biology research [46, 90, 99-103].

Combined Model

For a real-world prediction task, the feature sets are combined to enhance the prediction performance [104-110]. The feasibility of different feature sets is evaluated by diverse statistical learning algorithms. Then the evaluation scores are integrated by using various statistical strategies such as logistic regression [111], weight score [112] and multiple linear regression [113]. Moreover, recently the meta-classifiers (e.g. combined different ML algorithms) have widely been used in bioinformatics research to enhance the prediction performance [100, 114].

Evaluation

Measure

To examine the performance of different ML classifiers, many statistical measurements were used, including accuracy, specificity, sensitivity, and Matthew`s correlation coefficient (MCC). These assume a two-class binary classification problem, in which the outputs (PPI or non-PPI) are categorized either as PPI (+) or non-PPI (-). Four consequences will be provided (Table ). True positive (TP) signifies that the real value is ‘+’ and predicted class is ‘+’; false positive (FP) signifies that the real value is ‘-’ and predicted class is ‘+’. False negative (FN) occurs when the real value is ‘+’ and outcome is ‘-’; true negative (TN) occurs when both the real and prediction results are ‘-’. The four measures are defined by: The values of sensitivity, specificity, and accuracy lie between 0 and 1 and MCC between -1 and 1, a higher value signifies better estimate.

Parameter Optimization

After applying ML algorithms, threshold value selection is an important step for the precise prediction of PPIs and non-PPIs. The performance of the prediction model by using the training samples was assessed with a stepwise change in specificity [46, 115, 116]. Typically, high specificity decreases sensitivity. Users need to set different threshold values in their algorithms to understand the exact level of performance. However, existing methods did not set different threshold values, but used a fixed threshold value so that the specificity or sensitivity value was within a certain range. In this case, ordinary users cannot understand exact performances. Therefore, developers should control specificity or sensitivity by changing the threshold of the ML scores via a cross-validation test.

Training and Independent Datasets

Generally, the independent, test dataset used 10-30% samples randomly selected out of the whole PPI samples and the rest of the samples were considered as a training dataset. To evaluate the model performances, initially, a cross-validation test was executed on the training data [117, 118]. In this process, the samples are separated into n sub-groups, and each group is consecutively evaluated n times after training with the other groups. For example, the training dataset is divided into 10 groups. It is an ordinarily accepted number. Among the 10 groups, one group was selected for a test and the other 9 groups were used for training. The predicted PPIs with maximal scores were set to positive samples and the PPIs with low scores were regarded as negative samples. Particularly, a jackknife or a 10-fold CV test was used to predict existing PPI prediction (Table ) [119, 120].

Caveats of the exiting bioinformatics algorithms

Even though much advancement has been done for the expansion of PPI prediction algorithms [121-129], some challenges and limitations need to be addressed. Firstly, the accuracy reported by CV tests is hard to reproduce, unless the source codes and ML parameters regarding sequence encoding methods are provided. However, if developers provide a standalone program or web application, the performances could be evaluated based on independent datasets. Unfortunately, few reported methods provided their source codes or datasets (Table ). Therefore, it is highly recommended to provide the datasets and source codes while publishing a new methodology [119]. Secondly, most existing algorithms removed identical sequences and considered the remaining proteins as a dataset. A few studies have used the dataset including the proteins showing higher sequence identity (>30%). Using such high sequence similarity dataset might cause overfitting problems and overestimate the prediction accuracy. Hence, to develop a reliable prediction model, it is highly recommended to utilize low sequence identity cut-off (<30%), which has been extensively used in various sequence-based predictions. Thirdly, most of the publicly available methods use their own independent dataset to assess prediction performances. To conduct a fair comparison, it is essential to build unique or independent dataset. It is necessary to check whether the prediction model identifies unseen PPIs. Finally, half of the existing PPI tools are not publicly available. To get reliable performances without any knowledge of mathematics and statistics, online services are particularly valuable. Therefore, state-of art accessible services or software should be freely accessible to the users.

Future perspectives and conclusion

Due to the advancement in sequencing technology, it is essential to develop computational methods to enable fast and precise prediction of unseen PPIs from a large number of candidate proteins. Several ML-based methods have been proposed (Table ). A future study requires the construction of unbiased datasets with larger size and independent dataset for validating the proposed models, and the development of new encoding schemes. Of note, it is arguable that the addition of structure-based, side-chain orientation of amino acids or evolutionary information can advance the prediction performance. It is also important to integrate different feature encodings [129-133] such as chemical properties, multivariate mutual information, K-nearest neighbors, and pseudo amino acid configuration and to explore ML algorithms [134-138] including light gradient boosting, extreme gradient boosting, and deep learning.

Table 1

Currently available databases for PPIs.

Database	Description	Year	Database URL
DIP	Several species PPIs that are manually curated	2002	https://dip.doe-mbi.ucla.edu/dip/Main.cgi
TAIR	PPI annotations for Arabidopsis thaliana	2007	https://www.arabidopsis.org/portals/proteome/proteinInteract.jsp
PPIM	PPI database for Maize	2016	comp-sysbio.org/ppim/
PPIM	2,762,560 interactions among 14,000 proteins	2016	https://dbaasp.org/home
HIPPIE	Human PPI references	2017	http://cbdm.uni-mainz.de/hippie/
BioGRID	400,000 PPIs collected from the experimentations and primary literatures	2018	https://openwetware.org/wiki/Protein-protein_interaction_databases#BioGRID
APID	Agile protein intercoms database for bacterial PPIs	2019	http://compsysbio.org/bacteriome/
APID	It integrates the existing public resources and provides PPI information of more than 1100 organisms	2019	http://apid.dep.usal.es

Table 2

Currently available tools for PPI prediction.

Predictor	ML Algorithms	Encoding Methods	Testing Methods	Accuracy	Year	Predictor URL	References
Pred_PPI	SVM	Auto covariance	Jackknife	90.67% (human), 88.99% (yeast), 90.09% (Drosophila), 92.73% (E. coli), 97.51% (C. eleganse)	2010	http://cic.scu.edu.cn/bioinfor-matics/predict_ppi/default.html	[72]
Hotpoint	SVM	PseAAC and local alignment kernel	5-fold CV	70%	2010	http://prism.ccbb.ku.edu.tr/hotpoint/	[89]
PSOPIA	Domain-based	Sequence similarity	10-fold CV	70-85%	2014	http://mizuguchilab.org/PSOPIA	[80]
NIP	SVM	G-gap dipeptide compositions	Jackknife	92.67%	2016	http://mlda.swu.edu.cn/codes.php?name=NIP	[70]
SPRINT	SVM	k-mer	10-fold	N/A	2017	https://github.com/lucian-ilie/SPRINT/	[71]
SIPMA	RF	Autocorrelation, AAC,PseAAC	10-fold CV	89.9%	2018	http://kurata14.bio.kyutech.ac.jp/SIPMA/	[46]
DPPI	Deep learning	Sequence features	10-fold CV	96%	2018	https://github.com/hashemifar/DPPI/	[77]
PPI-Detect	SVM	BPF and sequence features	10-fold CV	91.40%	2018	https://ppi-detect.zmb.uni-due.de/	[47]
DLPred	Deep learning	PSSM, HI, AAindex, sequence conservation score, and 3D-1D scores.	10-fold CV	73.68%	2019	http://qianglab.scst.suda.edu.cn/dlp/	[75]
GWORVMBIG	Optimizer-Based Relevance Vector Machine	PSSM and evolutionary encoding	5-fold CV	NA	2019	http://219.219.62.123:8888/GWORVMBIG	[76]
DAMpred	Neural-Network	Protein structure encoding	10-fold	86%	2019	https://zhanglab.ccmb.med.umich.edu/DAMpred	[73]
FCTP-WSRC	SVM and Weighted sparse leraning	Auto covariance and KNN	5-fold CV	96.67%, 99.82%, and 98.09% for H. pylori, Human and Yeast	2020	https://github.com/wowkiekong/PPI-prediction	[74]

Table 3

Contingency table.

Confusion Matrix or 2×2 Contingency Table
Tested/Estimated/Predicted Results	Total Samples	True Condition
	Total Samples	Positive (+)	Negative (-)
	Positive (+)	n(TP)	n(FP)
	Negative (-)	n(FN)	n(TN)

n(TP) and n(FP) represent the numbers of correctly and incorrectly predicted positive samples, respectively. n(TN) and n(FN) represent the numbers of the correctly and incorrectly predicted negative samples, respectively.

129 in total

1. Improved Prediction of Protein-Protein Interaction Mapping on Homo Sapiens by Using Amino Acid Sequence Features in a Supervised Learning Framework.

Authors: Md Merajul Islam; Md Jahangir Alam; Fee Faysal Ahmed; Md Mehedi Hasan; Md Nurul Haque Mollah
Journal: Protein Pept Lett Date: 2021 Impact factor: 1.890

Review 2. Structure, dynamics, assembly, and evolution of protein complexes.

Authors: Joseph A Marsh; Sarah A Teichmann
Journal: Annu Rev Biochem Date: 2014-12-08 Impact factor: 23.643

3. Detection of membrane protein-protein interaction in planta based on dual-intein-coupled tripartite split-GFP association.

Authors: Tzu-Yin Liu; Wen-Chun Chou; Wei-Yuan Chen; Ching-Yi Chu; Chen-Yi Dai; Pei-Yu Wu
Journal: Plant J Date: 2018-03-23 Impact factor: 6.417

4. Critical assessment and performance improvement of plant-pathogen protein-protein interaction prediction methods.

Authors: Shiping Yang; Hong Li; Huaqin He; Yuan Zhou; Ziding Zhang
Journal: Brief Bioinform Date: 2019-01-18 Impact factor: 11.622

5. DeepCleave: a deep learning predictor for caspase and matrix metalloprotease substrates and cleavage sites.

Authors: Fuyi Li; Jinxiang Chen; André Leier; Tatiana Marquez-Lago; Quanzhong Liu; Yanze Wang; Jerico Revote; A Ian Smith; Tatsuya Akutsu; Geoffrey I Webb; Lukasz Kurgan; Jiangning Song
Journal: Bioinformatics Date: 2020-02-15 Impact factor: 6.937

6. iDNAProt-ES: Identification of DNA-binding Proteins Using Evolutionary and Structural Features.

Authors: Shahana Yasmin Chowdhury; Swakkhar Shatabda; Abdollah Dehzangi
Journal: Sci Rep Date: 2017-11-02 Impact factor: 4.379

7. APID database: redefining protein-protein interaction experimental evidences and binary interactomes.

Authors: Diego Alonso-López; Francisco J Campos-Laborie; Miguel A Gutiérrez; Luke Lambourne; Michael A Calderwood; Marc Vidal; Javier De Las Rivas
Journal: Database (Oxford) Date: 2019-01-01 Impact factor: 3.451

8. CD-HIT: accelerated for clustering the next-generation sequencing data.

Authors: Limin Fu; Beifang Niu; Zhengwei Zhu; Sitao Wu; Weizhong Li
Journal: Bioinformatics Date: 2012-10-11 Impact factor: 6.937

9. PIP-EL: A New Ensemble Learning Method for Improved Proinflammatory Peptide Predictions.

Authors: Balachandran Manavalan; Tae Hwan Shin; Myeong Ok Kim; Gwang Lee
Journal: Front Immunol Date: 2018-07-31 Impact factor: 7.561

10. CORUM: the comprehensive resource of mammalian protein complexes-2019.

Authors: Madalina Giurgiu; Julian Reinhard; Barbara Brauner; Irmtraud Dunger-Kaltenbach; Gisela Fobo; Goar Frishman; Corinna Montrone; Andreas Ruepp
Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971

10 in total

Review 1. Computational Network Inference for Bacterial Interactomics.

Authors: Katherine James; Jose Muñoz-Muñoz
Journal: mSystems Date: 2022-03-30 Impact factor: 7.324

2. LSTM-PHV: prediction of human-virus protein-protein interactions by LSTM with word2vec.

Authors: Sho Tsukiyama; Md Mehedi Hasan; Satoshi Fujii; Hiroyuki Kurata
Journal: Brief Bioinform Date: 2021-11-05 Impact factor: 11.622

3. PUP-Fuse: Prediction of Protein Pupylation Sites by Integrating Multiple Sequence Representations.

Authors: Firda Nurul Auliah; Andi Nur Nilamyani; Watshara Shoombuatong; Md Ashad Alam; Md Mehedi Hasan; Hiroyuki Kurata
Journal: Int J Mol Sci Date: 2021-02-20 Impact factor: 5.923

4. Prediction of serine phosphorylation sites mapping on Schizosaccharomyces Pombe by fusing three encoding schemes with the random forest classifier.

Authors: Samme Amena Tasmia; Md Kaderi Kibria; Khanis Farhana Tuly; Md Ariful Islam; Mst Shamima Khatun; Md Mehedi Hasan; Md Nurul Haque Mollah
Journal: Sci Rep Date: 2022-02-16 Impact factor: 4.379

5. Hierarchical representation for PPI sites prediction.

Authors: Michela Quadrini; Sebastian Daberdaku; Carlo Ferrari
Journal: BMC Bioinformatics Date: 2022-03-20 Impact factor: 3.169

6. Unsupervised machine learning for identifying important visual features through bag-of-words using histopathology data from chronic kidney disease.

Authors: Jeffrey B Hodgin; Arvind Rao; Joonsang Lee; Elisa Warner; Salma Shaikhouni; Markus Bitzer; Matthias Kretzler; Debbie Gipson; Subramaniam Pennathur; Keith Bellovich; Zeenat Bhat; Crystal Gadegbeku; Susan Massengill; Kalyani Perumal; Jharna Saha; Yingbao Yang; Jinghui Luo; Xin Zhang; Laura Mariani
Journal: Sci Rep Date: 2022-03-22 Impact factor: 4.379

7. Decoding the protein-ligand interactions using parallel graph neural networks.

Authors: Carter Knutson; Mridula Bontha; Jenna A Bilbrey; Neeraj Kumar
Journal: Sci Rep Date: 2022-05-10 Impact factor: 4.996

8. A Bioinformatic Approach Based on Systems Biology to Determine the Effects of SARS-CoV-2 Infection in Patients with Hypertrophic Cardiomyopathy.

Authors: Xiao Han; Fei Wang; Ping Yang; Bin Di; Xiangdong Xu; Chunya Zhang; Man Yao; Yaping Sun; Yangyi Lin
Journal: Comput Math Methods Med Date: 2022-09-27 Impact factor: 2.809

9. PredNTS: Improved and Robust Prediction of Nitrotyrosine Sites by Integrating Multiple Sequence Features.

Authors: Andi Nur Nilamyani; Firda Nurul Auliah; Mohammad Ali Moni; Watshara Shoombuatong; Md Mehedi Hasan; Hiroyuki Kurata
Journal: Int J Mol Sci Date: 2021-03-08 Impact factor: 5.923

10. An Improved Computational Prediction Model for Lysine Succinylation Sites Mapping on Homo sapiens by Fusing Three Sequence Encoding Schemes with the Random Forest Classifier.

Authors: Samme Amena Tasmia; Fee Faysal Ahmed; Parvez Mosharaf; Mehedi Hasan; Nurul Haque Mollah
Journal: Curr Genomics Date: 2021-02 Impact factor: 2.236

10 in total