| Literature DB >> 25759684 |
Esmaeil Nourani1, Farshad Khunjush2, Saliha Durmuş3.
Abstract
Infectious diseases are still among the major and prevalent health problems, mostly because of the drug resistance of novel variants of pathogens. Molecular interactions between pathogens and their hosts are the key parts of the infection mechanisms. Novel antimicrobial therapeutics to fight drug resistance is only possible in case of a thorough understanding of pathogen-host interaction (PHI) systems. Existing databases, which contain experimentally verified PHI data, suffer from scarcity of reported interactions due to the technically challenging and time consuming process of experiments. These have motivated many researchers to address the problem by proposing computational approaches for analysis and prediction of PHIs. The computational methods primarily utilize sequence information, protein structure and known interactions. Classic machine learning techniques are used when there are sufficient known interactions to be used as training data. On the opposite case, transfer and multitask learning methods are preferred. Here, we present an overview of these computational approaches for predicting PHI systems, discussing their weakness and abilities, with future directions.Entities:
Keywords: computational PHI prediction; data mining; machine learning; pathogen-host interaction (PHI); protein-protein interaction
Year: 2015 PMID: 25759684 PMCID: PMC4338785 DOI: 10.3389/fmicb.2015.00094
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Computational studies for prediction of PHIs.
| Krishnadev and Srinivasan, | |
| Lee et al., | |
| Wuchty, | |
| Dyer et al., | |
| Kim et al., | |
| Hepatitis C virus (HCV)-Human | Cui et al., |
| Phage T4- | Krishnadev and Srinivasan, |
| Phage lambda- | Krishnadev and Srinivasan, |
| Wang et al., | |
| Krishnadev and Srinivasan, | |
| Reid and Berriman, | |
| Reid and Berriman, | |
| Oral microbial-Human | Coelho et al., |
| Krishnadev and Srinivasan, | |
| Arnold et al., | |
| Kshirsagar et al., | |
| Kshirsagar et al., | |
| Schleker et al., | |
| Mei and Zhu, | |
| Schleker et al., | |
| Zhou et al., | |
| Krishnadev and Srinivasan, | |
| Kshirsagar et al., | |
| Kshirsagar et al., | |
| Davis et al., | |
| Kim et al., | |
| Mei, | |
| HIV1-Human | Evans et al., |
| Tastan et al., | |
| Mei, | |
| Qi et al., | |
| Dyer et al., | |
| Ray et al., | |
| Doolittle and Gomez, | |
| Nouretdinov et al., | |
| Mukhopadhyay et al., | |
| Mondal et al., | |
| 36 viral species-Human | Franzosa and Xia, |
| Influenza A NS1–Human | De Chassey et al., |
| HPV16–Human | Dong et al., |
| Kshirsagar et al., | |
| Kshirsagar et al., | |
| Dengue virus-Human | Doolittle and Gomez, |
| Segura-Cabrera et al., | |
| Insect vector | Doolittle and Gomez, |
| Schleker et al., | |
| Schleker et al., | |
| Human papilloma viruses (HPV)-Human | Cui et al., |
| Li et al., | |
| Barh et al., |
Figure 1Machine learning and data mining based approaches for prediction of PHIs.
Summary of the exploited features for prediction of PHIs.
| Domain and motif information | Set to be 1 every domain pair of each PPI in a binary feature vector of all possible domain pairs | Dyer et al., |
| Count possible interacting domains between pathogen and host proteins using domain interactions database (3DID) | Kshirsagar et al., | |
| Functional sequence motifs from ELM database checked in HIV-1 sequence | Tastan et al., | |
| Suppose protein pairs as interacting when they have one or more interacting domain | Coelho et al., | |
| Protein sequence n-mers (n-gram) | For each pathogen-host protein pair concatenate their vectors. Each protein vector count the number of times each distinct n-mer occurred in the sequence | Dyer et al., |
| Similar to Dyer et al. ( | Kshirsagar et al., | |
| Variant of the spectrum kernel based on sequence n-mers | Kshirsagar et al., | |
| Represent proteins by relative count of amino acid 3-mers | Cui et al., | |
| Forming 7 amino acid classes and computing frequency difference through 343-dimensional vector | Wuchty, | |
| Forming 4 amino acid classes and computing standardized frequency difference through 64 possible combination | Dong et al., | |
| Observing each of different 20 amino acids within protein sequence | Coelho et al., | |
| Network topology | Two features for each pathogen-host protein pair including human protein's degree and its betweenness centrality | Dyer et al., |
| Three features of human protein: degree, clustering coefficient, centrality | Tastan et al., | |
| Similar to Tastan et al. ( | Kshirsagar et al., | |
| Degree and betweenness centrality in human PPI | Dong et al., | |
| Gene ontology | Pairwise similarity between GO terms of host and pathogen and Neighbor similarity for GO terms of pathogen and binding partners of human proteins | Kshirsagar et al., |
| Pairwise and neighbor GO similarity | Tastan et al., | |
| Three aspects of Gen Ontology are the only used feature values and the homolog GO features are used for missing data | Mei, | |
| Biological process similarity is computed for protein pairs | Coelho et al., | |
| For every human protein within extracted biclusters find important GO terms | Ray et al., | |
| Using GO functional data for conducting two functional analysis | Reid and Berriman, | |
| Gene expression | Differential human gene expression infected by pathogen in seven control conditions | Kshirsagar et al., |
| Differential human gene expression across HIV-1 infected and uninfected samples | Tastan et al., | |
| Conserved pathways | Find other known PHI, which pathogen is homolog and host proteins share a pathway | Kshirsagar et al., |
| RNAi expression | Utilizing human genes reported as “hits” by the RNAi screens | |
| Homology information | For each PHI count the number of interologs from other species | |
| Forming orthologous groups through clustering host and pathogen proteins around central orthlogous pairs | Wuchty, | |
| Use STRING to get clusters of orthologous groups and their scores | Coelho et al., | |
| Pfam interactions | Counts the possible interactions between Pfam families of host and pathogen reported in the iPfam | Kshirsagar et al., |
| Use interacting pair of domains to predict gene interaction between malaria and its hosts (mouse and mosquito) | Reid and Berriman, | |
| Protein sequence | Sequence alignment between pathogen and host proteins computed using PSI-BLAST | Kshirsagar et al., |
| Tissue feature | Check infection susceptibility of tissues | Tastan et al., |
| Virus protein type | One feature for each HIV-1 protein to compute probability of interacting with human protein | |
| A feature vector formed by 11 types of HCV proteins and 9 types of HPV | Cui et al., | |
| Pathways | Pathway participation coefficient is calculated for each protein | Wuchty, |
| Use similarity of pathway memberships of human proteins to propose commonality hypothesis across organisms | Kshirsagar et al., | |
| For each human protein within extracted biclusters find important KEGG pathways | Ray et al., | |
| Find other known PHI, which pathogen is homolog and host proteins share a pathway | Kshirsagar et al., |
Homology based approaches for prediction of PHIs.
| Homology detection method using template PPI databases, DIP, and iPfam | Krishnadev and Srinivasan, |
| Interologs were inferred from ortholog information obtained from high confidence databases | Lee et al., |
| Homology detection method using template PPI databases, DIP, and iPfam | Tyagi et al., |
| Homology detection method using template PPI databases, DIP, and iPfam | Krishnadev and Srinivasan, |
| Introduce stringent homology which uses inter species template PPI | Zhou et al., |
| Conserved PHI network is generated using interacting proteins of the common conserved inter-species bacterial PPI | Barh et al., |
| Obtain host-pathogen interactome using sequence and interacting domain similarity to known PPIs | Schleker et al., |
| Interolog and Domain based approaches are used to predict PHIs | Li et al., |
| The ortholog information for the four species are integrated from different databases and interspecies PPI network is constructed followed by dynamic modeling of regulatory responses leads to identifying interactions | Wang et al., |
Structure based approaches for prediction of PHIs.
| Comparative modeling of 3D structures | Davis et al., |
| Sharing interacting partners of structurally similar human proteins to HIV proteins | Doolittle and Gomez, |
| Structural similarity of Denv proteins to human proteins having known interactions | Doolittle and Gomez, |
| 3D structural interaction network of host-pathogen and within-host PPI networks | Franzosa and Xia, |
| Assumes that structurally homologous proteins have probably interactors in common | De Chassey et al., |
Domain and motif based approaches for prediction of PHIs.
| PreDIN and PreSPI algorithms based on domain information | Kim et al., |
| Estimating PPI probability using combining interaction probability of domains | Dyer et al., |
| XooNET uses Structural Interactome MAP (PSIMAP), Protein | Kim et al., |
| Experimental Interactome MAP (PEIMAP) and Domain-Domain interactions from iPfam | |
| Based on ELMs on HIV-1 proteins interacting with human protein counter domains (CDs) | Evans et al., |
| Predict and rank bacteria-human PPIs based on domain-domain interaction | Arnold et al., |
| Build the virus-host interactomes by identifying domain interactions between virus and host PPIs followed by topological and functional analysis of the network | Zheng et al., |
| The viral-human interaction network is modeled based on motif-domain interactions | Segura-Cabrera et al., |
Popular evaluation metrics used for PHI prediction.
| Accuracy | Cui et al., | |
| Specificity | Cui et al., | |
| Sensitivity (Recall) | Dyer et al., | |
| Precision | Dyer et al., | |
| F1 score | Kshirsagar et al., | |
| AUC | The area under the ROC curve | Davis et al., |
TP, True Positive; TN, True Negative; FP, False Positive; FN, False Negative.