| Literature DB >> 22962483 |
Ernesto Iacucci1, Léon-Charles Tranchevent, Dusan Popovic, Georgios A Pavlopoulos, Bart De Moor, Reinhard Schneider, Yves Moreau.
Abstract
MOTIVATION: The prediction of receptor-ligand pairings is an important area of research as intercellular communications are mediated by the successful interaction of these key proteins. As the exhaustive assaying of receptor-ligand pairs is impractical, a computational approach to predict pairings is necessary. We propose a workflow to carry out this interaction prediction task, using a text mining approach in conjunction with a state of the art prediction method, as well as a widely accessible and comprehensive dataset. Among several modern classifiers, random forests have been found to be the best at this prediction task. The training of this classifier was carried out using an experimentally validated dataset of Database of Ligand-Receptor Partners (DLRP) receptor-ligand pairs. New examples, co-cited with the training receptors and ligands, are then classified using the trained classifier. After applying our method, we find that we are able to successfully predict receptor-ligand pairs within the GPCR family with a balanced accuracy of 0.96. Upon further inspection, we find several supported interactions that were not present in the Database of Interacting Proteins (DIPdatabase). We have measured the balanced accuracy of our method resulting in high quality predictions stored in the available database ReLiance. AVAILABILITY: http://homes.esat.kuleuven.be/~bioiuser/ReLianceDB/index.php CONTACT: yves.moreau@esat.kuleuven.be; ernesto.iacucci@gmail.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22962483 PMCID: PMC3436818 DOI: 10.1093/bioinformatics/bts391
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Family analysis. Candidates from the new examples where mapped to Gene Ontology and a search was performed for classifications containing the term ‘receptor’ with more than five members. The three classifications resulting from this search criteria were ‘Peptide Receptor Activity, G-protein Coupled’, ‘Transmembrane Receptor Tyrosine Kinase Activity’ and ‘Cytokine Receptor Activity’. We then used the DIP database as a baseline for calculations of sensitivity, specificity and balanced accuracy for each classification
Fig. 2.Histogram. The prioritized list resulting from our workflow is binned into 100 ranked bins of size 1164. On the left we see the ranked bins that have a length corresponding to their normalized co-citation score. The normalized co-citation score is the total number of co-citations for the members of a bin divided by the connectivity score for the members of a bin and then scaled to the maximum value across all bins. The bins colored in red correspond to the bins that contain pairs, which are called a positive by our classifier. On the right, we see the ranked bins that have a length corresponding to their connectivity score. The connectivity score is the total number of edge degree (number of predicted interactors in the genome) of each of the members of the bin. The dark blue bins correspond to bins, which contain members with higher connectivity than average. The green dashed lines correspond to the average value across all bins
Fig. 3.Workflow. The trained classifier is provided with new examples (genes which are co-cited with the receptors and ligands from the DLRP database) and makes predictions based on its ability to distinguish between interacting and noninteracting pairs. The predictions are ranked by the random forest score provided by the class
In Silico GPCR predictions: top ten predictions made in the GPCR family of receptors and ligands
| 1 | CD27 | CX3CR1 | 0.789 | |
| 2 | TCF7 | CTNNB1 | 0.780 | INTNETDB, MIPS, INTACT |
| 3 | CD27 | CCR1 | 0.723 | |
| 4 | CCL22 | CCR1 | 0.721 | High STRING prediction: 0.964 |
| 5 | LEF1 | CTNNB1 | 0.720 | BIND, BIOGRID, HPRD, MIPS |
| 6 | CCR1 | CSF1 | 0.716 | |
| 7 | CXCL13 | CCR1 | 0.694 | High STRING prediction: 0.983 Experimental ( |
| 8 | EDAR | CX3CR1 | 0.680 | |
| 9 | ANGPT2 | F2R | 0.667 | INTACT |
| 10 | CCL22 | CX3CR1 | 0.638 | High STRING prediction: 0.945 experimental ( |
Top 10 co-cited predictions: the top 10 co-cited predictions with co-citation
| 1 | CD3G | CD3D | 0.903 | BIOGRID, HPRD, INTACT, MIPS |
| 2 | CRK | ALK | 0.894 | |
| 3 | AC003958.6.1 | TNFSF4 | 0.890 | |
| 4 | B2M | CALR | 0.879 | BIOGRID, HPRD, MIPS |
| 5 | CDC14A | CDC7 | 0.876 | INTACT |
| 6 | DCN | SMAD7 | 0.870 | |
| 7 | PDGFRB | GRB7 | 0.867 | HPRD, MIPS |
| 8 | SMAD7 | ACVRL1 | 0.866 | INTACT |
| 9 | TNFSF4 | IL18 | 0.866 | |
| 10 | WDR48 | ERBB2 | 0.865 | |