| Literature DB >> 35173404 |
Amira Al-Aamri1, Kamal Taha2, Maher Maalouf3, Andrzej Kudlicki4, Dirar Homouz5.
Abstract
Computational prediction of gene-gene associations is one of the productive directions in the study of bioinformatics. Many tools are developed to infer the relation between genes using different biological data sources. The association of a pair of genes deduced from the analysis of biological data becomes meaningful when it reflects the directionality and the type of reaction between genes. In this work, we follow another method to construct a causal gene co-expression network while identifying transcription factors in each pair of genes using microarray expression data. We adopt a machine learning technique based on a logistic regression model to tackle the sparsity of the network and to improve the quality of the prediction accuracy. The proposed system classifies each pair of genes into either connected or nonconnected class using the data of the correlation between these genes in the whole Saccharomyces cerevisiae genome. The accuracy of the classification model in predicting related genes was evaluated using several data sets for the yeast regulatory network. Our system achieves high performance in terms of several statistical measures.Entities:
Keywords: Bioinformatics; gene co-expression network; predictive model; transcription factor
Year: 2020 PMID: 35173404 PMCID: PMC8842452 DOI: 10.1177/1176934320920310
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 2.031
Figure 1.The training accuracy increases as the moments represented by each pair of probes increase.
The “0” indicates a nonconnected class, and “1” shows a connected class.
| Probe pairs | Moments vector (MV) | Class |
|---|---|---|
| p00 | <m1, m2, . . ., m36> | 0 |
| p01 | <m1, m2, . . ., m36> | 1 |
| p02 | <m1, m2, . . ., m36> | 1 |
Figure 2.As the number of training examples increases, the percent of correct predicted instances increases.
A confusion matrix for the accuracy measures used for the testing data.
| n = 10,000 | Predicted class | ||
|---|---|---|---|
| Actual class | P | N | |
| P | TP = 4998 | FN = 2 | |
| N | FP = 750 | TN = 4250 | |
Abbreviations: FN, false negatives; FP, false positives; TN, true negatives; TP, true positives.
Figure 3.The recall accuracy calculated for different regulatory network databases and using a transcription factors repository (YeTFaSCo). YeTFaSCo indicates Yeast Transcription Factor Specificity Compendium.