| Literature DB >> 31937236 |
Maxime Folschette1,2,3,4,5, Vincent Legagneux2, Arnaud Poret4, Lokmane Chebouba4,6,7, Carito Guziolowski8,9, Nathalie Théret10,11.
Abstract
BACKGROUND: Integrating genome-wide gene expression patient profiles with regulatory knowledge is a challenging task because of the inherent heterogeneity, noise and incompleteness of biological data. From the computational side, several solvers for logic programs are able to perform extremely well in decision problems for combinatorial search domains. The challenge then is how to process the biological knowledge in order to feed these solvers to gain insights in a biological study. It requires formalizing the biological knowledge to give a precise interpretation of this information; currently, very few pathway databases offer this possibility.Entities:
Keywords: Data and network integration; Discrete modeling; Hepatocellular carcinoma; KEGG; Signaling and regulatory knowledge
Mesh:
Year: 2020 PMID: 31937236 PMCID: PMC6958715 DOI: 10.1186/s12859-019-3316-1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Schema describing the pipeline for building networks and predicting regulatory nodes. (1) Using a list of differentially expressed genes, construct the set of gene names and the corresponding set of observations (a sign is attributed for each gene: + when fold change >2 or − when fold change <0.5 and adjusted p-value < 10−5); (2) Extract the upstream/downstream signaling pathways for the set of genes from the signed interaction graph using Pathrider, a tool developed in our team to this purpose. Given a list of excluded genes (such as invariant genes), Pathrider filters these genes to reduce the graph size; (3) Check the sign consistency of our datasets to produce signed predictions for unmeasured biomolecules using iggy tool; (4) Validate the predictions made by iggy by computing sub-predictions (prediction 1, 2...n) using a sub-set of observations (by default, it starts sampling from 10% to 95% of observations with a step of 5% and a number of execution equal to 100), then compare it firstly with the differentially expressed genes, and in a second time with the predictions obtained with all the set of observations; and (5) Plot the precision scores for each sub-sets of the observations, and the stability of the prediction compared to the predictions of the entire set of observations
List of all predictions returned by Iggy
| List of positive (up-regulated) predictions |
| ADRA2A_prot, BDKRB2_prot, BMP4_gen, CCL11_prot, CCL13_gen, CCL13_prot, CCL17_gen, CCL17_prot, CCL19_prot, CCL21_prot, CCL22_prot, CCL26_prot, COL1A1_prot, COL1A2_prot, COL3A1_prot, COL4A2_prot, COL4A3_prot, COL6A1_prot, COL6A2_prot, COL6A3_prot, COMP_prot, CTBP2_prot, CTSK_prot, CXCL12_prot, CXCL14_prot, CXCL5_prot, CXCL6_prot, DCN_prot, DKK2_prot, DUSP4_prot, EFNB3_prot, EIF4EBP2_prot, EPHA3_prot, FGF18_prot, FGF1_prot, FHL2_prot, FPR1_prot, GLI3_prot, HGF_prot, HHIP_prot, HIF1A_prot, HTR2B_prot, ICAM1_gen, IL34_prot, IL6_prot, JAG1_prot, KRAS_gen, LAMA1_prot, LAMA2_prot, LAMC2_prot, LAMC3_prot, LIF_prot, NFATC1_prot, NFKB1::BCL3, NFKB2::RELB, NOTCH1_gen, NOTCH2_gen, NOTCH4_gen, NR0B2_gen, NR0B2_prot, NR1H4_gen, NR1H4_prot, NR3C2_gen, NR3C2_prot, NRG3_prot, NTF3_prot, NTRK3_prot, PMAIP1_prot, PPP2R2C_prot, PRKG1_prot, PTGER1_prot, PTGIR_prot, PTH1R_prot, PTHLH_prot, PTPRR_prot, RASAL1_prot, SCTR_prot, SEMA3C_prot, SFRP1_prot, SFRP2_prot, SFRP4_prot, SFRP5_prot, SGK1_gen, SLIT2_prot, TGFA_prot, THBS2_prot, THRA_prot, TNC_prot, TNXB_prot, VDR_gen, VDR_prot, WTIP_prot |
| List of negative (down-regulated) predictions |
| APAF1_gen, APAF1_prot, BAK1_gen, BAX_gen, BID_gen, CCL15_prot, CCL16_prot, CHAD_prot, CREB1_prot, CSNK2B_prot, DKK4_prot, EIF2B4_prot, EIF2B5_prot, ELMO1_prot, FOXO3_prot, IGFBP3_gen, IGFBP3_prot, JUND::NACA, LRP5_gen, LRP6_gen, MDM2_gen, PHLPP1_prot, PIDD1_gen, PIDD1_prot, PPP2R5A_prot, PPP2R5D_prot, PTEN_gen, RAD9A_prot, RFNG_prot, RXRB_prot, SENP2_prot, SESN1_gen, SESN1_prot, SESN2_gen, SESN2_prot, SESN3_gen, SESN3_prot, SFN_gen, SFN_prot, SIVA1_gen, SIVA1_prot, SLC38A9_prot, SPDYC_prot, SREBF1_gen, SREBF1_prot, THBS1_gen, THBS4_prot, THEM4_prot, THPO_prot, TNFRSF10A_gen, TNFRSF10B_gen, TP53_prot, TP73_prot, TSC2_gen |
Fig. 2Precision scores of predictions obtained on samplings of the observations. Boxplots of the precision scores (ordinate) of the predictions obtained with 100 randomly picked samplings (abscissa) of observations. Each box plot at abscissa x represents the distribution of the precision scores of the predictions obtained when using only x% of the observations. The point at 100% represents the prediction score of the predictions when using the complete set of observations
Fig. 3Stability of the predictions for subsets of observations. This figure summarizes the stability of the predictions for all samplings of the observations, compared to the final predictions with all 100% of observations. “Good” predictions (matching the 100% predictions) are depicted in green, “Bad” predictions (predicted differently than the 100% predictions) in red and “Missing” predictions (not predicted) in blue. For each category, four curves are plotted representing, from top to bottom, the maximum, median, mean and minimum number of predictions of this type. Curves are normalized to the number of predictions obtained for each set of sampled data
Fig. 4Consistency constraints. Given a signed graph, where green edges depict activations, and red edges, inhibitions, two partial labelings for the nodes A and B are proposed in the first and second rows. In both cases there are 32 different possible labelings for C and D taking 3 signs {+,−,0} corresponding to colors green, red and blue, respectively. We depicted here only one consistent and one inconsistent scenario according to the sign-consistency constraints