| Literature DB >> 21685069 |
Sara Berthoumieux1, Matteo Brilli, Hidde de Jong, Daniel Kahn, Eugenio Cinquemani.
Abstract
MOTIVATION: High-throughput measurement techniques for metabolism and gene expression provide a wealth of information for the identification of metabolic network models. Yet, missing observations scattered over the dataset restrict the number of effectively available datapoints and make classical regression techniques inaccurate or inapplicable. Thorough exploitation of the data by identification techniques that explicitly cope with missing observations is therefore of major importance.Entities:
Mesh:
Year: 2011 PMID: 21685069 PMCID: PMC3117355 DOI: 10.1093/bioinformatics/btr225
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Statistics of estimated parameter values for datasets with 40% of missing data and 10% noise. The results are shown as boxplots of the ratio of the estimated parameter values c and reference parameter values c. Statistics have been computed for each of the 5 methods from 100 datasets. For each method, the red line displays the median and the lower and upper blue lines represent the lower and upper quartile values, respectively. Whiskers extend from each end of the box to the most extreme values within 1.5 times the interquartile range from the ends of the box and outliers are shown with red crosses. The tested algorithms are Expectation Maximization (EM), direct optimization of loglikelihood (MaxLL), multiple imputation (MI), regression on incomplete datasets (Rg) and regression on complete datasets (RgF). (A–D) Boxplots for reactions 3, 4, 11 and 18 of the network, respectively.
Fig. 2.Statistics of estimated parameter values for datasets with 75% of missing data and 20% noise. The graphical notations are the same as for Figure 1. (A–F) Boxplots for reactions 3, 13, 17, 22, 19 and 25 of the network, respectively.
Fig. 3.Scheme of E.coli central carbon metabolism. This map, showing metabolites (bold fonts) and genes (italic) is adapted from (Ishii ). Abbreviations of metabolites are glucose (Glc), glucose 6-phosphate (G6P), fructose 6-phosphate (F6P), fructose 1-6-biphosphate (FBP), dihydroxyacetone phosphate (DHAP), glyceraldehyde 3-phosphate (G3P), 3-phosphoglycerate (3PG), phosphoenolpyruvate (PEP), pyruvate (Pyr), 6-phosphogluconate (6PG), 2-keto-3-deoxy-6-phospho-gluconate (2KDPG), ribulose 5-phosphate (Ru5P), ribose 5-phosphate (R5P), xylulose 5-phosphate (X5P), sedoheptulose 7-phosphate (S7P), erythrose 4-phosphate (E4P), oxaloacetate (OAA), citrate (Cit), isocitrate (IsoCit), 2-keto-glutarate (2KG), succinate-CoA (SuccoA), succinate (Suc), fumarate (Fum), malate (Mal), glyoxylate (Glyox), acetyl-CoA (AcoA), acetylphosphate (Acp) and acetate (Ace). Cofactors impacting the reactions are not shown. The gene names are separated by a comma in the case of isoenzymes, by a colon for enzyme complexes, and by a semicolon when the enzymes catalyze reactions that have been lumped together in the model.
Elasticity matrix [B0 B0] estimated by EM from the data of (Ishii ) for the linlog model of E.coli central carbon metabolism (the columns of the matrix have been permuted for readability)
Unidentifiable elasticities are shown in grey, uncertain elasticities (i.e. having a sign that is not significant with 95% confidence) in yellow, and correctly/incorrectly identified elasticities (i.e. having a sign that is significant with 95% confidence) in green/red. Abbreviations are as in Figure 3. Some of the cofactors are modeled as ratios of metabolite concentrations, e.g. ATP/ADP. Reaction 27, labeled μ, is a phenomenological reaction for biomass production. The last row indicates the percentage of missing data per metabolite and the right-most column displays the amount of complete datapoints available for each reaction. aRegression was not able to produce any result.