| Literature DB >> 32040480 |
Francesco Reggiani1,2, Marco Carraro1, Anna Belligoli3, Marta Sanna3, Chiara Dal Prà3, Francesca Favaretto3, Carlo Ferrari2, Roberto Vettor3, Silvio C E Tosatto1,4.
Abstract
In this work we present a framework for blood cholesterol levels prediction from genotype data. The predictor is based on an algorithm for cholesterol metabolism simulation available in literature, implemented and optimized by our group in the R language. The main weakness of the former simulation algorithm was the need of experimental data to simulate mutations in genes altering the cholesterol metabolism. This caveat strongly limited the application of the model in the clinical practice. In this work we present how this limitation could be bypassed thanks to an optimization of model parameters based on patient cholesterol levels retrieved from literature. Prediction performance has been assessed taking into consideration several scoring indices currently used for performance evaluation of machine learning methods. Our assessment shows how the optimization phase improved model performance, compared to the original version available in literature.Entities:
Year: 2020 PMID: 32040480 PMCID: PMC7010235 DOI: 10.1371/journal.pone.0227191
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Conceptual model for pathways and genes determining cholesterol plasma levels used van de Pas and colleagues [9], [10].
Process numbers stand for: 1, hepatic cholesterol synthesis (DHCR7); 2, peripheral cholesterol synthesis(DHCR7); 3, intestinal cholesterol synthesis (DHCR7); 4, dietary cholesterol intake (NPC1L1); 5, hepatic uptake of cholesterol from LDL (LDLR,APOB,APOE); 6, VLDL-C secretion (MTTP); 7, peripheral uptake of cholesterol from LDL (LDLR,APOB,APOE); 8, peripheral cholesterol transport to HDL (ABCA1); 9, HDL-associated cholesterol esterification (LCAT); 10, hepatic HDL-CE uptake (SCARB1); 11, intestinal chylomicron cholesterol secretion (MTTP); 12, peripheral cholesterol loss; 13, hepatic HDL-FC uptake (MTTP); 14, biliary cholesterol excretion (ABCG8,NPC1L1); 15, fecal cholesterol excretion; 16, intestinal cholesterol transport to HDL (ABCA1); 17, hepatic cholesterol transport to HDL (ABCA1); 18, hepatic cholesterol catabolism (CYP7A1); 19, hepatic cholesterol esterification (SOAT2); 20, intestinal cholesterol esterification (SOAT2); and 21, CE transfer from HDL to LDL (CETP).
Biological process and genes associated to each rate of the model.
| rate | Biological process | gene |
|---|---|---|
| 1 | hepatic cholesterol synthesis | DHCR7 |
| 2 | peripheral cholesterol synthesis | DHCR7 |
| 3 | intestinal cholesterol synthesis | DHCR7 |
| 4 | dietary cholesterol intake | NPC1L1 |
| 5 | hepatic uptake of cholesterol from LDL | LDLR, APOB, APOE |
| 6 | VLDL-C secretion | MTTP |
| 7 | peripheral uptake of cholesterol from LDL | LDLR, APOB, APOE |
| 8 | peripheral cholesterol transport to HDL | ABCA |
| 9 | HDL-associated cholesterol esterification | LCAT |
| 10 | hepatic HDL-CE uptake | SCARB1 |
| 11 | intestinal chylomicron cholesterol secretion | MTTP |
| 12 | peripheral cholesterol loss | |
| 13 | hepatic HDL-FC uptake | MTTP |
| 14 | biliary cholesterol excretion | ABCG8, NPC1L1 |
| 15 | fecal cholesterol excretion | |
| 16 | intestinal cholesterol transport to HDL | ABCA1 |
| 17 | hepatic cholesterol transport to HDL | ABCA1 |
| 18 | hepatic cholesterol catabolism | CYP7A1 |
| 19 | hepatic cholesterol esterification | SOAT2 |
| 20 | intestinal cholesterol esterification | SOAT2 |
| 21 | CE transfer from HDL to LDL | CETP |
Reaction rates present in the model and the associated biological process they represent, also the main genes involved in the process are reported [9], [10].
Optimized f parameters and related genes.
| Gene | Reggiani et al | van de Pas et al 2012 |
|---|---|---|
| 0.58 | 0.38 | |
| 0.9 | 0.31 | |
| 0.55 | 0.32 | |
| 0.53 | 0.41 | |
| 0.72 | 0.45 | |
| 0.43 | 0.65 | |
| 0.48 | 0.62 | |
| 1 | 0 | |
| 0 | 0 | |
| 0.81 | 0.05 |
Genes represented in the training, test set and related f as computed by the optimization procedure or by using experimental variables, as reported by van de Pas and colleagues[9]
Fig 2Training set patients cholesterol levels.
Boxplot of HDL and LDL cholesterol levels of the patients composing the training set. From left to right: cholesterol levels of the model at the steady state, patients affected by Autosomal Dominant Hypercholesterolemia (with high levels of LDL and low HDL) and patients affected by other disease altering lipoprotein metabolism.
Training set composition.
| Dataset | Gene | Patients | Mutations | type | rate |
|---|---|---|---|---|---|
| Autosomal Dominant Hypercholesterolemia | 13 | 9 | heterozygous | 5, 7 | |
| 7 | 1 | heterozygous | 5, 7 | ||
| 1 | 1 | homozygous | 5, 7 | ||
| 12 | 2 | heterozygous | 5, 7 | ||
| Other disease altering lipoprotein metabolism | 7 | 3 | 6 heterozygous, 1 compound heterozygous | 8, 16, 17 | |
| 1 | 1 | heterozygous | 21 | ||
| 17 | 2 | heterozygous | 9 | ||
| 7 | 4 | homozygous | 9 | ||
| 2 | 1 | heterozygous | 18 |
Disease, gene, number of patients with a mutation in that gene, number of different mutations, type of mutation (heterozygous, homozygous or compound heterozygous), rates representing that gene in the model
Models predictions percentage error on elements of the test set.
| Mutation | Gene | Predicted van de Pas et al | Predicted Reggiani et al | ||||
|---|---|---|---|---|---|---|---|
| HDL | LDL | TC | HDL | LDL | TC | ||
| 1 | LDLR | -30.83 | 35.05 | 29.65 | -13.52 | -14.82 | -13.69 |
| 2 | APOB | -36.7 | 139.71 | 115.91 | 11.53 | -26.23 | -20.45 |
| 3 | APOB | -51.96 | 62.66 | 49.05 | -35.34 | -12.89 | -15.14 |
| 4 | ABCA1 | 147.99 | -57.44 | -44.77 | 200.05 | -51.25 | -35.99 |
| 5 | APOE | NA | NA | -27.4 | NA | NA | -53.15 |
| 6 | CETP | 12.7 | -4.82 | -0.72 | 34.04 | -11.53 | -0.46 |
| 7 | LCAT | 34.3 | -0.66 | 21.7 | 39.18 | -3.08 | 20.55 |
| 8 | LCAT | 679.37 | -19.37 | 10.11 | 426.32 | 21.95 | 29.87 |
| 9 | DHCR7 | NA | NA | 171.51 | NA | NA | 171.51 |
| 10 | CYP7A1 | -4.42 | -42.09 | -34.15 | 1.37 | -50.07 | -40.81 |
Mutation numeric ID, gene, HDL, LDL and total cholesterol error (as percentage of experimental value), of predictions based on f as reported by van de Pas and colleagues[9], or trained f.
Experimental and predicted cholesterol levels of the test set.
| Mutation | Gene | Experimental value | Predicted van de Pas et al | Predicted Reggiani et al | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| HDL | LDL | TC | HDL | LDL | TC | HDL | LDL | TC | ||
| 1 | LDLR | 0.86 | 2.17 | 1.85 | 0.59 | 2.93 | 2.4 | 0.74±0.17 | 1.85±1.31 | 1.6±0.97 |
| 2 | APOB | 0.85 | 1.52 | 1.36 | 0.54 | 3.64 | 2.94 | 0.95±0.07 | 1.12±0.21 | 1.08±0.14 |
| 3 | APOB | 1.12 | 2.24 | 1.97 | 0.54 | 3.64 | 2.94 | 0.72 | 1.95 | 1.67 |
| 4 | ABCA1 | 0.22 | 1.42 | 1.07 | 0.55 | 0.6 | 0.59 | 0.66±0.19 | 0.69±0.15 | 0.68±0.16 |
| 5 | APOE | NA | NA | 2.8 | 0.65 | 2.44 | 2.03 | 0.84±0.13 | 1.45±0.57 | 1.31±0.41 |
| 6 | CETP | 1.1 | 0.98 | 1.01 | 1.24 | 0.93 | 1 | 1.47 | 0.87 | 1.01 |
| 7 | LCAT | 0.79 | 0.97 | 0.81 | 1.06 | 0.96 | 0.99 | 1.1±0.16 | 0.94±0.11 | 0.98±0.05 |
| 8 | LCAT | 0.19 | 0.82 | 0.77 | 1.48 | 0.66 | 0.85 | 1 | 1 | 1 |
| 9 | DHCR7 | NA | NA | 0.2 | 1.13 | 0.37 | 0.54 | 1.13±0.01 | 0.37±0.04 | 0.54±0.03 |
| 10 | CYP7A1 | 0.97 | 2.09 | 1.74 | 0.93 | 1.21 | 1.15 | 0.98±0.02 | 1.04±0.06 | 1.03±0.04 |
Mutation numeric ID, gene, HDL, LDL and total cholesterol, from wet lab experiments[9], from predictions based on f as reported by van de Pas and colleagues[9], or trained f with standard deviation.
Models performances on the whole test set.
| Prediction | PCC | KCC | MAE | RMSD | R2 |
|---|---|---|---|---|---|
| van de Pas et al predicted HDL ratio | -0.22 | -0.18 | 0.4 | 0.54 | 0.05 |
| Reggiani et al predicted HDL ratio | 0.32 | 0 | 0.32 | 0.4 | 0.11 |
| van de Pas et al predicted LDL ratio | 0.43 | ||||
| Reggiani et al predicted LDL ratio | 0.5 | ||||
| van de Pas et al predicted TC ratio | |||||
| Reggiani et al predicted TC ratio |
Cholesterol level and predictor: Pearson Correlation Coefficient, Kendall rank Correlation Coefficient, Root Mean Squared Error, Mean Absolute Error and R-squared index computed on the test set. Values in bold have a p-value lower than 0.05, computed as the probability of obtaining an index better than the original one in a distribution of 10000 random scores, generated by a bootstrap procedure.
Fig 3Model response in terms of HDL, LDL and total blood cholesterol at different values of f.
The effect of reducing model rates, involved in the test procedure, on HDL, LDL or total cholesterol levels.