| Literature DB >> 35444228 |
Wei-Ru Lu1, Wen-Tse Yang1,2, Justin Chu1, Tung-Han Hsieh1, Fu-Liang Yang3.
Abstract
Personalized modeling has long been anticipated to approach precise noninvasive blood glucose measurements, but challenged by limited data for training personal model and its unavoidable outlier predictions. To overcome these long-standing problems, we largely enhanced the training efficiency with the limited personal data by an innovative Deduction Learning (DL), instead of the conventional Induction Learning (IL). The domain theory of our deductive method, DL, made use of accumulated comparison of paired inputs leading to corrections to preceded measured blood glucose to construct our deep neural network architecture. DL method involves the use of paired adjacent rounds of finger pulsation Photoplethysmography signal recordings as the input to a convolutional-neural-network (CNN) based deep learning model. Our study reveals that CNN filters of DL model generated extra and non-uniform feature patterns than that of IL models, which suggests DL is superior to IL in terms of learning efficiency under limited training data. Among 30 diabetic patients as our recruited volunteers, DL model achieved 80% of test prediction in zone A of Clarke Error Grid (CEG) for model training with 12 rounds of data, which was 20% improvement over IL method. Furthermore, we developed an automatic screening algorithm to delete low confidence outlier predictions. With only a dozen rounds of training data, DL with automatic screening achieved a correlation coefficient ([Formula: see text]) of 0.81, an accuracy score ([Formula: see text]) of 93.5, a root mean squared error of 13.93 mg/dl, a mean absolute error of 12.07 mg/dl, and 100% predictions in zone A of CEG. The nonparametric Wilcoxon paired test on [Formula: see text] for DL versus IL revealed near significant difference with p-value 0.06. These significant improvements indicate that a very simple and precise noninvasive measurement of blood glucose concentration is achievable.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35444228 PMCID: PMC9021306 DOI: 10.1038/s41598-022-10360-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Comparison of PPG based NIBG with personalized models.
| References | Recruited subjects | Accuracy (zone A ratio of CEG) | Training Rounds for modeling | Time span between training and testing | Input data | Method | Age of population | |
|---|---|---|---|---|---|---|---|---|
| V.P. Rachim et al.[ | 12 healthy subjects | 100% | ~ 20 | 1 day | 24 features from PPG | Linear partial least squares regression | Not reported | |
| Al-dhaheri et al.[ | 10 healthy subjects | > 90% | > 30 | Not reported | PPG signal voltage | Linear regression | 20–36 | |
| Shu-jen Yeh et al.[ | 2 diabetes and 1 healthy subject | 90% or less, subject dependent | 3–4 day, with 15 min interval | 1–13 days | Temperature-modulated reflectance signal | linear least square regression, retrieving training data for best model fitting | 50–58 | |
| This work | 30 diabetic subjects | 100% (with auto-screening); 80% (w/o screening) | 12 | 20–85 days | PPG signal | Deduction Learning | 42–76 | |
Performance of models with 1st to 12th rounds as training and rounds 13, 14, 15 as testing (each paired with round 12).
| Methods | DL + S | DL | IL | RF | RF |
|---|---|---|---|---|---|
| Accuracy score ( | 93.50 | 88.21 | 81.47 | 80.46 | 82.13 |
| Mean absolute error (MAE) [mg/dl] | 12.07 | 25.96 | 38.38 | 40.46 | 37.19 |
| Root mean squared error (RMSE) [mg/dl] | 13.93 | 36.25 | 46.51 | 53.18 | 48.88 |
| Pearson correlation coefficient ( | 0.81 | 0.44 | 0.2 | −0.08 | −0.018 |
| A-Zone ratio | 100% | 80% | 55% | 60% | 60% |
Our preliminary tests of personalized Random Forest (RF) model are also presented here. The A-Zone ratio is the ratio of data points located in the zone A of CEG plot, and , MAE, RMSE, and are defined in Eqs.4, 5, 6, and 7, respectively. Since each sample has its value, the average and standard deviation for each model were presented.
Figure 1Schematic diagrams of IL and DL models. (a) Training of IL model. (b) Training of DL model. Each diamond block in (a) and triangular block in (b) represent a single personal model (NN) and a differential cell (DC), respectively, which take PPG signals as input and predicted BGL as output. Both NN and DC share similar CNN architecture (see Supplementary Data Fig. 5). The PPG signals , , , are recorded in chronological order, with their corresponding reference glucose levels , , , . For (b), when and connect to the input of the DC (see Supplementary Data Fig. 2b), the loss between the reference and the output prediction will be minimized by the backpropagation of the model. (c) 1-channel input of signal segment convolves with a specific filter, that generates a simpler pattern of features (e.g., the reverse of the input signal). (e) 2-channel input generates extra and non-uniform features beyond the original signals. (d), (f) One window of the overlapped output of 256 filters from the first CNN layer of IL and DL, respectively.
Figure 2Comparison of accuracy scores (R) of each method on all 30 subjects in each test round. (a) The accuracy score distribution in a box plot. (b) The mean accuracy scores in a line chart.
Performance Summary of models in groups of rounds.
| Model | Ratio in zone A of CEG | Accuracy score ( | Pearson correlation coefficient ( | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Rounds | 8–11 | 12–15 | 4–15 | 4–7 | 8–11 | 12–15 | 4–15 | 4–7 | 8–11 | 12–15 |
| IL | 58.2% | 61.1% | 76.29 | 76.11 | 76.54 | 76.50 | 0.554 | 0.578 | 0.540 | 0.417 |
| DL | 76.9% | 80.6% | 80.62 | 77.68 | 83.90 | 87.70 | 0.640 | 0.606 | 0.674 | 0.782 |
| DL + S | 85.1% | 100% | 84.70 | 81.90 | 86.84 | 93.87 | 0.725 | 0.646 | 0.812 | 0.960 |
Figure 3Clarke Error Grid (CEG) plots of model predictions: (a) IL. (b) DL. (c) DL + S. The data points are grouped into three categories: green symbols are results of 4th to 7th rounds, blue symbols are results of 8th to 11th rounds, and red symbols are results of 12th to 15th rounds, respectively. The table below each figure lists the proportion of data points found in each zone of the CEG. For DL and DL + S, all data points of rounds 12 to 15 are found in zone A and B.
Figure 4Comparison of DL + S predictions for patients with and without insulin treatment at rounds 12–15.
Figure 5Predictions of rounds 13–15 (each paired with round 12) by models built with a dozen rounds of training data (rounds 1–12) for (a) IL, (b) DL, and (c) DL + S.