| Literature DB >> 33046816 |
Hyun Hwang1, Rui Liu1,2, Joshua T Maxwell1, Jingjing Yang3, Chunhui Xu4,5.
Abstract
Human-induced pluripotent stem cell-derived cardiomyocytes (hiPSC-CMs) provide an excellent platform for potential clinical and research applications. Identifying abnormal Ca2+ transients is crucial for evaluating cardiomyocyte function that requires labor-intensive manual effort. Therefore, we develop an analytical pipeline for automatic assessment of Ca2+ transient abnormality, by employing advanced machine learning methods together with an Analytical Algorithm. First, we adapt an existing Analytical Algorithm to identify Ca2+ transient peaks and determine peak abnormality based on quantified peak characteristics. Second, we train a peak-level Support Vector Machine (SVM) classifier by using human-expert assessment of peak abnormality as outcome and profiled peak variables as predictive features. Third, we train another cell-level SVM classifier by using human-expert assessment of cell abnormality as outcome and quantified cell-level variables as predictive features. This cell-level SVM classifier can be used to assess additional Ca2+ transient signals. By applying this pipeline to our Ca2+ transient data, we trained a cell-level SVM classifier using 200 cells as training data, then tested its accuracy in an independent dataset of 54 cells. As a result, we obtained 88% training accuracy and 87% test accuracy. Further, we provide a free R package to implement our pipeline for high-throughput CM Ca2+ analysis.Entities:
Mesh:
Substances:
Year: 2020 PMID: 33046816 PMCID: PMC7550597 DOI: 10.1038/s41598-020-73801-x
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Overall workflow of machine learning method in this study. (a) Ca2+ transient data of 200 cells and 1893 peaks were collected and analyzed to train the peak- and cell-level SVM models, which were validated via LOOCV. (b) Test data of 54 cells and 454 peaks were used to implement the machine learning tool to yield final cell status prediction.
Peak variable averages and their standard deviations of the test data.
| Peak-level variables | Expert normal | Expert abnormal | SVM normal | SVM abnormal |
|---|---|---|---|---|
| A_l | 0.04 ± (0.95) | − 0.17 ± (1.18) | 0.05 ± (0.98) | − 0.13 ± (1.04) |
| A_r | 0.24 ± (0.90) | − 1.08 ± (0.68) | 0.36 ± (0.08) | − 0.91 ± (0.87) |
| A_d | 0.26 ± (0.50) | − 1.19 ± (1.64) | 0.40 ± (0.19) | − 1.01 ± (1.42) |
| D_l | 0.09 ± (1.01) | − 0.39 ± (0.83) | 0.14 ± (0.99) | − 0.35 ± (0.94) |
| D_r | 0.23 ± (0.94) | − 1.05 ± (0.44) | 0.33 ± (0.82) | − 0.84 ± (0.92) |
| Dy_max | 0.03 ± (0.97) | − 0.14 ± (1.14) | 0.02 ± (1.02) | − 0.04 ± (0.94) |
| Dy_min | 0.00 ± (0.93) | − 0.01 ± (1.28) | − 0.10 ± (0.88) | 0.26 ± (1.22) |
| D2y_max | − 0.06 ± (0.95) | 0.26 ± (1.18) | − 0.05 ± (0.88) | 0.12 ± (1.25) |
| D2y_min | − 0.23 ± (0.68) | 1.04 ± (1.46) | − 0.36 ± (0.39) | 0.90 ± (1.42) |
| R | 0.20 ± (0.98) | − 0.90 ± (0.40) | 0.27 ± (0.94) | − 0.68 ± (0.79) |
| delta | 0.11 ± (1.02) | − 0.50 ± (0.73) | 0.10 ± (0.74) | − 0.25 ± (1.44) |
| delta_l2Dymax | 0.06 ± (0.98) | − 0.26 ± (1.05) | 0.14 ± (0.99) | − 0.34 ± (0.96) |
| delta_m2Dymin | 0.08 ± (1.04) | − 0.37 ± (0.68) | 0.08 ± (1.04) | − 0.20 ± (0.86) |
| Peak_distance_median | 0.12 ± (0.94) | − 0.54 ± (1.09) | 0.15 ± (0.71) | − 0.39 ± (1.43) |
A 2 × 2 comparison of the peak classification by expert and SVM is shown in Supplementary Table S1 online.
n, number of peaks.
Cell variable averages and their standard deviations of the test data.
| Assessment | Number of cells | Cell-level variables | |||
|---|---|---|---|---|---|
| prop_abnormal | var_A | var_delta | var_R | ||
| Expert normal | 18 | − 0.71 ± (0.24) | − 0.48 ± (0.08) | − 0.33 ± (0.01) | − 0.38 ± (0.11) |
| Expert abnormal | 36 | 0.35 ± (1.05) | 0.24 ± (1.15) | 0.16 ± (1.20) | 0.19 ± (1.18) |
| SVM normal | 19 | − 0.81 ± (0.00) | − 0.51 ± (0.04) | − 0.26 ± (0.16) | − 0.43 ± (0.02) |
| SVM abnormal | 35 | 0.44 ± (1.00) | 0.28 ± (1.15) | 0.14 ± (1.22) | 0.23 ± (1.18) |
A 2 × 2 comparison of the cell classification by expert and SVM is shown in Supplementary Table S1 online.
Figure 2Cells under fluorescence imaging and their peak signals. (a) An example of hiPSC-CMs stained with Fluo-4 fluorescing under 488 nm light. (b) An example of Ca2+ transient signal visualized with detected peaks marked. Number of frames on the x-axis and fluorescence intensity on the y-axis. (c) Examples of Ca2+ transient signal visualized by human expert. Red arrows denote abnormal peaks and green arrows denote inconsistent periods.
Figure 3An example of Ca2+ transient signal, peak, its first derivative, second derivative, and peak-level variables. Out of 14 peak-level variables, 10 are indicated: (a) delta, (b) peak left amplitude (A_l), peak right amplitude (A_r), left peak duration (D_l), right peak duration (D_r), (c) maximum value of left side first derivative (Dy_max), absolute minimum of right side first derivative (Dy_min), (d) maximum of right side second derivative (D2y_max), and absolute minimum of right side second derivative (D2y_min). The peak variables extracted were used for peak status prediction via SVM modeling.
Peak abnormality assessment accuracy.
| Method | Accuracy (%) | Sensitivity (%) | Specificity (%) |
|---|---|---|---|
| Analytical algorithm | 93.3 | 91.1 | 95.8 |
| SVM-LOOCV | 92.2 | 91.8 | 95.3 |
All accuracy metrics were generated by taking expert cell assessments as the truth and considered for 1893 peaks in the training dataset.
Cell abnormality assessment accuracy.
| Dataset | Method | Accuracy (%) | Sensitivity (%) | Specificity (%) |
|---|---|---|---|---|
| Training data | Analytical algorithm | 87.5 | 90.4 | 83.5 |
| SVM-LOOCV | 89.9 | 94.7 | 83.3 | |
| Test data | Analytical algorithm | 83.3 | 83.3 | 83.3 |
| SVM | 87.0 | 88.9 | 83.3 |
All accuracy metrics were generated by taking expert cell assessments as the truth and considered for 200 cells in the training data and 54 cells in the test dataset.
Figure 4ROC curve plot. (a) Training data ROC curve plot. (b) Test data ROC curve plot.