| Literature DB >> 29921843 |
Martti Juhola1, Henry Joutsijoki2, Kirsi Penttinen3, Katriina Aalto-Setälä3,4.
Abstract
Human induced pluripotent stem cell-derived cardiomyocytes (hiPSC-CMs) have revolutionized cardiovascular research. Abnormalities in Ca2+ transients have been evident in many cardiac disease models. We have shown earlier that, by exploiting computational machine learning methods, normal Ca2+ transients corresponding to healthy CMs can be distinguished from diseased CMs with abnormal transients. Here our aim was to study whether it is possible to separate different genetic cardiac diseases (CPVT, LQT, HCM) on the basis of Ca2+ transients using machine learning methods. Classification accuracies of up to 87% were obtained for these three diseases, indicating that Ca2+ transients are disease-specific. By including healthy controls in the classifications, the best classification accuracy obtained was still high: approximately 79%. In conclusion, we demonstrate as the proof of principle that the computational machine learning methodology appears to be a powerful means to accurately categorize iPSC-CMs and could provide effective methods for diagnostic purposes in the future.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29921843 PMCID: PMC6008430 DOI: 10.1038/s41598-018-27695-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Example signals of Ca2+ transients. (a) 10 s from a normal LQT1 signal; entire peaks were detected as normal by the peak detection algorithm. (b) From an abnormal LQT1 signal; four peaks were detected as normal, but the four small peaks marked with purple asterisks at the tops as abnormal. (c) From a normal HCM signal; peaks detected as normal. (d) From an abnormal HCM signal; four peaks marked with purple detected as abnormal. (e) From a normal CPVT signal; peaks detected as normal. (f) From an abnormal CPVT; four peaks detected as abnormal marked with purple. (g) From a normal control signal. (h) From an abnormal control signal abnormal peaks marked with purple. The maxima (tops) of normal peaks are marked with green vertical bars; the beginnings and endings of all peaks are marked with blue vertical bars.
Figure 2The first peak classified as normal from Fig. 1e. In the left panel, the peak curve is s. Variables: Left amplitude A is the difference between curve locations of peak beginning s(a) and maximum at location s(c). Right amplitude A is the difference between end s(g) and s(c). Duration D of the peak left side is time difference from a to c along the horizontal axis. Duration D of the peak right side is time difference from c to g. Peak-to-peak time difference Δ is normally computed from the current peak maximum to that of the preceding peak. Exceptionally for the first peak of the signal, it is time difference from the first peak maximum to the signal beginning. The surface area R is formed by curve s and line from s(a) to s(g). In the middle panel, the first derivative s′ of the peak contains the maximum at location s′(b) and minimum at s′(e). In the rightmost panel, the second derivative s′′ is obtained by extracting the right segment from c to g based on the peak of the left panel and its minimum is at location s′′(d) and maximum at s′′(f). (The horizontal axis is scaled in seconds to express time clearly, but computation was performed according to the formulas given in the section entitled “methods′′. The symbols a, b, c, d, e, f and g are index values of signals. The directly time-related variables are computed by dividing them with the sampling frequency F.).
Means and standard deviations for the ten peak variables of Ca2+ transient signals.
| Disease or control | Variables | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
| max | |min s’| | max | |min |
| Δ [s] | |||
| LQT1 | 170 ± 79 | 172 ± 80 | 0.33 ± 0.18 | 0.68 ± 0.40 | 817 ± 472 | 508 ± 259 | 1,615 ± 1,324 | 1,208 ± 1,432 | 58 ± 42 | 1.17 ± 0.92 |
| HCM | 191 ± 89 | 193 ± 91 | 0.23 ± 0.12 | 0.43 ± 0.24 | 1,990 ± 920 | 1,052 ± 469 | 6,420 ± 3,382 | 3,235 ± 2,905 | 43 ± 36 | 0.72 ± 0.47 |
| CPVT | 229 ± 176 | 232 ± 176 | 0.34 ± 0.20 | 0.63 ± 0.43 | 1,349 ± 1,064 | 812 ± 541 | 2,895 ± 2,535 | 2,106 ± 2,709 | 85 ± 103 | 1.13 ± 0.94 |
| WT | 320 ± 189 | 323 ± 190 | 0.46 ± 0.21 | 0.79 ± 0.36 | 2,189 ± 1,203 | 1,161 ± 667 | 5,122 ± 3,432 | 4,048 ± 4,151 | 130 ± 104 | 1.48 ± 0.75 |
Results for 1635 peaks of LQT1, 1344 peaks of HCM, 2311 peaks of CPVT diseases and 1216 peaks of controls (WT): amplitude of peak left side A, amplitude of peak right side A, duration of peak left side D, duration of peak right side D, maximum of the first derivative s′ on the left side of a peak, absolute minimum s′ of the first derivative on the right side of a peak, maximum of the second derivative s′′ on the right side of a peak, absolute minimum s′′ of the second derivative of the right side of a peak, area R of a peak, and time difference Δ from peak to peak.
Both normal and abnormal signals of three diseases.
| Classification method | TPR of LQT1 | TPR of HCM | TPR of CPVT | Accuracy |
|---|---|---|---|---|
| 86.7 | 87.3 | 80.7 | 83.2 | |
| 94.4 | 84.5 | 79.0 | 83.5 | |
| 91.1 | 85.9 | 81.5 | 84.5 | |
| Random forests, 35 trees | 88.9 | 84.5 | 88.0 | |
| LS-SVM cubic kernel, parameter | 87.8 | 78.9 | 85.4 | 84.8 |
| LS-SVM RBF kernel, parameters | 90.0 | 78.9 | 88.4 |
True positive rates (TPR, %) for LQT1, HCM and CPVT diseases, with 90, 71 and 233 signals respectively, and accuracy (%) of all signals (kNN is k nearest neighbor searching method and LS-SVM RBF least square support vector machine with a radial basis function kernel). The best accuracies are bolded.
Normal and abnormal signals together of three diseases and controls.
| Classification method | TPR of LQT1 | TPR of HCM | TPR of CPVT | TPR of WT | Accuracy |
|---|---|---|---|---|---|
| 93.3 | 76.1 | 70.4 | 68.4 | 74.6 | |
| 93.3 | 74.6 | 71.7 | 68.4 | 75.0 | |
| 91.1 | 76.1 | 71.7 | 69.2 | 75.0 | |
| 87.8 | 80.3 | 71.7 | 69.2 | 75.0 | |
| 87.8 | 80.3 | 71.7 | 69.2 | 75.0 | |
| 94.4 | 78.9 | 71.2 | 66.9 | 75.1 | |
| Random forests, 54 trees | 88.9 | 81.7 | 76.8 | 72.9 | |
| LS-SVM RBF kernel, parameters | 85.6 | 71.8 | 70.8 | 78.2 |
True positive rates (TPR, %) of LQT1, HCM, CPVT diseases and controls (WT) with 90, 71, 233 and 133 signals respectively and accuracy (%) of all signals (kNN is k nearest-neighbor searching method and LS-SVM least square support vector machine). The best accuracy is bolded.