| Literature DB >> 35890781 |
Heejin Lee1, Junghwan Lee2, Yujin Kwon1, Jiyoon Kwon1, Sungmin Park3, Ryanghee Sohn4, Cheolsoo Park1.
Abstract
Heart and respiration rates represent important vital signs for the assessment of a person's health condition. To estimate these vital signs accurately, we propose a multitask Siamese network model (MTS) that combines the advantages of the Siamese network and the multitask learning architecture. The MTS model was trained by the images of the cheek including nose and mouth and forehead areas while sharing the same parameters between the Siamese networks, in order to extract the features about the heart and respiratory information. The proposed model was constructed with a small number of parameters and was able to yield a high vital-sign-prediction accuracy, comparable to that obtained from the single-task learning model; furthermore, the proposed model outperformed the conventional multitask learning model. As a result, we can simultaneously predict the heart and respiratory signals with the MTS model, while the number of parameters was reduced by 16 times with the mean average errors of heart and respiration rates being 2.84 and 4.21. Owing to its light weight, it would be advantageous to implement the vital-sign-monitoring model in an edge device such as a mobile phone or small-sized portable devices.Entities:
Keywords: Siamese network; contactless technique; deep learning; heart rate; multitasking; respiration rate
Mesh:
Year: 2022 PMID: 35890781 PMCID: PMC9321619 DOI: 10.3390/s22145101
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1Multitask Siamese (MTS) network architecture. The forehead and cheeks are extracted as regions of interest (ROIs) from the facial image, which enter the weighted networks as forehead and cheek streams, respectively. Cardiac (photoplethysmography (PPG)) signals can be obtained based on the value from the last layer of the Siamese network, and the respiratory signal is estimated after an additional dense layer in the last part of the Siamese network.
Figure 2Structures of multitask Siamese network MTS. The input data are 140(w) × 40(h) images of the forehead and cheek as three RGB channels with 600 frames. The dimension of the input data is ‘frame × width × height × channel’. The output shows the PPG and the respiratory signals as 600 frames.
Figure 3Leaky ReLU function.
Figure 4Preprocessing of the COHFACE dataset. Green dots are decided by the facial recognition library of “dlib”. The forehead and cheeks of the subject are detected by the green and red areas shown above.
Benchmark test results comparing our single-task model with the latest research results within five years. Heart rates (HR) are calculated in beats per minute (BPM), and the model performances are evaluated using the metrics of Pearson correlation coefficient (R), mean average error (MAE), and root-mean-squared error (RMSE).
| Model | HR (BPM) | ||
|---|---|---|---|
| R | MAE | RMSE | |
| Siamese rPPG network [ | 0.73 | 0.70 | 1.29 |
| Model by Z.-K. Wang et al. [ | 0.40 | 8.09 | 9.96 |
| ETA-rPPGNet [ | 0.77 | 4.67 | 6.65 |
| Model by Y.-Y. Tsou et al. [ | 0.72 | 0.68 | 1.65 |
|
|
|
|
|
Benchmark test results comparing multitask Siamese network model (MTS) with other multitask models. Heart rates (HR) and respiration rates (RR) are calculated in beats per minute (BPM) and respiration per minute (RPM), and the model performances are evaluated using the metrics of Pearson correlation coefficient (R), mean average error (MAE), and root-mean-squared error (RMSE).
| Model | HR (BPM) | RR (RPM) | |||
|---|---|---|---|---|---|
| R | MAE | RMSE | MAE | RMSE | |
| Multitask temporal shift convolutional attention network (MTTS-CAN) [ | 0.20 | 7.97 | 10.38 | 9.0 | 9.50 |
|
|
|
|
|
|
|
The number of parameters of the multitask Siamese network model (MTS) compared with the other models for the predictions of the heart rates and respiration rates.
| Model | # of Parameters |
|---|---|
| Siamese rPPG network [ | 11.80 M |
| Multitask temporal shift convolutional attention network (MTTS-CAN) [ | 0.93 M |
|
|
|
|
|
|
Figure 5The HR correlation results. (a) HR correlation results predicted using the proposed MTS model. (b) HR correlation results predicted using the MTTS-CAN model. Scatter plots display the relationship between the HR predictions and their true labels for the proposed MTS model and the other conventional models. MTS produced relatively comparable or even higher correlations for the predictions of HR compared with the other models.
Figure 6Examples of the predicted PPG and respiratory signals. (a) An example of the predicted PPG signal. (b) An example of the predicted respiratory signal. Predicted PPG and respiratory signals are plotted in red lines and their true labels in blue lines. Note the high similarity of the MTS model’s predictions to the true labels.
Figure 7Learning curves of the rPPG-prediction models. (a) Learning curves of the Siamese network with convolution block attention module (CBAM) for the prediction of the PPG signal. (b) Learning curve of the Siamese network with CBAM for the prediction of the respiratory signal. (c) Learning curve of MTS for the predictions of both PPG and respiratory signals. Learning curves (shown as loss) of the Siamese network with CBAM and MTS for the predictions of PPG and respiratory signals are displayed over a span of 250 epochs.
Benchmark performance of the various rPPG-prediction models on the COHFACE dataset [25]. 2SR, CHROME, and LiCVPR are the traditional signal processing-based methods, while HR-CNN and Two stream are data-driven and machine learning-based algorithms.
| Method | MAE | RMSE |
|---|---|---|
| 2SR [ | 20.98 | 25.84 |
| CHROME [ | 7.80 | 12.45 |
| LiCVPR [ | 19.98 | 25.59 |
| HR-CNN [ | 8.10 | 10.78 |
| Two stream [ | 8.09 | 9.96 |