Alexander Trumpp1, Johannes Lohr2, Daniel Wedekind3, Martin Schmidt3, Matthias Burghardt2, Axel R Heller2, Hagen Malberg3, Sebastian Zaunseder3. 1. Institute of Biomedical Engineering, TU Dresden, Fetscherstraße 29, 01307, Dresden, Germany. alexander.trumpp@tu-dresden.de. 2. Department of Anesthesiology and Intensive Care Medicine, University Hospital, TU Dresden, Fetscherstraße 74, 01307, Dresden, Germany. 3. Institute of Biomedical Engineering, TU Dresden, Fetscherstraße 29, 01307, Dresden, Germany.
Abstract
BACKGROUND: Camera-based photoplethysmography (cbPPG) is a measurement technique which enables remote vital sign monitoring by using cameras. To obtain valid plethysmograms, proper regions of interest (ROIs) have to be selected in the video data. Most automated selection methods rely on specific spatial or temporal features limiting a broader application. In this work, we present a new method which overcomes those drawbacks and, therefore, allows cbPPG to be applied in an intraoperative environment. METHODS: We recorded 41 patients during surgery using an RGB and a near-infrared (NIR) camera. A Bayesian skin classifier was employed to detect suitable regions, and a level set segmentation approach to define and track ROIs based on spatial homogeneity. RESULTS: The results show stable and homogeneously illuminated ROIs. We further evaluated their quality with regards to extracted cbPPG signals. The green channel provided the best results where heart rates could be correctly estimated in 95.6% of cases. The NIR channel yielded the highest contribution in compensating false estimations. CONCLUSIONS: The proposed method proved that cbPPG is applicable in intraoperative environments. It can be easily transferred to other settings regardless of which body site is considered.
BACKGROUND: Camera-based photoplethysmography (cbPPG) is a measurement technique which enables remote vital sign monitoring by using cameras. To obtain valid plethysmograms, proper regions of interest (ROIs) have to be selected in the video data. Most automated selection methods rely on specific spatial or temporal features limiting a broader application. In this work, we present a new method which overcomes those drawbacks and, therefore, allows cbPPG to be applied in an intraoperative environment. METHODS: We recorded 41 patients during surgery using an RGB and a near-infrared (NIR) camera. A Bayesian skin classifier was employed to detect suitable regions, and a level set segmentation approach to define and track ROIs based on spatial homogeneity. RESULTS: The results show stable and homogeneously illuminated ROIs. We further evaluated their quality with regards to extracted cbPPG signals. The green channel provided the best results where heart rates could be correctly estimated in 95.6% of cases. The NIR channel yielded the highest contribution in compensating false estimations. CONCLUSIONS: The proposed method proved that cbPPG is applicable in intraoperative environments. It can be easily transferred to other settings regardless of which body site is considered.
In the last decade, a novel optical measuring technique called camera-based photoplethysmography (cbPPG) has gained a lot of attention. The technique permits the remote extraction of cardio-respiratory signals using conventional video cameras [1, 2]. Similar to the classical photoplethysmography (PPG), the signals are mainly modulated by blood volume changes in the cutaneous microvasculature [3]. However, cbPPG has the benefit of allowing a spatial assessment of the microcirculatory perfusion which provides a new diagnostic value [4].For a broad and convenient application of cbPPG, a region of interest (ROI) has to be detected and tracked automatically at suitable skin regions in the video recordings. The efficiency of ROI selection eventually determines the quality and validity of the extracted plethysmograms and is, therefore, a crucial step. Facial regions are a good candidate since they are most often accessible and because the cutaneous perfusion is relatively high there [5]. In the past, the vast majority of works used face or facial landmark detection combined with subsequent redetection or tracking of selected features to (pre-)define ROIs in the context of cbPPG (e.g. [6-10]). However, such approaches rely on the visibility of certain anatomical areas and might fail if the face is partly occluded or rotated. Even if they succeed, a selected ROI could still be blocked, for example by hair. These problems may not be relevant in controlled environments, like the laboratory, but have to be considered in clinical or public settings [11, 12].One way to reduce the dependence on facial features is to include the time component in the selection process (e.g. [8, 13–16]). For that purpose, the image or a predefined ROI is blurred or divided into small sub-ROIs. The extracted signals from those pixels/sub-ROIs are then assessed for further use in terms of variations related to the cardiac cycle. A lot of those approaches nevertheless involve an initial ROI definition. Furthermore, they all rely on a distinct manifestation of the cardiac pulsation, which is most likely dominant for young and healthy subjects, but certainly diminished in older and vascular diseased subjects, and consequently, hard to determine when using small image regions [2, 3]. Another way to select facial ROIs is to utilize skin classifiers which detect proper areas based on the skin’s appearance in various color spaces. Most of those works, however, still combine the classifiers with face or facial landmark detection (e.g. [17-19]). There are only a few exceptions that either not exploit the found skin regions or focus, again, on the time component (signal processing) to obtain valid cbPPG signals and vital parameters [20-22].Recently, Moço et al. [23, 24] revealed how ballistocardiographic (BCG) effects degrade the wanted blood volume signal in cbPPG. The group showed that for the face, these effects are mainly present if the light source is not orthogonally directed towards the skin surface and the ROI is not homogeneously illuminated. For this reason, the selection of spatially homogeneous ROIs is essential to achieve pure cbPPG signals. Previous approaches, which in some way considered the ROI’s homogeneity, employed intensity thresholds, exploited regional means and standard deviations or clustered areas based on the lightness component [9, 25, 26]. For the eventual application, all those methods depend on an initial face detection.In this paper, we propose a novel and fully automated ROI selection method that utilizes level set segmentation to minimize the influence of BCG artifacts. The method (i) does not rely on the detection of anatomical features, (ii) chooses and tracks visible skin regions which are homogeneously illuminated, and (iii) solely operates on the image plane without being reliant on the presence of temporal variations related to the cardiac cycle. We demonstrate the applicability of our method for the face area of 41 patients which were recorded during surgery using a multi-camera setup. The performance was evaluated with respect to the quality of extracted cbPPG signals and correctly detected heart rates (HRs). To the best of our knowledge, only Rubīns et al. [27, 28] applied cbPPG so far in an intraoperative environment analyzing the inner hand area.
Methods
Data and setup
Our study was conducted at the Department of Anesthesiology and Intensive Care Medicine (University Hospital Carl Gustav Carus) in Dresden. It was authorized by the Institutional Review Board at TU Dresden (IRB00001473, EK168052013) and was in accordance with the Helsinki Declaration. We included 41 elderly patients in the cbPPG analyses of whom each had to give written consent. All clinically relevant information about the volunteers such as their medical history was logged. We recorded the patients for approximately 30 min while they underwent surgery on the torso or extremities. Important events during the surgical procedure and interventions by anesthetists were also tracked. Table 1 summarizes the most important characteristics of the patient group. As depicted, almost half of the participants had a relevant degree of vascular disease (e.g. stenosis, varicosis, thrombosis, hypovolemia, artery occlusive disease). Consequently, the strength of the blood volume pulse in the microvasculature might have been affected limiting the extraction of valid cbPPG signals.
Table 1
Important characteristics of the patient group
Characteristic
Value
Age (in years)
65.2 ± 12.0
Female/ male (number)
17/24
Body mass index (in kg/m2)
26.1 ± 4.6
NYHA (number)a
0—not examined
4
1—no problems
35
2—irrelevant problems
0
3—relevant problems
2
Vascular system (number)a
0—not examined
0
1—no problems
20
2—irrelevant problems
2
3—relevant problems
19
Duration surgery (in min)
157.3 ± 99.9
Duration video recording (in min)
32.0 ± 7.2
aThe categories stem from the ANDOK protocol. For the NYHA (New York Heart Association) classification, they describe the relevance of assistance based on the degree of heart failure
Important characteristics of the patient groupaThe categories stem from the ANDOK protocol. For the NYHA (New York Heart Association) classification, they describe the relevance of assistance based on the degree of heart failureCbPPG setup during surgery. (1) Construction with adjustable arm for the sensing system. (2) Sensing system (enlarged on the right) including NIR illumination, NIR camera, and RGB camera. (3) Recording PC. (4) Patient (face directed towards the cameras). (5) Surgeons and clinical staffFor video recording, we used a mobile measuring system that was already applied successfully in another clinical study [11, 12]. The system consists of a medical PC (ACL OR-PC 19) and a sensing component which are both mounted on a movable constructional framework (see Fig. 1). The sensing component encompasses two cameras (IDS Imaging Development Systems GmbH), a monochrome camera (UI-3370CP-NIR-GL) and an RGB camera (UI-3370CP-C-HQ), and a near-infrared (NIR) light source with four LED spots (Kingbright BL106-15-29). In combination with an additional NIR bandpass filter (MidOpt BP850) at the monochrome camera, the light source permitted a controlled measurement in the non-visible range (880 nm). We equipped both cameras with lenses by Schneider-Kreuznach (Cinegon 16/1.8) and set them up to a color depth of 12 Bit, a frame rate of 100 fps, and a resolution of pixels. Before each recording, the sensing component was aligned at a distance between 0.5 and 1 m over the head of the patient who was in a supine position (see Fig. 1). Due to general anesthesia, the subject was unconscious during the measurement. The illumination for the RGB video was defined by the surgical light above the table and by the room’s fluorescent lamps. For reference purposes, we also synchronously stored physiological signals from the patient monitor (e.g. photoplethysmogram) on our medical PC.
Fig. 1
CbPPG setup during surgery. (1) Construction with adjustable arm for the sensing system. (2) Sensing system (enlarged on the right) including NIR illumination, NIR camera, and RGB camera. (3) Recording PC. (4) Patient (face directed towards the cameras). (5) Surgeons and clinical staff
For our analyses, we aimed at using facial areas as ROI. However, the following obstacles in the intraoperative setting challenged the ROI selection process:In the next section, we describe the developed method that is able to tackle those problems.Face was often partly occluded by surgical drapePatient was moved by clinical staffMeasuring stand was relocatedStaff reached into recording areaOperation table was readjusted in heightIllumination varied due to moving staffPatient moved due to surgical procedure.
Image processing
The ROI selection algorithm, which is presented here, is an enhanced and more complex version of an approach that we successfully applied to recordings (only single camera) of patients in an intensive care unit [12]. The new algorithm allows to process the RGB and NIR video stream simultaneously. For that purpose, the two streams were synchronized leading to a frame-wise assignment in which the time component of both streams can be expressed by the same frame number . An image pair to a certain instant is then representable by the four channels
(red, green, blue, NIR) with being the spatial component.
Skin cassification
In our setup, common face detection algorithms, as used in [6, 7, 9], eventually failed due to the limited visibility of required features. To detect suitable regions that potentially provide physiological information, we employed a skin classifier by Jones and Rehg [29] on the (first) RGB image. The classifier has to be built once and is then generally applicable. First, two RGB histograms, one for the class and one for were constructed using over 13,000 labeled skin and non-skin color pictures that were made available by the authors. Second, the conditional probability density functions and were calculated by normalizing the histograms on the total number of counts. Eventually, the classifier could be derived from the Bayesian decision rule [30]. A pixel was classified as skin if [29]where is the pixel’s RGB value and a threshold which determines the ratio between the true positive and false positive classification rate. We found to be a good trade-off. Before skin detection, we adjusted the image intensity1 because we discovered this step to boost the classifier’s performance.
Segmentation
Since the classifier operates on a pixel level and does not take any local distributions into account, the outcome is usually insufficient and may not leave homogeneously illuminated skin regions (see Fig. 2a). To deal with this problem, we applied a segmentation approach by Brox et al. [31] which utilizes level set methods.
Fig. 2
Example for a segmentation process using level set methods. a Initialization point. b Point during segmentation. c Point when process has converged. The inside region and the outside region are implicitly described and changed by The contour is depicted separately in the images below the graphs. Please note that t represents the segmentation time for an image and does not refer to the time component in the videos. The eye section was blurred if it was visible
Level set methods for segmentation
Level set methods allow to describe an evolving segmentation contour C in an implicit manner using a function [32]. For a two-phase segmentation, there is an inside region and an outside region Let be an optimal ROI and non-suitable skin areas and the background (whole image region ). As might consist of numerous subregions that are not connected, an explicit description is challenging. This task is much easier when is employed to implicitly describe the image plane (see Fig. 2):
(’’ denotes ’implies’). The actual segmentation process is an optimization problem in which a selected energy functional is minimized. The minimization can be realized by a gradient descent and represents the propagation of the contour from an initialization point to an optimum In our case, the gradient descent reads [31]where H is the Heaviside function ( for , for and for ), the feature vector with M elements, and the conditional probability density functions for the regions (). The first term in the equation allows to separate and based on the distribution of the feature values in those regions. The second term is the curvature term which controls the contour’s smoothness with being the weighting factor [33].
Adaption and contribution
Level set methods are powerful techniques that are beyond the scope of basic image processing [32]. Previous works often performed ROI selection by applying conventional image processing ideas, i.e. face detection and feature point tracking. Here, we exploit the benefits of level set segmentation to additionally consider novel findings regarding the cbPPG signal’s origin. Therefore, we defined homogeneity as essential selection criterion since the respective regions are less impacted by BCG effects [23, 24]. To achieve homogeneously illuminated ROIs, we included the image intensity values in the vector F. Furthermore, a texture measure was chosen to also avoid inhomogeneities in the skin’s surface topology which cause artifacts in case of motion [34]. We determined J by calculating the local standard deviations for each color channel in neighborhoods of pixels. The vector could then be formulated as where is the mean of the single texture images
and During the segmentation process, pixels are assigned to based on the probability that the pixel’s intensity and texture values are similar enough to belong there. This probability was obtained using a Gaussian function [33]in which and are the mean and standard deviation of the values in given that . One of the most crucial steps in our ROI selection algorithm is the initialization of the segmentation. In order to obtain homogeneous skin regions, we set the outcome of the skin classification to The result represents our final ROI for the RGB image. Figure 2 depicts an example of a respective segmentation process.Example for a segmentation process using level set methods. a Initialization point. b Point during segmentation. c Point when process has converged. The inside region and the outside region are implicitly described and changed by The contour is depicted separately in the images below the graphs. Please note that t represents the segmentation time for an image and does not refer to the time component in the videos. The eye section was blurred if it was visible
Registration
A skin region, which appears homogeneous in the RGB image, might appear differently in the NIR image where LED spot lights were used. Therefore, we attempted to employ level set segmentation separately for the NIR image to find its most homogeneous skin regions. However, with being monochrome, the skin classifier was not applicable for initialization. The result from the RGB image could also not simply be assigned to the corresponding NIR image since the respective cameras had a different viewing angle in our setup (see Fig. 1). We decided to apply an intensity-based block-matching method to transfer Briefly, the green channel (less noisy than R and B channel) was divided into overlapping blocks of pixels at the ROI. For each block the best matching block in was then determined within a search area around the block location of The mean squared error (MSE) was chosen as the matching criterion [35]. Due to the different lighting conditions in the RGB and NIR video (see “Data and setup” section), we always mean adjusted the blocks that were compared. Therefore, the MSE readswhere and are the block means. A priori knowledge about the cameras’ positioning allowed us to limit the search area to and pixels. The outcome of the registration was set as the initialization state for the eventual segmentation process in which the feature vector read The final ROI was then defined by
Implementation and framework
The implementation of the presented method was realized in MATLAB R2016a. For the level set approach, we followed the suggestions by Osher and Fedkiw [32]. We shortly mention important aspects in that context but would like to refer the reader to their book for a detailed description. The partial differential Equation in (2) was solved numerically (forward Euler method) by an iterative procedure. The level set function was initialized employing a signed distance function (see Fig. 2a) and reinitialized after each iteration step. The derivative of the Heaviside function was replaced by a smooth delta function.Program structure of the presented ROI detection and tracking algorithm. a ROI detector which (initially) detects the skin, finds the ROI and registers and adapts the result for the NIR image. b Simplified flowchart of the whole program (detection and tracking) which runs separately for the RGB and NIR video. For some transitions between the program blocks, the data types are given (I: image, : adjusted image, : image region, : frame number). * pause after ROI reselectionFigure 3b depicts the basic flow chart of our ROI selection method. An essential part is the ROI detector of which the program structure is shown in Fig. 3a. The detector’s principle components were already explained in the previous sections yielding two ROIs for a given image pair (e.g. for ). For the segmentation components, we used 300 (RGB image) and 100 iteration steps (NIR image) to obtain and , respectively. These counts were determined empirically by selecting a broad variety of images and examining how many steps are at least necessary to reach a stabilized segmentation contour. The largest occurring step counts were rounded up and chosen for the whole data set. After detection, the ROIs were tracked separately in the RGB and NIR video streams. For that purpose, we also applied level set segmentation where the process for a frame was initialized by the ROI of the preceding frame: and . Since possible changes between two consecutive frames are generally minor, only 50 iteration steps were necessary for convergence. In fact, when the contour remained nearly unchanged between two steps (regional size difference pixels), the segmentation was stopped early. The key idea behind the tracking approach was to rather track the intensity/ texture with their homogeneity inside the skin region than anatomical features. In this way, abrupt changes in the light intensity could be avoided within the ROI. However, certain artifacts, such as the temporary occlusion of the recording area by the staff, caused problems during tracking. Either the ROI was quickly assigned to non-suitable areas or disappeared completely because skin was no longer visible. The latter problem could be easily detected and was treated by executing the ROI detector repeatedly until skin regions were found again. To tackle the first problem, we always checked the mean intensity in the ROI for the last 10 s. If its standard deviation exceeded 50 units, our requirement of having stable ROI conditions was considered to be violated and the ROI detector was executed. As redetection might also lead to major intensity variations over time, after reselection, we paused the artifact monitoring for 10 s (see Fig. 3b).
Fig. 3
Program structure of the presented ROI detection and tracking algorithm. a ROI detector which (initially) detects the skin, finds the ROI and registers and adapts the result for the NIR image. b Simplified flowchart of the whole program (detection and tracking) which runs separately for the RGB and NIR video. For some transitions between the program blocks, the data types are given (I: image, : adjusted image, : image region, : frame number). * pause after ROI reselection
Signal processing
After image processing, the cbPPG signals were extracted by averaging the ROIs’ pixel values for each frame and color channel. As a result, we obtained four signals (R, G, B, NIR) for each patient throughout the recording. The signals were divided into consecutive 10 s segments amounting to an average of segments per subject and channel. Since ROIs could not always be selected (see previous section), the cbPPG signals occasionally held empty entries. Any segment that contained such entries was disregarded for the following steps. Each signal segment was removed from its linear trend and further filtered using an FIR highpass (order: 250, cutoff frequency: 0.5 Hz). Next, the signals were zero-padded to points, and the Fast Fourier transform was performed. Hence, we were able to determine a segment-wise HR by detecting the maximum peak in the related amplitude spectrum within the range of 30 and 200 bpm. The same procedure was applied to calculate the reference HRs out of corresponding 10 s segments in the PPG monitor signal. In order to assess the quality of the cbPPG signals, we computed the signal-to-noise ratio (SNR) by adapting a formula of de Haan and Jeanne [36]where is defined asThe SNR considers the signal amplitudes around the true HR and its first harmonic in a ± 5 bpm band as the wanted component and the remaining amplitudes between 30 and 200 bpm as the noise component.
Evaluation and statistics
For each patient and color channel, signal processing provided between 103 and 368 HR and SNR values (dependent on recording time and artifacts) which were taken into account for evaluation. To analyze the two measures across all subjects, we built an individual HR detection rate (HDR) and a median SNR from those segment-related values. The HDR was determined as the relative number (in %) of HRs that deviated less than 5 bpm from the reference HRs. The segments which were excluded beforehand, due to missing ROIs, were treated as inputs where the HR was falsely detected.Our overall goal was to show how well the proposed ROI selection method performs in an intraoperative environment. We did not focus on further transformation techniques (e.g. source separation) to achieve the best possible HDR. Therefore, we assessed the results separately for each color channel. However, we regarded the NIR channel to be of special interest since a dedicated illumination setup was applied. For this reason, we tested whether the combination of the channel with the best performing channel (here green) yields a significantly better HDR outcome than only using the green channel. We also evaluated the combinations G&B and G&R for reference purposes. The HDR values of a combination resulted from the assumption that for a segment, always the correct HR (if available) can be selected between the two considered channels. The significance of the improvements was analyzed by employing a Wilcoxon sign rank test (one-tailed) as follows: G to G&B, G to G&R, and G to G&NIR.
Results
ROI selection
For all 41 patients, appropriate ROIs were automatically detected and tracked in both, the RGB and NIR video. As mentioned before, in some rare cases, the ROI was not determinable. For the RGB and the NIR videos, the average numbers of segments, which were affected by the absence of single ROIs, were generally low reaching a maximum of 8 and 31, respectively (see Fig. 4a). A further quality attribute of our method is how often the ROI detector had to be re-executed. Regarding the median value, in only 6 segments of the RGB videos and 2 segments of the NIR videos, the ROI was redetected over the duration of the recording (see Fig. 4b). In the “Implementation and framework” section, it was described that the ROI stability was considered compromised if the standard deviation of the mean ROI intensity exceeded 50. Figure 4c visualizes the respective segment counts proving an overall low ROI fluctuation.
Fig. 4
Reliability metrics of the ROI selection process. a Number of segments (NoS) per patient in which single ROIs were absent. b NoS in which the ROI detector had to be re-executed. c NoS in which the standard deviation of the mean ROI intensity exceeded 50 (see “Implementation and framework” section). Each boxplot depicts 41 patient-related values
Reliability metrics of the ROI selection process. a Number of segments (NoS) per patient in which single ROIs were absent. b NoS in which the ROI detector had to be re-executed. c NoS in which the standard deviation of the mean ROI intensity exceeded 50 (see “Implementation and framework” section). Each boxplot depicts 41 patient-related valuesFigure 5 shows the selected ROIs of six patients at different states in the videos. The examples represent the strength of our approach being robust against illumination changes, limitations in the face’s visibility, and against variations in scale and rotation. All ROIs contain homogeneously illuminated skin regions which demonstrate our method to reject relatively darker regions and regions that were not orthogonally aligned towards the camera (see Fig. 5a, c). Moreover, an ROI can consist of several unconnected regions and may have holes serving the purpose of homogeneity (see Fig. 5a, c, d). In Additional file 1 of this article, a video is linked which visualizes the described performance for an example. The advantage of using a separate segmentation step for the NIR image in the ROI detector is comprehensible when looking at Fig. 5a–c. The lighting situation in the NIR video was considerably different from the one in the RGB video. Therefore, a simple ROI registration based on the head’s pose would not have been sufficient since homogeneous areas were required.
Fig. 5
Selected ROIs for six different patients. The first two columns show the ROIs (only contour) for the RGB and NIR image at the beginning of the recording, the last two columns at a later point. If there was minor or no movement, the results in column 1 and 2 are similar to those in 3 and 4. Please note that in case the patient was identifiable, the eye section in the depicted images was blurred
Selected ROIs for six different patients. The first two columns show the ROIs (only contour) for the RGB and NIR image at the beginning of the recording, the last two columns at a later point. If there was minor or no movement, the results in column 1 and 2 are similar to those in 3 and 4. Please note that in case the patient was identifiable, the eye section in the depicted images was blurredWe also tested the real-time capability of our method. Solely the ROI detector needed longer processing times of about 10 s (MATLAB, i5-4590 @ 3.3 GHz on a single core). The tracking could be performed in real-time ( ms). In this study, we did not focus on creating an online method. Nevertheless, prospective works could speed up the algorithm to that end by implementing it in C++ and taking advantage of parallel computing.
HR detection and SNR
Figure 6a depicts the results of the HDR for the four color channels. Across all patients, the green channel provided the best outcome when applying our method (median of 95.6%). The NIR channel yielded a moderate detection rate (median of 76.2%) while the red and the blue channel are rather poor candidates to correctly detect the HR (median of 62.3 and 39.9%). The variation among the patients was the lowest for the green channel leaving only a small number of subjects with lower HDR values. Figure 6b shows the results of the SNR. As can be derived from the plot, the HDR is related to the quality of the cbPPG signals where the green channel also generates the best outcome (median of 3.9 dB) followed by the NIR, red, and blue channel in order of performance (median of − 2.5, − 4.1, and − 6.4 dB). However, in contrast to the HDR, the variation among the individual SNR values proved to be higher for the better performing channels.
Fig. 6
Results of cbPPG measures when using the proposed method. a Heart rate detection rate for the red, green, blue and near-infrared channel. b Signal-to-noise ratio (SNR)
Results of cbPPG measures when using the proposed method. a Heart rate detection rate for the red, green, blue and near-infrared channel. b Signal-to-noise ratio (SNR)In the previous section, we explained our attempt to explore what contribution the blue, the red, and particularly the NIR channel might make within our method. The results reveal all considered channel combinations to yield significantly higher HR detection rates than the green channel alone (see Fig. 7). As presumed, the combination with the NIR channel involved the largest improvement in the median HDR (95.6 versus 97.3%). Furthermore, except of a few outliers, all patients showed rates above 88% in the G&NIR group while in the other groups, a relatively large number of subjects lay under 80%. In 29 of the 41 patients, the NIR channel was able to provide at least once and up to 22 times a correct HR (average of 4.6 segments) when all the other channels failed.
Fig. 7
Heart rate detection rate for the green channel in comparison to channel combinations. The combinations are determined assuming that always the correct heart rate (if available) can be selected between the two channels. Each boxplot depicts 41 patient-related values. The outcome of the statistical tests is shown above the boxes (***)
Heart rate detection rate for the green channel in comparison to channel combinations. The combinations are determined assuming that always the correct heart rate (if available) can be selected between the two channels. Each boxplot depicts 41 patient-related values. The outcome of the statistical tests is shown above the boxes (***)
Discussion
Skin classifiers are an easy way to locate potential ROIs. For classification, most works in cbPPG applied absolute thresholds in the components of various color spaces, most often of the YCbCr space [17–21, 37]. We tested this classifier in our framework. The given thresholds led to a general overrepresentation of the skin areas, and we found it hard to adjust to changing conditions on a large scale of data. The used Bayesian classifier was trained with pictures that comprised numerous skin tones captured in different environment and illumination situations. Although it was barely employed for cbPPG so far [12, 38], we found it to be robust and its outcome to be well-controllable ( adjustment). We tested that higher values (low false positive but also low true positive rate) lead to better ROIs since the classifier is only used to initialize the segmentation method which is able to compensate an underrepresentation of the skin (see Fig. 2). Level set segmentation is an iterative process where the evolving contour has to reach a stable state. For the RGB images, stabilization was usually not an issue because the information of three color channels allowed a clearer separation. For the NIR images, more problems occurred. In rare cases, the contour increased or decreased uncontrollably. Additional knowledge about potential skin areas, e.g. by using in F, could solve those problems. However, it would require a reliable mapping of the RGB data on the NIR images.Homogeneity is an important criterion in ROI selection. Rodríguez and Castro [25] applied a simple intensity threshold to exclude darker areas like the eyebrows. Yang et al. [9] built a roughness measure in sub-ROIs which was employed to select the smoothest regions. Bousefsaf et al. [26] used the lightness component of the CIE L*u*v space to create five regional clusters of which the best were eventually combined. Yet, none of these methods allowed a continuous (time and space) pixel-wise selection as it could be accomplished by level set segmentation.Besides homogeneity, another advantage of our approach is that it neither depends on anatomical features nor on the manifestation of the cardiac pulse. There are only a few works which fall into this category. Wang et al. [20, 21] exclusively applied a skin classifier (see above) for ROI detection. Potential insufficiencies in the outcome, however, were disregarded as the group focused on signal processing. Similar to our procedure, Stricker et al. [39] employed skin classification in combination with a segmentation method, namely GrabCut [40]. Due to the resemblance, we decided to test the method for a number of images in our setting (see Fig. 8). We followed the description of the authors in which the result of the skin detector was first morphologically closed and then used for initialization in GrabCut. In comparison to our method, the GrabCut-based approach showed a systematic lack of performance as high-contrast non-skin and more heterogeneous skin areas were selected.
Fig. 8
Comparison of the proposed method to a GrabCut-based approach. Three examples (RGB video) are depicted in the state of the initial ROI detection. The first column shows the result of the skin classifier. Similar as in our method, it was used as initialization for GrabCut although morphological closing was performed beforehand (see [39]). The last two columns show the final ROIs (only contour) in which the red arrows highlight the lack of performance of GrabCut. Please note that in case the patient was identifiable, the eye section in the depicted images was blurred. Due to eyebrows, eyelashes, and shadowing effects, the region around the eyes usually appears darker than the surrounding area
Comparison of the proposed method to a GrabCut-based approach. Three examples (RGB video) are depicted in the state of the initial ROI detection. The first column shows the result of the skin classifier. Similar as in our method, it was used as initialization for GrabCut although morphological closing was performed beforehand (see [39]). The last two columns show the final ROIs (only contour) in which the red arrows highlight the lack of performance of GrabCut. Please note that in case the patient was identifiable, the eye section in the depicted images was blurred. Due to eyebrows, eyelashes, and shadowing effects, the region around the eyes usually appears darker than the surrounding areaThe SNR assesses the cbPPG signals’ quality based on the HR. The response characteristic of the different wavelengths coincides with the outcome of prior investigations regarding the quality of photoplethysmograms [41]. As a higher quality involves a stronger manifestation of the cardiac pulse, the chances of correctly detecting the HR also increase (see similarities in Fig. 6a, b). Nevertheless, the SNR measure has limitations since the stated relation not always holds and a high HDR can be associated with a low SNR (see high variance in SNR plots). In general, the proposed method is able to select ROIs which provide cbPPG signals (green channel) that largely show a distinct pulsation and are scarcely degraded by artifacts. To a small degree, false HR detections are attributed to cases where no ROIs were found. The majority of false detections can be explained by situations when the ROI detector was re-executed. Our tracking idea was to retain the regions’ homogeneity and avoid abrupt light changes. However, the reselection of the ROI does not consider prior intensity values and may lead to an edge in the cbPPG signal hindering a valid HR extraction.The NIR channel played a special role in our investigation since a separate camera and light source was used. Estepp et al. [42] already demonstrated that a multi-camera setting can enhance the HDR. In our setting, the dedicated NIR illumination yielded stable conditions in moments where the ambient light was low or strongly altered (see Figs. 5b and 9). Therefore, the NIR channel also made the highest contribution to maximizing the HDR (see Fig. 7). However, the problem of accurately mapping the ROI from the RGB to the NIR image remains. The application of cameras with a native alignment between the RGB and NIR channels (e.g. [43]) resolves this drawback.
Fig. 9
Signal examples where artifacts occurred. Related signal segments for the R, G, B and NIR channel where the HR was detected correctly solely in the NIR signal. The ROIs were well-defined in both videos. Light variations in the ambient light caused artifacts to occur in the RGB video while the NIR video remained unaffected (cardiac pulse is visible). Please note that the strength of the pulsatile component usually does not exceed units for the set color depth
Signal examples where artifacts occurred. Related signal segments for the R, G, B and NIR channel where the HR was detected correctly solely in the NIR signal. The ROIs were well-defined in both videos. Light variations in the ambient light caused artifacts to occur in the RGB video while the NIR video remained unaffected (cardiac pulse is visible). Please note that the strength of the pulsatile component usually does not exceed units for the set color depthMoço et al. [23, 24] revealed how homogeneously illuminated regions provide purer cbPPG signals that are less corrupted by BCG artifacts. Our method is able to select such regions. Furthermore, it is an alternative to the group’s methods, which also dealt with those artifacts but had to be calibrated beforehand.We would like to emphasize again that we aimed at demonstrating the high performance of our ROI selection approach and not necessarily at reaching a maximum HDR. However, if certain applications require a reliable HR detector, appropriate signal processing steps can be subsequently executed. We tested that solely a simple principal component analysis on the R, G, B channel signals leads to detection rates over 99%.
Intraoperative setting
To the best of our knowledge, we are the first to apply cbPPG during surgery with the patients being under general anesthesia. Rubīns et al. [27, 28] investigated the effect of vasodilation in the course of regional anesthesia using cbPPG, once in the NIR light range and once in the green range. Both times, they considered the inner region of a fixed hand (no movement) and built amplitude maps, which did not demand a prior ROI selection but presumed the presence of cardiac pulsations in signals from spatial subregions.
Conclusions
In this paper, we presented a fully automated ROI selection method for cbPPG. It overcomes the drawbacks of past approaches and, therefore, allowed us to employ cbPPG in vascular diseased patients in an intraoperative environment. The method neither relies on the visibility of anatomical features nor on the manifestation of the cardiac pulsation. Homogeneity in intensity and texture are the determining criteria for choosing and tracking ROIs. As a result, distinct and mostly undistorted photoplethysmograms could be obtained. Our method is easily transferable to other applications where other body sites are involved. Moreover, it can be run for multi-camera systems as long as one RGB camera is part of the setting. Eventually, the method enables prospective studies to focus on the benefit of using cbPPG during surgery. The spatial assessment of the cutaneous microcirculation might help the anesthetists to better react to cardiovascular events and adjust the respective medication.Additional file 1. A video showing the application of the proposed method. The video shows a moving face for which the proposed method was applied in order to select an ROI. For comparison purposes, the Viola-Jones face detector combined with the KLT feature tracker was employed [1, 2]. In contrast to this standard approach, our method only chooses homogeneously illuminated skin regions that are most suitable for cbPPG.1. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. I-511–I-518 (2001).2. Tomasi, C., Kanade, T.: Detection and Tracking of Point Features. Technical Report MU-CS-91-132, Carnegie Mellon University (1991).
Authors: S Rasche; A Trumpp; T Waldow; F Gaetjen; K Plötze; D Wedekind; M Schmidt; H Malberg; K Matschke; S Zaunseder Journal: Clin Hemorheol Microcirc Date: 2016-11-04 Impact factor: 2.375
Authors: João Jorge; Mauricio Villarroel; Hamish Tomlinson; Oliver Gibson; Julie L Darbyshire; Jody Ede; Mirae Harford; John Duncan Young; Lionel Tarassenko; Peter Watkinson Journal: NPJ Digit Med Date: 2022-01-13