Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

LSTM-based real-time signal quality assessment for blood volume pulse analysis

Open Access Open Access

Abstract

Remote photoplethysmogram (rPPG) is a low-cost method to extract blood volume pulse (BVP). Some crucial vital signs, such as heart rate (HR) and respiratory rate (RR) etc. can be achieved from BVP for clinical medicine and healthcare application. As compared to the conventional PPG methods, rPPG is more promising because of its non-contacted measurement. However, both BVP detection methods, especially rPPG, are susceptible to motion and illumination artifacts, which lead to inaccurate estimation of vital signs. Signal quality assessment (SQA) is a method to measure the quality of BVP signals and ensure the credibility of estimated physiological parameters. But the existing SQA methods are not suitable for real-time processing. In this paper, we proposed an end-to-end BVP signal quality evaluation method based on a long short-term memory network (LSTM-SQA). Two LSTM-SQA models were trained using the BVP signals obtained with PPG and rPPG techniques so that the quality of BVP signals derived from these two methods can be evaluated, respectively. As there is no publicly available rPPG dataset with quality annotations, we designed a training sample generation method with blind source separation, by which two kinds of training datasets respective to PPG and rPPG were built. Each dataset consists of 38400 high and low-quality BVP segments. The achieved models were verified on three public datasets (IIP-HCI dataset, UBFC-Phys dataset, and LGI-PPGI dataset). The experimental results show that the proposed LSTM-SQA models can effectively predict the quality of the BVP signal in real-time.

© 2023 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction

Photoplethysmogram (PPG) [1,2] is a technology using pulse oximeters to detect blood volume pulse (BVP) signals. It is a simple and low-cost technique to monitor vital signs, including heart rate (HR), blood pressure [3], peripheral oxygen saturation [4], respiratory rate (RR) [5], HR variability (HRV) [6] and other signals indicating physiological processes.

Recently, an extended PPG technology named remote PPG (rPPG) has been attracting increasing attention [7,8]. With rPPG, the video captured by cameras at a certain distance from the skin surface is used to estimate vital signs. It is more convenient and comfortable since the data collection can be realized by a smartphone, a personal computer camera, or other high-definition webcams. Since rPPG overcomes the disadvantages of contact HR sensors, such noncontact monitoring technique has numerous potential applications in public health implications. For example, a camera using rPPG technology can monitor multiple patients simultaneously in medical institutions, or the paramedic glasses using rPPG technology can assist the medical staff to monitor the physiological parameters of patients rapidly.

Unlike the PPG devices which normally use a built-in infrared light source with an emission wavelength of 800-940 nm, the rPPG mostly uses ambient light and a consumer-level digital camera. Most digital cameras use the Bayer filter (a color filter array) to record the information about the intensity of light in red, green, and blue wavelength regions. With ambient light and non-contact environment, the rPPG is susceptible to external light sources [9] and motion artifacts, resulting in the detected BVP signal containing both high-quality segments and low-quality segments. The high-quality segments can improve the reliability of estimated parameters. The low-quality segments corrupted by intense artifacts will render the rPPG based algorithms useless. Therefore, how to identify the quality of BVP signals is significant for the development of rPPG.

In the past few years, several methods for BVP signal quality assessment (SQA) were proposed [10,11]. Waveform morphology analysis is a common method to identify good quality pulses and concurrently reject artifact in BVP signals. Sukor et al. [12] developed a waveform SQA by pulse amplitude, pulse width, trough depth differences between successive pulse troughs, Euclidean distance and the ratio between the amplitude. Li and Clifford [13] took the correlation coefficient between the beat and the template from dynamic time warping (DTW) as SQA feature. Furthermore, Sun et al. [14] improved the DTW method for the three-category task (good quality segment, valid segment, invalid segment) and named the new method derivative dynamic time warping (DDTW). Orphanidou et al. [15] assessed whether reliable HRs can be obtained from electrocardiogram (ECG) and PPG signals with feasibility rules and adaptive template matching. Papini et al. [16] compared each pulse with a dynamic template via DTW barycenter averaging for quality index estimation. Seok et al. [17] developed three pulse quality indices based on Euclidian distance and DTW. Li et al. [18] used a Bayesian hypothesis testing method to analyze SQA features including baseline variation count, rising time, falling time and saturation index. Karlen and Kobayashi [19] used Gaussian filters to select the correct slopes and cross-correlation of consecutive pulse segments to assess signal quality. Alam et al. [20] proposed a set of novel SQA features including the ratio of cardiac diastole duration, the ratio of summation of every complete PPG cycle length and the variance of the absolute height ratio.

Not only morphological features have the potential to identify the quality of BVP signals, but also time-frequency domain characteristics and statistical characteristics can be used to assess the quality of BVP signals in machine learning based methods. Elgendi et al. [21] compared eight features (perfusion, kurtosis, skewness, relative power, non-stationarity, zero crossing, entropy, and the matching of systolic wave detectors) and four different classifiers (Mahalanobis distance, linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and SVM). Zhang et al. [22] used a SVM classifier to determine the signal quality through the signal power spectrum characteristics, the correlation dimension, box-counting dimension, fuzzy entropy and Lempei-Ziv complexity. Pereira et al. [23] also used the SVM classifier with 40 time-domain, frequency domain and non-linear features. Zhang et al. [24] described a method combing SVM and multi-feature fusion for assessing the signal quality of pulsatile waveforms. Sabeti et al. [25] proposed an approach using morphological features and machine learning and does not require the single beat segments. Pradhan et al. [26] evaluated 5 classifiers including k-nearest neighbor, multi-class SVM, Naïve Bayes, decision tree (DT) and random forest (RF). Liu et al. [27] presented a five-layer fuzzy neural network to predict the quality of signal after feature extraction. Zaman and Morshed [28] used seventeen features extracted from the signals and investigated RF, artificial neural network (ANN) and SVM to assess signal quality. Pereira et al. [29] compared various supervised machine learning techniques (SVM, KNN, RF) and selected SVM as the main experimental classifier because of better accuracy. Mohagheghian et al. [30] proposed a approach for feature selection yielded the best-suited feature subset, which was optimized to distinguish between the clean and corrupted PPG segments.

Figure 1(a) and Fig. 1(b) illustrate the rPPG samples of high-quality and low-quality signals at different analysis scales. In [16,17,21,24], the algorithms assessed the quality of the single beat segments, with the signal must be accurately segmented into heartbeat segments during signal preprocessing. However, on the BVP signal from rPPG, the peak detection and pulse detection algorithms often failed with motion noise [31]. The quality assessment methods for single beat signals are not applicable to the BVP signals from rPPG.

 figure: Fig. 1.

Fig. 1. Different analysis scales in the SQA researches and the BVP waveforms of high quality and low quality extracted by rPPG. (a) Single beat segment based quality assessment. (b) Fixed-length segment based quality assessment. (c) Sampling point based quality assessment for the high-quality and low-quality hybrid signals, the signal in black is the quality score (high-quality in "1" and low-quality in "0"), which is the target output (label) of the proposed LSTM-SQA network.

Download Full Size | PDF

There are few SQA methods for the BVP signal extracted from rPPG (rPPG-SQA). Wang et al. [32] proposed a quality metric, comprised of a front-end metric and a back-end metric, to indicate the monitoring conditions and assess the reliability of pulse rate measurement. Fallet et al. [33] used frame-to-frame average absolute difference between pairs of corresponding pixels for signal quality computation. Benezeth et al. [34] presented a probabilistic formulation of a cardiac signal quality index based on the Bayesian information criterion that encapsulates the characteristic shape of the rPPG signal. Ernst et al. [35] investigated six signal quality indexes (SQIs) from the literature in terms of their effect size and combined them to a novel SQI-filter. The existing rPPG-SQA methods are unsupervised algorithms. Due to the lack of rPPG dataset with quality labels, the rPPG-SQA methods used the heart rate estimation error as evaluation metric indirectly.

Supervised deep learning has shown amazing performance in various tasks, such as natural language processing, speech processing or computer vision. Before the training process, deep learning requires the ground truth labels to construct loss functions for optimizing the parameters of the network. This is extremely difficult for the research of supervised deep learning on rPPG-SQA.

In this paper, we propose a LSTM-based signal quality assessment method (LSTM-SQA). It is an end-to-end algorithm which can be used to predict the quality score of the sampling point. Due to the lack of rPPG datasets with quality annotations, we design an annotated-sample generation method with blind source separation (BSS). This study contributes to the extant literature with:

1) A novel deep learning model based on LSTM is presented to identify the quality of BVP signals from PPG and rPPG in real-time.

2) A supervised learning task for BVP signal quality assessment is proposed, on a high-quality and low-quality hybrid signal as shown in Fig. 1(c), the quality scores of the signal are assessed continuously based on the sampling points.

3) A BSS based sample generation method is presented to build training datasets with quality annotations of PPG and rPPG.

It should be noted that a preliminary version of this work was published in a peer review conference [36]. The original network was trained with the quality score annotated by traditional methods. The trained network is an approximate function of traditional signal quality methods. In order to obtain a higher-performance supervised network without quality labels, inspired by the pseudo-label [37] and data generation, we first proposed a method to generate an approximate rPPG dataset with quality labels. we also extended our work by (1) simplifying the proposed LSTM-SQA for real-time task and, (2) performing additional experiments including experiments on public datasets, sample generation method, cross-domains task, cross-devices task, and real-time performance.

The organization of the rest article is as follow: In Section 2, we describe the public datasets, propose the method of generating annotation data, present the detail of the LSTM-SQA method and introduce the evaluation metrics of the experiment. In Section 3, we present the evaluation results of LSTM-SQA, hyper-parameter analysis and performance comparison with other common SQA features. In Section 4 and Section 5, we present the discussion and conclusion respectively.

2. Materials and methods

2.1 Databases

Three facial video datasets with annotated PPG signals for rPPG studies were used in our experiments. The first database was created by our laboratory [38], which consists of facial videos collected by two webcams (Microsoft Lifecam Studio, M-cam and Aoni A36 webcam, A-cam), and the PPG signals collected by a pulse oximeter (CMS50E) synchronously. During the data collection process, the M-cam was fixed on the table to collect facial video. The A-cam was fixed to the wearable hat through a connecting rod, and even if the subjects rotate their heads, the video of frontal face can be collected. Different cameras record videos of the same subject at the same time, and one of them removes motion noise by moving in sync with the subject’s face. This provides two sets of rPPG signals with different quality, which is helpful for the research of removing motion noise and obtaining better rPPG signals. The pulse oximeter was worn on the subject’s finger, and the finger remained stationary during the data collection. We set the heart rates extracted from PPG signals as ground truth. Overall, our dataset contains totally 156 minutes facial videos of 16 males and 10 females performing a resting task, a talking task and facial rotation task. The average frame rate of the video collected by M-cam is 30 FPS, frame rate of the video collected by A-cam is 25 FPS, and average sampling rate of the signal collected by the CMS50E is 60 Hz.

The second dataset UBFC-Phys [39] is a public dataset for psychophysiological studies. It contains 56 participants following a rest task T1, a speech task T2 and an arithmetic task T3. During the experiment, the participants were filmed and were wearing a wristband that measured their PPG and Electro Dermal Activity (EDA) signals.

The third dataset LGI-PPGI [40] records the facial videos by Logitech HD C270 webcam (L-cam) of 20 males and 5 females in the age range of 25-42. It also provides standard PPG signals by CMS50E as ground truth. It consists of resting session, facial rotation session, gym session and talking session.

2.2 Blind source separation (BSS)

Blind source separation (BSS) [41] is an algorithm that separates a set of source signals from a set of mixed signals. The blind independent component analysis (ICA) is a special case of BSS, and ICA based denoising algorithms are widely used in rPPG studies to separate BVP signal from the raw signals [42,43]. In this research, BVP signals and noise signals were separated by ICA method and all signals were used for the generation of training data and quality labels.

Ideally, the independent source signals, ${\boldsymbol {s}}(t)$, is mixed by a matrix, ${\boldsymbol {A}} = [{a_{ij}}] \in {\mathbb {R}^{m \times n}}$, to produce the set of mixed signals, ${\boldsymbol {x}}(t)$, as in Eq. (1). Usually, $n$ is equal to $m$. It is assumed that only the mixed signals ${\boldsymbol {x}}(t)$ can be observed, the ICA aims to invert the Eq. (1). The BSS recovers an approximation of the original signals, ${\boldsymbol {y}}(t)$, by an unmixing matrix, ${\boldsymbol {W}} = [{w_{ij}}] \in {\mathbb {R}^{n \times m}}$ and equation, ${\boldsymbol {y}}(t) = {\boldsymbol {Wx}}(t)$. ${\boldsymbol {s}}(t)$, ${\boldsymbol {x}}(t)$ and ${\boldsymbol {y}}(t)$ can be further denoted with respect to column vectors ${{s}}(t)$, ${{x}}(t)$ and ${{y}}(t)$ as Eqs. (2)(3)(4) respectively.

$${\boldsymbol{x}}(t) = {\boldsymbol{As}}(t)$$
$${\boldsymbol{s}}(t) = {[{s_1}(t),{s_2}(t),\ldots,{s_n}(t)]^T}$$
$${\boldsymbol{x}}(t) = {[{x_1}(t),{x_2}(t),\ldots,{x_m}(t)]^T}$$
$${\boldsymbol{y}}(t) = {[{y_1}(t),{y_2}(t),\ldots,{y_n}(t)]^T}$$

There are multiple methods to get the matrix $\boldsymbol {W}$ and the approximate original signals $\boldsymbol {y}(t)$. For example, fastICA seek the original components by maximizing the non-Gaussianity measured by kurtosis or negentropy. Whereas fastICA extracts a series of signals one by one, infomax ICA extracts multi-signals in parallel. Second-order blind identification (SOBI) based on second-order statistics uses time correlation and spectrum difference of observed signals to remove the correlation of each source by diagonalizing a group of cross correlation coefficient matrices. We extracted the BVP signals by SOBI because of better robustness.

2.3 BVP signals and noise signals extraction

The basic flowchart of BVP and noise signals extraction is summarized in Fig. 2. We manually annotated the cheek and whole face areas in the first frame of the video, and the areas are the regions of interesting (ROIs) of the whole video. Each frame in the video was loaded as a RGB matrix, The spatial means of pixel values in different color channels constructed R (red) signal, G (green) signal and B (blue) signal respectively. Two ROIs of different skin areas provide more information which makes the BSS method more stable and also can enrich the signals from different sources. Then the RGB signals were decomposed by SOBI and the signals including heartbeat source signals could be obtained. Since the ICA methods including SOBI have a fundamental limitation that the order of its outputs is undetermined, we manually selected the signals with distinct heartbeat waveform as the high-quality signals, and the other signals as noise signals.

 figure: Fig. 2.

Fig. 2. A flowchart of BVP signal and noise signal extraction.

Download Full Size | PDF

2.4 Development of LSTM-SQA

2.4.1 Training sample generation method (TSGM)

Training sample generation method (TSGM) generated 38400 30-second partially corrupted signals $\boldsymbol {s}_{gen}$ with a sampling rate of 30 Hz and corresponding quality labels $\boldsymbol {label}_{gen}$. Most signals contained high-quality segments and low-quality segments, therefore, the task of the LSTM-SQA is to predict the signal quality continually. For one of the corrupted signals generated by TSGM, we randomly selected a 30-second high-quality segment $\boldsymbol {s}_{high}$ which have distinguishable systolic and diastolic waves from the BVP signals. The $\boldsymbol {s}_{gen}(\boldsymbol {t})$ and $\boldsymbol {label}_{gen}(\boldsymbol {t})$ were initialized by the Eqs. (5)(6). The $\boldsymbol {t}_{low}$ is a vector of consecutive integers that marks the corrupted segment in $\boldsymbol {s}_{gen}(\boldsymbol {t})$ as Eq. (7). The $n_l$ is the start point of corrupted segment in $\boldsymbol {s}_{gen}(\boldsymbol {t})$ and the $l_n$ is the length of corrupted segment. A segment $\boldsymbol {s}_{noise}$ of length $l_n$ was selected from the noise signals extracted by SOBI method. To simulate more complex environments, the generated signal was updated $k$ times. At each iteration: the values of corrupted segments were linearly added by the values in the noise segments with a weight $r^k_{inten}$ as Eq. (8); the sampling points of corrupted segments were labeled as “0" as Eq. (9); other values which are not in the corrupted segments of $\boldsymbol {s}^k_{gen}(\boldsymbol {t})$ and $\boldsymbol {label}^k_{gen}(\boldsymbol {t})$ were assigned the same value of $\boldsymbol {s}^{k-1}_{gen}(\boldsymbol {t})$ and $\boldsymbol {label}^{k-1}_{gen}(\boldsymbol {t})$ in the previous iteration.

Since the BVP signal may been corrupted by noise of any intensity at any time, inspired by data augmentation [44], the BVP signals $\boldsymbol {s}_{high}$, the start points $n_l$, the lengths $l_n$ and the noise signals $\boldsymbol {s}_{noise}$ were randomly selected. The noise segment were multiplied by a random weight $r_{inten}$ to control the intensity of the noise before being added. The TSGM reduces the risk of network overfitting and makes the model robust to noises of different positions, different intensities and different lengths.

$$\boldsymbol{s}^0_{gen}(\boldsymbol{t}) = \boldsymbol{s}_{high}(\boldsymbol{t}), \boldsymbol{t}=[1,2,3,\ldots,900]$$
$$\boldsymbol{label}^0_{gen}(\boldsymbol{t}) = \boldsymbol{1}, \boldsymbol{t}=[1,2,3,\ldots,900]$$
$$\boldsymbol{t}^k_{low}=[t^k_l,t^k_{l}+1,t^k_{l}+2,\ldots,t^k_{l}+l^k_n], n^k_{l}+l^k_n\leq900$$
$$\boldsymbol{s}^k_{gen}(\boldsymbol{t}^k_{low}) = \boldsymbol{s}^{k-1}_{gen}(\boldsymbol{t}^{k}_{low}) + r^{k}_{inten}\boldsymbol{s}^{k}_{noise}$$
$$\boldsymbol{label}^k_{gen}({\boldsymbol{t}}^k_{low}) = \boldsymbol{0}$$

In addition to manually selecting high-quality signals by the rPPG, there are standard pulse signals collected by PPG device. The PPG signal has obvious pulsation waveforms, which is more easily to be distinguished from noise with the naked eye. So, with noise signal extracted from rPPG, we can generate high-quality and low-quality hybrid PPG signals by the same method. The tasks are divided into rPPG-Noise classification and PPG-Noise classification.

2.4.2 Long short-term memory network (LSTM)

LSTM has solved a variety of problems that need to use long-range contextual information. It is widely applied in music generation, speech recognition, real-time translation system and in the image domain like activity recognition and video description. Recently, LSTM based networks were employed to estimate HR from wearable PPG [45,46]. It is a low-cost algorithm and its computational complexity is $O{(n)}$ for a signal with $n$ sampling points.

The LSTM block contains three gates (an input gate ${{\boldsymbol {i}}_t}$, a forget gate ${{\boldsymbol {f}}_t}$ and an output gate ${{\boldsymbol {o}}_t}$) and the memory cell ${{\boldsymbol {c}}_t}$. The input gate controls which information flows into the memory cell from the new input data, the forget gate controls which information remains in the cell, and the output gate controls which information in the cell is used to output in the LSTM block. The vector formulas for a LSTM block forward pass can be written as:

$${{\boldsymbol{f}}_t} = \sigma ({{\boldsymbol{W}}_{\boldsymbol{f}}}{{\boldsymbol{e}}_t} + {{\boldsymbol{R}}_{\boldsymbol{f}}}{{\boldsymbol{h}}_{t - 1}} + {{\boldsymbol{b}}_{\boldsymbol{f}}})$$
$${{\boldsymbol{i}}_t} = \sigma ({{\boldsymbol{W}}_{\boldsymbol{i}}}{{\boldsymbol{e}}_t} + {{\boldsymbol{R}}_{\boldsymbol{i}}}{{\boldsymbol{h}}_{t - 1}} + {{\boldsymbol{b}}_{\boldsymbol{i}}})$$
$${{\boldsymbol{o}}_t} = \sigma ({{\boldsymbol{W}}_{\boldsymbol{o}}}{{\boldsymbol{e}}_t} + {{\boldsymbol{R}}_{\boldsymbol{o}}}{{\boldsymbol{h}}_{t - 1}} + {{\boldsymbol{b}}_{\boldsymbol{o}}})$$

At the moment $t$, the input feature $\boldsymbol {e}_t \in {\mathbb {R}^{d \times 1}}$ is put into LSTM block. The model updates the forget gate ${{\boldsymbol {f}}_t}$, the input gate ${{\boldsymbol {i}}_t}$ and the output gate ${{\boldsymbol {o}}_t}$ by Eqs. (10)(11)(12) respectively. The matrices $\boldsymbol {W}_f \in {\mathbb {R}^{h \times d}}$, $\boldsymbol {W}_i \in {\mathbb {R}^{h \times d}}$ and $\boldsymbol {W}_o \in {\mathbb {R}^{h \times d}}$ are weight matrices for input feature. The matrices $\boldsymbol {R}_f \in {\mathbb {R}^{h \times h}}$, $\boldsymbol {R}_i \in {\mathbb {R}^{h \times h}}$ and $\boldsymbol {R}_o \in {\mathbb {R}^{h \times h}}$ are weight matrices for hidden state vector $\boldsymbol {h}_{t-1} \in {\mathbb {R}^{h \times 1}}$ at the last moment $t-1$. The vectors $\boldsymbol {b}_{f} \in {\mathbb {R}^{h \times 1}}$, $\boldsymbol {b}_{i} \in {\mathbb {R}^{h \times 1}}$ and $\boldsymbol {b}_{o} \in {\mathbb {R}^{h \times 1}}$ are bias vectors.

$${{\boldsymbol{\tilde c}}_t} = \tanh ({{\boldsymbol{W}}_{\boldsymbol{c}}}{{\boldsymbol{e}}_t} + {{\boldsymbol{R}}_{\boldsymbol{c}}}{{\boldsymbol{h}}_{t - 1}} + {{\boldsymbol{b}}_{\boldsymbol{c}}})$$
$${{\boldsymbol{c}}_t} = {{\boldsymbol{f}}_t} \odot {{\boldsymbol{c}}_{t - 1}} + {{\boldsymbol{i}}_t} \odot {{\boldsymbol{\tilde c}}_t}$$
$${{\boldsymbol{h}}_t} = {{\boldsymbol{o}}_t} \odot \tanh ({{\boldsymbol{c}}_t})$$

The model preprocesses the input feature and the hidden state vector $\boldsymbol {h}_{t-1}$ from last time $t-1$ by Eq. (13). The matrix $\boldsymbol {W}_c \in {\mathbb {R}^{h \times d}}$ is weight matrix of input vector, the matrix $\boldsymbol {R}_c \in {\mathbb {R}^{h \times h}}$ is weight matrix of hidden state vector and the vector $\boldsymbol {b}_{c} \in {\mathbb {R}^{h \times 1}}$ is bias vector. The memory cell $\boldsymbol {c}_t$ at the current time $t$ is updated by Eq. (14), the forget gate control the retained information in last memory cell $\boldsymbol {c}_{t-1}$ and the input gate control the input information from cell input activation vector ${\boldsymbol {\tilde c}}_t$. The output vector of LSTM, $\boldsymbol {h}_t$ is solved by Eq. (15). The operator $\odot$ denotes the Hadamard product.

The LSTM block is mainly composed of multiple matrix operations. When the input vector is in $d$-dimensional space and the hidden state vector is in $h$-dimensional space, regardless of nonlinear operation in the activation function, the computational complexity of forward propagation is reported to be $4(hd + hh) + 3h$ floating point multiplications. The dimensions of input vector and hidden state vector are the main factors that affect the computational complexity.

2.4.3 LSTM-SQA

In this paper, we proposed a signal quality assessment method based on long short-term memory network (LSTM-SQA). It predicts quality score of sampling point one by one in real-time. The detail of the algorithm is shown in Fig. 3. At the input stage, the value of the sample point was put into the model. Firstly, the input value was encoded through the full connected layer. Then the memory cell was updated by the memory cell vector $\boldsymbol {c}_{t-1}$ and the hidden vector $\boldsymbol {h}_{t-1}$ at last moment time. The LSTM module gave an output vector $\boldsymbol {h}_{t}$. Finally, the output passed through two full connected layers, the model gave the final score of the current sampling point. When a new sampling point at time $t+1$ was input into the model, repeated the above steps to predict the quality score at time ${t+1}$.

 figure: Fig. 3.

Fig. 3. Internal structure of LSTM-SQA. The net contains three units: the first unit is a fully connected layer for input; the second unit is a standard LSTM block; the third unit is two fully connected layers for output. LSTM-SQA predicts the quality score of the sampling point one by one based on the value of the sampling point and the information retained in the memory cell. The model receives sampling points one by one and predicts the corresponding quality score $\widehat {label_{gen}}$ for each sampling point.

Download Full Size | PDF

3. Result

3.1 Evaluation of the proposed approach

This section presents three evaluation experiments to test the performances of LSTM-SQA. In the first experiment, we focused on the computation complexity of LSTM-SQA. In order to verify whether LSTM-SQA can provide real-time quality scores, we calculated the number of floating-point multiplications required for forward propagation of LSTM-SQA, and measured the time overhead of LSTM-SQA on Tensorflow library and Numpy library of the Python programming language.

In the second experiment, we focused on the classification accuracy of LSTM-SQA. Since each sampling point was labeled as “1" in high-quality segments and as “0" in low quality segments, we set “1" if the output of the LSTM-SQA was greater than 0.5 and set “0" when that was lower than 0.5. With enough data generated by TSGM, the first 90% of the data was used as the training set, and the remaining 10% was used as the validation set. The experiment consists of two parts measured by classification accuracy: (1) the experiments of hyper-parameter tuning, the performance of LSTM-SQA is tested with different numbers of neurons in different layers, the number of neurons directly affects the computational complexity and real-time performance of the network; (2) the experiments on public datasets, we chose 10 subjects (s10 to s20) from the UBFC-Phys dataset, and 6 subjects (Alex, Angelo, Cpi, David, Felix, Harun) from the LGI-PPGI dataset.

In the final experiment, the real data (G signal described in Fig. 2) is used as the test set. As no quality label, we evaluated the output of LSTM-SQA through the estimation error of heart rate (EE-HR). Each sampling point in the G signal was annotated by the EE-HR as in Eq. (16). The EE-HR at moment $t$ was calculated by $HR(t)$ from 8-second sliding windows from $t-8$s to $t$ of the PPG signal and $HR(t)'$ from the G signal at the same time. In order to avoid the influence of outliers on model training, we controlled the upper bound of the EE-HR to 50. This experiment aims to verify whether LSTM-SQA can guarantee the reliability of the parameters extracted from the signal and reduce the estimation error. We also analyzed the importance of each part in TSGM. We removed different parts of TSGM, using weighted average as in Eq. (17) to verify whether there is a performance decline.

$${EE{-}H{R(t)} = \left\{ \begin{array}{l} \left| {HR(t)' - H{R(t)}} \right|, \left| {HR(t)' - H{R(t)}} \right| < 50\\ 50, if{\rm{ }}\left| {HR(t)' - H{R(t)}} \right| \ge 50 \end{array} \right.}$$
$$WEE{-}H{R} = \frac{\sum{(\widehat{label_{gen}}(t)\times EE{-}HR(t))}}{\sum{\widehat{label_{gen}}(t)}}$$

3.2 Hyper-parameter tuning

Deep learning models usually have large number of parameters and high computational complexity which means they are not applicable to real-time system. We first test different numbers of neurons in each layer for hyper-parameter tuning. Table 1 shows the number of parameters and the number of multiplications required for forward propagation of LSTM-SQA under different hyper-parameters. The number of parameters is related to the required memory capacity to load the model, and the number of multiplications is related to the computational complexity of the model. The symbol ${F_n}$ denotes $n$ neurons fully connected layer. The ${L_m}$ is a LSTM layer with the hidden state vector and output vector in $m$-dimensional space. The ${F_8},{L_8},{F_8},{F_1}$ denotes a deep learning model with four layers and the first input layer is a fully connected layer, the second layer is LSTM layer, the third layer is fully connected layer and the last output layer is a fully connected layer. Obviously, proposed LSTM-SQA with 616 parameters is a lightweight model compared to typical deep learning models (VGG16 with about 34 million parameters, Alexnet with about 60 million parameters).

Tables Icon

Table 1. Number of parameters in LSTM-SQA under different hyper-parameters.

Figure 4 shows the boxplots of accuracy obtained from different numbers of neurons in each layer. To explore the accuracy change under different number of neurons reflected in each layer, we fixed the other layers and observed the performance and stability of the model with different numbers of neurons in the tested layer, the experiment was done 10 times.

 figure: Fig. 4.

Fig. 4. Ranges of accuracy for different numbers of neurons in each layer. (a) Ranges of accuracy for different numbers of neurons in a fully connected layer before LSTM layer. (b) Ranges of accuracy for different dimensions of hidden state vector in a single LSTM layer. (c) Ranges of accuracy for different numbers of neurons in a fully connected layer following LSTM layer.

Download Full Size | PDF

Figure 4(a) shows the accuracy obtained from different number of neurons for the first fully connected layer. The figure shows that with the increasing of the number of connected layer nodes, the distribution of accuracy is more concentrated. The performance of the model becomes more stable.

Figure 4(b) shows the experiment result of different dimensions of hidden state vector in LSTM. The dimension of hidden vector in LSTM affects the performance and stability of the model directly. It can be observed that the average accuracy and maximum accuracy of the model will increase with the increase of the dimension of hidden vector. So, the LSTM structure is the most important part in the model.

Figure 4(c) shows that the fully connected layer following the LSTM layer has a similar performance to the first layer under different number of neurons.

3.3 Run-time complexity

Although the larger the numbers of parameter in LSTM-SQA we use, the better performance we will get, the ${F_8},{L_{8}},{F_8},{F_1}$ model with lightweight parameters and sufficient performance is more suitable for real-time systems. Therefore, we focus on the performance of ${F_8},{L_{8}},{F_8},{F_1}$ model in the following experiments.

The ${F_8},{L_8},{F_8},{F_1}$ denotes the deep learning model with four layers, the first fully connected layer has 8 neurons and required $(1 \times 8)$ floating point multiplications, the second LSTM layer requires $(4 \times 8 \times 8 + 4 \times 8 \times 8 + 3 \times 8)$ floating point multiplications, the third layer with 8 neurons requires $(8 \times 8)$ floating point multiplications, and the last output fully connected layer with 1 neuron requires $(8 \times 1)$ floating point multiplications. Every forward propagation of LSTM-SQA requires $616$ floating point multiplications. Omit the non-linear operation in the activation function, the computer performance of the hardware to run this model needs to reach about 36000 floating point operations per second (36 kFLOPS), where the sampling rate of BVP signals is 30Hz and a single multiply-accumulate operation counts as 2 floating point operations. Considering that the mobile processor Nvidia Tegra K1 shipped in the first half of 2014 has a peak performance of 365 GFLOPS, LSTM-SQA has minimal requirements for computing performance.

When the sampling rate of BVP signals is 30Hz, the model gets an input value every 33-millisecond, so the model needs to complete the internal matrix operation within 33-millisecond. We conducted simulation experiments on Tensorflow-cpu 1.14.0 and Numpy 1.92.0 to verify the time consumption for forward propagation of the model. The CPU platform is Intel Core i7-8700 CPU @ 3.20 GHz, and the memory capacity is 8 G. In 1000 turns of experiments, the average time consumption of forward propagation operation on Tensorflow-cpu is 0.92 milliseconds, and the maximum is 2 milliseconds, while on Numpy, the average time consumption of forward propagation operation is 0.04 milliseconds, and the maximum is 1.03 milliseconds. The time consumed on forward propagation is much less than the time interval of sampling points, hence the requirements for real-time quality score calculation can be met.

We verified the real-time performance of the algorithm with a laptop (HP ProBook 455 G7 15.6" Notebook with Ryzen 7 4700U @ 2.0GHz and 32 GB RAM.) and an open source program for real-time remote heart rate estimation of rPPG [47]. Similar to the description in Fig. 2, we use the in-built webcam to record facial video and calculate the G signal. Than, the quality score of the current sampling point is calculated. We changed the code of GUI in order to display G signal and quality scores in real time. Figure 5 illuminates the Python interface of the program. In the first half of the experiment, the subjects were asked to sit still, while in the latter part of the experiment, the subjects were asked to shake their heads. LSTM-SQA accurately detected motion noise, and the quality score changed from high to low.

 figure: Fig. 5.

Fig. 5. A Python interface developed for visualizing the real-time performance and the results of proposed LSTM-SQA. The signal in green is the G signal of rPPG and the signal in white is the quality score predicted by LSTM-SQA.

Download Full Size | PDF

Results of proposed LSTM-SQA on the signals generated by TSGM are shown in Fig. 6. When the signal quality changes, the LSTM-SQA prediction results also change within a period of time. The prediction results almost coincide with the ground truth.

 figure: Fig. 6.

Fig. 6. Results of the proposed LSTM-SQA. The black line represents the ground truth , the Label “1" is high quality and the Label “0" is low quality. The black dotted line is the output of LSTM-SQA. (a) Low-quality segment first, high-quality segment in the middle and low-quality segment in the end. (b) High-quality segment first, low-quality segment in the middle, and high-quality segment last. (c) A high-quality signal at ground truth, but the sudden change of amplitude can be observed around 16 seconds.

Download Full Size | PDF

3.4 SQAs comparison

The experiment contains multiple traditional SQA features including skewness $S_{SQA}$ [21], kurtosis $K_{SQA}$ [21], entropy $E_{SQA}$ [28], signal-to-noise ratio $N_{SQA}$ [28], zero crossing rate $Z_{SQA}$ [21], template matching $TM_{SQA}$ [17], standard deviation of peak-to-peak time interval $ST_{SQA}$ [23], MLP [13], 1D-CNN, 2D-CNN [48,49]. Considering that the traditional features are unsuitable in real-time environment, the aim of this experiment is to assess the quality of a whole signal with fixed length.

In this section, two tasks were designed to discriminate noise from the high-quality BVP signals obtained by PPG and rPPG. The classification task C$_{rPPG}$ was to identify the BVP signals and the noise source signals extracted by rPPG in Section 2.3. And the task C$_{PPG}$ was to identify the BVP signals extracted by PPG and the noise source signals. The signals were sliced into 8-second segments, the segments of BVP signals were labeled as “1" and the segments of source signals were labeled as “0".

For LSTM-SQA, we selected trained ${F_8},{L_8},{F_8},{F_1}$ models in Section 3.2 to be compared. The models directly gave the quality score of each sampling point as shown in Fig. 6 and the mean of quality scores of the entire signal represented the score of the signal to be assessed. If the mean value was greater than 0.5, the input signal is determined to be BVP signal and that is identified as noise signal if the mean value was lower than 0.5. We selected SVM as classifier and randomly selected 100 positive samples and 100 negative samples as the test set. The results are shown in Table 2.

Tables Icon

Table 2. Performance results of traditional features and LSTM-SQA (unit: %)

For the single characteristic, the $Z_{SQA}$ reaches the best accuracy. Since reaching an accuracy close to 90%, the $K_{SQA}$, $E_{SQA}$, $N_{SQA}$ and $TM_{SQA}$ are also useful features for signal quality assessment. When we use all the above mentioned features as the input of SVM, the accuracies reach 97.5% and 99.5% on C$_{rPPG}$ task and C$_{PPG}$ task respectively. For LSTM-SQA, the average accuracy can reach higher than 98%. The LSTM-SQA can be further generalized to the cross-task (quality assessment for a fixed-length segment) by averaging the scores of each sampling point and the performances are better than the traditional methods. In most supervised deep learning-based methods, the accuracies reach 100%. However, they cannot continuously assess the signal quality as shown in Fig. 6.

3.5 Performance analysis on public rPPG datasets

In order to verify LSTM-SQA, we generated high-quality and low-quality hybrid signals by BSS on IIP-HCI dataset, UBFC-Phys dataset and LGI-PPGI dataset. With corrupted high-quality BVP signals obtained by PPG and rPPG, the tasks can be divided into PPG and rPPG based tasks. We designed six classify tasks: PPG based task (HCI$_{PPG}$) and rPPG based task (HCI$_{rPPG}$) on IIP-HCI dataset, PPG based task (UBFC$_{PPG}$) and rPPG based task (UBFC$_{rPPG}$) on UBFC-Phys dataset, PPG based task (LGI$_{PPG}$) and rPPG based task (LGI$_{rPPG}$) on LGI-PPGI dataset. The average accuracy is shown in Table 3.

Tables Icon

Table 3. Performance results on public datasets. (unit:%)

In this experiment, the performance of LSTM-SQA on UBFC$_{PPG}$ was poor. After in-depth analysis of the data, we found that not all the PPG signals were truly “high quality". The heartbeat waveform in part of PPG signals such as s10_T2, s11_T2 and s11_T3 that collected by wristband can not be easily recognized even by the human eye. This contradicts our hypothesis that the PPG signals are completely high-quality signals, and eventually leads to an average accuracy of 67.91%. While for the LGI$_{PPG}$ task, most of PPG signals are real high-quality signals, an accuracy higher than 82.50% is obtained. When we keep the oximeter in a stationary state during the data collection, the accuracy reached 89.36% for HCI$_{PPG}$ task. In rPPG signals, we have manually selected high-quality signals, and the real-time performance reached 87.31%, 82.40% and 85.23% on IIP-HCI, UBFC-Phys and LGI-PPGI respectively. The LSTM-SQA can be performed well on multiple datasets for signal quality prediction.

3.6 TSGM ablation study

In Table 4, we compared the weighted average of HR estimation error, the errors of HR were calculated by the PPG and G signals, and the weights were calculated by different LSTM-SQA models, which were trained on different datasets generated using variants of TSGM. When the LSTM-SQA model and TSGM algorithm are effective, the predicted value of the signal with a small heart rate error is high, and the predicted value of the signal with a large error is low, the weighted average error will be smaller than the average error. This table shows that both two ROIs and ICA are important parts of TSGM. It is also important to obtain high-quality BVP signals through manual selection. However, PPG signals that do not need manual selection also have high performance, the signal can be used as a substitute for the rPPG signal.

Tables Icon

Table 4. Ablation study. We compared TSGM to its variants by weighted average error of HR estimation, including: (i) removing one ROI, (ii) removing the BSS (using G signal as high-quality signal), (iii) removing manual selection (using result of BSS as high-quality BVP signal directly), and (iv) removing rPPG signal (using PPG signal as high-quality signal).

3.7 HR estimation analysis

In this section, we analyzed the correlation between EE-HR and quality score predicted by LSTM-SQA in the HCI$_{rPPG}$ that mentioned in the previous section. Unlike the training data set generated based on the TSGM, the test set consists of filtered and normalized G signals, so it is a cross-dataset verification experiment. The G signals extracted from facial videos collected by three different webcams were evaluated separately. Among the webcams, A-cam and M-cam were video collection equipment in the HCI$_{rPPG}$ task, while the L-cam was video collection equipment used in the LGI$_{rPPG}$ task. Therefore, verifying the model on the database collected by L-cam is a cross-device verification experiment. In each video, the subjects perform different behavioral tasks (resting, talking and facial rotation), resulting in different artificial noises in BVP. The EE-HR calculated in section of different cameras under different tasks are shown in the Table 5. The quality scores predicted by LSTM-SQA are shown in the Table 6.

Tables Icon

Table 5. Estimation error of HR under different webcams and tasks. (unit: bpm)

Tables Icon

Table 6. The output of LSTM-SQA under different webcams and tasks.

Table 5 and Table 6 show that under different cameras, the resting state has the lowest EE-HR and the highest quality score. The A-cam and the face remain relatively stationary, thereby suppressing the noise caused by facial rotation. The heart rate error of the facial rotation task is 7.41 bpm, and the quality score is 0.64. For M-cam and L-cam, the noise caused by talking increase the EE-HR to be 7.22 bpm and 11.10 bpm, and the quality scores are 0.34 and 0.05 respectively. The facial rotation creates even more noise, which leads to the largest EE-HR and the lowest quality score on the data collected by M-cam and L-cam. Generally speaking, with the same webcam, when the EE-HR increases, the quality score predicted by LSTM-SQA decreases.

Figure 7 shows the proportion of samples under different EE-HRs when the quality score is greater than 0.5 or less than 0.5. With different webcams, when the quality score is greater than 0.5, the EE-HR is concentrated in 0-2 bpm and 2-5 bpm. Since FFT is noise-robust to HR estimation, part low-quality signals have accurate heart rate estimation results. The average of EE-HR is 2.77 bpm when the quality score is greater than 0.5 and that is 12.58 bpm when the quality score is lower than 0.5. LSTM-SQA can effectively reduce the error of heart rate estimation and ensure the reliability of the estimated parameters.

 figure: Fig. 7.

Fig. 7. The proportion of samples with different estimation errors on the dataset collected by three different webcams. The black box is the proportion of samples with a quality score greater than 0.5 predicted by LSTM-SQA, and the white box is the proportion with a quality score less than 0.5. (a) A-cam. (b) M-cam. (c) L-cam.

Download Full Size | PDF

4. Discussion

This work demonstrates that the proposed LSTM-SQA can predict the quality of the BVP signal in real-time with lightweight framework. In Fig. 4, the dimension of hidden state vector in LSTM is related to the accuracy of model classification. This finding supports the importance of LSTM structure and confirms that the model needs enough memory space to retain context information. The memory cell with its contained information empowers the model to predict the signal quality in real-time.

In the morphological features based SQA methods, they assumed that the BVP signal can easily be segmented beat by beat, most of the pulse peaks are easily detected and the dicrotic notch, systolic and diastolic waves are easily observed. However, the quality of signal from rPPG is lower than that from PPG which makes the morphological features inappropriate. The SQA methods based on statistical or frequency-domain characteristics require adequate sampling points, which makes real-time processing difficult to achieve. Our approach attempts to address the new problem in rPPG and real-time system.

LSTM-SQA has achieved high classification accuracy in the task of identifying signal quality. In Section 3.5, we verified its performance on multiple databases. Examples of quality score predicted by LSTM-SQA are shown in Fig. 6. Furthermore, the output of LSTM-SQA is related to the HR estimation error and the motion and noise artifact of the facial video. In Table 5 and Table 6, we illustrate the mean value of quality score and mean absolute error of HR estimation extracted from the videos under different tasks. The more the noise, the greater error of HR estimate and lower the quality score, the heart rate estimation error and the quality score predicted by LSTM-SQA are negatively correlated.

Moreover, we analyzed the distribution of heart rate errors with scores greater than 0.5 and less than 0.5 respectively. As shown in the Fig. 7, when the score is greater than 0.5, the 0-2 bpm accounts for the largest proportion, and the 2-5 bpm accounts for the second. When the EE-HR is greater than 5, the “good quality" part obtained by LSTM-SQA accounts for a much larger proportion than the “bad quality" part. The LSTM-SQA enhances the reliability of heart rate estimation.

In addition to high classification performance, we have also solved two common problems in supervised deep learning. One is that the LSTM-SQA is a supervised learning method but there is no dataset with each sampling point is labeled with quality score. The other is that the larger number of parameters in deep learning makes it inappropriate for real-time operations.

For the former problem, we proposed BSS based method to generate high and low-quality mixed data, where each sampling point is labeled. Similar to the real-time sampling process, the sampling points are input into LSTM-SQA one by one, then LSTM-SQA can accurately predict whether the sampling point is good or not. Through corrupting random positions in high-quality signals with random selected noise, the model trained by the data generated by the TSGM is robust and performs well on cross-domain tasks.

For the latter problem, we reduced the parameters of the network and verify the stability and accuracy of the network under different numbers of neurons. Figure 4 and Fig. 6 show that the network still has high accuracy when the number of network parameters is less than 1000. In terms of real-time performance, we analyzed the computational cost of multiplication, and analyzed the time required for forward propagation on the Tensorflow and Numpy libraries of the Python language. Lightweight model [50] using frequency domain analysis is resource-hungry than our algorithm (The complexity of the fast Fourier transform (FFT) is $O(nlog(n))$. For a 8 seconds signal with the sampling rate of 30 Hz, the FFT requires about 1000 complex multiplication operations. Proposed LSTM-SQA requires 616 floating point multiplications). On ordinary hardware, it is easy to assess the quality score in real-time. A small number of parameters can also prevent the network from over-fitting. What’s more surprising is that LSTM-SQA found out the inaccurate in manual label in Fig. 6(c), which reveals that the algorithm seems to have the ability to verify the manual labels.

Real-time quality assessment can help rPPG to better predict physiological characteristics. For example, when the camera loses ROI (such as human face), the algorithm will automatically discard the current signal segment. When there is too much motion noise in the video to extract useful information, the segment can also be discarded. LSTM-SQA can help medical equipments provide a more accurate and stable monitoring performance and reduce false alarms.

There are still limitations of the proposed model. LSTM-SQA needs supervised data that have been annotated correctly. When the label is not accurate, for example, in UBFC, we default the PPG signals as high-quality signals, but the fact is not accurate, LSTM-SQA will fail. For the rPPG, the BSS could not determine the heartbeat source signal, so the data needed to be manually labeled.

5. Conclusion

The aim of this work is to develop a method to predict the quality of BVP signal in real-time. Here, a SQA model based on LSTM is proposed to predict the quality of sample point one by one. The proposed LSTM-SQA is particularly robust to BVP signals extracted from PPG and rPPG. The lightweight structure with less than 1000 parameters requires less than 1000 multiplications per time could accurately predict the signal quality in real-time. Aiming at the lack of rPPG datasets with quality annotation, this work also provides an effective training sample generation method for the research of signal quality assessment. The experimental results show that LSTM-SQA trained by the generating samples can ensure the reliability of the physiological parameters estimated from the actual signal.

Funding

Natural Science Foundation of Anhui Province (1908085MF203); University Natural Science Research Project of Anhui Province (KJ2020A0034).

Acknowledgments

The authors are grateful to the volunteers at Anhui University laboratory of intelligent information and human-computer interaction (IIP-HCI) for generously sharing their video data on persons interacting with computers.

Disclosures

The authors declare no conflicts of interest.

Data availability

The original video datasets are annotated by the relevant papers [3840] and the datasets are available from the authors of the papers. The BVP data annotated by the quality score generated by our algorithm are available in [51].

References

1. K. Shelley and S. Shelley, “Pulse oximeter waveform: photoelectric plethysmography,” Clinical Monitoring: Practical Applications for Anesthesia and Critical Care (Saunders2001), pp. 420–428.

2. J. Allen, “Photoplethysmography and its application in clinical physiological measurement,” Physiol. Meas. 28(3), R1–R39 (2007). [CrossRef]  

3. D. B. McCombie, A. T. Reisner, and H. H. Asada, “Adaptive blood pressure estimation from wearable PPG sensors using peripheral artery pulse wave velocity measurements and multi-channel blind identification of local arterial dynamics,” in 2006 International Conference of the IEEE Engineering in Medicine and Biology Society, (IEEE, 2006), pp. 3521–3524.

4. B. Wei, X. Wu, C. Zhang, and Z. Lv, “Analysis and improvement of non-contact SpO2 extraction using an RGB webcam,” Biomed. Opt. Express 12(8), 5227–5245 (2021). [CrossRef]  

5. M. Hassan, A. Malik, D. Fofi, N. Saad, and F. Meriaudeau, “Novel health monitoring method using an RGB camera,” Biomed. Opt. Express 8(11), 4838–4854 (2017). [CrossRef]  

6. G. Lu, F. Yang, J. A. Taylor, and J. F. Stein, “A comparison of photoplethysmography and ECG recording to analyse heart rate variability in healthy subjects,” J. Med. Eng. Technol. 33(8), 634–641 (2009). [CrossRef]  

7. X. Chen, J. Cheng, R. Song, Y. Liu, R. Ward, and Z. J. Wang, “Video-based heart rate measurement: recent advances and future prospects,” IEEE Trans. Instrum. Meas. 68(10), 3600–3615 (2019). [CrossRef]  

8. Y. Sun and N. Thakor, “Photoplethysmography revisited: from contact to noncontact, from point to imaging,” IEEE Trans. Biomed. Eng. 63(3), 463–477 (2016). [CrossRef]  

9. A. Lam and Y. Kuno, “Robust heart rate measurement from video using select random patches,” Proceedings of the IEEE International Conference on Computer Vision2015 Inter, 3640–3648 (2015) [CrossRef]  .

10. N. Gambarotta, F. Aletti, G. Baselli, and M. Ferrario, “A review of methods for the signal quality assessment to improve reliability of heart rate and blood pressures derived parameters,” Med. Biol. Eng. Comput. 54(7), 1025–1035 (2016). [CrossRef]  

11. U. Satija, B. Ramkumar, and M. Sabarimalai Manikandan, “A Review of Signal Processing Techniques for Electrocardiogram Signal Quality Assessment,” IEEE Rev. Biomed. Eng. 11, 36–52 (2018). [CrossRef]  

12. J. A. Sukor, S. J. Redmond, and N. H. Lovell, “Signal quality measures for pulse oximetry through waveform morphology analysis,” Physiol. Meas. 32(3), 369–384 (2011). [CrossRef]  

13. Q. Li and G. D. Clifford, “Dynamic time warping and machine learning for signal quality assessment of pulsatile signals,” Physiol. Meas. 33(9), 1491–1501 (2012). [CrossRef]  

14. X. Sun, P. Yang, and Y. T. Zhang, “Assessment of photoplethysmogram signal quality using morphology integrated with temporal information approach,” Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS pp. 3456–3459 (2012) [CrossRef]  .

15. C. Orphanidou, T. Bonnici, P. Charlton, D. Clifton, D. Vallance, and L. Tarassenko, “Signal-quality indices for the electrocardiogram and photoplethysmogram: derivation and applications to wireless monitoring,” IEEE J. Biomed. Health Inform. 19(3), 832–838 (2014). [CrossRef]  

16. G. B. Papini, P. Fonseca, X. L. Aubert, S. Overeem, J. W. Bergmans, and R. Vullings, “Photoplethysmography beat detection and pulse morphology quality assessment for signal reliability estimation,” Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS pp. 117–120 (2017). [CrossRef]  

17. H. S. Seok, S. Han, J. Park, D. Roh, and H. Shin, “Photoplethysmographic pulse quality assessment methods based on similarity analysis,” 2018 Joint 10th International Conference on Soft Computing and Intelligent Systems (SCIS) and 19th International Symposium on Advanced Intelligent Systems (ISIS) (2018), pp. 350–353. [CrossRef]  

18. K. Li, S. Warren, and B. Natarajan, “Onboard tagging for real-time quality assessment of photoplethysmograms acquired by a wireless reflectance pulse oximeter,” IEEE Trans. Biomed. Circuits Syst. 6(1), 54–63 (2012). [CrossRef]  

19. W. Karlen, K. Kobayashi, J. M. Ansermino, and G. A. Dumont, “Photoplethysmogram signal quality estimation using repeated Gaussian filters and cross-correlation,” Physiol. Meas. 33(10), 1617–1629 (2012). [CrossRef]  

20. S. Alam, S. Datta, A. D. Choudhury, and A. Pal, “Sensor agnostic photoplethysmogram signal quality assessment using morphological analysis,” ACM International Conference Proceeding Series (2017), pp. 176–185. [CrossRef]  

21. M. Elgendi, “Optimal signal quality index for photoplethysmogram signals,” Bioengineering 3(4), 21 (2016). [CrossRef]  

22. Y. Zhang and J. Pan, “Assessment of photoplethysmogram signal quality based on frequency domain and time series parameters,” 2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI) (2018), 1–5. [CrossRef]  

23. T. Pereira, K. Gadhoumi, M. Ma, R. Colorado, and X. Hu, “Robust assessment of photoplethysmogram signal quality in the presence of atrial fibrillation,” 2018 Computing in Cardiology Conference (2018), 1–4 [CrossRef]  

24. J. Zhang, L. Yang, Z. Su, X. Mao, K. Luo, and C. Liu, “Photoplethysmogram signal quality assessment using support vector machine and multi-feature fusion,” J. Med. Imaging Health Inform. 8(9), 1757–1762 (2018). [CrossRef]  

25. E. Sabeti, N. Reamaroon, M. Mathis, J. Gryak, M. Sjoding, and K. Najarian, “Signal quality measure for pulsatile physiological signals using morphological features: Applications in reliability measure for pulse oximetry,” Informatics Med. Unlocked 16, 100222 (2019). [CrossRef]  

26. N. Pradhan, S. Rajan, and A. Adler, “Evaluation of the signal quality of wrist-based photoplethysmography,” Physiol. Meas. 40(6), 065008 (2019). [CrossRef]  

27. S. H. Liu, J. J. Wang, W. Chen, K. L. Pan, and C. H. Su, “Classification of photoplethysmographic signal quality with fuzzy neural network for improvement of stroke volume measurement,” Appl. Sci. 10(4), 1476 (2020). [CrossRef]  

28. M. S. Zaman and B. I. Morshed, “Estimating reliability of signal quality of physiological data from data statistics itself for real-time wearables,” 2020 42nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) (2020), pp. 5967–5970.

29. T. Pereira, K. Gadhoumi, M. Ma, X. Liu, R. Xiao, R. A. Colorado, K. J. Keenan, K. Meisel, and X. Hu, “A supervised approach to robust photoplethysmography quality assessment,” IEEE J. Biomed. Health Inform. 24(3), 649–657 (2020). [CrossRef]  

30. F. Mohagheghian, D. Han, A. Peitzsch, N. Nishita, E. Ding, E. L. Dickson, D. DiMezza, E. M. Otabil, K. Noorishirazi, J. Scott, D. Lessard, Z. Wang, C. Whitcomb, K.-V. Tran, T. P. Fitzgibbons, D. D. McManus, and K. H. Chon, “Optimized signal quality assessment for photoplethysmogram signals using feature selection,” IEEE Trans. Biomed. Eng. 69(9), 2982–2993 (2022). [CrossRef]  

31. T. Blöcher, J. Schneider, M. Schinle, and W. Stork, “An online PPGI approach for camera based heart rate monitoring using beat-to-beat detection,” 2017 IEEE Sensors Applications Symposium (SAS) (2017), pp. 1–6. [CrossRef]  

32. W. Wang, B. Balmaekers, and G. De Haan, “Quality metric for camera-based pulse rate monitoring in fitness exercise,” 2016 IEEE International Conference on Image Processing (ICIP), (2016), pp. 2430–2434. [CrossRef]  

33. S. Fallet, Y. Schoenenberger, L. Martin, F. Braun, V. Moser, and J.-M. Vesin, “Imaging photoplethysmography: A real-time signal quality index,” 2017 Computing in Cardiology (CinC), (2017), pp. 1–4. [CrossRef]  

34. Y. Benezeth, S. Bobbia, K. Nakamura, R. Gomez, and J. Dubois, “Probabilistic signal quality metric for reduced complexity unsupervised remote photoplethysmography,” 2019 13th International Symposium on Medical Information and Communication Technology (ISMICT), (2019), pp. 1–5. [CrossRef]  

35. H. Ernst, H. Malberg, and M. Schmidt, “More reliable remote heart rate measurement by signal quality indexes,” in 2020 Computing in Cardiology, (2020), pp. 1–4. [CrossRef]  

36. H. Gao, X. Wu, C. Shi, Q. Gao, and J. Geng, “A LSTM-based realtime signal quality assessment for photoplethysmogram and remote photoplethysmogram,” 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2021), pp. 3826–3835. [CrossRef]  

37. N. Simmler, P. Sager, P. Andermatt, R. Chavarriaga, F.-P. Schilling, M. Rosenthal, and T. Stadelmann, “A survey of un-, weakly-, and semi-supervised learning methods for noisy, missing and partial labels in industrial vision applications,” 2021 8th Swiss Conference on Data Science (SDS), (2021), pp. 26–31. [CrossRef]  

38. J. Geng, C. Zhang, H. Gao, Y. Lv, and X. Wu, “Motion resistant facial video based heart rate estimation method using head-mounted camera,” 2021 IEEE International Conferences on Internet of Things (iThings) and IEEE Green Computing & Communications (GreenCom) and IEEE Cyber, Physical & Social Computing (CPSCom) and IEEE Smart Data (SmartData) and IEEE Congress on Cybermatics (Cybermatics), (IEEE, 2021), pp. 229–237. [CrossRef]  

39. R. Meziatisabour, Y. Benezeth, P. De Oliveira, J. Chappe, and F. Yang, “UBFC-Phys: a multimodal database for psychophysiological studies of social stress,” IEEE Transactions on Affective Computing (2021),pp. 1–16. [CrossRef]  

40. C. S. Pilz, S. Zaunseder, J. Krajewski, and V. Blazek, “Local group invariance for heart rate estimation from face videos in the wild,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (2018), 1335–1343. [CrossRef]  

41. P. Comon, “Contrasts, independent component analysis, and blind deconvolution,” Int. J. Adapt. Control Signal Process. 18(3), 225–243 (2004). [CrossRef]  

42. Y.-P. Yu, P. Raveendran, C.-L. Lim, and B.-H. Kwan, “Dynamic heart rate estimation using principal component analysis,” Biomed. Opt. Express 6(11), 4610–4618 (2015). [CrossRef]  

43. L. Kong, Y. Zhao, L. Dong, Y. Jian, X. Jin, B. Li, Y. Feng, M. Liu, X. Liu, and H. Wu, “Non-contact detection of oxygen saturation based on visible light imaging device using ambient light,” Opt. Express 21(15), 17464–17471 (2013). [CrossRef]  

44. Z. Zhong, L. Zheng, G. Kang, S. Li, and Y. Yang, “Random erasing data augmentation,” Proc. AAAI Conf. on Artif. Intell. 34(7), 13001–13008 (2020). [CrossRef]  

45. L. G. Rocha, G. Paim, D. Biswas, S. Bampi, F. Catthoor, C. Van Hoof, and N. Van Helleputte, “LSTM-only model for low-complexity HR estimation from wrist PPG,” 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), (2021), pp. 1068–1071. [CrossRef]  

46. M. Wójcikowski, “Real-time ppg signal conditioning with long short-term memory (lstm) network for wearable devices,” Sensors 22(1), 164 (2021). [CrossRef]  

47. M. Christiaan, “Pythonvideopulserate,” Github, 2020, https://github.com/MartinChristiaan/-PythonVideoPulserate.

48. D. Roh and H. Shin, “Recurrence plot and machine learning for signal quality assessment of photoplethysmogram in mobile environment,” Sensors 21(6), 2188 (2021). [CrossRef]  

49. S.-H. Liu, R.-X. Li, J.-J. Wang, W. Chen, and C.-H. Su, “Classification of photoplethysmographic signal quality with deep convolution neural networks for accurate measurement of cardiac stroke volume,” Appl. Sci. 10(13), 4612 (2020). [CrossRef]  

50. A. Mahmoudzadeh, I. Azimi, A. M. Rahmani, and P. Liljeberg, “Lightweight photoplethysmography quality assessment for real-time IOT-based health monitoring using unsupervised anomaly detection,” Procedia Comput. Sci. 184, 140–147 (2021). [CrossRef]  

51. H. Gao, “The BVP data annotated by the quality score generated by TSGM,” Github, 2023, https://github.com/ghy2718437/rPPG_SQA.

Data availability

The original video datasets are annotated by the relevant papers [3840] and the datasets are available from the authors of the papers. The BVP data annotated by the quality score generated by our algorithm are available in [51].

38. J. Geng, C. Zhang, H. Gao, Y. Lv, and X. Wu, “Motion resistant facial video based heart rate estimation method using head-mounted camera,” 2021 IEEE International Conferences on Internet of Things (iThings) and IEEE Green Computing & Communications (GreenCom) and IEEE Cyber, Physical & Social Computing (CPSCom) and IEEE Smart Data (SmartData) and IEEE Congress on Cybermatics (Cybermatics), (IEEE, 2021), pp. 229–237. [CrossRef]  

40. C. S. Pilz, S. Zaunseder, J. Krajewski, and V. Blazek, “Local group invariance for heart rate estimation from face videos in the wild,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (2018), 1335–1343. [CrossRef]  

51. H. Gao, “The BVP data annotated by the quality score generated by TSGM,” Github, 2023, https://github.com/ghy2718437/rPPG_SQA.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (7)

Fig. 1.
Fig. 1. Different analysis scales in the SQA researches and the BVP waveforms of high quality and low quality extracted by rPPG. (a) Single beat segment based quality assessment. (b) Fixed-length segment based quality assessment. (c) Sampling point based quality assessment for the high-quality and low-quality hybrid signals, the signal in black is the quality score (high-quality in "1" and low-quality in "0"), which is the target output (label) of the proposed LSTM-SQA network.
Fig. 2.
Fig. 2. A flowchart of BVP signal and noise signal extraction.
Fig. 3.
Fig. 3. Internal structure of LSTM-SQA. The net contains three units: the first unit is a fully connected layer for input; the second unit is a standard LSTM block; the third unit is two fully connected layers for output. LSTM-SQA predicts the quality score of the sampling point one by one based on the value of the sampling point and the information retained in the memory cell. The model receives sampling points one by one and predicts the corresponding quality score $\widehat {label_{gen}}$ for each sampling point.
Fig. 4.
Fig. 4. Ranges of accuracy for different numbers of neurons in each layer. (a) Ranges of accuracy for different numbers of neurons in a fully connected layer before LSTM layer. (b) Ranges of accuracy for different dimensions of hidden state vector in a single LSTM layer. (c) Ranges of accuracy for different numbers of neurons in a fully connected layer following LSTM layer.
Fig. 5.
Fig. 5. A Python interface developed for visualizing the real-time performance and the results of proposed LSTM-SQA. The signal in green is the G signal of rPPG and the signal in white is the quality score predicted by LSTM-SQA.
Fig. 6.
Fig. 6. Results of the proposed LSTM-SQA. The black line represents the ground truth , the Label “1" is high quality and the Label “0" is low quality. The black dotted line is the output of LSTM-SQA. (a) Low-quality segment first, high-quality segment in the middle and low-quality segment in the end. (b) High-quality segment first, low-quality segment in the middle, and high-quality segment last. (c) A high-quality signal at ground truth, but the sudden change of amplitude can be observed around 16 seconds.
Fig. 7.
Fig. 7. The proportion of samples with different estimation errors on the dataset collected by three different webcams. The black box is the proportion of samples with a quality score greater than 0.5 predicted by LSTM-SQA, and the white box is the proportion with a quality score less than 0.5. (a) A-cam. (b) M-cam. (c) L-cam.

Tables (6)

Tables Icon

Table 1. Number of parameters in LSTM-SQA under different hyper-parameters.

Tables Icon

Table 2. Performance results of traditional features and LSTM-SQA (unit: %)

Tables Icon

Table 3. Performance results on public datasets. (unit:%)

Tables Icon

Table 4. Ablation study. We compared TSGM to its variants by weighted average error of HR estimation, including: (i) removing one ROI, (ii) removing the BSS (using G signal as high-quality signal), (iii) removing manual selection (using result of BSS as high-quality BVP signal directly), and (iv) removing rPPG signal (using PPG signal as high-quality signal).

Tables Icon

Table 5. Estimation error of HR under different webcams and tasks. (unit: bpm)

Tables Icon

Table 6. The output of LSTM-SQA under different webcams and tasks.

Equations (17)

Equations on this page are rendered with MathJax. Learn more.

x ( t ) = A s ( t )
s ( t ) = [ s 1 ( t ) , s 2 ( t ) , , s n ( t ) ] T
x ( t ) = [ x 1 ( t ) , x 2 ( t ) , , x m ( t ) ] T
y ( t ) = [ y 1 ( t ) , y 2 ( t ) , , y n ( t ) ] T
s g e n 0 ( t ) = s h i g h ( t ) , t = [ 1 , 2 , 3 , , 900 ]
l a b e l g e n 0 ( t ) = 1 , t = [ 1 , 2 , 3 , , 900 ]
t l o w k = [ t l k , t l k + 1 , t l k + 2 , , t l k + l n k ] , n l k + l n k 900
s g e n k ( t l o w k ) = s g e n k 1 ( t l o w k ) + r i n t e n k s n o i s e k
l a b e l g e n k ( t l o w k ) = 0
f t = σ ( W f e t + R f h t 1 + b f )
i t = σ ( W i e t + R i h t 1 + b i )
o t = σ ( W o e t + R o h t 1 + b o )
c ~ t = tanh ( W c e t + R c h t 1 + b c )
c t = f t c t 1 + i t c ~ t
h t = o t tanh ( c t )
E E H R ( t ) = { | H R ( t ) H R ( t ) | , | H R ( t ) H R ( t ) | < 50 50 , i f | H R ( t ) H R ( t ) | 50
W E E H R = ( l a b e l g e n ^ ( t ) × E E H R ( t ) ) l a b e l g e n ^ ( t )
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.