Preserving shape details of pulse signals for video-based blood pressure estimation

Xuesong Han; Xuesong Han; Xuezhi Yang; Xuezhi Yang; Shuai Fang; Shuai Fang; Yawei Chen; Yawei Chen; Qin Chen; Qin Chen; Longwei Li; RenCheng Song

doi:10.1364/BOE.516388

1. Introduction

Blood pressure (BP) is a crucial physiological indicator that mirrors the health status of the human body. Hypertension can significantly increase the risk of cardiovascular diseases such as cerebral hemorrhage and coronary heart disease [1]. Presently, the global count of hypertensive patients has surpassed 1.2 billion [2]. Daily BP monitoring proves effective in controlling BP by facilitating the early detection of abnormal changes. The current method of measuring BP using traditional cuff sphygmomanometers is not portable and necessitates professional assistance. Therefore, there is a pronounced demand for a more convenient and comfortable cuffless method of measuring BP.

Contact photoplethysmography (cPPG) is a technique capable of capturing variations in the intensity of reflected light using a photoelectric sensor to extract changes in blood vessel volume within the human body. The cPPG signal is closely related to the pulse and enables the extraction of numerous features associated with human physiological parameters. cPPG devices, requiring only skin contact, have emerged as a prominent avenue of research for cuffless BP measurement [3,4]. There are two methods for BP estimation using cPPG pulse signals, one based on pulse transit time (PTT) and the other based on pulse features. PTT can be measured by the transit time of the pulse signal between two points in the artery and has been confirmed to be related to the BP values [5,6]. Certain researchers [7–9] have effectively implemented BP estimation using PTT. The feature-based BP estimation is achieved by constructing a model through pulse features extracted from the pulse signal. Compared to the PTT-based method, this method requires only one sensor and has a lower requirement for sensor accuracy.

In recent years, a contactless pulse extraction technique called imaging photoplethysmography (iPPG) has received extensive attention from researchers. This technique enables the extraction of pulse signals from video, making it suitable for implementation using a camera. As a non-contact technology, iPPG can eliminate the discomfort caused by cPPG devices, which is especially suitable for burn patients, newborns and other people who are inconvenient to use contact devices. At the same time, the non-contact measurement can effectively avoid the risk of spreading infectious diseases. In addition, iPPG technology can leverage the high penetration of cell phones to achieve universal health screening without adding extra costs. The technology can also be combined with drones to realize emergency diagnosis of the injured. Or it can provide services for smart devices, such as smart cars, to increase product value. Some researchers have already conducted studies on the calculation of PTT using iPPG technology. Jeong et al. [10] measured the imaging PTT between the head and hand of subjects using a high-speed camera. The experiment confirmed a relatively strong correlation between the imaging PTT and BP. Other researchers have also verified the feasibility of using imaging PTT for BP estimation [11–13]. Unfortunately, the extraction of imaging PTT does have a high requirement for the frame rate of the camera, which significantly restricts the practical application of this technique. BP estimation methods based on pulse features extracted from iPPG pulse signals have gained significant attention in the field of non-contact BP measurement. Ding et al. [14] propose that there is a quite strong correlation between the feature of pulse width at half amplitude and BP values. Djeldjli et al. [15] have validated the strong correlation between the pulse features extracted from iPPG and cPPG in an ideal acquisition environment. Rong et al. [16] demonstrate that the auxiliary light source is not necessary for BP estimation based on iPPG pulse features. Padilla et al. [17] try to use the neural network to extract features directly from pulse signals for the BP estimation. Other researchers have also validated the feasibility of estimating BP based on iPPG pulse features [18–21].

The performance of feature-based BP estimation is directly influenced by the quality of pulse shape. However, the extraction of iPPG pulse signals is susceptible to interference from factors like illumination variations or motion artifacts in real-world scenes. To address this issue, numerous anti-interference algorithms have been developed for extracting reliable iPPG pulse signals [22–26]. However, most of these algorithms are designed with a focus solely on the accuracy of heart rate, without conducting an in-depth study on the shape information of iPPG pulse signals. There is a critical need for the development of a method capable of effectively eliminating interferences while ensuring the preservation of valuable shape information. But it is difficult to determine the specific information that should be retained. Wang et al. [27] and Tonget al. [28] attempted to preserve the pulse details of iPPG using self-adaptive singular spectrum analysis (SA-SSA) and biorthogonal wavelet (Bior-Wavelet), respectively. These two methods are effective in removing noise, but they result in the loss of valuable shape details because the value of the shape information cannot be recognized. The research by Song et al. [29] verified that it is feasible to transform facial iPPG pulse signals into the corresponding down-sampled fingertip cPPG pulse signals. Fingertip cPPG pulse signals can be considered the non-interference reference signal under a stable measurement environment. The transformation of pulse signals from iPPG to cPPG provides a new perspective on this issue. Nevertheless, the improvement in BP estimation is unsatisfactory by using the directly transformed pulse signals. The main reason is that, for a period of pulse signal, the shape distortion of any cardiac cycle may affect the performance of BP estimation. But it is hard to achieve the requirement of shape error for every cardiac cycle.

We find that the difficulty of signal transformation decreases with the reduction in the length of the reference signal. Typically, in the resting state, the human BP and pulse signal retain stable for a short period of time. In this case, shape information of the entire pulse signal can be approximated by an average cardiac cycle (ACC) signal. Based on the above facts, a new transformation method of iPPG pulse signals using the cPPG ACC as the reference signal is proposed. Due to the significant reduction in the length of the reference signal, it is easier to achieve a better performance on the error of signal shape at a higher sampling rate. In this paper, a neural network using multi-scale convolution and self-attention mechanism is developed to realize this transformation. The transformed signal successfully reach a sampling rate of 200Hz and are capable to capture valuable shape information under noise interference. Our method has been conducted on a dataset collected from 491 patients in the hospital without auxiliary equipments. Most of the subjects are elderly people (mean age 60 years) that are more susceptible to the effects of hypertension. A total of 24 pulse features are chosen for the experiment. The results show that the proposed method can effectively improve the value of maximal information coefficient between pulse features and BP values which means a stronger correlation. We have also tested the effect of the proposed method on different BP estimation models. The error of BP estimation have better performance than other iPPG pulse extraction methods.

The main contributions of this article are as below.

• A method that can preserve shape details of iPPG pulse signals is proposed for BP estimation. iPPG pulse signals (30Hz) are transformed into the ACC of cPPG pulse signals (200Hz) which can effectively reduce the shape error in the transformation process.
• A neural network using multi-scale convolution and self-attention mechanism is developed to implement the transformation from iPPG pulse signals to the ACC of cPPG pulse signals.
• The research in this paper is conducted on a database collected from elderly people in real-world scenes, which has a positive significance to promote the clinical application of iPPG technology.

2. Method

This section begins with an overview of the proposed method. The experimental dataset and the corresponding data preprocessing are subsequently introduced. Lastly, a detailed description of the multi-scale self-attention (MSSA) network is provided.

2.1 System overview

The main processing flow of our method is presented in Fig. 1, which consists of two core components: data preprocessing and model construction. The data preprocessing step is divided into the preprocessing of video data and cPPG data. These two parts are responsible for generating the input and label data required by the MSSA network, respectively.

Fig. 1. Flowchart of the proposed method. The MSSA network developed in this study transforms the iPPG pulse signal (30Hz) into a ACC signal of cPPG (200Hz). Pulse features extracted from the transformed ACC signal exhibit a stronger correlation with BP values. Errors of BP estimation are also reduced using the transformed pulse signal.

Download Full Size | PDF

The MSSA network utilizes multi-scale convolution to extract features from the iPPG signal (30Hz) to generate the corresponding cPPG average cycle (200Hz). The features of different scales can, on the one hand, extract more comprehensive pulse shape information, and on the other hand, reduce the interference of local noise. To further mitigate the impact of noise, we choose to employ a self-attention mechanism to adjust feature weights, assigning higher weights to periods with better signal quality. The length of the cPPG individual cycle signal is variable, and we address this issue by using a padding method. In order to ensure that the network learns useful information from the labels, we adjust the loss function from mean squared error (MSE) to weighted MSE, assigning higher weights to the effective data in the labels.

2.2 Experimental dataset

The data used in this study are collected from a total of 491 patients who receive treatment in the Department of Cardiovascular Medicine and Cardiac Rehabilitation Center at the First Affiliated Hospital of the University of Science and Technology of China. The process of data collection is ethical. Population of our dataset comprises 63.5% males and 36.5% females, with a mean age of 60 years. The age, BP, and gender distributions of the subjects are shown in Fig. 2.

Fig. 2. Data distribution of age (a), SBP (b), DBP (c), and gender (d) in the experiment dataset

Download Full Size | PDF

Data collection is conducted under standard hospital lighting conditions (with an illumination intensity of at least 300 lux) without specialized auxiliary light sources (shown in Fig. 3). The collection period spans two years and lasts from 9:30 am to 5:30 pm. During data collection, the distance between the face of the subject and the camera is about 0.5m. Both facial videos and cPPG signals are collected at the same period for about a minute. Facial videos are captured using the front camera of a Surface Pro7 laptop with a resolution of 1920$\times$1080 and a frame rate of 30 fps. And a fingertip cPPG sensor (Model: ZJE PWS-20D) is used for acquiring the cPPG signals at a sampling rate of 200 Hz. The true BP values are obtained through an Omron medical BP monitor. During data acquisition, the BP of the subject is measured first, and then the facial video and fingertip cPPG data are collected immediately. The BP value is collected from the left upper arm of the subject and is recorded once for each set of data. Video data is saved in MP4 format with MPEG-4 compression standard. Subjects are instructed to maintain a natural state and gaze directly at the camera. Slight head movements are permitted while the finger of all the subjects are kept stationary.

Fig. 3. Environments of data collection

Download Full Size | PDF

2.3 Data preprocessing

2.3.1 Video data preprocessing

The workflow of video data preprocessing is presented in Fig. 4. Initially, a suitable region of interest (ROI) on the face needs to be selected from the video. And then, the mean value of the G-channel pixel values within the selected ROI is calculated. In this study, the selected ROI is the region below the eyes and above the mouth, which has a rich distribution of capillaries and yields a high signal-to-noise ratio for facial iPPG pulse signals. Moreover, this region can avoid interference from eye blinks and mouth movements. Dlib is used for facial landmark detection, and the position of the ROI is determined based on the detected facial landmarks (shown in Fig. 4(a)).

Fig. 4. Processing steps of video data. (a) Select iPPG ROI using face landmarks. (b) Extract raw iPPG signals using G-channel pixel averaging. (c) Detrending and normalization.

Download Full Size | PDF

As there is no control of ambient light, the change of illumination is a common phenomenon during data collection which can lead to a baseline drift for iPPG pulse signals (shown in Fig. 4(b)). To address this issue, the smoothness priors approach (SPA) [30] is employed to perform baseline removal. Following this step, the iPPG pulse signal needs to be normalized, and the maximum-minimum normalization method is chosen for this purpose (shown in Fig. 4(c)). To avoid the loss of shape information, the extracted iPPG pulse signal is not filtered in this paper.

2.3.2 cPPG data preprocessing

The procedure for preprocessing cPPG data is illustrated in Fig. 5. Initially, the cPPG signal is segmented based on the cardiac cycle. Every two adjacent valley positions correspond to one cardiac cycle. Based on the detected valley positions, the cPPG signal is divided into several individual cardiac cycle signals (Fig. 5(a)). The divided individual cardiac cycle signals are denoted as cPPG-SCC. Subsequently, signals in the cPPG-SCC are amalgamated to generate a cPPG ACC signal (Fig. 5(b)). Length of the ACC signal is the mean of the maximum and minimum length of the signal in cPPG-SCC. And the $\mathrm {n}$th value of ACC is the mean of the corresponding nth values of the signal in cPPG-SCC. The synthesized signal undergoes normalization through the application of maximum and minimum normalization techniques. Owing to the variable length of cPPG ACC signals across subjects, the extracted signal is extended to a fixed length, denoted as $\mathrm {L}_\mathrm {fix}$, to enable the computation of the loss function for the MSSA network.

Fig. 5. Processing steps of cPPG data. (a) Perform valley detection and segment the cPPG signal into individual cardiac cycles based on the location of the valleys. (b) Generating average cardiac cycle by combining individual cardiac cycles. (c) Normalization and padding.

Download Full Size | PDF

The length of $\mathrm {L}_\mathrm {fix}$ should be no less than the maximum length of the cardiac cycle signal. It is important to note that the relationship between cardiac cycle length and heart rate (HR) exhibits an inverse proportionality, which can be expressed using Eq. (1).

(1)$${ \mathrm{HR}=\frac{\mathrm{F}}{\mathrm{L}_{\mathrm{fix}}}\cdot 60}$$

where $\mathrm {F}$ is the sampling rate that equals 200 for the cPPG signal in this paper.

Generally, human heart rate is not lower than 40 beats per minute (bpm), and the corresponding length of the cardiac cycle is 300. Therefore, cPPG ACC signals extracted in this study are uniformly padded to a data length of 300. The padding value used in this study is −1, as opposed to the commonly used value of 0. This padding method facilitates the separation of transformed ACC signal and padding data from the model output. The final label of MSSA network after cPPG preprocessing is shown as Fig. 5(c).

2.4 Model construction

The MSSA network constructed in this paper consists of several MSSA blocks, followed by a global average pooling (GAP) layer and a fully connected (FC) layer (as shown in Fig. 1). The sizes of the input and output layers are (600,1) and (300,1) respectively. The method proposed in this paper is based on the assumption that the human pulse signal remains stable within a short time window. However, it is worth noting that a shorter length of the input signal may result in a more pronounced interference from noise. In this study, the input signal length is set to 600, which is able to achieve a balance according to our experience.

2.4.1 Multi-scale self-attention block

The detailed structure of MSSA block is presented in Fig. 6. We define convolution layers of varying scales to extract multi-scale features through one-dimensional convolutions with different kernel sizes. After that, batch normalization is used to normalize the output of multi-scale features extracted by the convolution layers. Then, the self-attention layers are employed to adjust weights according to the value of these features. After integrating information across all scales, a rectified linear unit (ReLU) activation layer is employed, followed by the addition of a dropout layer that can mitigate overfitting.

Fig. 6. Structure of MSSA block. The MSSA block extracts features through several one-dimensional convolutional kernels of different scales, while utilizing a self-attention mechanism to adjust the weight allocation of the features.

Download Full Size | PDF

2.4.2 Loss function

In this paper, mean squared loss (MSE) is selected as the loss function, and some modifications are made based on the original function. Loss values corresponding to the ACC signal in the label should be assigned greater weights than those of padding data, which is helpful for the network to learn useful information. The MSE loss function is weighted accordingly which is defined in Eq. (2) and Eq. (3).

(2)$${ \mathrm{L}_{\mathrm{n}}=\left\{ \begin{array}{rr} \alpha \left( \mathrm{Y}_{\mathrm{n}}-\mathrm{P}_{\mathrm{n}} \right) ^2 & \mathrm{Y}_{\mathrm{n}}\geqslant 0\\ \left( \mathrm{Y}_{\mathrm{n}}-\mathrm{P}_{\mathrm{n}} \right) ^2 & \mathrm{Y}_{\mathrm{n}}<0\\ \end{array} \right. }$$

(3)$${ \mathrm{Loss}=\frac{1}{\mathrm{N}}\sum_{\mathrm{n}=1}^{\mathrm{N}}{\mathrm{L}_{\mathrm{n}}}}$$

where ${ \mathrm {Y}_{\mathrm {n}}}$ and ${ \mathrm {P}_{\mathrm {n}}}$ are the ${ \mathrm {n}}$th value of label and prediction respectively, $\mathrm {N}$ is the length of label, and the weight factor $\alpha$ should no less than 1. Since ACC signals are normalized to the range from 0 to 1 and the padding data is −1, it is easy to distinguish between the valid and padding data by determining whether ${ \mathrm {Y}_{\mathrm {n}}}$ is less than 0.

3. Experiments

3.1 Setup

We randomly select the data of 191 out of 491 subjects as the test set and the remaining 300 as the training set. The BP distributions of the divided training and test sets are consistent. For each set of data, the first and last 2 seconds of the video and cPPG data are removed to reduce interference, after which a 20 seconds time window with a sliding step of 10 seconds is used to truncate the data. After truncating, the data sizes of the training set and test set are 1500 cases and 833 cases respectively. The division of test and training sets follows the subject-independent configuration, which means that the truncated data of the subjects in the training set will not appear in the test set. We use five-fold cross-validation to train five sets of models on the training set, and then test these models on the test set. The final experimental result is the mean of these test results.

In this paper, the MSSA network consists of 4 MSSA blocks, and each MSSA block follows the same structure. Each MSSA block contains three convolutional layers at different scales, with kernel sizes of $1\times 3$, $1\times 7$, and $1\times 15$, respectively. The parameters are consistent for each scale convolution layer, where the number of convolution filters and stride are 32 and 1 respectively, and the padding mode is same. The rate of dropout layer is set to 0.2 for each MSSA block. An Adam optimizer with a 0.001 learning rate and a 0.000001 decay rate is employed for the training. The epoch number and batch size of training are 550 and 32 respectively. The loss function is the weighted MSE whose weight factor $\alpha$ is set to 5. The training of model is conducted on a NVIDIA GeForce RTX 3060 Laptop GPU using Tensorflow2.7-GPU version.

3.2 Evaluation metrics

3.2.1 Pulse features

A total of 24 pulse features are selected for the evaluation. The specific categories are shown below.

• Pulse width features: Consisting of the ascending and descending width at $\mathrm {h{\% }}$ pulse height [31]. These features are denoted as $\mathrm {AW}_\mathrm {h}$ and $\mathrm {DW}_\mathrm {h}$ respectively (shown in Fig. 7), where $\mathrm {h=10,25,50,75,90}$.
• Derivative features: Consisting of the slope of the ascending and descending branches that are denoted as AS and DS respectively [32]. AS and DS correspond to the slopes of lines $\mathrm {V_1}\mathrm {P}$ and $\mathrm {P}\mathrm {V_2}$ in Fig. 7, respectively.
• Time features: Consisting of the ascending and descending branch time that are denoted as AT and DT respectively (shown in Fig. 7) [33].
• Morphology features: Consisting of AA, DA, RAA, RDA, Kurt, Skew, K, and RI. AA and DA are the area of ascending and descending branches (shown in Fig. 7) [33]. RAA and RDA are the ratios of AA and DA to the whole pulse area (AA plus DA) [33]. Kurt and Skew are the kurtosis and skewness of the cardiac cycle signal [33]. K is the K value [34]. RI (reflection index) [35] is the ratio of DA and AA.
• Energy feature: Teager–Kaiser energy (KTE) [36].
• Frequency feature: Heart rate (HR) [33].

Fig. 7. Diagram of pulse features extraction

Download Full Size | PDF

3.2.2 Pearson correlation coefficient

Pearson correlation coefficient (PCC) is used to evaluate the linear correlation between pulse features extracted from both transformed ACC signals and reference signals.

3.2.3 Maximal information coefficient

The correlation between pulse features and BP may be nonlinear. The PCC describes linear correlations well but is not applicable to nonlinear scenarios. The maximal information coefficient (MIC) [37] is a metric that determines the correlation between variables by globally optimizing the mutual information and describes arbitrary forms of correlation well. The metric is robust and the output range is normalized. In this paper, MIC is used to calculate the correlation between pulse features and BP. MIC_SBP is the MIC between pulse features and SBP. MIC_DBP is the MIC between pulse features and DBP.

3.2.4 Error of average cardiac cycle signal

The error of ACC signals are consisting of the error of ACC length and waveform. These two indexes are calculated by the absolute percentage error (APE) and are used to evaluate the shape error of transformed ACC signals.

The APE of ACC length is denoted as $\mathrm {APE}_{\mathrm {len}}$ and can be calculated as Eq. (4).

(4)$${ \mathrm{APE}_{\mathrm{len}}=\left| \frac{\mathrm{Len}_{\mathrm{T}}-\mathrm{Len}_{\mathrm{P}}}{\mathrm{Len}_{\mathrm{T}}} \right|}$$

where ${ \mathrm {Len}_{\mathrm {T}}}$ and ${ \mathrm {Len}_{\mathrm {P}}}$ are the length of the reference ACC signal and the transformed ACC signal.

The APE of ACC waveforms is denoted as $\mathrm {APE}_{\mathrm {wave}}$ and can be calculated as Eq. (6).

(5)$${ \mathrm{Sum}\left( \mathrm{Y} \right) =\sum_{\mathrm{n}=1}^{\mathrm{L}_{\min}}{\left| \mathrm{Y}_{\mathrm{n}} \right|}}$$

(6)$${ \mathrm{APE}_{\mathrm{wave}}=\frac{1}{\mathrm{Sum}\left( \mathrm{Y} \right)}\sum_{\mathrm{n}=1}^{\mathrm{L}_{\min}}{\left| \mathrm{Y}_{\mathrm{n}}-\mathrm{P}_{\mathrm{n}} \right|}}$$

where $\mathrm {Y}_{\mathrm {n}}$ and $\mathrm {P}_{\mathrm {n}}$ are the $\mathrm {n}$th value of the reference ACC signal and transformed ACC signal respectively, $\mathrm {L}_{\min }$ is the minimum length of the reference ACC signal and transformed ACC signal.

3.2.5 Error of BP estimation

The mean absolute error (MAE) and standard deviation (STD) of error are calculated to evaluate the influence of different iPPG pulse extraction method for BP estimation.

3.3 Comparison with cPPG reference signals

A set of examples demonstrating the effectiveness of the proposed method under different levels of interference are provided (as shown in Fig. 8). Our method exhibits accurate fitting of the shape of the reference signal when provided with a high-quality iPPG pulse signal (Fig. 8(a)). Furthermore, even in the presence of noise interference in the input signal, our method demonstrates stability to a certain extent and successfully captures the essential shape information of the reference signal (Fig. 8(b)-(d)).

Fig. 8. Several transformed ACC signals under different levels of interference. Reference signals are the ACC of cPPG.

Download Full Size | PDF

The distribution of $\mathrm {APE}_{\mathrm {len}}$ and $\mathrm {APE}_{\mathrm {wave}}$ are shown in Fig. 9. Most of the $\mathrm {APE}_{\mathrm {len}}$ and $\mathrm {APE}_{\mathrm {wave}}$ are less than 0.1, with proportions of 95.9% and 76.8%, respectively. The mean of $\mathrm {APE}_{\mathrm {len}}$ and $\mathrm {APE}_{\mathrm {wave}}$ are 0.0380 and 0.0889, and the standard deviation of them are 0.0254 and 0.0451.

Fig. 9. Distributions of $\mathrm {APE}_{\mathrm {len}}$ (a) and $\mathrm {APE}_{\mathrm {wave}}$ (b)

Download Full Size | PDF

Since $\mathrm {APE}_{\mathrm {len}}$ is directly related to HR, the error of HR is also calculated here. The mean absolute error and the root mean square error of HR are 3.79 and 7.51 respectively. The proportion of absolute HR errors less than 3 is 0.525, less than 5 is 0.807, and less than 10 is 0.953.

The PCC results and the scatter diagrams of the pulse features are presented in Fig. 10. Out of the 24 features, 18 exhibit a PCC greater than 0.7, with AS and DT achieving a PCC greater than 0.9. However, it is worth noting that the PCC results for certain pulse width features are not satisfactory. The reason lies in the fact that the pulse width features are more sensitive to changes in the shape of pulse signals, and even slight fluctuations can have a significant impact.

Fig. 10. PCC results and scatter diagrams of the 24 pulse features extracted from our method and the referenced cPPG signal.

Download Full Size | PDF

Overall, our method is capable of successfully capturing valuable shape information of reference pulse signals and achieving satisfactory results in most pulse features. However, further improvement is needed in certain features that require higher precision.

3.4 Comparison with other iPPG pulse extraction methods

In this comparison, we first test whether our method enhances the correlation between pulse features and BP. Four methods that can extract iPPG pulse signals are selected for this test. The four methods are Bior-Wavelet [28], PulseGAN [29], SA-SSA [27], and LGI [26] in order.

In this test, MIC_SBP and MIC_DBP are calculated for each method. Results are shown in Table 1 which indicate that, compared to other methods, the majority of pulse features calculated by our method show significant improvement in both MIC_SBP and MIC_DBP. Although PulseGAN also employs cPPG signals as the reference signal for pulse transformation, the enhancement in the correlation between pulse features and BP is not significant as ours. This indicates that the optimization of our method for the pulse transformation is necessary.

Table 1. MIC results of 24 pulse features for different iPPG pulse extraction methods

View Table | View all tables in this article

We also test whether the proposed method leads to an improvement in the performance of BP estimation (Table 2 and Table 3). Six BP estimation models using pulse signals are selected. These models are SVR [16], FCN [18], LSTM [38], AlexNet [39], ResNet50 [39], and Transformer [40] respectively. SVR, FCN, and LSTM take pulse features as input, while AlexNet, ResNet50, and Transformer take a segment of the pulse signal as input. In this study, the experimental data is divided into segments of 20 seconds. The average cardiac cycle obtained by our method are copied and spliced, and the length of the spliced signal are consistent with the duration of the experimental data segment. The spliced signal undergoes the same processing as signals obtained from other iPPG pulse extraction methods. Pulse features required for SVR, FCN, and LSTM are extracted from these signals, or these signals are directly used as input for AlexNet, ResNet50, and Transformer. When extracting pulse features, the pulse feature other than HR is the mean of this feature for all cardiac cycles. HR is calculated by Eq. (7). $\mathrm {F}_{\max }$ is the frequency with maximum amplitude in the Fourier transform of the pulse signal.

(7)$$\mathrm{HR}=60\cdot \mathrm{F}_{\max}$$

Table 2. Errors of SBP with different iPPG pulse extraction methods

View Table | View all tables in this article

Table 3. Errors of DBP with different iPPG pulse extraction methods

View Table | View all tables in this article

Test results show that, compared to other iPPG pulse extraction methods, our method effectively reduces the error of both SBP and DBP estimation. We also take the results of the Transformer [40] model as an example of a more comprehensive assessment using the British Hypertension Society (BHS) grading criteria. According to Table 4, our method shows obvious improvements in all error magnitudes.

Table 4. Comparison width BHS Standard (Transformer [40] model)

View Table | View all tables in this article

However, the above BP estimation methods generally performed poorly in the dataset of this paper, and even the best-performing Transformer [40] model failed to meet the AAMI/ESH standard (MAE$\leqslant$5mmHg, STD$\leqslant$8mmHg). For the BHS standard, only the accuracy of DBP can barely close to grade C, failing to meet the medical device standard. As BP varies greatly among the elderly, it is necessary to optimize BP estimation methods for these group in further study. Another thing to note is that, although AA, DA, RAA, and RDA are features related to pulse area, RAA and RDA perform significantly better than AA and DA. The main reason is that the normalization in data preprocessing loses the pulse height information, which in turn affects the area feature. On the other hand, RAA and RDA are ratio features, which offset the effect of pulse height in the calculation process. Feature RI, which is the ratio of DA to AA, also gains significant improvements in this way.

3.5 Ablation experiments

The ablation experiments are conducted to further evaluate the influence of different factors. Four factors are individually tested as follows.

• Without Self-Attention: The self-attention layer in each MSSA block is removed.
• Without Muti-Scale: The multi-scale convolution layer in each MSSA block is changed to a single-scale convolution layer with only $1\times 3$ filter size.
• Standard MSE: The loss function is reset to the standard MSE without weight factor.
• Whole cPPG: The label of MSSA model is replaced by cPPG pulse signals for the entire duration of 20 seconds, which are made by concatenating cPPG ACC signals (200 Hz) to create a data length of 4000. The output size of MSSA model is modified to (4000,) accordingly.

We have tested the effect of these four factors on pulse features and BP estimation respectively. The mean value of PCC and MIC are utilized to provide a concise representation of the influence on pulse features (shown in Table 5). The error of SBP and DBP are also calculated to assess the influence of these factors on BP estimation. Here we choose Transformer [40] as the estimation model (shown in Table 6). In the test, baseline is our proposed method. According to Table 5 and Table 6, the absence of any key factor in our method leads to a significant influence. Factor with the greatest impact is the absence of multi-scale features, as features extracted by a single-scale convolutional layer struggle to fully reflect the shape information of the pulse signal. Self-attention mechanism and the weighted MSE are also important for the performance of our method.

Table 5. Ablation experiment 1: influence on the 24 pulse features

View Table | View all tables in this article

Table 6. Ablation experiment 2: influence on the BP estimation (Transformer [40] model)

View Table | View all tables in this article

The experiment of Whole cPPG illustrates that, for the transformation of pulse signals, it is necessary to use the ACC as the reference signal. An example of the Whole cPPG result is presented in Fig. 11. It can be observed that it is difficult to fit the waveform shapes of all cardiac cycles. Even though the main shapes of some cardiac cycles are successfully fitted, there are still many cycles with poor fitting results. Selecting well-fitting cardiac cycles for BP estimation is challenging and unnecessary. It is a better choice to set the ACC signal as the reference of pulse transformation.

Fig. 11. A result of the Whole cPPG experiment. Reference signal is the signal spliced by cPPG ACC. It is difficult to fit the shape of all cardiac cycles

Download Full Size | PDF

3.6 Discussion

It is important to note that the method proposed in this paper relies on the premise that both human BP and pulse signals remain stable during a short resting period. However, this assumption may not be hold for subjects in non-resting states or individuals with acute medical conditions. Furthermore, our transformation method lacks the phase information of original pulse signals, rendering the transformed pulse signal unsuitable for the calculation of PTT. The method in this paper loses the HRV information of the input iPPG signal. However, the signal duration required for HRV analysis is usually more than 5 minutes [41], and the significance of HRV at the study duration (20s) in this paper is more limited. In addition the heart rate extracted from the mean cardiac cycle can be regarded as an average downsampling of the heart rate in the time period where the input signal is located. For long duration signals, the method in this paper can still reflect the trend of HRV changes.

Another thing to note is that both iPPG and cPPG pulse signals are normalized in the data preprocessing process. The normalization process is necessary because the ratio of pulse height between iPPG and cPPG pulse signals varies from person to person. The main reason for this is that the reflectivity of light from the skin of the face and fingers varies from different people. This phenomenon is particularly evident in the elderly population. A typical example is shown in Fig. 12, iPPG pulse heights of these two individuals are not significantly different, but there is a noticeable difference in the cPPG pulse heights. The process of normalization causes loss of pulse height information, and pulse features related to pulse height will be directly affected.

Fig. 12. A sample illustrating the differences between individuals in the ratio of iPPG and cPPG pulse height.

Download Full Size | PDF

This paper utilizes multi-scale convolution and self-attention mechanisms to mitigate the impact of noise interference, resulting in a certain level of effectiveness. Nevertheless, it is essential to acknowledge that our proposed method does not dismiss the influence of signal quality. In situations where the quality of the input iPPG signal is extremely poor, there is a probability of experiencing model degradation, leading to a substantial deviation between the transformed and reference ACC signal (as shown in Fig. 13). We have explored alternative methods for the extraction of iPPG pulse signals. But these attempts did not yield improved performance. The main reason is the pronounced skin aging in some of the subjects, which leads to excessively weak facial pulse signals that are challenging to extract effective pulse signals.

Fig. 13. An example of model degradation with an extremely low-quality input signal. Reference signal is the ACC of cPPG.

Download Full Size | PDF

4. Conclusion

A method to preserve shape details of pulse signals for video-based BP estimation is provided in this paper. The quality of BP estimation , which is sensitive to the shape distortion, is significantly improved by transforming iPPG signals into the corresponding cPPG ACC signals. Our method effectively reduce the difficulty of signal transformation from iPPG to cPPG pulse signals which can lead to lower shape errors. We have developed a neural network called MSSA for this transformation, which effectively reduces the shape error of transformed signals and the quality of pulse features has also been improved significantly. Our method also provides an effective solution for other tasks that require high quality of iPPG pulse shape.

Funding

National Natural Science Foundation of China (62271186); Hefei University of Technology-NIO Innovation Institute (W2022JSKF1120); Research on Key Technology of Driver Heart rate Monitoring based on Visual Perception (GXXT-2023-004).

Disclosures

The authors declare that there are no conflicts of interest related to this article.

Data availability

The data cannot be shared due to ethical policy, but can be made available on reasonable request.

References

1. S. E. Kjeldsen, “Hypertension and cardiovascular risk: general aspects,” Pharmacol. Res. 129, 95–99 (2018). [CrossRef]

2. K. T. Mills, A. Stefanescu, and J. He, “The global epidemiology of hypertension,” Nat. Rev. Nephrol. 16(4), 223–237 (2020). [CrossRef]

3. M. Elgendi, R. Fletcher, and Y. Liang, “The use of photoplethysmography for assessing hypertension,” NPJ Digit. Med. 2(1), 60 (2019). [CrossRef]

4. P.-K. Man, K.-L. Cheung, and N. Sangsiri, “Blood pressure measurement: From cuff-based to contactless monitoring,” Healthcare 10(10), 2113 (2022). [CrossRef]

5. R. Mukkamala, J. O. Hahn, and O. T. Inan, “Toward ubiquitous blood pressure monitoring via pulse transit time: theory and practice,” IEEE Trans. Biomed. Eng. 62(8), 1879–1901 (2015). [CrossRef]

6. Z. Liu, C. Zhou, H. Wang, et al., “Blood pressure monitoring techniques in the natural state of multi-scenes: a review,” Front. Med. 9, 851172 (2022). [CrossRef]

7. Y. Yoon, J. H. Cho, and G. Yoon, “Non-constrained blood pressure monitoring using ECG and PPG for personal healthcare,” J. Med. Syst. 33(4), 261–266 (2009). [CrossRef]

8. X. He, R. A. Goubran, and X. P. Liu, “Secondary peak detection of PPG signal for continuous cuffless arterial blood pressure measurement,” IEEE Trans. Instrum. Meas. 63(6), 1431–1439 (2014). [CrossRef]

9. R. Lazazzera, Y. Belhaj, and G. Carrault, “A new wearable device for blood pressure estimation using photoplethysmogram,” Sensors 19(11), 2557 (2019). [CrossRef]

10. I. C. Jeong and J. Finkelstein, “Introducing contactless blood pressure assessment using a high speed video camera,” J. Med. Syst. 40(4), 77 (2016). [CrossRef]

11. X. J. Fan, Q. L. Ye, X. B. Yang, et al., “Robust blood pressure estimation using an RGB camera,” J. Ambient. Intell. Humaniz. Comput. 11(11), 4329–4336 (2020). [CrossRef]

12. N. Sugita, M. Yoshizawa, and M. Abe, “Contactless technique for measuring blood-pressure variability from one region in video plethysmography,” J. Med. Biol. Eng. 39(1), 76–85 (2019). [CrossRef]

13. R. Takahashi, K. Ogawa-Ochiai, and N. Tsumura, “Non-contact method of blood pressure estimation using only facial video,” Artif Life Robot. 25(3), 343–350 (2020). [CrossRef]

14. X. Ding, W. Wang, Y. Chen, et al., “Feasibility study of pulse width at half amplitude of camera PPG for contactless blood pressure estimation,” in 2021 43rd Annual International Conference of the IEEE Engineering in Medicine Biology Society (EMBC), (2021), pp. 365–368.

15. D. Djeldjli, F. Bousefsaf, and C. Maaoui, “Remote estimation of pulse wave features related to arterial stiffness and blood pressure using a camera,” Biomed. Signal Process. Control. 64, 102242 (2021). [CrossRef]

16. M. Rong and K. Y. Li, “A blood pressure prediction method based on imaging photoplethysmography in combination with machine learning,” Biomed. Signal Process. Control. 64, 102328 (2021). [CrossRef]

17. M. M. Padilla, E. J. Berjano, J. Saiz, et al., “Assessment of relationships between blood pressure, pulse wave velocity and digital volume pulse,” in in Proc. Comput. Cardiol., (2006), pp. 893–896.

18. C. G. Viejo, S. Fuentes, D. D. Torrico, et al., “Non-contact heart rate and blood pressure estimations from video analysis and machine learning modelling applied to food sensory responses: A case study for chocolate,” Sensors 18(6), 1802 (2018). [CrossRef]

19. Y. M. Zhou, H. Ni, Z. Qi, et al., “The noninvasive blood pressure measurement based on facial images processing,” IEEE Sens. J. 19(22), 10624–10634 (2019). [CrossRef]

20. H. Luo, D. Y. Yang, and A. Barszczyk, “Smartphone-based blood pressure measurement using transdermal optical imaging technology,” Circ Cardiovasc. Imaging 12(8), e008857 (2019). [CrossRef]

21. B.-J. Wu, B.-F. Wu, and C.-P. Hsu, “Camera-based blood pressure estimation via windkessel model and waveform features,” IEEE Trans Instrum Meas 72, 1–13 (2023). [CrossRef]

22. M.-Z. Poh, D. McDuff, and R. W. Picard, “Non-contact, automated cardiac pulse measurements using video imaging and blind source separation,” Opt. Express 18(10), 10762–10774 (2010). [CrossRef]

23. G. de Haan and V. Jeanne, “Robust pulse-rate from chrominance-based rPPG,” IEEE Trans. Biomed. Eng. 60(10), 2878–2886 (2013). [CrossRef]

24. G. Haan and A. Leest, “Improved motion robustness of remote-PPG by using the blood volume pulse signature,” Physiol. Meas. 35(9), 1913–1926 (2014). [CrossRef]

25. W. Wang, S. Stuijk, and G. de Haan, “A novel algorithm for remote photoplethysmography: Spatial subspace rotation,” IEEE Trans. Biomed. Eng. 63(9), 1974–1984 (2016). [CrossRef]

26. C. S. Pilz, S. Zaunseder, J. Krajewski, et al., “Local group invariance for heart rate estimation from face videos in the wild,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2018), pp. 1335–13358.

27. D. Wang, X. Yang, and X. Liu, “Detail-preserving pulse wave extraction from facial videos using consumer-level camera,” Biomed. Opt. Express 11(4), 1876–1891 (2020). [CrossRef]

28. Y. Tong, Z. Huang, and Z. Zhang, “Detail-preserving arterial pulse wave measurement based biorthogonal wavelet decomposition from remote rgb observations,” Measurement 222, 113605 (2023). [CrossRef]

29. R. Song, H. Chen, and J. Cheng, “Pulsegan: Learning to generate realistic pulse waveforms in remote photoplethysmography,” IEEE J Biomed Health Inform 25(5), 1373–1384 (2021). [CrossRef]

30. M. P. Tarvainen, P. O. Ranta-aho, and P. A. Karjalainen, “An advanced detrending method with application to hrv analysis,” IEEE Trans. Biomed Eng. 49(2), 172–175 (2002). [CrossRef]

31. X. Teng and Y. Zhang, “Continuous and noninvasive estimation of arterial blood pressure using a photoplethysmographic approach,” in Proceedings of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (IEEE Cat. No.03CH37439), vol. 4 (2003), pp. 3153–3156 Vol.4.

32. J. Weng, Z. Ye, and J. Weng, “An improved pre-processing approach for photoplethysmographic signal,” in 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference (2005), pp. 41–44.

33. E. Seitsonen, I. Korhonen, and M. J. van Gils, “EEG spectral entropy, heart rate, photoplethysmography and motor responses to skin incision during sevoflurane anaesthesia,” Acta Anaesthesiol. Scand. 49(3), 284–292 (2005). [CrossRef]

34. H. F. Yang, Q. Zhou, and J. Xiao, “Relationship between vascular elasticity and human pulse waveform based on FFT analysis of pulse waveform with different age,” in ICBBE 2009. 3rd International, (2009).

35. D. G. Brillante, A. J. O’Sullivan, and L. G. Howes, “Arterial stiffness indices in healthy volunteers using non-invasive digital photoplethysmography,” Blood Press. 17(2), 116–123 (2008). [CrossRef]

36. J. Kaiser, “On a simple algorithm to calculate the ‘energy’ of a signal,” in International Conference on Acoustics, Speech, and Signal Processing (1990), pp. 381–384 vol.1.

37. D. N. Reshef, Y. A. Reshef, and H. K. Finucane, “Detecting novel associations in large data sets,” Science 334(6062), 1518–1524 (2011). [CrossRef]

38. D. Wang, X. Yang, and X. Liu, “Photoplethysmography-based blood pressure estimation combining filter-wrapper collaborated feature selection with lasso-lstm model,” IEEE Trans. Instrum. Meas. 70, 1–14 (2021). [CrossRef]

39. F. Schrumpf, P. Frenzel, C. Aust, et al., “Assessment of deep learning based blood pressure prediction from PPG and rPPG signals,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2021), pp. 3815–3825.

40. Y. Chu, K. Tang, and Y.-C. Hsu, “Non-invasive arterial blood pressure measurement and spo₂ estimation using PPG signal: a deep learning framework,” BMC Medical Inform. Decis. Mak. 23(1), 131 (2023). [CrossRef]

41. A. Plaza-Florido, J. Sacha, and J. M. A. Alcantara, “Short-term heart rate variability in resting conditions: methodological considerations,” Kardiol. Pol. 79(7-8), 745–755 (2021). [CrossRef]

Features	MIC_SBP					MIC_DBP
Features	Ours	Bior-Wavelet	PulseGAN	SA-SSA	LGI	Ours	Bior-Wavelet	PulseGAN	SA-SSA	LGI
${A W}_{10}$	0.502	0.234	0.226	0.168	0.202	0.456	0.207	0.211	0.162	0.191
${A W}_{25}$	0.533	0.257	0.255	0.197	0.183	0.450	0.259	0.209	0.174	0.180
${A W}_{50}$	0.565	0.258	0.248	0.185	0.184	0.519	0.233	0.206	0.183	0.169
${A W}_{75}$	0.558	0.239	0.263	0.192	0.175	0.506	0.234	0.229	0.187	0.183
${A W}_{90}$	0.524	0.226	0.287	0.170	0.180	0.491	0.239	0.236	0.174	0.175
${D W}_{10}$	0.409	0.269	0.246	0.193	0.179	0.553	0.218	0.202	0.188	0.167
${D W}_{25}$	0.500	0.256	0.178	0.180	0.217	0.554	0.246	0.204	0.186	0.180
${D W}_{50}$	0.514	0.221	0.375	0.185	0.177	0.589	0.269	0.221	0.196	0.172
${D W}_{75}$	0.429	0.198	0.322	0.179	0.175	0.495	0.220	0.187	0.169	0.171
${D W}_{90}$	0.410	0.229	0.310	0.198	0.190	0.525	0.231	0.199	0.202	0.186
AS	0.453	0.246	0.217	0.187	0.290	0.436	0.213	0.209	0.204	0.189
DS	0.369	0.242	0.218	0.213	0.218	0.408	0.227	0.228	0.185	0.195
AT	0.414	0.189	0.251	0.197	0.185	0.457	0.201	0.195	0.185	0.180
DT	0.409	0.219	0.211	0.190	0.191	0.531	0.190	0.192	0.203	0.173
AA	0.226	0.235	0.286	0.181	0.215	0.221	0.270	0.244	0.192	0.183
DA	0.250	0.227	0.228	0.186	0.230	0.284	0.232	0.207	0.191	0.176
RAA	0.422	0.240	0.264	0.205	0.181	0.483	0.225	0.191	0.164	0.161
RDA	0.434	0.225	0.197	0.178	0.187	0.534	0.241	0.201	0.182	0.182
Kurt	0.314	0.246	0.214	0.175	0.201	0.505	0.242	0.206	0.191	0.173
Skew	0.412	0.238	0.284	0.168	0.188	0.562	0.229	0.263	0.170	0.185
K	0.321	0.236	0.190	0.180	0.171	0.647	0.233	0.192	0.166	0.167
RI	0.486	0.260	0.256	0.178	0.193	0.577	0.263	0.213	0.175	0.193
KTE	0.304	0.244	0.203	0.173	0.250	0.386	0.241	0.195	0.189	0.194
HR	0.285	0.159	0.188	0.163	0.188	0.458	0.126	0.198	0.184	0.168

BP Models	SBP Error (mmHg)
	Ours		Bior-Wavelet		PulseGAN		SA-SSA		LGI
	MAE	STD	MAE	STD	MAE	STD	MAE	STD	MAE	STD
SVR	13.44	16.56	15.83	19.91	16.46	20.26	17.36	20.44	16.58	20.41
FCN	15.99	19.77	18.57	22.90	19.64	22.21	19.39	22.30	18.66	22.53
LSTM	12.36	15.05	16.09	19.59	17.18	19.60	15.53	18.90	17.41	21.32
AlexNet	14.11	16.50	17.99	22.91	16.78	18.69	18.97	22.71	17.55	22.31
ResNet50	13.88	17.36	15.53	19.04	15.81	19.88	19.07	22.78	16.61	19.68
Transformer	11.18	13.43	13.41	15.67	14.56	17.15	17.83	19.76	18.52	21.05

BP Models	DBP Error (mmHg)
	Ours		Bior-Wavelet		PulseGAN		SA-SSA		LGI
	MAE	STD	MAE	STD	MAE	STD	MAE	STD	MAE	STD
SVR	9.51	12.25	10.11	12.98	10.49	13.53	10.73	13.76	11.01	14.15
FCN	10.09	12.06	11.06	13.82	11.79	13.06	11.23	12.57	12.01	13.93
LSTM	8.84	11.40	9.64	12.35	10.95	13.29	9.78	12.02	10.68	13.58
AlexNet	9.90	12.88	10.97	12.69	10.77	12.32	11.09	13.81	11.68	13.09
ResNet50	10.03	12.72	10.85	13.33	10.53	13.50	12.58	14.01	10.81	13.52
Transformer	8.37	10.08	9.57	11.87	10.45	12.56	11.46	14.02	11.18	13.79

		Percentage Error (%)
		$⩽$ 5mmHg	$⩽$ 10mmHg	$⩽$ 15mmHg
BHS Standard	Grade A	60%	85%	95%
	Grade B	50%	75%	90%
	Grade C	40%	65%	85%
Our	SBP	24.49%	53.90%	72.75%
Our	DBP	36.85%	69.27%	86.07%
Bior-Wavelet	SBP	17.17%	36.97%	57.74%
Bior-Wavelet	DBP	23.41%	54.50%	84.15%
PulseGAN	SBP	18.49%	37.58%	54.98%
PulseGAN	DBP	24.97%	51.14%	75.99%
SA-SSA	SBP	16.33%	33.73%	48.14%
SA-SSA	DBP	27.49%	50.54%	72.15%
LGI	SBP	11.52%	24.37%	42.62%
LGI	DBP	24.13%	51.38%	72.15%

		Mean MIC
Experiment	Mean PCC	SBP	DBP
Baseline	0.748	0.418	0.485
Without Self-Attention	0.430	0.211	0.243
Without Multi-Scale	0.178	0.174	0.192
Standard MSE	0.487	0.272	0.317
Whole cPPG	0.239	0.266	0.303

Preserving shape details of pulse signals for video-based blood pressure estimation

Abstract

1. Introduction

2. Method

2.1 System overview

2.2 Experimental dataset

2.3 Data preprocessing

2.3.1 Video data preprocessing

2.3.2 cPPG data preprocessing

2.4 Model construction

2.4.1 Multi-scale self-attention block

2.4.2 Loss function

3. Experiments

3.1 Setup

3.2 Evaluation metrics

3.2.1 Pulse features

3.2.2 Pearson correlation coefficient

3.2.3 Maximal information coefficient

3.2.4 Error of average cardiac cycle signal

3.2.5 Error of BP estimation

3.3 Comparison with cPPG reference signals

3.4 Comparison with other iPPG pulse extraction methods

3.5 Ablation experiments

3.6 Discussion

4. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (13)

Tables (6)

Equations (7)

Biomedical Optics Express