R<sup>3</sup>-DICnet: an end-to-end recursive residual refinement DIC network for larger deformation measurement

Jiashuai Yang; Kemao Qian; Kemao Qian; Lianpo Wang; Lianpo Wang; Lianpo Wang

doi:10.1364/OE.505655

1. Introduction

Digital image correlation (DIC) is a non-contact and full-field optical technique for deformation measurement [1,2]. Over the past four decades, DIC has been widely used in many fields, such as experimental mechanics [3,4], biomechanics/cell mechanics [5,6] and structural health monitoring [7,8]. The traditional DIC first acquires digital images before and after the deformation of an object as the reference image and the deformed images, respectively. Then points of interest (POIs) and their respective subsets are defined in the reference image. The surface deformation of the object is obtained by matching each POI in the deformed image. To achieve accurate and robust matching, the DIC is divided into two main steps, i.e., initial value guess [9–11] and then sub-pixel iterations based on the initial guess using the classical Newton-Raphson method [12] or the inverse compositional Gauss-Newton (IC-GN) method [13].

The wide applicability of the traditional DIC is mainly due to the simplicity in the hardware setup and the high accuracy that it can achieve. However, the traditional DIC has the following three challenges: (i) The subset size and shape function are difficult to choose [14,15], which compromises accuracy for deformation measurements; (ii) Deformation accuracy is still unsatisfactory under high-gradient deformation cases [16,17], large deformation and discontinuous deformation cases [18]; (iii) Slow computation speed due to the iterative optimization process. GPU parallel processing of POIs has been proposed to speed up the whole process and reached a rate of about ${10^5}$ POI/s [19,20], which corresponds to about 300 × 300 POI/s. If we pursue real-time computation (24 frames per second), the computational capability is further reduced to 65 × 65 POIs, which is far from desirable.

Recently, deep learning has been introduced to DIC to solve the above problems. Inspired by Flownet [21] in the field of optical flow estimation, Boukhtache et al [22] proposed a pixel-level end-to-end StrainNet specifically for measuring sub-pixel deformation less than one pixel. A Speckle Dataset containing different deformation frequencies was generated for training. StrainNet is inferior to traditional methods for low-frequency deformation measurements, but superior to them for high-frequency deformation. It can reach speeds of ${10^7}$ POI/s, which is unmatched by traditional methods. Recently, they simplified the StrainNet by reducing the model parameters from 38.68 million to 0.67 million [23], but the measurement range of the displacement remains the same. To improve the accuracy, Wang et al [24] created a more realistic speckle dataset using higher-order Hermite elements and introduced residual blocks [25] in the U-Net network [26]. Despite the improved accuracy, it is only for small deformation objects with a deformation less than 2 pixels. To increase the measurement range, Yang [27] et al. proposed a strain network, a displacement network, and a large displacement dataset (up to 8 pixels). They achieved satisfactory results and demonstrated that deep learning based DIC has an advantage at deformation discontinuities compared to the traditional methods. Lan et al [28] also used U-Net to measure complex deformation fields, and their method was able to measure deformations up to 8 pixels and achieved good results in high frequency deformation measurements. In summary, the current deep learning based DIC methods have demonstrated exciting performances in complex deformation measurements due to their strong learning capability, high convenience without parameter setting after the network has been pre-trained, and have fast computational speeds due to their end-to-end nature. However, shortcomings are also noticed: (i) although the current methods have achieved good results at high-frequency and discontinuous positions, the measurement range of the current methods is limited to within 8 pixels, which is far from sufficient in practical applications; (ii) although some methods increased their measurement range by a proper dataset, the large difference in dataset distribution makes a simple U-Net structure unsatisfactory when dealing small deformation measurements.

Motivated by the interesting advantages of the deep-learning based DIC methods, this paper further improves these methods to obtain better results regarding the accuracy for both small and large deformations, i.e., with a large measurement range. To achieve this goal, we propose a recursive iterative residual refinement DIC network (R³-DICnet). Mimicking the two-step optimization idea in the traditional DIC methods, i.e., initial value estimation and sub-pixel iteration, R³-DICnet performs initial value estimation by using deep features, and iterative refinement using shallow features, which makes it possible to accurately measure both small and large deformations. In addition, Speckle datasets containing small and large deformations with a maximum deformation of 30 pixels were generated for training and validation. Our R³-DICnet achieved following deformations measurement capability: (i) R³-DICnet is comparable to Augmented Lagrangian Digital Image Correlation (ALDIC) [29] in terms of accuracy for small deformation measurements, while large deformation measurements outperform ALDIC. (ii) For the evaluation of different frequency deformations, R³-DICnet achieves a spatial resolution (SR) of 15.95 pixels on the Star1 sample of the DIC Challenge [30] that outperforms StrainNet-f and ALDIC, especially for high frequency deformations. (iii) For large deformation and discontinuous deformation, existing deep learning based DIC methods are unable to measure due to the limited measurement range. However, R³-DICnet outperforms ALDIC, which is evaluated through Sample 15 P200_K250 [31]. (iv) Overall, comparing with other deep learning based DIC methods, R³-DICnet pushes measurement accuracy higher and measurement range larger, while keeps the same order of magnitude of the computation speed. Compared with traditional methods, R³-DICnet shows the advantages in complex deformation measurements without parameter tuning and much faster computational speed. The superior performance of R³-DICnet is promising for large-scale data processing and complex deformation measurement tasks, which will have a positive impact on many application areas such as materials science and engineering.

The rest of this paper is organized as follows. In Sec. 2, the main structure of R³-DICnet, especially the design idea of each component, is introduced in detail; the dataset generation method and network training details are described. In Sec. 3, the superior performance of the proposed network on synthetic and real images is demonstrated. Sec. 4 draws the conclusion.

2. Methodology

2.1 Network architectures design

Similar to the traditional DIC, our network has an initial value estimation module (I-module) and recursive subpixel iteration module (R-module). In addition, both modules rely on features from the feature extraction module (F-module). The F-I-R three-module structure is shown in Fig. 1. It is worth mentioning that although R³-DICnet has been divided into three modules, it is functioning in an end-to-end manner with high convenience. The implementation details of each module of R³-DICnet are introduced below.

Fig. 1. Network structure of R³-DICnet. Given an image pair (${I_1}$ and ${I_2}$), the feature encoder and the context encoder constitute the feature extraction module (F-module), which can extract the image features ($\mathrm{{\cal F}}_k^1$, $\mathrm{{\cal F}}_k^2$ in red) and the deformation features (${\mathrm{{\cal M}}_k}$ in pink), where $k \in \{{1,2,3,4} \}$; The second stage performs the initial value estimation (I-module), where D (blue) represents the deformation analyzer and U (green) represents the update block; The third stage is the iterative refinement module (R-module), which calculates the deformation increments in a recursive manner and accumulates them to obtain the final deformation field.

Download Full Size | PDF

2.1.1 F-module

The input ${I_1},{I_2} \in {\mathrm{\mathbb{R}}^{H \times W \times C}}$ are the reference image and the deformed image, where $H \times W$ and C are the image size and number of channels of the input images, respectively. Similar to the backbone network of RAFT [32], the F-module has two main components: the feature encoder and the context encoder. The feature encoder uses a pyramid structure to transform the input images ${I_1}$ and ${I_2}$ into a 4-level pyramid of multi-scale feature $\mathrm{{\cal F}}_k^1,\mathrm{{\cal F}}_k^2\; $($k \in \{{1,2,3,4} \}$). Each pyramid level uses a multilevel residual block that down-samples the features by a factor of 2 by adjusting the stride size and convolution kernel size. From the first to the fourth levels, the number of feature channels are 32, 64, 128 and 256, and scales are 1/2, 1/4, 1/8 and 1/16 of the original image, respectively.

The context encoder is used to extract the deformation features ${\mathrm{{\cal M}}_k}$ at different scales between the reference image and the deformed image. The context encoder has exactly the same network structure as the feature encoder, except that the context encoder concatenates the reference and the deformed images in the C dimension for processes them simultaneously and extracts mixed features, which is different from the feature encoder where each image is processed independently.

2.1.2 I-module

The I-module (blue dashed box) is used to provide an initial value for the R-module. Similar to the multi-layer pyramid search strategy of traditional DIC methods [33], the I-module in this paper also calculates object deformation from low resolution to high resolution, which can more robustly measure large deformation. The I-module works at pyramid levels 4 and 3 with two sub-models, i.e, a deformation analyzer and an update sub-module. The details of the Level-4 computation in this module are as follows:

(A1) the zero-deformation field ${\hat{d}_0}$ is used to initialize ${d_{old}}$;

(A2) $\mathrm{{\cal F}}_4^2$ is warped towards $\mathrm{{\cal F}}_4^1$ via ${d_{old}}$ to obtain $\mathrm{\tilde{{\cal F}}}_4^2$. Unlike traditional methods, warping is performed on high-level CNN features instead of on images [34].

(A3) Calculate the correlation between $\mathrm{\tilde{{\cal F}}}_4^2$ and $\mathrm{{\cal F}}_4^1$ by constructing the cost volume [21].

(A4) The cost volume, $\mathrm{{\cal F}}_4^1$, $\mathrm{\tilde{{\cal F}}}_4^2$, $\mathrm{\tilde{{\cal F}}}_4^2 - \mathrm{{\cal F}}_4^1$, and ${d_{old}}$ are feature mapped using convolution, respectively. Then all the features are concatenated on the C-channel, and finally the concatenated features are feature mapped using the convolutional layer to obtain the incremental feature ${\cal F}^\Delta$.

The ${\cal F}^\Delta$ describes the feature-space distance and the matching costs between image features by the action of the current deformation field. The ${\cal F}^\Delta$ obtained by the deformation analyzer and the deformation features ${\mathrm{{\cal M}}_4}$ extracted by the context network are fed to the update module to complete the update of ${d_{old}}$ .The gate recurrent unit (GRU) and up-sampling module defined in RAFT [32] are used in the update module. The GRU determines whether information needs to be retained or discarded. The up-sampling module considers each pixel of the high-resolution deformation field as a convex combination of its low-resolution 9 neighbors and performs fine up-sampling of the deformation field by predicting the convex combination weights. Specifically,

(D1) Similar to the update module in RAFT, ${\mathrm{{\cal M}}_4}$ will be divided equally into $\mathrm{{\cal M}}_4^1$ and $\mathrm{{\cal M}}_4^2$ along the C channel. The input to the GRU has two parts: one is $\mathrm{{\cal M}}_4^1$ and the other is a concatenation of $\mathrm{{\cal M}}_4^2$ and ${\cal F}^\Delta$.

(D2) The output obtained from GRU passes through two branches to predict the weights of the convex combination (Mask) and the deformation field ${d_{new}}$, respectively.

(D3) The updated deformation ${d_{new}}$ is up-sampled by a factor of 2 using Mask to obtain ${\hat{d}_1}$.

In addition, the same operation will be performed once more at the third pyramid level, and the ${\hat{d}_2}$ obtained by updating ${\hat{d}_1}$ will be used as the initial value of the iterative refinement module.

2.1.3 R-module

Similar to subpixel iteration in the IC-GN method, the R-module iteratively calculates deformation increments and accumulates them to achieve subpixel measurement accuracy. Like the I-module, the R-module includes a deformation analyzer and an update block. The difference is that the update block is used to estimate the deformation increment instead of the deformation. The iterative refinement module is executed N times, where N is predefined, typically between 5 and 10. Specifically, the deformation analyzer and update block work at level 2 of the pyramid to update the ${\hat{d}_2}$ obtained in the previous stage to get deformation increment $\varDelta {d_1}$, and then define $d_1^m$=${\hat{d}_2}$+ $\Delta {d_1}$. $d_1^m$ is up-sampled by a factor of 4 to get the final ${d_1}$ while $d_1^m$ will continue to be fed to the deformation analyzer to complete the iteration. Finally, a series of deformation fields $\{{{d_1}, \cdots ,{d_N}} \}$ will be obtained by iteration. Our iterative optimization method avoids the Hessian matrix calculation in IC-GN method, thus greatly improving the calculation speed. It is noted that the R-module is recursively called, and the number of model parameters is determined and does not increase with the number of executions, which significantly reduces the number of model parameters. Compared to StrainNet-f ‘s 38.68 million parameters, R³-DICnet has only 8.77 million parameters.

2.2 Dataset generation

2.2.1 Existing datasets

Unlike the traditional DIC methods, the deep learning based DIC methods require labeled datasets with the ground truth of object deformation. In general, the dataset should consist of reference and deformed image pairs and their corresponding deformation fields. However, the ground truth is difficult to obtain, as the true object deformations are not easy to accurately determine. The available datasets are summarized in Table 1, which have a small deformation range, resulting in a limited network measurement range. Therefore, it is necessary to develop a dataset containing large deformations.

Table 1. Available datasets

View Table | View all tables in this article

2.2.2 Dataset development

Due to the challenging task of accurately determining real-world object deformations, this study suggests adopting a synthetic approach to generate a large deformation dataset. Similar to [22], the speckle generator proposed in [35] is used to generate 200 reference images with 256 × 256 pixels. Subsequently, various deformations are applied to the reference images to produce the deformed images. The deformation is defined according to the following strategy: First, deformations from -1 pixel to 1 pixel randomly generated within blocks of 3 × 3, 8 × 8, …, 58 × 58 and 63 × 63, which are then interpolated into images with a size of 256 × 256, so that deformation fields with varying frequency are simulated. Furthermore, these deformation fields are magnified by factors of 5, 10, 15, 20, 25, and 30 to obtain different ranges of deformation. In addition, the boundaries of the deformed image during image warping may introduce some irrelevant pixels, which will cause the network to learn some wrong information. To avoid this problem, the displacement direction of the image boundary is designed to be away from the center of the image so that the reference image boundary expands outwards. This strategy ensures that all pixel grayscale values of the deformed image are interpolated from the reference image pixels.

By varying both frequency and range, a total of 246 different deformation fields were generated. After applying these deformation fields to 200 reference images, a dataset of 49,200 image pairs and corresponding deformation fields is obtained. As small deformations are more often encounter in real applications and thus more important, 12,000 image pairs are generated for the deformations ranging from -1 pixel to 1 pixel. For the other ranges (-5 to 5 pixels, -10 to 10 pixels, -15 to 15 pixels, -20 to 20 pixels, -25 to 25 pixels, and -30 to 30 pixels) have 6,200 pairs each. In addition, in our network training, 44000 pairs of images are used as training set and the remaining 5200 pairs are used as validation set.

2.3 Training details

The proposed R³-DICnet is trained for 300 epochs on an NVIDIA GeForce 3090, with the following setting: the batch size of 32, the learning rate starts at 0.0002, the optimizer of AdamW [36] with a weight delay of 0.5 × 10⁻⁴, and the loss is defined as follows.

(1)$${{\cal L}} = \mathop {\mathop \sum \limits^N }\limits_{i = 1} {\gamma ^{N - i}}\left\| {{d_{gt}} - {d_i}_1} \right\|$$

where $\{{{d_1},\ldots \; ,{d_i}} \}$ is the deformation field calculated by each iteration optimization, N is the numbers of the iteration refinement, which is set to 8; ${d_{gt}}$ represents the ground truth of the deformation fields, and $\gamma $ represents the attenuation of the weight of each iteration, which is set to 0.8. In addition, during training and testing, the input grayscale images need to be normalized before feeding it to the network. To evaluate the performance of the R³-DICnet and StrainNet-f architectures, we trained StrainNet-f on the proposed dataset following the above training strategy to obtain StrainNet-r. The average endpoint error (AEE) of the two networks on the training and validation sets is shown in Fig. 2. It can be intuitively seen that the proposed network has a faster convergence speed and measurement accuracy.

Fig. 2. Average endpoint error (in pixels) for each epoch of the network training process. Left: average endpoint error on the training set. Right: average endpoint error on the validation set.

Download Full Size | PDF

3. Experimental verification

In this paper, the performance of the proposed R³-DICnet is verified by simulations and actual experiments. Simulations verify the measurement performance of the proposed network for different deformation ranges (Sec. 3.1), different deformation frequencies (Sec. 3.2) and discontinuous deformation (Sec. 3.3), while actual experiments (Sec. 3.4 and Sec. 3.5) demonstrate the practicality and accuracy of our method. Finally, computational costs are compared in Sec. 3.6.

3.1 Accuracy assessment on different deformation ranges

In this section, an experimental test set with 140 samples is prepared, which contains 2 different types of deformations, linear and periodic as expressed by Eq. (2) and Eq. (3), respectively,

(2)$$\left\{ {\begin{array}{{c}} {\; u = {a_1} + {a_2}x + {a_3}y + {a_4}xy}\\ {v = 0} \end{array}} \right.$$

(3)$$\left\{ {\begin{array}{{c}} {\; u = \sin({{\sigma_1}({x - {x_0}} )+ {\sigma_2}({y - {y_0}} )} )}\\ {v = 0} \end{array}} \right.$$

where u and v are the deformations in $x$- and $y$-directions, respectively; x and y are the coordinates of each pixel in the reference image; coefficients {${a_1}$, ${a_2}$, ${a_3}$, ${a_4}$, ${\sigma _1}$, ${\sigma _2}$, ${x_0}$, ${y_0}$} are randomly generated to control the center and magnitude of deformation. Each deformation type has 7 different deformation ranges (up to 30 pixels), with 10 samples per range. Each sample contains image pairs with a size of 256 × 256 pixels. To evaluate the accuracy of the methods, the following mean value of absolute error (MAE) is used to evaluate the gap between a calculated result and the ground truth. The MAE of the u field can be calculated as follows,

(4)$$MAE = \frac{1}{n}\; \; \mathop {\sum \; }\limits_{i,j} |{u({i,j} )- {u_{gt}}({i,j} )} |$$

where $i,j$ represent the row and column pixel locations, n is the total number of data points, $u({i,j} )$ and ${u_{gt}}({i,j} )$ are the calculated value and the ground truth. The MAE of v field can be similarly calculated by merely replacing u by v in Eq. (4).

We use R³-DICnet for evaluation on the test set, while ALDIC [29] is chosen as a representative of the traditional method for comparison. ALDIC combines the advantages of global DIC and local DIC with high accuracy and robustness in full-field measurements. Existing DIC methods based on deep learning cannot satisfy the 30-pixel measurement range. For comparison with deep learning DIC architectures, we used StrainNet-r mentioned in Sec. 2.3 for comparison. We find ALDIC performs best with a subset radius M = 12 in our evaluation. In addition, an ROI was selected so that it is 60 pixels away from all four image boundaries. The average MAEs of u-field calculated by ALDIC, StrainNet-r and R³-DICnet for different deformation ranges are shown in Table 2. ALDIC performs better in the 1-pixel deformation range, and R³-DICnet performs better in all other deformation ranges. For deformations larger than 5 pixels, StrainNet-r outperforms ALDIC, but not as well as R³-DICnet. R³-DICnet has an acceptable MAE of about 0.0435 pixels for large deformations. In addition, the result of a sample randomly selected in each deformation range of the linear deformation type is shown in Fig. 3, which is consistent to Table 2. One may notice that the maximum absolute errors in Table 3 are also generally consistent with the previous conclusions.

Fig. 3. Calculation results of ALDIC, StrainNet-r and R³-DICnet (in pixels).

Download Full Size | PDF

Table 2. Average MAE of the calculated results for different ranges of deformation (in pixels)

View Table | View all tables in this article

Table 3. Maximum absolute error of random sampling results (in pixels)

View Table | View all tables in this article

3.2 Accuracy assessment on different deformation frequencies

We now move on the evaluate the performance of R³-DICnet on the Star1 sample taken from DIC challenge 2.0 [30] which uses a star deformation field with a size of 50 × 2000 pixels and the deformation range of -0.5 to 0.5 pixel. The star deformation field is a synthetic vertical deformation field modelled by a cosine wave, whose period gently and linearly increases when going toward the right-hand side of the map, as can been seen in the ground truth field in Fig. 4.

Fig. 4. Comparison of results between different DIC measurement methods.

Download Full Size | PDF

Consistent with the previous section, ALDIC and StrainNet-r is still used for comparison. In addition, the pre-trained model StrainNet-f provided in [22] is compared, which was trained on the small deformation dataset Speckle-dataset 2.0 proposed in [22] and is specifically designed for sub-pixel measurements. The subset radius M in ALDIC was set to 4 pixels because it provides good results; the subset interval is 1 pixel. The ROI is away from the image boundaries by 10 pixels and is used for all the methods. The ROI, ground truth of the deformation field and the deformation field computed by the above methods are shown in Fig. 4, where the error maps are also given. It is apparent that all the methods perform well for the low frequency deformation while R³-DICnet and StrainNet-r perform better than StrainNet-f and ALDIC for the high frequency deformation. In order to show these results clearly, the deformation values along the horizontal centerline are shown in Fig. 5(a), and the MAE of the displacement values for each column pixels is shown in Fig. 5(b). The true displacement value on the horizontal centerline is equal to 0.5 pixels, and a green horizontal line is used to visualize this true value in Fig. 5(a). The results shown in Fig. 5 are consistent with the results in Fig. 4. It is worth mentioning that although both R³-DICnet and StrainNet-r are trained on our proposed dataset, R³-DICnet has smoother curves than StrainNet-r at low frequencies, which fully demonstrates the advantages of the R³-DICnet architecture.

Fig. 5. Comparison of the R³-DICnet, StrainNet-f, StrainNet-r, and ALDIC methods. (a) The displacement values along the horizontal centerline calculated by the above four methods; (b) The MAE of the displacement values for each column pixels calculated by the above four methods.

Download Full Size | PDF

For quantitative evaluation, the average MAE for low frequencies ($f < 1/45$) and high frequencies ($f \ge 1/45$) are calculated separately, as given in Table 4. In addition, spatial resolution (SR), noise level ${\sigma _u}$, and the metrological performance indicator $\alpha $ (the product of SR and ${\sigma _u}$) are also given. The details of these metrics are given in [22] and [30], and for all of them, a lower value indicates a better performance. R³-DICnet again shows a better performance in handling deformations of different frequencies.

Table 4. Comparison of different indicators

View Table | View all tables in this article

3.3 Accuracy assessment on discontinuous deformation

We further use Sample 15 from DIC challenge 1.0 [31] to evaluate the adaptability of R³-DICnet for discontinuous deformations. Sample 15 has eight cases that could be analyzed, among which the image “P200_K250.tif” is chosen for analysis because it has nicely separated peaks. The image has 1000 × 2000 pixels in size, and a discontinuous deformation field of within 10 pixels was applied along the $y$-direction. As in [31], we selected 67,304 data points at intervals of 5 pixels for comparison in the internal region from pixels (50,50) to (940,1925) for Sample 15.

The results of the calculations for the ALDIC with subset radii of M = 16, StrainNet-r, and R³-DICnet are shown in Fig. 6 and the error maps between the calculated results and the ground truth are also shown. It can be seen from the error maps that R³-DICnet and StrainNet-r outperforms ALDIC at the deformation discontinuity locations (rows 40, 90, and 140). To better reveal the results, the absolute error curves at discontinuous rows are shown in Fig. 7, and quantitative measures are given in Table 5. R³-DICnet has a consistent better performance.

Fig. 6. Deformation fields and error maps calculated by ALDIC, StrainNet-r and R³-DICnet.

Download Full Size | PDF

Fig. 7. Absolute error curves at discontinuous positions.

Download Full Size | PDF

Table 5. Full-field MAE and full-field maximum absolute error for different methods (in pixels)

View Table | View all tables in this article

3.4 Verification by real tensile experiments

As shown in Fig. 8(a), the experimental system contains a test specimen, a stretching machine, and an industrial camera (Basler_aca2440-75uc with 25 mm lens). The test specimen is a thin plate made of Q235 mild steel and with dimensions shown in Fig. 8(b). One end of the test specimen is fixed, and the other end is stretched by the stretching machine at a constant speed of 10 mm/s until fracture happens. During the experiment, we carefully aligned the camera and the test specimen, keeping the optical axis perpendicular to the test specimen. The camera is about 42 cm away from the test specimen, and shoots at a speed of 25 FPS. The actual image taken by the camera and the ROI are shown in Fig. 8(c). These captured image sequences are used to verify R³-DICnet.

Fig. 8. (a) Experimental setup; (b) Dimensional diagram of the specimen; (c) Example of images taken using the camera system.

Download Full Size | PDF

The R³-DICnet, StrainNet-r (already trained in the above sessions) and ALDIC (with the subset radius M = 16 pixels and the subset interval is set to 1 pixel) are used for deformation computation. Four frames were randomly selected for display, as shown in Fig. 9, where the difference between R³-DICnet and ALDIC, and between StrainNet-r and ALDIC were also shown. ALDIC yields slightly smoother result since it obtains the deformation field through interpolation operations. For quantitative comparison, we take the absolute value of the difference between the two methods and ALDIC, and calculate its mean and maximum values, as shown in Table 6. The results in Fig. 9 and Table 6 show that R³-DICnet outperforms StrainNet-r.

Fig. 9. The deformation fields in the $y$-direction calculated by the R³-DICnet, StrainNet-r and ALDIC and the corresponding difference maps on frames 23, 48,73 and 98.

Download Full Size | PDF

Table 6. The mean absolute difference and maximum absolute difference of the ${\boldsymbol y}$-direction deformation field, computed between R³-DICnet and ALDIC, and between StrainNet-r and ALDIC at frames 23, 48, 73, and 98 (in pixels)

View Table | View all tables in this article

3.5 Verification by wedge splitting tests

In order to fully demonstrate the performance of the proposed method on other actual applied scenarios, the test samples from the open-source Pydic [37] repository are chosen in this section, and the sample is obtained from real wedge splitting tests. The experimental setup is shown in Fig. 10(a), the top of the specimen is pressured by the wedge indenter, and the two ends of the specimen move towards the left and right directions respectively and fracture occurs. The reference image (“essaib000.BMP”) is shown in Fig. 10(b), with the size of 1920 × 2560 pixels. Consistent with the previous section, calculations were still performed using R³-DICnet, StrainNet-r, and ALDIC (with the subset radius M = 16 pixels and the subset interval is set to 1 pixel) on the ROI in Fig. 10(b). The results of the different methods on the deformed image (“essaib400.BMP”) are given in Fig. 11, which also shows the differences between R³-DICnet and ALDIC and between StrainNet-r and ALDIC. In addition, the absolute value of the difference between the two methods and ALDIC, and calculate its mean and maximum values, as shown in Table 7. It should be noted that since the ROI is not regular cannot be fed directly to the convolutional network, R³-DICnet and StrainNet-r first output the deformation field of the whole image, and then crop it according to the specified ROI to obtain the final deformation field. The results in Fig. 11 and Table 7 show that R³-DICnet still achieves better results, which further proves the effectiveness of the proposed method in real experiments (More examples of experimental validation as detailed in Supplement 1).

Fig. 10. (a) Experimental setup; (b) Reference image and ROI.

Download Full Size | PDF

Fig. 11. Calculation results for different methods and difference maps between R³-DICnet and ALDIC, and between StrainNet-r and ALDIC.

Download Full Size | PDF

Table 7. The mean absolute difference and maximum absolute difference between R³-DICnet and ALDIC, and between StrainNet-r and ALDIC (in pixels)

View Table | View all tables in this article

3.6 Computation time

We selected 10⁵ POIs on the ROI in Fig. 8(c) and performed calculations using StrainNet-r, ALDIC and R³-DICnet. In addition, OpenCorr [38] was chosen for comparison, which represents a local DIC with faster computation due to GPU acceleration. In both OpenCorr and ALDIC, the subset radius M was set to 16. In OpenCorr, the fast Fourier transform based cross-correlation (FFT-CC) [20] is used for initial value estimation and first-order IC-GN is used for sub-pixel matching. All calculations are performed on a laptop with an AMD Ryzen 9 5900 H processor and NVIDIA GeForce RTX 3080 Laptop GPU 8GB graphics card. The calculation times for the different methods are given in Table 8, which shows that speed of R³-DICnet is comparable to StrainNet-r, an order of magnitude faster than OpenCorr-GPU, and four orders of magnitude faster than ALDIC. In general, deep learning based DIC methods have clear advantage on execution speed for complicated computation algorithms. It is worth noting that the speeds mentioned here may differ from those described in [19] and [22], mainly due to differences in the equipment used.

Table 8. Comparison of calculation times

View Table | View all tables in this article

4. Conclusions

By integrating the two-step idea from traditional DIC methods into deep learning method, R³-DICnet achieves high accuracy and efficiency in deformation measurement. For accuracy, R³-DICnet is slightly worse than ALDIC for small deformation but outperforms it in larger deformations. R³-DICnet shows better performance for the deformation fields with different frequencies and discontinuities. For efficiency, R³-DICnet is among the fastest DIC algorithms. Furthermore, it is worth mentioning that R³-DICnet, trained on pure synthetic data, also is able to achieve good performance in real experiments, indicating its strong generalization ability and thus high convenience for practical use.

In conclusion, the proposed R³-DICnet not only shows the best performance in measuring large deformations with high efficiency and high accuracy, but also the accuracy of small deformation measurement is still good. This will be crucial for performing large-scale data processing as well as complex deformation measurement tasks in practice. In addition, designing a network with a larger measurement range, higher accuracy, efficiency, and convenience will also be our continuous pursuit.

Funding

Guangdong Basic and Applied Basic Research Foundation (2022A1515110036); National Natural Science Foundation of China (12302245); Natural Science Basic Research Program of Shaanxi Province (2023-JC-QN-0026); Shuangchuang Program of Jiangsu Province (JSSCBS20220943).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. B. Pan, “Digital image correlation for surface deformation measurement: historical developments, recent advances and future goals,” Meas. Sci. Technol. 29(8), 082001 (2018). [CrossRef]

2. F. Hild and S. Roux, “Digital Image Correlation: From Displacement Measurement to Identification of Elastic Properties - A Review,” Strain 42, 69–80 (2006). [CrossRef]

3. Y. Katz and Z. Yosibash, “New insights on the proximal femur biomechanics using Digital Image Correlation,” Journal of Biomechanics 101, 109599 (2020). [CrossRef]

4. R. Cao, W. Xiao, F. Pan, et al., “Displacement and strain mapping for osteocytes under fluid shear stress using digital holographic microscopy and digital image correlation,” Biomed. Opt. Express 12(4), 1922 (2021). [CrossRef]

5. M. A. Mousa, M. M. Yussof, U. J. Udi, et al., “Application of Digital Image Correlation in Structural Health Monitoring of Bridge Infrastructures: A Review,” Infrastructures 6(12), 176 (2021). [CrossRef]

6. R. Janeliukstis and X. Chen, “Review of digital image correlation application to large-scale composite structure testing,” Composite Structures 271, 114143 (2021). [CrossRef]

7. S. Roux and F. Hild, “Optimal procedure for the identification of constitutive parameters from experimentally measured displacement fields,” International Journal of Solids and Structures 184, 14–23 (2020). [CrossRef]

8. J. Curt, M. Capaldo, F. Hild, et al., “An algorithm for structural health monitoring by digital image correlation: Proof of concept and case study,” Optics and Lasers in Engineering 151, 106842 (2022). [CrossRef]

9. H. Jin and H. A. Bruck, “Pointwise digital image correlation using the genetic algorithm optimization method,” in 2003 SEM Annual Conference and Exposition on Experimental and Applied Mechanics (2003), pp. 1–8.

10. B. Pan, Y. Wang, and L. Tian, “Automated initial guess in digital image correlation aided by Fourier–Mellin transform,” Opt. Eng 56(1), 014103 (2017). [CrossRef]

11. J. Yang, J. Huang, Z. Jiang, et al., “SIFT-aided path-independent digital image correlation accelerated by parallel computing,” Optics and Lasers in Engineering 127, 105964 (2020). [CrossRef]

12. H. A. Bruck, S. R. McNeill, M. A. Sutton, et al., “Digital image correlation using Newton-Raphson method of partial differential correction,” Experimental Mechanics 29(3), 261–267 (1989). [CrossRef]

13. S. Baker and I. Matthews, “Lucas-Kanade 20 Years On: A Unifying Framework,” International Journal of Computer Vision 56(3), 221–255 (2004). [CrossRef]

14. H. W. S. C. H. R. E. I. E. R. and M and A. Sutton, “Systematic errors in digital image correlation due to undermatched subset shape functions,” Experimental Mechanics 42(3), 303–310 (2002). [CrossRef]

15. R. Zhu, H. Xie, Z. Hu, et al., “Performances of different subset shapes and control points in subset-based digital image correlation and their applications in boundary deformation measurement,” Appl. Opt. 54(6), 1290–1301 (2015). [CrossRef]

16. X. Li, G. Fang, J. Zhao, et al., “A practical and effective regularized polynomial smoothing (RPS) method for high-gradient strain field measurement in digital image correlation,” Optics and Lasers in Engineering 121, 215–226 (2019). [CrossRef]

17. S. Hwang and W. Wu, “Deformation measurement around a high strain-gradient region using a digital image correlation method,” J Mech Sci Technol 26(10), 3169–3175 (2012). [CrossRef]

18. G. M. Hassan, “Deformation measurement in the presence of discontinuities with digital image correlation: A review,” OPT LASER ENG 137, 106394 (2021). [CrossRef]

19. L. Zhang, T. Wang, Z. Jiang, et al., “High accuracy digital image correlation powered by GPU-based parallel computing,” Optics and Lasers in Engineering 69, 7–12 (2015). [CrossRef]

20. Z. Jiang, Q. Kemao, H. Miao, et al., “Path-independent digital image correlation with high accuracy, speed and robustness,” Optics and Lasers in Engineering 65, 93–102 (2015). [CrossRef]

21. P. Fischer, A. Dosovitskiy, E. Ilg, et al., “FlowNet: Learning Optical Flow with Convolutional Networks,” in 2015 IEEE International Conference on Computer Vision (ICCV) (2016).

22. S. Boukhtache, K. Abdelouahab, F. Berry, et al., “When Deep Learning Meets Digital Image Correlation,” Optics and Lasers in Engineering 136, 106308 (2021). [CrossRef]

23. S. Boukhtache, K. Abdelouahab, A. Bahou, et al., “A lightweight convolutional neural network as an alternative to DIC to measure in-plane displacement fields,” Optics and Lasers in Engineering 161, 107367 (2023). [CrossRef]

24. Y. Wang and J. Zhao, “DIC-Net: Upgrade the performance of traditional DIC with Hermite dataset and convolution neural network,” Optics and Lasers in Engineering 160, 107278 (2023). [CrossRef]

25. K. He, X. Zhang, S. Ren, et al., “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 770–778.

26. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,”

27. R. Yang, Y. Li, D. Zeng, et al., “Deep DIC: Deep learning-based digital image correlation for end-to-end displacement and strain measurement,” Journal of Materials Processing Technology 302, 117474 (2022). [CrossRef]

28. S. Lan, Y. Su, Z. Gao, et al., “Deep learning for complex displacement field measurement,” Sci. China Technol. Sci. 65(12), 3039–3056 (2022). [CrossRef]

29. J. Yang and K. Bhattacharya, “Augmented Lagrangian Digital Image Correlation,” Exp. Mech. 59(2), 187–205 (2019). [CrossRef]

30. P. L. Reu, B. Blaysat, E. Andó, et al., “DIC Challenge 2.0: Developing Images and Guidelines for Evaluating Accuracy and Resolution of 2D Analyses,” Exp. Mech. 62(4), 639–654 (2022). [CrossRef]

31. P. L. Reu, E. Toussaint, E. Jones, et al., “DIC challenge: developing images and guidelines for evaluating accuracy and resolution of 2D analyses,” Exp. Mech. 58(7), 1067–1099 (2018). [CrossRef]

32. Z. Teed and J. Deng, “Raft: Recurrent all-pairs field transforms for optical flow,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16 (Springer, 2020), pp. 402–419.

33. L. Wang, S. Bi, H. Li, et al., “Fast initial value estimation in digital image correlation for large rotation measurement,” Optics and Lasers in Engineering 127, 105838 (2020). [CrossRef]

34. T. Hui, X. Tang, and C. C. Loy, “Liteflownet: A lightweight convolutional neural network for optical flow estimation,” in Proceedings of the IEEE conference on computer vision and pattern recognition (2018), pp. 8981–8989.

35. F. Sur, B. Blaysat, and M. Grédiac, “Rendering Deformed Speckle Images with a Boolean Model,” J Math Imaging Vis 60(5), 634–650 (2018). [CrossRef]

36. I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv, arXiv:1711.05101 (2017). [CrossRef]

37. André. Damien, “Pydic, a Python suite for local digital image correlation,” https://gitlab.com/damien.andre/pydic/-/tree/master/examples/wedge-splitting-test/img.

38. Z. Jiang, “OpenCorr: An open source library for research and development of digital image correlation,” Optics and Lasers in Engineering 165, 107566 (2023). [CrossRef]

Network name	Dataset name	Deformation range (pixel)
StrainNet [22]	Speckle dataset 1.0 and 2.0	-1∼1
DIC-net [24]	Hermite dataset	-2∼2
DisplacementNet [27]	Null	-7∼7

Type	Method	Different ranges
Type	Method	1	5	10	15	20	25	30
Linear deformation	ALDIC	0.0058	0.0178	0.0684	0.2207	0.5947	0.6758	1.1817
	StrainNet-r	0.0202	0.0265	0.0395	0.0619	0.0872	0.1547	0.2174
	R³-DICnet	0.0092	0.0154	0.0156	0.0196	0.0203	0.0259	0.0423
Periodic deformation	ALDIC	0.0088	0.0184	0.0563	0.1279	0.4246	0.7025	0.9640
	StrainNet-r	0.0218	0.0303	0.0401	0.0584	0.0792	0.1236	0.1854
	R³-DICnet	0.0096	0.0158	0.0199	0.0201	0.0262	0.0306	0.0435

	Sample 1	Sample 2	Sample 3	Sample 4	Sample 5	Sample 6	Sample 7
ALDIC	0.0197	0.0514	0.0884	0.3174	0.9838	0.8488	0.8370
StrainNet-r	0.0988	0.1084	0.1571	0.2785	0.2705	0.6177	1.3136
R³-DICnet	0.0249	0.0686	0.0694	0.1090	0.0924	0.1198	0.2359

	ALDIC	StrainNet-f	StrainNet-r	R³-DICnet
Low frequency ( $f < 1 / 45$ )	0.0240	0.0244	0.0294	0.0231
High frequency ( $f \geq 1 / 45$ )	0.1170	0.1056	0.0948	0.0922
SR	36.32	29.04	19.52	15.95
$σ^{u}$	0.0209	0.0177	0.0208	0.0178
$α$	0.7591	0.5140	0.4060	0.2839

	ALDIC 2M + 1 = 33	StrainNet-r	R³-DICnet
MAE	0.0534	0.0523	0.0417
Maximum absolute error	0.4774	0.6056	0.2888

R³-DICnet: an end-to-end recursive residual refinement DIC network for larger deformation measurement

Abstract

1. Introduction

2. Methodology

2.1 Network architectures design

2.1.1 F-module

2.1.2 I-module

2.1.3 R-module

2.2 Dataset generation

2.2.1 Existing datasets

2.2.2 Dataset development

2.3 Training details

3. Experimental verification

3.1 Accuracy assessment on different deformation ranges

3.2 Accuracy assessment on different deformation frequencies

3.3 Accuracy assessment on discontinuous deformation

3.4 Verification by real tensile experiments

3.5 Verification by wedge splitting tests

3.6 Computation time

4. Conclusions

Funding

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (11)

Tables (8)

Equations (4)

Optics Express

		Frame 23	Frame 48	Frame 73	Frame 98
Maximum deformation		6	12	18	23
Mean absolute difference	StrainNet-r	0.0480	0.0836	0.1656	0.2852
Mean absolute difference	R³-DICnet	0.0370	0.0485	0.0581	0.0802
Maximum absolute difference	StrainNet-r	0.5238	1.2955	1.9057	2.6809
Maximum absolute difference	R³-DICnet	0.4399	0.5856	0.5223	0.6772

	StrainNet-r		R³-DICnet
	U	V	U	V
Mean absolute difference	0.0369	0.0363	0.0315	0.0257
Maximum absolute difference	0.3021	0.3212	0.4778	0.2184

	R³-DICnet	StrainNet-r	ALDIC	OpenCorr-GPU
Seconds	0.0255	0.0248	332.23	0.6329
POI/s	3.92× $10^{6}$	4.03× $10^{6}$	3.01× $10^{2}$	1.58× $10^{5}$