Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Stereo-IA: stereo visual intensity alignment and beyond under radiation variation

Open Access Open Access

Abstract

Stereo vision is a hot research topic at present, but due to the radiation changes, there will be a large intensity difference between stereo pairs, which will lead to serious degradation of stereo vision based matching, pose estimation, image segmentation and other tasks. Previous methods are not robust to radiation changes or have a large amount of calculation. Accordingly, this paper proposes a new stereo intensity alignment and image enhancement method based on the latest SuperPoint features. It combines the triangle based bearings-only metric, scale-ANCC and belief propagation model and has strong robustness to radiation changes. The quantitative and qualitative comparison experiments on Middlebury datasets verify the effectiveness of the proposed method, and it has a better image restoration and matching effect under the radiation changes.

© 2022 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction

Compared with depth camera and monocular camera, binocular camera has great advantages, it plays a key role in outdoor ego-motion estimation, driving assistance, 3D reconstruction and underwater exploration [13]. However, binocular camera in practical applications is often affected by different illuminations and exposures, as shown in Fig. 1.

 figure: Fig. 1.

Fig. 1. Stereo vision under radiation variation

Download Full Size | PDF

At present, there are mainly two schemes to the above problems: deep learning and geometric reasoning. According to the way the label is obtained, deep learning can be divided into supervised, self-supervised and semi-supervised networks. Although these learning networks can improve the final performance through continuous learning and correcting errors, these networks need a lot of training data, and the optimization parameters even reach millions level. In addition, most deep learning networks have general generalization ability and are only suitable for some scenes or data sets. Recently, deep learning combined with geometric reasoning has gradually become the mainstream. On the one hand, it can improve the accuracy of geometric reasoning methods, on the other hand, it can also improve the generalization ability of deep learning methods, such as [47]. For low illumination or night images, deep learning methods [810] can achieve good results. However, these algorithms are for the restoration of low illumination images, that is, under exposure. In the case of over exposure, the image features are lost. Without the constraints of other images, the deep learning method is difficult to achieve image restoration by a single image. Moreover, one image restored by deep learning and another image of stereo pair still exists intensity difference, which does not greatly promote the development of stereo vision.

Geometric reasoning is the basis of realizing robust deep learning and plays an important guiding role in stereo vision. The typical geometric metrics against photometric changes include Normalized cross-correlation (NCC) [1114] and Census [15]. In order to reduce intensity differences between stereo pair, log chromaticity transform (LCT) becomes a popular measures in some recent works [1113]. However, there are still significant intensity differences after above transformation. The work by Heo et al. [16] presented a joint depth map and color consistency estimation with strong robustness to radiometric changes. However, this method changes the original texture information of the stereo images (similar to LCT transform), and there are no aligned images, so it cannot be used for further image recognition or pose estimation. ANCC [11] and RANCC [12] are essentially a combination of NCC and bilateral filter (BF) [17]. Because guide filter (GF) [18] is superior to BF in filtering accuracy, GF based fast cost-volume filtering (CVF) [19], stereo matching using dual fusion (DF) [20], local stereo matching with adaptive guided filter [21], iterative guided filter (IGF) stereo matching [22] and intensity guided cost metric (IGCM) [14] were proposed. However, it is difficult to obtain the guidance image that is less affected by radiometric changes, which makes these algorithms have some limitations in application. The work by Xu et al. [23] summarized the disadvantages of traditional matching algorithms under radiation changes, and proposed an unsupervised inverse intensity compensation method, but this compensation algorithm is prone to holes or cracks in low texture or depth discontinuities. Recent studies [24,25] have shown that deep learning features have better advantages than traditional features (e.g., SIFT [26], ORB [27]) in low-texture, exposures and illuminations. However, these learning features are still sparse, therefore we develop a novel robust stereo visual intensity alignment framework with SuperPoint [24] (it is denoted as Stereo-IA, as shown in Fig. 2). Stereo-IA can realize the intensity alignment of stereo images affected by radiation, which can further improve the accuracy of the following tasks such as image recognition and pose estimation.

 figure: Fig. 2.

Fig. 2. Workflow of Stereo-IA. (a) Input data is the raw stereo images acquired by imaging setups that include effect of different illuminations or exposures. (b) SuperPoint for geometric correspondence (a self-supervised network for training interest point detectors and descriptors). (c) Delaunay-mesh based belief propagation, in which we exploit strong support points extracted by SuperPoint network to achieve dense correspondence. (d, f) Stereo visual intensity alignment, in which the corresponding compensation parameters are estimated. (e) Matching cost calculation, in which we construct a robust metric combined with RGB, Gradient, Inverse and scale-ANCC. (g) Output the aligned or restored stereo images.

Download Full Size | PDF

The rest of this paper is organized as follows. Section 2 gives a detailed description of the proposed method. Some qualitative and quantitative experiments are carried out in Section 3, and where the corresponding discussion and analysis are given. In Section 4, this paper is summarized and the follow-up work is looked forward.

2. Formulation

2.1 Overview

Figure 2 illustrates the workflow of the proposed intensity alignment approach. The basic idea is to first segment an input stereo pair into 2D triangles with sparse support points, then dense correspondence and intensity alignment are carried out in triangles. The corresponding operation in each triangle region is independent. We exploit three vertices to propagate the confidence in the triangle region, which can not only improve the processing speed of the whole image, but also improve the matching accuracy. However, due to the presence of texture-less regions (foreground and background), the disparity in the triangle region may not be on the same plane, so there will be a large error when using the plane interpolation strategy to obtain the initial value of propagation, as shown in Fig. 3. The disparity at the center point of edge is much smaller than at the vertices, and the same is true for the triangular midpoint. Recently popular algorithms, such as ELAS [28], MeshStereo [29], LS-ELAS [30] all have such shortcomings.

 figure: Fig. 3.

Fig. 3. Delaunay-mesh and triangles. In the left image, the red stars show support points extracted by SuperPoint, the blue lines make up the triangular mesh.

Download Full Size | PDF

To tackle this difficulty, inspired by LPSM [31], we propose a triangle based bearings-only metric scheme, which takes occlusion, texture-less and photo-consistency into consideration. The problem is formulated as following steps:

  • 1. The support points are used to establish a triangular mesh, and the corresponding disparity is roughly calculated in the mesh.
  • 2. The developed scale-ANCC model and the confidence propagation model are exploited to restore the dense correspondence.
  • 3. Dense correspondence results and intensity alignment model are used to estimate the intensity compensation parameters.
  • 4. Graph filtering is used to further optimize the above results.

2.2 SuperPoint and Delaunay-mesh based robust cost metric

As shown in Fig. 4, we establish the angle model by the spatial position relationship between the support points (i.e., interest points extracted by SuperPoint) and the target pixel. Given $S = \{{{\mathbf s}_1,{\mathbf s}_2,{\mathbf s}_3} \}$ be a set of support points in the left image, it represents three vertices in the triangle, where ${\mathbf s}_n = ({u_n,v_n,d_n} )$ means the n-th support point, which is a 3D vector composed of pixel coordinates and disparity. The target pixel can be formulated as a 3D vector of ${\mathbf t} = ({{\mathbf p},d} )$. Then the sum of the altitude angle from three vertices to target pixel can be obtained as follows:

$${{\rm H}^L} = \sum\nolimits_{n = 1}^3 {\Theta ({\mathbf s}_n - {\mathbf t})} ,$$
where $\Theta $ is a function of calculating altitude angle (i.e., the angle $\psi$ in Fig. 4), it can be expressed as:
$$\Theta (\varDelta u,\varDelta v,\varDelta d) = {\tan ^{ - 1}}(\frac{{|{\varDelta d} |}}{{\sqrt {\varDelta {u^2} + \varDelta {v^2}} }}).$$
Then the angle metric can be expressed as:
$$\begin{aligned} {{\mathbf C}^{Angle}}({\mathbf p},d) &= |{{\rm H} - {{\rm H}^R}} |\\ &= \sum\nolimits_{n = 1}^3 {|{\Theta ({\mathbf s}_n - {\mathbf t}) - \Theta ({{\mathbf s}^R_n} - {{\mathbf t}^R})} |} . \end{aligned}$$
where ${{\rm H}^R}$, ${{\mathbf s}^R}$ and ${{\mathbf t}^R}$ indicate corresponding quantities associated to right image, that is ${{\mathbf s}^R_n} = ({{u^R_n},{v^R_n},{d^R_n}} )$, ${{\mathbf t}^R} = ({{{\mathbf p}^R},d} )$.

 figure: Fig. 4.

Fig. 4. Triangle based bearings-only metric.

Download Full Size | PDF

Considering the anisotropic diffusion, the pixels at different positions have different effects on the target pixel. Inspired by [17,32], we add adaptive weights to (3), the following equation can be obtained:

$${{\mathbf C}^{Angle}}({\mathbf p},d) = \sum\nolimits_{n = 1}^3 {\frac{{|{\Theta ({\mathbf s}_n - {\mathbf t}) - \Theta ({{\mathbf s}^R_n} - {{\mathbf t}^R})} |}}{{{e^{|{dis({\mathbf s}_n,\;{\mathbf t}) - dis({{\mathbf s}^R_n},\;{{\mathbf t}^R})} |/\delta }}}}} .$$
where $dis({\cdot} )$ represents the Euclidean distance between the two 3D points, $\delta$ is adjusting distance parameter for controlling the contribution of the corresponding altitude angle. If the distance between the target pixel and the support point changes greatly in stereo pair, the contribution of the corresponding angle will be reduced accordingly. In addition, because the angle difference is relatively small compared with the intensity difference or gradient difference, so the logarithmic operation is used to enlarge the angle metric and the minimum value ${\lambda ^L}$ is added to prevent no solution, then (4) can be expressed as:
$${{\mathbf C}^{LogA}}({\mathbf p},d) = \lg (\max ({\lambda ^L},{{\mathbf C}^{Angle}}({\mathbf p},d))).$$
This new metric avoids the singularity of matching in texture-less regions. In the smaller triangle region, the proposed method has higher confidence and smaller estimation error than LPSM [31] in the whole image. Consequently, the sparse support point disparity and the coarse disparity estimated by bearings-only metric in the triangle jointly generate the initial dense correspondence results. Next, we will adopt the strategy of belief propagation to further eliminate or reduce these errors.

2.3 Stereo visual intensity alignment

In order to further reduce the influence of radiation changes on stereo matching, we propose a visual alignment scheme, which can minimize the intensity difference between stereo pairs. We build an optimization model to realize the initial global intensity alignment of stereo pairs through sparse support points that can be robustly matched, which greatly reduces the matching ambiguities.

Here we still use these interest points extracted by SuperPoint as the support points. Given ${S^{IA}} = \{{{\mathbf s}_1, \cdots ,{\mathbf s}_n} \}$ be a set of support points, ${\mathbf s}_n = {(u_n,v_n,o_n)^T}$ is composed of pixel coordinate $(u_n,v_n)$ and the corresponding intensity $o_n$. We assume that the image affected by illuminations and the ideal image satisfy the linear relationship locally, then the following model can be established:

$$o{_n^L} \propto \;\;\{{\kappa \cdot o{_n^R} + \eta \cdot \Gamma_n} \},$$
where L and R indicate quantities associated to left and right images, respectively. $\kappa$ and $\eta$ are the compensation parameters to be estimated, $\Gamma_n = \sqrt {255 - o{_n^R}}$ that means the inverse operation of the input image. The intensity alignment is formulated as an optimization problem, and the goal is to minimize the following objective function $E$:
$$E(\kappa ,\eta ) = \frac{2}{{N_s}}\sum\limits_{i = 1}^{N_s} {\omega_i} \cdot {(o{_i^L} - \kappa \cdot o{_i^R} - \eta \cdot \Gamma_i)^2} + \frac{1}{{N_s}}\sum\limits_{j = 1}^{N_s} {\lambda_i} {(\hat{o}{_j^L} - \kappa \hat{o}{_j^R})^2},$$
which reduces to solving an $N_s \times 2$ linear system ${\boldsymbol A\zeta } = {\boldsymbol \varOmega }$, where
$$\begin{array}{l} {\boldsymbol A} = [{a_{ij}} ]\;\;\;with\;\;a_{ij} = \left\{ \begin{array}{l} 2\omega_i \cdot o{_i^R} + \lambda_i \cdot \hat{o}{_i^R},\;\;\;\;\;if\;\;j = 1\\ 2\omega_i \cdot \sqrt {255 - o{_i^R}} ,\;\;\;\;\;if\;\;j = 2 \end{array} \right.\\ {\boldsymbol \zeta } = {[{\kappa \;\;\;\eta } ]^T}\\ {\boldsymbol \varOmega } = {[{2\omega_1 \cdot o{_1^L} + \lambda_1 \cdot \hat{o}{_1^L}\;\;\;2\omega_2 \cdot o{_2^L} + \lambda_2 \cdot \hat{o}{_i^L}\;\;\; \cdots \;\;\;\;2\omega N_s \cdot oN{s^L} + \lambda N_s \cdot \hat{o}{_i^L}} ]^T}. \end{array}$$
In Eq. (7), the first term is a weighted average of the squared intensity differences of the corresponding pixels in linear model (i.e., pixels which can be robustly matched). The second term in Eq. (7) is a weighted average of the squared intensity mean differences between the corresponding pixels, it may be interpreted as a regularization term, and it is minimized when all support points have the same corresponding intensity, $\lambda$ is the regularization coefficient.$N_s$ represents the number of support points, $\hat{o}$ denotes the corresponding mean in the window 5×5, $\omega_i$ is an adaptive weight, which can reduce the proportion of mismatched support points in Eq. (7), it modeled as products of Laplacian kernels:
$$\omega_i = \exp ( - \frac{{\delta_i}}{{\mathrm{\gamma }_g}} - \frac{{\xi_i}}{{\mathrm{\gamma }_v}}),$$
where $\delta_i$ is the similarity distance, defined as the Euclidean distance between the gradient values of i-th matched support points, and $\xi_i$ is the vertical coordinates spatial distance of corresponding pixels. Parameters $\mathrm{\gamma }g$ and $\mathrm{\gamma }v$ are thresholds that control the contribution of gradient and spatial distances, respectively. It should be noted that the above equation is different from BF [17], because the color difference of the corresponding pixels will vary greatly under the radiation change, we choose the relative color difference operator, namely gradient difference. In addition, we use the vertical ordinates difference of the corresponding pixels to further constrain the weight $\omega_i$. In particular, $\omega_i$ should be large when the corresponding gradients are similar or the vertical ordinates difference is smaller. On the other hand, $\omega_i$ should be small. We have selected $\mathrm{\gamma }g\textrm{ = 5}\textrm{.5}$ and $\mathrm{\gamma }v\textrm{ = }3$ based on experiments. The regularization coefficient $\lambda_j = \exp ( - |{\hat{o}{_j^L} - \hat{o}{_j^R}} |)$, i.e., when the average intensity of the target pixel in the window 5×5 in the left image is exactly the same as the right image, then the role of $\kappa$ in the intensity alignment model is enhanced, otherwise it is decreased. The Eq. (8) can be formulated as a linear optimization problem, and we can take appropriate methods to solve it.

Consequently, a raw intensity alignment for the stereo images can be obtained. Due to the influence of radiation changes, there are relatively sparse support points extracted by SuperPoint. We just take the global estimation of model parameters in this subsection. Later, we will use dense support points to refine the alignment results.

2.4 Fast belief propagation and dense correspondence refinement

On the one hand, the bearings-only metric still has estimation errors, on the other hand, the intensity compensation coefficient estimated by sparse support points is globally consistent (different regions are affected by different illumination, and the intensity compensation should be different, but it is difficult to estimate the local intensity compensation parameters using sparse points). Therefore, in this section, the exact dense correspondence is restored by means of confidence propagation. This method takes the initial disparity estimated by bearings-only metric as a priori and 3σ as confidence space to quickly obtain the scale-ANCC value of target pixel.

The fast belief propagation model is shown in Fig. 5, given a pixel in left image is ${\mathbf p}$, its corresponding pixel in target image is ${\mathbf q}$, with initial disparity $l_{ini}$ estimated by above bearings-only metric. Then the refined dense correspondence can be obtained by the following formulas:

$$p(\tilde{l},\;{\mathbf p}) \propto p(\tilde{l}|{l_{ini}} ,\;{\mathbf p})p(\tilde{l},{\mathbf q}|{\mathbf p} ),$$
$$p(l_{ini},{\mathbf p}) = \left\{ \begin{array}{l} \exp ( - \frac{{{{(\tilde{l} - l_{ini})}^2}}}{{2{\sigma^2}}}) + \delta ,\;if\;|{\tilde{l} - l_{ini}} |< 3\sigma \\ 0,\;otherwise \end{array} \right.,$$
where $\sigma$ is a preset threshold (confidence space), $\delta$ is a preset constant that can adjust the prior probability. We just choose similar disparity (i.e., within 3σ) to propagate probability, it is our optimized search space instead of the initial given large-scale candidate space. The likelihood probability can be expressed as:
$$p(\tilde{l},{\mathbf q}|{\mathbf p} ) = \left\{ \begin{array}{l} \exp ( - {{C({\mathbf p},\tilde{l})} / \zeta }),\;if\;hor({\mathbf p} - {\mathbf q}) \le \Delta \\ 0,\;otherwise \end{array} \right.,$$
where $C({\cdot} )$ is proposed scale-ANCC function as shown in Fig. 6, $\zeta$ is a presetting constant, $hor({\cdot} )$ represents the operation of taking horizontal coordinates, $\Delta $ is an acceptable vertical error. Then the posterior probability in (10) can be expressed as:
$$\begin{aligned} p(\tilde{l},\;{\mathbf p}) &= p(\tilde{l}|{l_{ini}} ,\;{\mathbf p})p(\tilde{l},{\mathbf q}|{\mathbf p} ),\\ & = (\exp ( - \frac{{{{(\tilde{l} - l_{ini})}^2}}}{{2{\sigma ^2}}}) + \delta ) \cdot \exp ( - {{C({\mathbf p},l)} / \zeta }), \end{aligned}$$

 figure: Fig. 5.

Fig. 5. Fast belief propagation. It is a probability propagation model from initial disparity map estimated by bearings-only metric in triangles to the final accurate disparity map. In this way, the search range is greatly reduced to 3σ from the initial maximum value.

Download Full Size | PDF

 figure: Fig. 6.

Fig. 6. Illustration of scale-ANCC model. It mainly consists of 36 pixels at three resolutions, these pixels are connected in a certain order. The green point means target pixel, whose pixel coordinate is (u,v) at full resolution.

Download Full Size | PDF

The proposed scale-ANCC combines features at three scales, namely full resolution, half resolution and quarter resolution. We use 12 pixels around the target pixel to describe the similarity, that is, 36 pixels in total for the three scales. In terms of complexity, it is equivalent to 6×6 window, but its robustness is far better than that of 6×6 window. Suppose a pixel $\mathbf p$ on the left image (reference image), the corresponding pixel $\mathbf q$ on the right image (target image) with a candidate disparity level $d$. ${\mathbf I}$ is the set of pixels around target pixel at different scales (i.e., the 36 pixels in Fig. 6). Then the matching cost can be obtained as follows:

$${C_{scale - ancc}}({\mathbf p},d) = \frac{{\sum\limits_{t = 1}^{36} {{}^t{w^L}({\mathbf p})({}^t{{\mathbf I}^L}({\mathbf p}) - {\varpi ^L}({\mathbf p})) \cdot {}^t{w^R}({\mathbf q})({}^t{{\mathbf I}^R}({\mathbf q}) - {\varpi ^R}({\mathbf q}))} }}{{\sqrt {\sum\limits_{t = 1}^{36} {{{({}^t{w^L}({\mathbf p}){}^t{{\mathbf I}^L}({\mathbf p}) - {\varpi ^L}({\mathbf p}))}^2}\sum\limits_{t = 1}^{36} {{{({}^t{w^R}({\mathbf q}){}^t{{\mathbf I}^R}({\mathbf q}) - {\varpi ^R}({\mathbf q}))}^2}} } } }},$$
$$\mathop {{}^t{w^i}({\mathbf e})}\limits_{i \in \{ L,R\} } = \exp ( - {|{{}^t{{\mathbf I}^i}({\mathbf e}) - {{\mathbf I}^i}({\mathbf e})} |^2}/\delta c - {|{{}^t{{\mathbf c}^{\mathbf e}} - {\mathbf e}} |^2}/\delta d),$$
where $L$ and $R$ indicate quantities associated to left and right images, respectively. ${\varpi (\cdot)}$ is the mean value of corresponding set. ${\delta _c}$ is color difference parameter, ${\delta _d}$ is distance parameter between the pixel ${\mathbf e}$ and other pixel ${\mathbf c}$ around it. The coefficient $\tilde{\vartheta } = (\kappa ,\eta )$ estimated in global intensity alignment is exploited to optimize scale-ANCC, then the numerator of Eq. (14) can be expressed as:
$$\begin{array}{l} \Pi = \sum\limits_{t = 1}^{36} {{\rm X} \cdot {}^t{w^R}({\mathbf q})({}^t{{\mathbf I}^R}({\mathbf q}) - {\varpi ^R}({\mathbf q}))} \\ \;\;\;\; = \sum\limits_{t = 1}^{36} {{\rm X} \cdot {}^t{w^R}({\mathbf q})(\mathrm{\tilde{\Lambda }}({}^t{{\mathbf I}^R}({\mathbf q})) - {{\tilde{\varpi }}^R}({\mathbf q}))} , \end{array}$$
$$\tilde{\Lambda }(t) = \tilde{\vartheta }\left( \begin{array}{l} \;\;\;\;\;t\\ \sqrt {255 - t} \end{array} \right),$$
where $\mathrm X$ denotes the left half of the numerator in Eq. (14), $\tilde{\Lambda }$ is intensity alignment function, $\tilde{\varpi }(\cdot)$ represents corresponding mean value. The denominator of Eq. (14) can be compensated in the same way. Then the accurate dense correspondence can be obtained by maximizing Eq. (13).

2.5 Graph filtering and beyond

The coefficients estimated by the global intensity alignment mentioned above are the same for the whole image. Because the influence of radiation change on different regions is different (i.e., anisotropic), it is not accurate to use the same coefficient for the whole image. Therefore, we use dense corresponding results (after hole-filling) to further estimate the local refined intensity alignment coefficient. The disparity results estimated above contain outliers or holes in occlusion regions and at depth discontinuities. We select the pixels satisfying the left-right consistency check (LRC) as the vertices to establish the Delaunay-mesh. Then, the average depth of three vertex positions is exploited to smooth the region in the triangular mesh. In this way, most outliers or holes are effectively removed from the initial results.

Then $\tilde{\vartheta } = (\kappa ,\eta )$ can be updated in a small window $W_s$ by Eq. (8). However, if the step length is smaller than the window size, the solution in Eq. (7) is not unique when it is computed in different windows. Here we exploit the average of all the possible solution as the final result, that is:

$$\tilde{\vartheta }^{\prime} = (1/\psi ) \cdot \sum\nolimits_{t \in ws} {{{\tilde{\vartheta }}^t}} ,$$
where $\psi$ denotes the size of Ws. The estimated coefficient is exploited to align the intensity of each pixel. Then the initial visual enhanced images can be obtained. Next, we will continue to optimize the output disparity map or enhanced image using Delaunay-mesh to ensure the smoothness of enhanced images. However, some support points (i.e., triangle vertices) may be partially in the depth discontinuous region, and such vertices will have certain singularity in triangle based graph filtering, so we refer to the work by Fickel et al. [32] to adjust the triangular mesh.

It is assumed that each vertex in the mesh is either related to a single object, presenting a continuous variation of disparity value, or at the boundary between two objects, where discontinuities are allowed [32]. For a given vertex $V$, the distribution of the corner disparities $di$ related to $V$ are evaluated in mesh. It can be considered that $V$ belongs to a single object when $\max \{ di\} - \min \{ di\} < Td$, where $Td$ is a disparity threshold (e.g., 1.0). Instead, $V$ lies on the boundary of two objects. In view of this situation, Fickel et al. [32] proposed an optimization strategy for generating new vertices, as shown in Fig. 7. They used the method of binary clustering, all the corners at the same vertex either present the same disparity (type 1 vertex), or are clustered in exactly two disparity values (type 2 vertex). Then, all the edges of the triangular tessellation are scanned in detail, and the state of their vertices are checked. If both of them are type 1, nothing is done, since the mesh is already continuous at both ends. If one of the vertices is type 1 and the other type 2, then type 2 vertex is split into two other vertices and a new edge is created between them. If both vertices are type 2, they are divided into two new vertices (similar to the previous case) and connected with three new edges (to generate two new triangles). These three possibilities are illustrated in Figs. 7(a)-(c), respectively.

 figure: Fig. 7.

Fig. 7. Optimization of the triangular mesh. The thick red edge is under consideration, black dots mean type 1 vertices, and blue squares represent type 2 vertices. The dashed lines mean the new inserted edges. (a) All type 1 vertices (no triangle created); (b) One type 1 and one type 2 vertices (one new triangle created); (c) All type 2 vertices (two new triangles created).

Download Full Size | PDF

Due to low texture, occlusion and discontinuous regions, there are still some mismatching errors in estimated disparity maps and aligned images. The filtering effect of window based filtering is not obvious when the window is small, and it is easy to cause blur when the window is too large. ANCC [11], RANCC [12], IGCM [14], CVF [19] and IGF [22] and all have the similar problems. In order to solve the above problems, we develop the graph filtering method based on triangle, as shown in Fig. 8. Because the graph filtering is carried out in the triangle, it does not need to set or adjust the size of the window, and it can also prevent the blurring problem caused by the large window. Inspired by the GF [18], we first select the guided image according to the texture complication. If the $Te$ value of the left image is smaller than that of the right image, select the left image as the guided image and vice versa, which is computed as:

$$Te(I) = \frac{{Hi{s^1}(I) + Hi{s^2}(I) + Hi{s^3}(I)}}{{Hi{s^0}(I)}},$$
where $I$ indicate the input image, $Hi{s^i}$ represent the i-th component (rank) in the histogram, $Hi{s^0}$ mean the total number of pixels (or image size). Then the texture features of the guided image are exploited to enhance another stereo image in triangles. However, the left and right images have different view, and the corresponding triangles have projection transformation in the stereo pair, i.e., the corresponding triangles are not exactly the same, as shown in the upper part of Fig. 8. The yellow block indicates a Delaunay triangle in the left image, and the blue block means the corresponding triangle in the right image. Projection transformation makes the areas of the two triangles not the same, i.e., the number of pixels in the triangle is different. In order to avoid the influence of this difference, we use the sliding window in triangle for graph filtering. The sliding windows slide along the edges of the two triangles at the same time, and the minimum intersection in the two triangles is selected for graph filtering calculation. More specifically, pixels not in the corresponding triangle are excluded, the graph filtering in sliding window can be expressed as:
$$\tilde{\Gamma }({\mathbf p}) = \sum\limits_k {w({{\mathbf g}^k},{\mathbf p}) \cdot } \Gamma ({\mathbf p}),$$
where $\Gamma ({\mathbf p})$ denotes the value at pixel ${\mathbf p}$, (it is disparity value in stereo matching or it is pixel value in image enhancement), $w$ is an adaptive weight in the window:
$$w({{\mathbf g}^k},{\mathbf p}) = \frac{1}{N}\sum\limits_{\;\;k \in \{ 1:N\} } {(1 + \frac{{({\mathbf J}({\mathbf p}) - \mu )({\mathbf J}({{\mathbf g}^k}) - \mu )}}{{{\sigma ^2} + \epsilon }})} ,$$
where $\mathbf g$ means the pixel in the window which centered on pixel $\mathbf p$, $k$ means pixel index, $\mathbf J$ is the guided map.$\eta $, $\mu$ and ${\sigma ^2}$ represent the total number, mean value and variance of pixels in the sliding window, respectively.

 figure: Fig. 8.

Fig. 8. Triangle based graph filtering and enhancement. The aligned image has the problem of fracture in detail (red arrow part). After graph filtering, a smoother image enhancement effect can be obtained. It is mainly divided into two steps. Firstly, the gray histogram of stereo image pair is calculated, and the high texture image is judged and selected as the guide image. Secondly, graph filtering is carried out in the triangular mesh sliding window.

Download Full Size | PDF

3. Experimental results

We now demonstrate that the proposed method can generate the most advanced stereo matching, feature matching and interactive image segmentation results. In order to verify the effectiveness of the proposed method in stereo matching and image enhancement under the influence of radiation changes, this section carries out test experiments on the widely used Middlebury dataset [3335], and compares it with the advanced methods in recent years. This dataset contains multiple images (without rotation) accurately corrected along a fixed baseline (i.e. exactly the same baseline) using the same camera, and also provides a corresponding series of radiation change images under three exposure and three illumination settings. For easy to compare, three different exposure and illumination levels in Middlebury dataset are expressed as Exp0, Exp1, Exp2 and Illum1, Illum2 and Illum3, respectively. In addition, the proposed methods are represented by Stereo-IA-0 and Stereo-IA-1 before and after triangular mesh filtering.

3.1 Stereo matching

Firstly, we conduct independent experimental research on different modules of the proposed algorithm, as shown in Fig. 9. It can be seen that the robustness of the traditional ANCC cost metric is reduced when the radiation change has a great impact (intensity overflow). Compared with ANCC, NCC + GF (IGCM [14]) has great improvement, but it relies on guide image and lacks intensity alignment process. Its performance is not ideal when the radiation changes greatly. The improved scale-ANCC has a significant improvement in performance. After intensity alignment and graph filtering, the matching results are further improved, and the disparity holes and mismatched points are significantly reduced. However, in the stage of Delaunay-mesh propagation and intensity alignment, SuperPoint is better than the traditional SIFT scheme. Then the matching performance of the proposed method is verified under different exposure and illumination combinations. IGCM [14] (OE’ 2018) and ICIR [23] (image’ 2021) are the latest stereo matching algorithms under radiation changes. Their performance is better than the traditional methods such as ANCC [11], RANCC [12], CVF [19], IGF [22] and AD [36]. Therefore, the proposed method is only compared and analyzed with IGCM and ICIR. In order to illustrate the limitations of the single depth learning method (generalization ability), another group of experiments of Cascade-stereo [7] (CVPR’ 2020) depth learning network are carried out. The matching results of three extreme combinations as shown in Figs. 1012, from top to bottom, they are “Art”, “Dolls”, “Books”, “Bowling2” and “Baby2”, respectively.

 figure: Fig. 9.

Fig. 9. Stereo matching results before and after stereo visual intensity alignment (aggregated by CVF [19]). (a) Right image under the different illumination and exposure. It can be seen that the right image loses some texture information due to exposure influence and intensity overflow. (b) Ground Truth. (c) Matching result by ANCC (6×6) [11] cost metric. (d) Matching result by the proposed scale-ANCC (6×6) cost metric (without intensity alignment). (e) Matching result by the NCC + GF (IGCM [14]). (f) Matching result by the SIFT [26] Delaunay-mesh belief propagation and intensity alignment. (g) Matching result by the SuperPoint [24] Delaunay-mesh belief propagation and intensity alignment. (h) Triangle based graph filtering.

Download Full Size | PDF

 figure: Fig. 10.

Fig. 10. Stereo matching under the different exposure (underexposed). From left to right: Left image, right image, Ground Truth, disparity map estimated by Cascade-stereo [4], IGCM [11] and Stereo-IA (ours), respectively.

Download Full Size | PDF

 figure: Fig. 11.

Fig. 11. Stereo matching under the different exposure (overexposed). From left to right: Left image, right image, Ground Truth, disparity map estimated by Cascade-stereo [4], IGCM [11] and Stereo-IA (ours), respectively.

Download Full Size | PDF

 figure: Fig. 12.

Fig. 12. Stereo matching under the different illumination and exposure. From left to right: Left image, right image, Ground Truth, disparity map estimated by Cascade-stereo [4], IGCM [11] and Stereo-IA (ours), respectively.

Download Full Size | PDF

From the experimental results, it can be seen that the improved method is significantly better than Cascade-stereo deep learning method and IGCM algorithm. IGCM is a method of combining NCC and GF, which undoubtedly improves the performance of cost aggregation, but it is difficult to obtain the ideal stereo guide image pair in the real scene. In addition, because the intensity difference of corresponding pixels of stereo image pair will increase under radiation changes, and IGCM has no corresponding processing measures in this regard, there are many mismatches in the estimated disparity map, as shown in the OE’ 2018 estimation results in Figs. 1012. Especially in the case of overexposure, the above errors are more serious. Because overexposure will lose some detail information (intensity overflow) compared with underexposure, and most methods based on depth learning are trained on standard images, and there are many illumination conditions, which will undoubtedly increase the complexity and convergence time of network training. Therefore, the generalization ability of most deep learning stereo matching networks is general, and the estimated disparity map often has problems of ambiguity, distortion and mismatch, as shown in the CVPR’ 2020 estimation results in Figs. 1012. On the contrary, the proposed Stereo-IA method contains global and local intensity alignment processes, and combines multi-scale ANCC and triangular mesh based bearings only measurement, which has strong robustness to low texture regions. In addition, it exploits triangular mesh based graph filter to optimize the matching cost, and shows good comprehensive performance. After intensity alignment, the intensity difference between stereo image pairs becomes very small, which significantly enhances the quality of stereo image and greatly improves the stereo matching accuracy.

Middlebury-2014 [35] is the latest test dataset. The right image of stereo image pair usually contains three radiation settings (i.e. normal im1, exposure im1E and illumination im1L). We will use the latter two settings to test the performance of different algorithms. The visual comparison results of “recycle”, “playable”, “sticks” and “pipes” in Middlebury-2014 is shown in Fig. 13. It can be seen that the over smoothing problem of deep learning method (CVPR’ 2020) is very obvious, and the image contour information has been seriously damaged. Due to the edge preservation of GF, OE’ 2018 has better performance than CVPR’ 2020 in detail, but this method has more disparity holes. The image’ 2021 method contains an intensity compensation process, which is stable on multiple datasets, but is vulnerable to depth discontinuities or occluded areas, and there is a fracture phenomenon in details. The proposed algorithm first uses the deep learning network to extract the robust support points, and then combines the multi-scale ANCC and bearings only measurement to refine results in the Delaunay-mesh, which has stronger robustness to the influence of radiation changes and depth discontinuities.

 figure: Fig. 13.

Fig. 13. Stereo matching results on Middlebury-2014.

Download Full Size | PDF

The error threshold is set to 2.0, and the performance comparison of different matching algorithms on the above seven images is shown in Fig. 14. Compared with the advanced ANCC, IGF, OE’ 2018, CVPR’ 2020 and Image’ 2021 methods, the average error rate of Stereo-IA-0 (without Delaunay-mesh filtering) decreased significantly by 42.21%, 31.47%, 26.05%, 27.89% and 12.42%. After filtering, the non-Gaussian noise in the disparity map is smoothed, and the overall error rate of Stereo-IA-1 is 18.05% lower than that of Stereo-IA-0. This proves the effectiveness of the proposed Delaunay-mesh based filtering, and which play an important role in improving the matching accuracy. Finally, quantitative experiments are carried out on Middlebury-2014 dataset, and the performance comparison is shown in Table 1. Radiation changes make the intensity of corresponding pixels between stereo image pair greatly different, and the traditional RGB, gradient and other similarity measures are prone to one to many singular matching. Especially when the information of overexposed image is lost, the traditional matching method is more difficult to restore the accurate stereo correspondence. Although the matching accuracy of deep learning network has been greatly improved after a large amount of data training, due to the lack of intensity compensation process, it often has problems such as weak generalization, fuzziness, over smoothing and false matching. As a result, the single deep learning network is only dominant in some datasets, and the average performance or generalization is weaker than the deep learning method integrating traditional constraints. However, the proposed Stereo-IA shows more stable performance, because it combines the advantages of deep learning network and traditional triangular optimization, which is very effective to eliminate matching outliers in the process of cost calculation and aggregation, so as to greatly improve the stereo matching accuracy.

 figure: Fig. 14.

Fig. 14. The average error percentages in all cases.

Download Full Size | PDF

Tables Icon

Table 1. The average performance comparison on Middlebury-2014 (Unit: %; The smaller the better)

3.2 Image enhancement

Generally, the indicators for evaluating image quality include RMSE (root mean squared error) and PSNR (peak signal to noise ratio), which are described as follows:

$$\mathrm{Mean\,square\,error\,(MSE)}: \frac{1}{{H \times W}}\sum\nolimits_{\mathbf p} {{{||{X{\mathbf p} - Y{\mathbf p}} ||}^2}}$$
$$\mathrm{Root\,mean\,square\,error (RMSE)}: \sqrt {\frac{1}{{H \times W}}\sum\nolimits_{\mathbf p} {{{||{X{\mathbf p} - Y{\mathbf p}} ||}^2}} }$$
$$\mathrm{Peak\,signal\,to\,noise\,ratio\,(PSNR)}: 10{\log _{10}}\left\{ {\frac{{{{({{2^n} - 1} )}^2}}}{{MSE}}} \right\}$$
where MSE represents the mean square error between the currently estimated image and the reference image, $H$ and $W$ represent the height and width of the image respectively. In the peak signal to noise ratio (PSNR) expression, $n$ denotes the number of bits of a single pixel (generally 8, that is, the number of gray scales of a pixel is 256). The unit of PSNR is dB, the larger the value, the smaller the distortion. Because the simple calculation based on difference profile does not conform to the evaluation results of human visual system (HVS), a new evaluation index, SSIM (structural similarity) is introduced. It is a comprehensive quality evaluation index from brightness, contrast and structure, which is described as follows:
$$\begin{array}{l}\mathrm{Structural\,similarity\,(SSIM)}: SSIM(X,Y) = l(X,Y) \cdot c(X,Y) \cdot s(X,Y)\\l(X,Y) = \frac{{2{\mu _X}{\mu _Y} + C_1}}{{\mu _X^2 + \mu _Y^2 + C_1}},\;\;\;c(X,Y) = \frac{{2{\sigma _X}{\sigma _Y} + C_2}}{{\sigma _X^2 + \sigma _Y^2 + C_2}},\;\;\;s(X,Y) = \frac{{{\sigma _{XY}} + C_3}}{{{\sigma _X}{\sigma _Y} + C_3}}.\end{array}$$
where ${\mu _X}$ and ${\mu _Y}$ represent the mean value of $X$ and Y, respectively. ${\sigma _X}$ and ${\sigma _Y}$ denote the variance of $X$ and Y, respectively. ${\sigma _{XY}}$ means the covariance of $X$ and Y. $C_1$, $C_2$, $C_3$ are constant, they are mainly to avoid the situation where the denominator is equal to zero. It usually takes $C_1 = {(K1 \ast L)^2}$, $C_2 = {(K2 \ast L)^2}$, $C_3 = {{C_2} / 2}$, and $K1 = 0.01$, $K2 = 0.03$, $L = 255$. The SSIM range is $[{0,1} ]$, the larger the value, the smaller the image distortion. In practical application, the sliding window can be used to block the image, given the total number of blocks is $N$, then calculate the SSIM of the corresponding block, and finally take the average value as the structural similarity measure of the two images, that is:
$$\mathrm{Average structural similarity (MSSIM)}: MSSIM(X,Y) = \frac{1}{N}\sum\limits_{k = 1}^N {SSIM({X_k},{Y_k})} .$$

In order to further verify the image enhancement performance of the proposed algorithm, we exploit the above evaluation indexes to evaluate the quality of the image after intensity alignment, and the results are shown in Fig. 15.

 figure: Fig. 15.

Fig. 15. Comparison before and after image enhancement.

Download Full Size | PDF

In Fig. 15, from left to right, they are Ground Truth, original image, intensity alignment images by image’ 2021, Stereo-IA-0 and Stereo-IA-1 respectively. The red arrow indicates some fracture problems, the green box means the enlarged display of some details, and the number at the bottom of the figure is the PSNR value. It can be seen that due to the influence of radiation variation, the intensity between stereo pair has changed greatly, and the PSNR value of the initial image is generally small, no more than 20. After intensity compensation, the stereo image quality is significantly improved, the PSNR value increases. However, Image’ 2021 method does not deal with the problems of low texture and depth discontinuity, and there is a obvious fracture phenomenon in the restored images. The proposed Stereo-IA combines the deep learning network and triangle optimization to significantly improve the intensity alignment accuracy and make the image enhancement effect smoother. RMSE, PSNR and MSSIM are used to evaluate the enhancement results of “Art”, “Dolls”, “Books”, “Bowling2” and “Baby2”. The comprehensive performance comparison is shown in Table 2. Our method performs best in the three evaluation indexes. Compared with the original image, the average RMSE value is reduced by 93.83%, the average PSNR value is increased by 196.32%, the average MSSIM value is increased by 69.08%, the average RMSE value before and after Delaunay-mesh based filtering is reduced by 6.76%, the average PSNR value is increased by 1.4%, and the average MSSIM value is increased by 0.85%. After the image enhancement, ORB feature matching accuracy and Mask-RCNN semantic segmentation performance all have been greatly improved, which as shown in Fig. 16. The left side of the blue arrow is the original result and the right side is the result after image enhancement. The local intensity curves of corresponding object points in “Art” as shown in Fig. 17, the perfect intensity alignment between stereo images is realized.

 figure: Fig. 16.

Fig. 16. Feature matching and semantic segmentation after image enhancement.

Download Full Size | PDF

 figure: Fig. 17.

Fig. 17. Local intensity curves of corresponding object points in “Art”.

Download Full Size | PDF

Tables Icon

Table 2. The performance comparison of image enhancement (RMSE: The smaller the better)

3.3 Running time of the algorithm

Table 3 shows the running time of advanced ANCC, AD [36], IGF, IGCM, ICIR, Cascade-stereo algorithms and proposed Stereo-IA method on Intel Core (4) i7-4558u 2.8 GHz machine. The input image size is 347×277, the candidate maximum disparity is 64. The matching cost of the first four algorithms are all calculated with a size of 19×19 window (with the best performance), because Khan et al. [14] accelerated IGCM by image integration, the running time of the IGCM was improved, and ICIR adopted 5×5 window. Cascade-stereo is a deep learning method, which needs to be accelerated by GPU parallel processing equipment. The support point extraction in Stereo-IA also needs GPU for speedy handling (average 0.15s). The Stereo-IA combines the deep learning network and the traditional Delaunay-mesh optimization model. Although the running time is slightly larger than that of IGCM and ICIR algorithms, it has a great improvement in accuracy. The fracture and blur problems in the disparity map have been improved, and the average estimation error even exceeds that of the Cascade-stereo method, while the image enhancement process takes only about 0.021s.

Tables Icon

Table 3. Runtime Comparison

4. Conclusion

Aiming at the problems that low texture and depth discontinuous areas are easy to cause matching blur and image enhancement fracture under radiation change, this paper proposes a novel stereo matching and image enhancement algorithm integrating depth learning network and image domain triangulation. The proposed method significantly enhances the texture information of stereo images, and further improves the accuracy of subsequent tasks such as image recognition. Multiple group stereo matching and image enhancement experiments show the effectiveness of the proposed method, a very effective compromise between the accuracy and computation load is achieved.

Funding

National Natural Science Foundation of China (E1102/52071102, E1102/52071108); Central University Basic Research Fund of China (3072021CFJ0410).

Acknowledgements

The authors would like to thank the editor, the associate editor and anonymous reviewers for their valuable comments that have led to improvements in the quality and presentation of this paper.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. J. C. Zhou, T. Y. Yang, W. Q. Ren, D. Zhang, and W. S. Zhang, “Underwater image restoration via depth map and illumination estimation based on a single image,” Opt. Express 29(19), 29864–29886 (2021). [CrossRef]  

2. W. Wang, C. X. Zhang, and M. K. Ng, “Variational model for simultaneously image denoising and contrast enhancement,” Opt. Express 28(13), 18751–18777 (2020). [CrossRef]  

3. Y. Tao, L. L. Dong, L. Q. Xu, and W. H. Xu, “Effective solution for underwater image enhancement,” Opt. Express 29(20), 32412–32438 (2021). [CrossRef]  

4. P. Knobelreiter, C. Sormann, A. Shekhovtsov, F. Fraundorfer, and T. Pock, “Belief propagation reloaded: learning BP-layers for labeling problems,” in Proc. CVPR (IEEE, 2020), pp. 7900–7909.

5. J. L. Schnberger, S. N. Sinha, and M. Pollefeys, “Learning to fuse proposals from multiple scanline optimizations in semi-global matching,” in Proc. ECCV (IEEE, 2018), pp. 758–764.

6. F. Zhang, V. Prisacariu, R. Yang, and P. H. S. Torr, “GA-Net: guided aggregation net for end-to-end stereo matching,” in Proc. CVPR (IEEE, 2019), pp. 185–194.

7. X. D. Gu, Z. W. Fan, S. Y. Zhu, Z. Z. Dai, F. T. Tan, and P. Tan, “Cascade cost volume for high-resolution multi-view stereo and stereo matching,” in Proc. CVPR (IEEE, 2020), pp. 2495–2504.

8. Y. Atoum, M. Ye, L. Ren, Y. Tai, and X. M. Liu, “Color-wise attention network for low-light image enhancement,” in Proc. CVPR (IEEE, 2020), pp. 506–507.

9. L. Wang, Z. Liu, W. Siu, and D. P. K. Lun, “Lightening network for low-light image enhancement,” IEEE Trans. on Image Process. 29, 7984–7996 (2020). [CrossRef]  

10. Y. Y. Qu, K. Chen, C. Liu, and Y. S. Ou, “UMLE: Unsupervised multi-discriminator network for low light enhancement,” in Proc. CVPR (IEEE, 2021).

11. Y. S. Heo, K. M. Lee, and S. U. Lee, “Robust stereo matching using adaptive normalized cross-correlation,” IEEE Trans. Pattern Anal. Mach. Intell. 33(4), 807–822 (2011). [CrossRef]  

12. V. Q. Dinh, C. C. Pham, and J. W. Jeon, “Robust adaptive normalized cross-correlation for stereo matching cost computation,” IEEE Trans. Circuits Syst. Video Technol. 27(7), 1421–1434 (2017). [CrossRef]  

13. F. F. Gu, H. Zhao, X. Zhou, J. J. Li, P. H. Bu, and Z. X. Zhao, “Photometric invariant stereo matching method,” Opt. Express 23(25), 31779–31781 (2015). [CrossRef]  

14. A. Khan, K. Muk, and C. M. Kyung, “Intensity guided cost metric for fast stereo matching under radiometric variations,” Opt. Express 26(4), 4096–4111 (2018). [CrossRef]  

15. G. A. Kordelas, D. S. Alexiadis, P. Daras, and E. Izquierdo, “Enhanced disparity estimation in stereo images,” Image and Vision Computing. 35, 31–49 (2015). [CrossRef]  

16. Y. Heo, K. Lee, and S. Lee, “Joint depth map and color consistency estimation for stereo images with different illuminations and cameras,” IEEE Trans. Pattern Anal. Mach. Intell. 35(5), 1094–1106 (2013). [CrossRef]  

17. C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images,” in Proc. 6th ICCV (IEEE, 1998), pp. 839–846.

18. K. He, S. Jian, and X. Tang, “Guided image filtering,” in Proc. ECCV (Springer, 2010), pp. 1–14.

19. C. Rhemann, A. Hosni, M. Bleyer, C. Rother, and M. Gelautz, “Fast cost-volume filtering for visual correspondence and beyond,” in Proc. CVPR (IEEE, 2011), pp. 3017–3024.

20. H. Dong, T. Wang, X. Yu, and P. Ren, “Stereo matching via dual fusion,” IEEE Signal Process. Lett. 25(5), 615–619 (2018). [CrossRef]  

21. S. Zhu and L. Yan, “Local stereo matching algorithm with efficient matching cost and adaptive guided image filter,” Vis Comput. 33(9), 1087–1102 (2017). [CrossRef]  

22. R. A. Hamzah, H. Ibrahim, and A. H. Abu Hassan, “Stereo matching algorithm based on per pixel difference adjustment, iterative guided filter and graph segmentation,” J. Vis. Commun. Image Represent. 42, 145–160 (2017). [CrossRef]  

23. C. L. Xu, C. D. Wu, D. K. Qu, H. B. Sun, and J. L. Song, “Efficient and robust unsupervised inverse intensity compensation for stereo image registration under radiometric changes,” Signal Processing: Image Communication 90, 116054 (2021). [CrossRef]  

24. D. DeTone, T. Malisiewicz, and A. Rabinovich, “SuperPoint: Self-supervised interest point detection and description,” in Proc. CVPR (IEEE, 2018), pp. 224–236.

25. P. E. Sarlin, C. Cadena, R. Siegwart, and M. Dymczyk, “From coarse to fine: Robust hierarchical localization at large scale,” in Proc. CVPR (IEEE, 2019), pp. 12716–12725.

26. D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis.60(2), 91–110 (2004). [CrossRef]  

27. E. Rublee, V. Rabaud, K. Konolige, and G. R. Bradski, “ORB: An efficient alternative to SIFT or SURF,” in Proc. ICCV (IEEE, 2011), p. 2564.

28. A. Geiger, M. Roser, and R. Urtasun, “Efficient large-scale stereo matching,” in Proc. ACCV (Springer, 2011), pp. 25–38.

29. C. Zhang, Z. Li, Y. Cheng, R. Cai, H. Chao, and Y. Rui, “MeshStereo: A global stereo model with mesh alignment regularization for view interpolation,” in Proc. ICCV (IEEE, 2015), pp. 2057–2065.

30. R. A. Jellal, M. Lange, W. Benjamin, A. Schilling, and A. Zell, “LS-ELAS: line segment based efficient large scale stereo matching,” in Proc. ICRA (IEEE, 2017), pp. 146–152.

31. C. Xu, C. Wu, D. Qu, F. Xu, H. Sun, and J. Song, “Accurate and efficient stereo matching by log-angle and pyramid-tree,” IEEE Trans. Circuits Syst. Video Technol. 31(10), 4007–4019 (2021). [CrossRef]  

32. G. P. Fickel, C. R. Jung, and T. Malzbender, “Stereo matching and view interpolation based on image domain triangulation,” IEEE Trans. on Image Process. 22(9), 3353–3365 (2013). [CrossRef]  

33. D. Scharstein and R. Szeliski, “High-accuracy stereo depth maps using structured light,” in Proc. CVPR (IEEE, 2003), pp. 195–202.

34. H. Hirschmüller and D. Scharstein, “Evaluation of cost functions for stereo matching,” in Proc. CVPR (IEEE, 2007), pp. 1–8.

35. D. Scharstein, H. Hirschmüller, Y. Kitajima, G. Krathwohl, N. Nesic, X. Wang, and P. Westling, “High-resolution stereo datasets with subpixel-accurate ground truth,” in Proc. GCPR (VMV, 2014), pp. 31–42.

36. Y. H. Kim, J. Koo, and S. Lee, “Adaptive descriptor-based robust stereo matching under radiometric changes,” Pattern Recognit. Lett 78, 41–47 (2016). [CrossRef]  

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (17)

Fig. 1.
Fig. 1. Stereo vision under radiation variation
Fig. 2.
Fig. 2. Workflow of Stereo-IA. (a) Input data is the raw stereo images acquired by imaging setups that include effect of different illuminations or exposures. (b) SuperPoint for geometric correspondence (a self-supervised network for training interest point detectors and descriptors). (c) Delaunay-mesh based belief propagation, in which we exploit strong support points extracted by SuperPoint network to achieve dense correspondence. (d, f) Stereo visual intensity alignment, in which the corresponding compensation parameters are estimated. (e) Matching cost calculation, in which we construct a robust metric combined with RGB, Gradient, Inverse and scale-ANCC. (g) Output the aligned or restored stereo images.
Fig. 3.
Fig. 3. Delaunay-mesh and triangles. In the left image, the red stars show support points extracted by SuperPoint, the blue lines make up the triangular mesh.
Fig. 4.
Fig. 4. Triangle based bearings-only metric.
Fig. 5.
Fig. 5. Fast belief propagation. It is a probability propagation model from initial disparity map estimated by bearings-only metric in triangles to the final accurate disparity map. In this way, the search range is greatly reduced to 3σ from the initial maximum value.
Fig. 6.
Fig. 6. Illustration of scale-ANCC model. It mainly consists of 36 pixels at three resolutions, these pixels are connected in a certain order. The green point means target pixel, whose pixel coordinate is (u,v) at full resolution.
Fig. 7.
Fig. 7. Optimization of the triangular mesh. The thick red edge is under consideration, black dots mean type 1 vertices, and blue squares represent type 2 vertices. The dashed lines mean the new inserted edges. (a) All type 1 vertices (no triangle created); (b) One type 1 and one type 2 vertices (one new triangle created); (c) All type 2 vertices (two new triangles created).
Fig. 8.
Fig. 8. Triangle based graph filtering and enhancement. The aligned image has the problem of fracture in detail (red arrow part). After graph filtering, a smoother image enhancement effect can be obtained. It is mainly divided into two steps. Firstly, the gray histogram of stereo image pair is calculated, and the high texture image is judged and selected as the guide image. Secondly, graph filtering is carried out in the triangular mesh sliding window.
Fig. 9.
Fig. 9. Stereo matching results before and after stereo visual intensity alignment (aggregated by CVF [19]). (a) Right image under the different illumination and exposure. It can be seen that the right image loses some texture information due to exposure influence and intensity overflow. (b) Ground Truth. (c) Matching result by ANCC (6×6) [11] cost metric. (d) Matching result by the proposed scale-ANCC (6×6) cost metric (without intensity alignment). (e) Matching result by the NCC + GF (IGCM [14]). (f) Matching result by the SIFT [26] Delaunay-mesh belief propagation and intensity alignment. (g) Matching result by the SuperPoint [24] Delaunay-mesh belief propagation and intensity alignment. (h) Triangle based graph filtering.
Fig. 10.
Fig. 10. Stereo matching under the different exposure (underexposed). From left to right: Left image, right image, Ground Truth, disparity map estimated by Cascade-stereo [4], IGCM [11] and Stereo-IA (ours), respectively.
Fig. 11.
Fig. 11. Stereo matching under the different exposure (overexposed). From left to right: Left image, right image, Ground Truth, disparity map estimated by Cascade-stereo [4], IGCM [11] and Stereo-IA (ours), respectively.
Fig. 12.
Fig. 12. Stereo matching under the different illumination and exposure. From left to right: Left image, right image, Ground Truth, disparity map estimated by Cascade-stereo [4], IGCM [11] and Stereo-IA (ours), respectively.
Fig. 13.
Fig. 13. Stereo matching results on Middlebury-2014.
Fig. 14.
Fig. 14. The average error percentages in all cases.
Fig. 15.
Fig. 15. Comparison before and after image enhancement.
Fig. 16.
Fig. 16. Feature matching and semantic segmentation after image enhancement.
Fig. 17.
Fig. 17. Local intensity curves of corresponding object points in “Art”.

Tables (3)

Tables Icon

Table 1. The average performance comparison on Middlebury-2014 (Unit: %; The smaller the better)

Tables Icon

Table 2. The performance comparison of image enhancement (RMSE: The smaller the better)

Tables Icon

Table 3. Runtime Comparison

Equations (26)

Equations on this page are rendered with MathJax. Learn more.

H L = n = 1 3 Θ ( s n t ) ,
Θ ( Δ u , Δ v , Δ d ) = tan 1 ( | Δ d | Δ u 2 + Δ v 2 ) .
C A n g l e ( p , d ) = | H H R | = n = 1 3 | Θ ( s n t ) Θ ( s n R t R ) | .
C A n g l e ( p , d ) = n = 1 3 | Θ ( s n t ) Θ ( s n R t R ) | e | d i s ( s n , t ) d i s ( s n R , t R ) | / δ .
C L o g A ( p , d ) = lg ( max ( λ L , C A n g l e ( p , d ) ) ) .
o n L { κ o n R + η Γ n } ,
E ( κ , η ) = 2 N s i = 1 N s ω i ( o i L κ o i R η Γ i ) 2 + 1 N s j = 1 N s λ i ( o ^ j L κ o ^ j R ) 2 ,
A = [ a i j ] w i t h a i j = { 2 ω i o i R + λ i o ^ i R , i f j = 1 2 ω i 255 o i R , i f j = 2 ζ = [ κ η ] T Ω = [ 2 ω 1 o 1 L + λ 1 o ^ 1 L 2 ω 2 o 2 L + λ 2 o ^ i L 2 ω N s o N s L + λ N s o ^ i L ] T .
ω i = exp ( δ i γ g ξ i γ v ) ,
p ( l ~ , p ) p ( l ~ | l i n i , p ) p ( l ~ , q | p ) ,
p ( l i n i , p ) = { exp ( ( l ~ l i n i ) 2 2 σ 2 ) + δ , i f | l ~ l i n i | < 3 σ 0 , o t h e r w i s e ,
p ( l ~ , q | p ) = { exp ( C ( p , l ~ ) / ζ ) , i f h o r ( p q ) Δ 0 , o t h e r w i s e ,
p ( l ~ , p ) = p ( l ~ | l i n i , p ) p ( l ~ , q | p ) , = ( exp ( ( l ~ l i n i ) 2 2 σ 2 ) + δ ) exp ( C ( p , l ) / ζ ) ,
C s c a l e a n c c ( p , d ) = t = 1 36 t w L ( p ) ( t I L ( p ) ϖ L ( p ) ) t w R ( q ) ( t I R ( q ) ϖ R ( q ) ) t = 1 36 ( t w L ( p ) t I L ( p ) ϖ L ( p ) ) 2 t = 1 36 ( t w R ( q ) t I R ( q ) ϖ R ( q ) ) 2 ,
t w i ( e ) i { L , R } = exp ( | t I i ( e ) I i ( e ) | 2 / δ c | t c e e | 2 / δ d ) ,
Π = t = 1 36 X t w R ( q ) ( t I R ( q ) ϖ R ( q ) ) = t = 1 36 X t w R ( q ) ( Λ ~ ( t I R ( q ) ) ϖ ~ R ( q ) ) ,
Λ ~ ( t ) = ϑ ~ ( t 255 t ) ,
ϑ ~ = ( 1 / ψ ) t w s ϑ ~ t ,
T e ( I ) = H i s 1 ( I ) + H i s 2 ( I ) + H i s 3 ( I ) H i s 0 ( I ) ,
Γ ~ ( p ) = k w ( g k , p ) Γ ( p ) ,
w ( g k , p ) = 1 N k { 1 : N } ( 1 + ( J ( p ) μ ) ( J ( g k ) μ ) σ 2 + ϵ ) ,
M e a n s q u a r e e r r o r ( M S E ) : 1 H × W p | | X p Y p | | 2
R o o t m e a n s q u a r e e r r o r ( R M S E ) : 1 H × W p | | X p Y p | | 2
P e a k s i g n a l t o n o i s e r a t i o ( P S N R ) : 10 log 10 { ( 2 n 1 ) 2 M S E }
S t r u c t u r a l s i m i l a r i t y ( S S I M ) : S S I M ( X , Y ) = l ( X , Y ) c ( X , Y ) s ( X , Y ) l ( X , Y ) = 2 μ X μ Y + C 1 μ X 2 + μ Y 2 + C 1 , c ( X , Y ) = 2 σ X σ Y + C 2 σ X 2 + σ Y 2 + C 2 , s ( X , Y ) = σ X Y + C 3 σ X σ Y + C 3 .
A v e r a g e s t r u c t u r a l s i m i l a r i t y ( M S S I M ) : M S S I M ( X , Y ) = 1 N k = 1 N S S I M ( X k , Y k ) .
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.