Obtaining accurate disparity values in textureless and texture-free regions is a very challenging task. To solve this problem, we present a novel algorithm. First, we use the guided filter method to fuse the color cost volume and the gradient cost volume. Second, we use three types of image category information to merge the different scale disparity maps and obtain the primary disparity map. Third, during the disparity refinement procedure, we also utilize the three types of category information to define different support regions and assign different weights for pixels remaining to be refined. Extensive experiments show that the performance of our method is not inferior to many state-of-the-art methods on the Middlebury data set.
© 2019 Optical Society of America
Stereo matching plays an important role in many applications, including 3D reconstruction, 3D scanning, and medical imaging. According to , stereo-matching algorithms can be mainly classified into local and global methods. Global stereo methods skip the cost aggregation step. They use belief propagation  or graph cut  to minimize an energy function and constrain the disparity smoothness between two neighbor pixels. Global methods achieve higher accuracy, but the execution time is long. Local methods use local features to determine each pixel’s disparity. The matching accuracy of the local method is lower than that of the global method, but it benefits from a fast running time, so it has attracted a lot of attention. In recent years, the semi-global method (SGM) and the data-driven method (deep learning) have also been widely used to solve the stereo-matching problem. Compared with these, the local method is simpler in principle and implementation. Therefore, in our work, we focus on computing an accurate disparity map with the local stereo-matching strategy.
Most local stereo-matching methods consist of four steps: cost computation, cost aggregation, disparity computation, and disparity refinement. Most stereo-matching algorithms use intensity-based initial cost measurements, for instance, the sum of absolute differences (SAD) and the sum of squared differences (SSD). But these methods are sensitive to noise and radiometric distortions . Fortunately, the normalized cross correlations (NCC) , gradient, rank , census transforms , and convolutional neural networks (CNN)  have been widely used. Certainly, in order to obtain more robust cost, some methods combine different costs [9–11].
The cost-aggregation procedure is another significant procedure for stereo matching, because cost aggregation can restrain the noise interference in the cost volume. Many aggregation methods have been proposed and evaluated . Yoon and Kweon  proposed an adaptive support weight (ASW) method based on color and distance information at the cost aggregation step, and the weight assigned to each pixel is variable. The edge effect is much better than the SAD algorithm, and ASW achieved results comparable to those of global methods. After the ASW, many edge-aware filters were introduced to smooth the cost volume. Yang  used improved bilateral filtering to filter the cost volume. Rhemann et al.  used a very fast edge-preserving filter (guided filter ) to smooth the cost volume. It used the mean and variance information to calculate the filtering weight, which introduced the structure information of the image. The matching result is more accurate, and the noise point is less than that of the ASW algorithm. After Rhemann, many methods used the guided filter or its variant to proceed with the cost aggregation [17–19]. Besides the support weight, some scholars pay more attention to the support region shape [20–22]. Veksler  presented a variable window method by choosing a useful range of window shapes/sizes for evaluation. Zhang et al.  constructed a cross-skeleton support region for every pixel. Shi et al.  calculated the adaptive support window according to each segmentation region based on the color correlation between adjacent pixels. In order to consider more global information, some segment tree methods were proposed [23,24]. Yang  proposed a non-local cost aggregation method, which extended the kernel size to the entire image. By computing a minimum spanning tree (MST) over the image graph, the non-local cost aggregation method can perform extremely fast. Mei et al.  followed the non-local cost aggregation idea and enforced the disparity consistency by using a segment tree instead of MST. Zhang et al.  proposed a cross-scale cost aggregation (CSCA) stereo-matching method. Compared with the single-scale cost aggregation, this multiscale constraint strengthened the consistency of inter-scale cost volume and behaved well in textureless regions.
Disparity refinement is designed to improve the matching accuracy in occluded regions and low-texture regions. The left-right consistency check (LRC) and the left-right filling strategy  are commonly used to detect and refine outliers. Huang and Zhang  used belief aggregation for outlier detection and belief propagation for disparity filling. Mei et al.  detected and classified pixels into occlusions and mismatches and then used an iterative region voting strategy to interpolate these outliers. Bilateral filtering and weighted median filtering  were also employed for disparity refinement.
Though many methods have been proposed to improve the matching accuracy, the low-accuracy problem in texture-free regions has not been solved very well. To improve the matching accuracy in texture-free regions, we propose a local stereo-matching method based on multi-scale fusion and multi-type support regions. There are some contributions for our method.
- (1) We propose an adaptive proportion cost combination method, which uses the guided filter to combine the color and gradient costs adaptively without considering the original image. Compared with the fixed proportion combination methods [16–19], our combination method is more flexible, as it can retain more correct cost values in texture-free regions. Furthermore, our method simplifies the cost combination and the guided filter cost aggregation procedures [16–19] into one cost fusion procedure.
- (2) Different from the cross-scale cost aggregation method (CSCA) , we prefer to merge the disparity maps under different scales rather than merge the costs under different scales. This procedure can obtain a primary disparity map with less incorrect pixels in texture-free regions.
- (3) In the disparity refinement procedure, we propose one multi-type support region refinement method, which considers the local and global information comprehensively. The four types of support regions are determined by the pixel category information. Our refinement method is more valid than the ordinary single left-right consistency check and filling refinement methods.
Figure 1 shows the flowchart of our algorithm. The red lines denote the cost computation step, the green lines denote the cost aggregation step, the blue lines denote the disparity computation step, and the black lines denote the disparity refinement step. The purple lines denote the proposed pixel category step.
The paper is organized as follows: Section 1 is the introduction. The proposed method is described in Section 2. Corresponding to the four steps of stereo matching, Section 2.A is the cost computation procedure, Section 2.B is the cost aggregation procedure, Section 2.C is the disparity computation procedure, and Section 2.D is the disparity refinement procedure. Experiments and comparison results are presented in Section 3. Finally, we come to the conclusion in Section 4.
2. PROPOSED METHOD
A. Cost Computation Using Color and Gradient Differences
The common cost measurements include absolute difference (AD), gradient amplitude, and non-parametric transformations such as rank and census. They all have their own strengths and weaknesses and apply to different backgrounds. In order to synthesize the advantages of different algorithms, the matching cost combination method is widely used. Zhu and Li  proposed an improved gradient cost function that incorporates the gradient phase because the gradient vector and phase have different invariance properties with respect to the radiometric distortion. Mei et al.  proposed a new cost measure by combining AD and census to achieve an impressive performance. Zhan et al.  proposed a novel double-RGB gradient model using the guidance image. In our method, we use the color and gradient differences to compute the matching cost.
Given a pixel in the left image with a disparity d, the corresponding pixel in the right image is . First, we compute the color and gradient cost values and as follows:19,29], and is the derivative in the direction. The direction is denoted as . and are the left and right gray images.
B. Cost Aggregation by Cost Volume Fusion
After obtaining the color and gradient cost volumes, many cost computation methods use the fixed proportion [Eq. (5)] to merge these different cost volumes [15,19,29–31]. The fixed proportion merging method is simple, but it is difficult to obtain the optimal merging parameter . Different from these methods, we use the guided filter to merge these two cost volumes as follows:2, and the gradient cost volume is more accurate in texture-free regions as shown in the yellow rectangle regions in Fig. 2.
Based on these characteristics, we assume that the aggregated cost volume has relationships with the color cost volume and the gradient cost volume as shown in Eqs. (6) and (7). To minimize the energy function [Eq. (8)], we can solve the guided filter parameters in Eqs. (9) and (10) and compute the fused cost volume as follows:
Figure 3 contains the disparity maps under the fixed proportion cost merging method of  and our cost fusion method. The disparity maps obtained by our cost volume are more correct, especially in texture-free regions (the red rectangle areas).
C. Primary Disparity Computation Based on Multi-Scale Disparity Merging
1. Multi-Scale Disparity Map Computation
CSCA  proved that cross-scale cost aggregation behaved well in textureless regions. Inspired by that, we use multi-scale images in our method. We use the original and the half-scale images to compute the cost volume and obtain the disparity maps corresponding to each scale. We record the zoom parameter as scale. The block sizes [ in Eq. (6)] of the cost volume fusion process under two scales are and . Figure 4(b) shows the disparity maps obtained under the half-scale image (the disparity map has been resized to the same size as the original image). In red rectangle regions on the plastic image, the error disparities of the half-scale image are less than the original scale. But, in the red rectangle regions on the Tsukuba image, the fine structural information under the half scale is not better than that of the original scale image. To get a primary disparity map that has less incorrect values in large texture-free regions and retains fine structural information, we propose our disparity merging method based on the pixel category.
2. Multi-Scale Disparity Maps Merging Based on Pixel Category
In our merging method, we define three pixel categories: segmentation category, texture category, and left-right consistency (LRC) category. We use the mean-shift segmentation method  to obtain the segmentation category image [as shown in Fig. 5(b)]. Pixels in the same segmentation region have the same label.
We use Eq. (11) on the segmentation image to produce the texture category image. In the texture category image, pixels with a value of 1 [white pixels in Fig. 5(c)] lie in the texture-free regions, and pixels with a value smaller than 1 lie in the texture regions;
Besides the previous two categories, we also use the LRC check to classify the pixels into stable and unstable points. Figure 5(d) is the LRC category image. Pixels with a value of 1 (white pixels in LRC category image) fail the LRC check, and pixels with a value of 0 (black pixels in LRC category image) pass the LRC check.
In the end, for pixels whose texture category values equal 1 and fail the LRC check simultaneously, we choose the small scale disparity values as the primary disparity values. For other pixels, we choose the original scale disparity values as the primary disparity values. Figure 4(c) shows the primary disparity maps by our disparity map merging method. Compared with Figs. 4(a) and 4(b), our primary disparity maps retain fine structural information and have less incorrect values in texture-free regions.
D. Disparity Refinement
1. Edge Disparity Optimization Based on Multi-Information Weights
The disparity map usually suffers from the edge foreground expansion problem. To solve this problem, we first use the Canny edge detection method on the first filled disparity map to find the disparity edge [Fig. 6(b)]. Then we use the dilate operation to expand the edge lines into edge regions. In Fig. 6(d), the green lines are the ground truth disparity edges, and the black regions are the disparity edge regions extracted by us. The green lines are surrounded by our disparity edge regions. Therefore, operating the cost aggregation on the detected disparity edge regions can once again obtain more accurate edge disparity values.
In order to limit the time consumed, for the disparity edge pixels we do not proceed with the second cost aggregation on the whole cost volume. As shown in Fig. 7, for an edge point (the blue pixel), the effective disparity values in its support region (the red rectangle) are 5 and 8 on the first filled disparity map, so we prefer to proceed with the aggregation operation on the cost volume only from disparity of 5 to disparity of 8 instead of the whole cost volume from disparity 1 to 25 (the disparity level).
The cost aggregation methods can be regarded as a filter operation in Eq. (13). Color weight in Eq. (14) and spatial distance weight in Eq. (15) are always used to assign the filter weights. In our method, we introduce another disparity weight in Eq. (16) into the weight assignment procedure. The three image categories (obtained in Section 2.C.2) are also used in the weight calculation procedure, written as follows:16) the larger the , the smaller the . So the disparity factor can restrain the edge foreground expansion problem. In different cases, different information is used to assign weights. For edge pixels that do not lie in the texture-free regions, the weight calculation method is Eq. (17) as follows:
For edge pixels the lie in the texture-free regions, the weight calculation method is Eq. (18). The disparity values in texture-free regions are usually low error values, and the parameter can assign low weight for the pixels with incorrect low disparity values as follows:8(b) shows the edge-optimized result using color and spatial distance weights. Figure 8(c) shows the optimized results using our weight assignment method. In the red rectangle disparity edge areas, our disparities are more accurate.
2. Disparity Map Filling Based on Multi-Type Support Regions
The ordinary left-right consistency check (LRC) is important for disparity refinement, but there are some disadvantages for the LRC check and filling strategy.
First, the LRC check is always not correct enough: pixels that have passed the LRC check may not have the correct disparity, and pixels that have failed the LRC check may have the correct disparity.
Second, the ordinary LRC only pays attention to the horizontal direction and ignores other surrounding pixels’ disparity information.
Third, the perceptual range of the ordinary LRC is limited; it cannot collect enough global information to deal with the texture-free regions.
To overcome all these disadvantages, we define four types of support regions based on the three pixel categories defined in Section 2.C.2. Figure 9 shows the four types of support regions. The horizontal support region is the ordinary LRC check region; the eight scan lines support region and the rectangle support region pay more attentio to the local information. The cross-support region pays more attention to the global information.
In the horizontal support region, we find two pixels that have passed the LRC check first in the left side and the right side, and then we calculate the minimum and maximum disparity value of these two pixels and record them as and .
In the eight scan lines support region, we find eight pixels that have passed the LRC check first in eight directions. Then, among these eight pixels, we find the pixel that is the most similar to the pixel that failed the center LRC check and record the disparity of it as . The similarity is calculated by Eq. (19) as follows
In the rectangle support region, we compute the median disparity value for the pixels that lie in the same segmentation region as the center pixel and record it as . We also compute the median disparity value for the pixels that have passed the LRC check and simultaneously lie in the same segmentation region as the center pixel, and we record it as .
The half-arm length of the cross-support region is . In this support region, we compute the median disparity value for the pixels that lie in the same segmentation region as the center pixel and record it as . After that, we further compute the median disparity value and the largest disparity value for the pixels that have passed the LRC check and simultaneously lie in the same segmentation region as the center pixel. Then we record them as and .
Our disparity refinement contains two disparity filling steps. The first filling step mainly pays attention to the local disparity information, so the horizontal support region, the eight scan lines support region, and the rectangle support region are used. The concrete procedure for the first disparity filling is shown in Table 1.
In Table 1, is one of the candidate pixels in the rectangle support region . means that pixel does not lie in the texture-free region. means that pixel lies in the same segmentation region as pixel . means that pixel has passed the LRC check. The operation counts the pixel number, satisfying the conditions in the brackets. If the amount of pixels that lie in the same region as the center pixel and have also passed the LRC check is not larger than the threshold , the is less likely to be the correct disparity value. Considering this, we set the operations from line 9 to line 13.
The second filling step mainly pays attention to the global disparity information, so the horizontal support region and the cross-support region are used. In order to deal with texture-free regions, we divide pixels into four categories: pixels that have failed the LRC check and also lie in the texture-free region, pixels that have failed the LRC check and also lie in the texture region, pixels that have passed the LRC check and also lie in the texture-free region, and pixels that have passed the LRC check and also lie in the texture region. For these four pixel categories, we use different the filling strategies listed in Table 2.
In Table 2, the settings of in line 5 and line 19 are to restrain low incorrect values in foreground areas and high incorrect values in background areas. In the cross-support region, if the amount of pixels that lie in the same region as the center pixel and have also passed the LRC check is not larger than the threshold , the value is less likely to be the correct disparity value. Considering this, we set the operations from line 11 to line 15. In line 24, for the pixels that pass the LRC check, the setting of can avoid the unnecessary change.
Figure 10 shows the results of each refinement step in our method. For the first and second filling steps in Figs. 10(b) and 10(d), the filled disparity maps have less error disparities than the previous non-filled disparity maps in Figs. 10(a) and 10(c).
3. EXPERIMENT AND ANALYSIS
A. Experimental Environment and Parameter Setting
In order to evaluate the performance of the proposed method, we carry out our experiments with an Intel i5-7300HQ 2.5 GHz CPU and 8 GB RAM environment, and the development tool is MATLAB 2016a. There are some parameters in our experiments, and they are listed in Table 3. In Table 3, wid and hei are image width and height, respectively. The scale of 0.5 means that we use two scale images: the original scale image and the half-scale image. The win is the basic block size in our method, and many other block sizes are based on it. The and are the same as in [19,30]. Figure 11 shows the results by our method.
The complexity of the cost fusion procedure is similar to that of the CostFilter  method, because we all use the guided filter for cost aggregation. The complexity of the disparity refinement is mainly decided by the disparity edge-optimization procedure, which proceeds the second cost aggregation procedure on the disparity edge pixels. Table 4 shows the corresponding time consumption of the images in Fig. 11. In Table 4, the time of the small image Midd1 is close to the time of the large image Adirondack; this is because the edge-optimization procedure will find and optimize more disparity edge pixels in the texture-free regions. The whole matching time for the Midd1 image is 258 s: the disparity edge-optimization procedure takes 151 s, and the cost fusion procedure is 4 s.
B. Comparison Experiments
As a method based on the guided filter, our method can be regarded as an improvement on the CostFilter method , so we first compare our method with CostFilter. Table 5 is the final error results of our method and the CostFilter method. We do the comparison experiments on the four Middlebury 2.0 test images. Compared with the CostFilter method, our method obtains a lower error rate in three images, and our method’s average error rate is also lower than that of the CostFilter method.
Our method is a multi-scale method that is inspired by the CSCA method. But different from the CSCA method, instead of merging costs under different scales, we merge disparity maps under different scales. Figure 12 shows the comparison result between CSCA and our method. All disparity maps in Fig. 12 do not have disparity refinement. We can see in the texture-free regions (the red rectangle regions) that our disparity maps have more correct disparity values than CSCA. This proves that our multi-scale disparity map merging method is more suitable for texture-free regions than the CSCA method.
The PatchMatch-based methods have attracted the attention of many scholars, and this kind of method currently achieves the best result in the non-deep-learning methods. Therefore, we also compare our method with some PatchMatch-based methods and other classical methods: PatchMatch Belief Propagation (PMBP) , Speed-up PatchMatch Belief Propagation (SPM-BP) , Graph Cut based continuous stereo matching using Locally Shared Labels (GCLSL) , Segment Tree (ST) , Local Expansion (LocalExp) , PatchMatch (PM) , and PatchMatch-based Superpixel Cut (PMSC) . Since most of the methods are designed for low-resolution images, we run the corresponding experiments on the Middlebury 2006 data set, which consists of 21 low-resolution image pairs. Figures 13 and 14 are part of the image results.
As shown in Figs. 13 and 14, the PM, PatchMatch Filter (PMF), PMBP, SPM-BP, and GCLSL methods are not good at dealing with texture-free regions: they have many error disparities (the red points in these two figures) in the texture-free regions. Table 6 shows the eight methods’ error rates on the 21 image pairs of Middlebury 2006, with 1 pixel error threshold. Error rates are evaluated on non-occluded regions. Our method reaches the best accuracy in 11 out of the 21 image pairs, and the average error rate is also the lowest. The PMSC and LocalExp methods used the pixel-matching cost from Matching Cost with a Convolutional Neural Network (MC-CNN) , but their average error rates are also not superior to ours, especially for images containing large texture-free regions like Midd1, Midd2, Monopoly, and Plastic in Table 6.
The Middlebury 2006 data sets contain low-resolution images, so we also evaluate our method on the Middlebury 2014 data sets, which have high-resolution images. There are many state-of-the-art methods on the Middlebury website, including the local method, global method, semi-global method, and data-driven (deep learning) method. Because our method is not a data-driven method, we compare our method with the newest non-data-driven methods: Confidence Map based 3D cost aggregation with multiple Minimum Spanning Trees (3DMST-CM) , Coalesced Bidirectional Matching Volume Robost vision challenge (CBMV-ROB) , Dense and robust image registration by shift Adapted Weighted Aggregation (DAWA-F) , Fusing Adaptive Support Weights (FASW) , Improvement of Stereo Matching (ISM) , PieceWise Cost Aggregation Semi-Global Matching (PWCA-SGM) , Segment-based Disparity Refinement (SDR) , Adaptive Weighted bilateral filter Processing on Stereo Matching (SM-AWP) , Sparse Representation for Suitable and selective Stereo Matching (SMSSR) , Two-branch Convolutional Sparse Coding Stereo Matching (TCSCSM) , and Ref. . All these methods were proposed after the year 2018. The qualitative and quantitative comparison results are shown in Fig. 15 and Table 7. During our experiments, we used 1p perfect images (Adirondack, Jade Plant, Motorcycle, Piano, Pipes, Playroom, PlaytableP, Recycle, Shelves, and Teddy) without any interfering factors. In Table 7, our method achieved the fourth-lowest error rate among all twelve methods. This proves that our method is not inferior to the mainstream methods on large images.
In this paper, we present a novel accurate stereo-matching approach with multi-scale fusion and multi-type support region. We propose a fusion-based cost aggregation method and a multi-scale disparity map merging strategy. Under this method, we can obtain as many correct disparity values as possible in texture-free regions for the primary disparity map. During the disparity refinement procedure, we define three types of support regions that can consider both the local and global information to refine the disparity map. Furthermore, we also define a new weight assignment strategy to refine the disparity values in edge regions. Evaluation shows that the proposed method obtains highly accurate disparity maps, and it is currently superior to many state-of-the-art methods on the Middlebury 2006 data set. There are certainly still some disadvantages to our method; as a type of local method, our method does not obtain the highest score on high-resolution Middlebury 2014 stereo images, and the error disparities always lie in the slant regions with obvious depth discontinuity. In the future, we would like to solve these problems.
We thank all editors and reviewers for their work to improve this paper.
1. D. Scharstein, R. Szeliski, and R. Zabih, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” Int. J. Comput. Vis. 47, 7–42 (2002). [CrossRef]
2. J. Sun, N. N. Zheng, and H. Y. Shum, “Stereo matching using belief propagation,” IEEE Trans. Pattern Anal. Mach. Intell. 25, 787–800 (2003). [CrossRef]
3. Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” IEEE Trans. Pattern Anal. Mach. Intell. 23, 1222–1239 (2001). [CrossRef]
4. R. A. Hamzah, H. Ibrahim, and A. H. A. Hassan, “Stereo matching algorithm based on illumination control to improve the accuracy,” Image Anal. Stereol. 35, 39–52 (2016). [CrossRef]
5. K. Briechle and U. D. Hanebeck, “Template matching using fast normalized cross correlation,” Proc. SPIE 4387, 95–102 (2001). [CrossRef]
6. G. Zhao, Y. Du, and Y. Tang, “Adaptive rank transform for stereo matching,” in International Conference on Intelligent Robotics and Applications (2011), pp. 95–104.
7. R. Zabih and J. Woodfill, “Non-parametric local transforms for computing visual correspondence,” in European Conference on Computer Vision (1994), pp. 151–158.
8. J. Bontar and Y. Lecun, “Stereo matching by training a convolutional neural network to compare image patches,” J. Mach. Learn. Res. 17, 2287–2318 (2016).
9. S. Zhu and Z. Li, “Local stereo matching using combined matching cost and adaptive cost aggregation,” TIIS 9, 224–241 (2015). [CrossRef]
10. X. Mei, X. Sun, M. Zhou, S. Jiao, H. Wang, and X. Zhang, “On building an accurate stereo matching system on graphics hardware,” in IEEE International Conference on Computer Vision Workshops (IEEE, 2012), pp. 467–474.
11. Y. Zhan, Y. Gu, K. Huang, C. Zhang, and K. Hu, “Accurate image-guided stereo matching with efficient matching cost and disparity refinement,” IEEE Trans. Circuits Syst. Video Technol. 26, 1632–1645 (2015). [CrossRef]
12. F. Tombari, S. Mattoccia, L. D. Stefano, and E. Addimanda, “Classification and evaluation of cost aggregation methods for stereo correspondence,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2008), pp. 1–8.
13. K. J. Yoon and I. S. Kweon, “Adaptive support-weight approach for correspondence search,” IEEE Trans. Pattern Anal. Mach. Intell. 28, 650–656 (2006). [CrossRef]
14. Q. Yang, “Hardware-efficient bilateral filtering for stereo matching,” IEEE Trans. Pattern Anal. Mach. Intell. 36, 1026–1032 (2014). [CrossRef]
15. C. Rhemann, A. Hosni, and M. Bleyer, “Fast cost-volume filtering for visual correspondence and beyond,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2011), pp. 3017–3024.
16. K. He, J. Sun, and X. Tang, “Guided image filtering,” in European Conference on Computer Vision (2010), pp. 1–14.
17. S. Zhu and L. Yan, “Local stereo matching algorithm with efficient matching cost and adaptive guided image filter,” Vis. Comput. 33, 1087–1102 (2017). [CrossRef]
18. G. S. Hong and B. G. Kim, “A local stereo matching algorithm based on weighted guided image filtering for improving the generation of depth range image,” Displays 49, 80–87 (2017). [CrossRef]
19. H. Ma, S. Zheng, C. Li, Y. Li, L. Gui, and R. Huang, “Cross-scale cost aggregation integrating intra-scale smoothness constraint with weighted least squares in stereo matching,” J. Opt. Soc. Am. A 34, 648–656 (2017). [CrossRef]
20. O. Veksler, “Fast variable window for stereo correspondence using integral images,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2003), Vol. 1, pp. I-556–I-561.
21. K. Zhang, J. Lu, and G. Lafruit, “Cross-based local stereo matching using orthogonal integral images,” IEEE Trans. Circuits Syst. Video Technol. 19, 1073–1079 (2009). [CrossRef]
22. H. Shi, H. Zhu, J. Wang, S. Y. Yu, and Z. F. Fu, “Segment-based adaptive window and multi-feature fusion for stereo matching,” J. Algorithms Comput. Technol. 10, 3–200 (2016). [CrossRef]
23. Q. Yang, “A non-local cost aggregation method for stereo matching,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2012), pp. 1402–1409.
24. X. Mei, X. Sun, W. Dong, H. Wang, and X. Zhang, “Segment-tree based cost aggregation for stereo matching,” in Computer Vision and Pattern Recognition (IEEE, 2013), pp. 313–320.
25. K. Zhang, Y. Fang, D. Min, L. Sun, S. Yang, S. Yan, and Q. Tian, “Cross-scale cost aggregation for stereo matching,” in Computer Vision and Pattern Recognition (2014), pp. 1590–1597.
26. G. Egnal, M. Mintz, and R. P. Wildes, “A stereo confidence metric using single view imagery with comparison to five alternative approaches,” Image Vision Comput. 22, 943–957 (2004). [CrossRef]
27. X. Huang and Y. J. Zhang, “An O(1) disparity refinement method for stereo matching,” Pattern Recogn. 55, 198–206 (2016). [CrossRef]
28. Z. Ma, K. He, Y. Wei, J. Sun, and E. Wu, “Constant time weighted median filtering for stereo matching and beyond,” in IEEE International Conference on Computer Vision (2014), pp. 49–56.
29. Y. Li, D. Min, M. S. Brown, M. N. Do, and J. Lu, “SPM-BP: sped-up PatchMatch belief propagation for continuous MRFs,” in International Conference on Computer Vision (ICCV) (2015), pp. 4006–4014.
30. C. Lei and Y. H. Yang, “Optical flow estimation on coarse-to-fine region-trees using discrete optimization,” in IEEE International Conference on Computer Vision (ICCV) (2009), pp. 1562–1569.
31. T. Brox and J. Malik, “Large displacement optical flow: descriptor matching in variational motion estimation,” IEEE Trans. Pattern Anal. Mach. Intell. 33, 500–513 (2011). [CrossRef]
32. D. Comaniciu and P. Meer, “Mean shift: a robust approach toward feature space analysis,” IEEE Trans. Pattern Anal. Mach. Intell. 24, 603–619 (2002). [CrossRef]
33. F. Besse, C. Rother, A. Fitzgibbon, and J. Kautz, “PMBP: PatchMatch belief propagation for correspondence field estimation,” Int. J. Comput. Vis. 110, 2–13 (2014) [CrossRef]
34. T. Taniai, Y. Matsushita, and T. Naemura, “Graph cut based continuous stereo matching using locally shared labels,” in Conference on Computer Vision and Pattern Recognition (2014), pp. 1613–1620.
35. T. Taniai, Y. Matsushita, Y. Sato, and T. Naemura, “Continuous 3D label stereo matching using local expansion moves,” IEEE Trans. Pattern Anal. Mach. Intell. 40, 2725–2739 (2018). [CrossRef]
36. M. Bleyer, C. Rhemann, and C. Rother, “PatchMatch stereo–stereo matching with slanted support windows,” in British Machine Vision Conference (BMVA) (2011), pp. 1–11.
37. L. Li, S. Zhang, X. Yu, and L. Zhang, “PMSC: patchmatch-based superpixel cut for accurate stereo matching,” IEEE Trans. Circuits Syst. Video Technol. 28, 679–692 (2018). [CrossRef]
38. Y. Xiao, D. Xu, G. Wang, X. Hu, Y. Zhang, X. Ji, and L. Zhang, “Confidence map based 3D cost aggregation with multiple minimum spanning trees for stereo matching,” in International Conference on Computer Analysis of Images and Patterns (CAIP) (submitted).
39. K. Batsos, C. Cai, and P. Mordohai, “CBMV: a coalesced bidirectional matching volume for disparity estimation,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018), pp. 2060–2069.
40. J. Navarro and A. Buades, “Dense and robust image registration by shift adapted weighted aggregation and variational completion,” Image and Vision Computing (submitted).
41. W. Wu, H. Zhu, S. Yu, and J. Shi, “Stereo matching with fusing adaptive support weights,” IEEE Access 7, 61960–61974 (2019). [CrossRef]
42. R. Hamzah, A. Kadmin, M. Hamid, S. Fakhar, A. Ghani, and H. Ibrahim, “Improvement of stereo matching algorithm for 3D surface reconstruction,” Signal Process. Image Commun. 65, 165–172 (2018). [CrossRef]
43. H. Li, Y. Sun, and L. Sun, “Edge-preserved disparity estimation with piecewise cost aggregation,” International Journal of Geo-Information (submitted).
44. T. Yan, Y. Gan, Z. Xia, and Q. Zhao, “Segment-based disparity refinement with occlusion handling for stereo matching,” IEEE Trans. Image Process. 28, 3885–3897 (2019). [CrossRef]
45. S. Safwana Abd Razak, M. Othman, and A. Kadmin, “The effect of adaptive weighted bilateral filter on stereo matching algorithm,” Int. J. Eng. Adv. Technol. 8, C5839028319 (2019).
46. H. Li and C. Cheng, “Adaptive weighted matching cost based on sparse representation,” IEEE Transactions on Image Processing (submitted).
47. C. Cheng, H. Li, and L. Zhang, “A new stereo matching cost based on two-branch convolutional sparse coding and sparse representation,” IEEE Transactions on Image Processing (submitted).
48. S. Patil, T. Prakash, B. Comandur, and A. Kak, “A comparative evaluation of SGM variants for dense stereo matching,” IEEE Transactions on Pattern Analysis and Machine Intelligence (submitted).