We present a new multi-focus image fusion method based on dictionary learning with a rolling guidance filter to fusion of multi-focus images with registration and mis-registration. First, we learn a dictionary via several classical multi-focus images blurred by a rolling guidance filter. Subsequently, we present a new model for focus regions identification via applying the learned dictionary to input images to obtain the corresponding focus feature maps. Then, we determine the initial decision map via comparing the difference of the focus feature maps. The latter is to optimize the initial decision map and perform it on input images to obtain fused images. Experimental results demonstrate that the suggested algorithm is competitive with the current state of the art and superior to some representative methods when input images are well registered and mis-registered.
© 2017 Optical Society of America
Since the depth of focus is limited in optical lenses of conventional cameras, it cannot obtain an image that contains all the objects in focus. To address this issue, image fusion technology has been developed, more accurately, multi-focus image fusion technology, in which multiple images taken with diverse focal length are integrated to obtain an all-in-focus image . The main purpose for image fusion is that all of the important salient visual information of input images should be well preserved in the fused image. In particular, for multi-focus image fusion, combine two or more images of the same scene taken with different focus settings into a single all-in-focus image with extended depth of the field .
By far, many multi-focus image fusion algorithms have been proposed [1–8]. Most of these methods make use of the information within the local regions of input images when performing fusion. This usually can work well for input images with perfect registration. Nonetheless, input images are not well registered under normal circumstances; some isolated regions with pregnant detail information are usually regarded as defocusing content and selected during the fusion process. This will result in an unpleasing effect in the fused results, such as spatial inconsistency or block artifacts that will appear. Further, the mis-registration often occurs within the set of multi-focus images because of the relative movement between the object and camera or their difference of focal points. In this paper, we will propose a new multi-focus image fusion method to address the fusion issues about the mis-registration of multi-focus images. In the proposed method, the information from each local and spatial neighboring region will be collaboratively applied to distinguish the focused regions from the multi-focus images. In this paper, we will present a dictionary learning with rolling guidance filter (DLRGF)-based focus measurement model to address the fusion of mis-registration and block effects. For the recently proposed multi-focus methods [9–14] based on dictionary learning or sparse representation, most of these commonly obtain fused images via fusing the spare coefficients directly, generally introduce unpleasing artifacts and lead to spatial inconsistencies occurring in the fused images for obviously mis-registered input images.
In this paper, we show a new direction to understand image blur (e.g., image defocusing) via dictionary-based sparse representation based on external data. Specially, we found that when decomposing local image patches into dictionary atoms in an additive manner, a clear-image dictionary and a blur-image dictionary show visually different results. The salient difference demonstrates that dictionary atoms can characterize structure in noticeable blur images thus amplifying the inherent difference between slight blur and clear regions. It is very similar to the relationship between focused regions and defocused regions. Motivated by this feature, we propose a simple but expressive salient feature extraction model with the purpose of measuring the focused regions in multi-focus images. Our method is different from the presented fusion methods [15–17] based on sparse representation and dictionary-based sparse representation in two aspects. First, the sparse coefficients are used to determine the focused and defocused regions in the multi-focus images, unlike these existing methods, using the inverse transform of the fused sparse coefficients to obtain fused images. Second, we use the sparse representation obtained from a learned dictionary trained by several classical multi-focus images blurred via rolling guidance filter, to achieve feature extraction and classification of pixels as focused or unfocused ones. To testify to the feasibility and effectiveness of our proposed method, several experiments are conducted using several sets of classical data under five objective quality metrics. Experimental results demonstrate that our proposed method can compete with the current state-of-the-art methods and outperform the classical multi-scale transform (MST) and multi-scale transform sparse representation (MST-SR) methods for registered and mis-registered images, in terms of visual and quantitative evaluations.
The rest of this paper is organized as follows. Section 2 briefly reviews some related work. Section 3 details the proposed focus measurement model and its application to multi-focus image fusion. Experimental results and conclusions are given in Sections 4 and 5, respectively.
2. RELATED WORK
During the past few years, many multi-focus image fusion methods have been proposed in the literature. These are mainly divided into transform and spatial domain methods, respectively . Among these methods, the MST is one of the most popular kinds of methods . The classical image fusion methods include pyramid decomposition  wavelet transform , shearlet transform , and non-subsampled contourlet transform (NSCT) . In recent years, a new MST method was presented based on hybrid dual-tree complex wavelet transform (HDTCWT)  and hybrid dual-tree complex wavelet transform and support vector machine (HDTCWT-SVM) . In contrast, this new fusion method generally achieves better results than the classical MST methods for the input images with well registration and mis-registration.
For the spatial-domain-based methods, the earliest fusion method calculates the average of source images pixel by pixel, which usually caused undesirable artifacts in the fusion results . In recent years, some block and region-based fusion methods have also been proposed [21–23]. In particular, some state-of-the-art pixel fusion methods based on guided filtering (GF) , image matting , dense scale invariant transform (DSIFT) , Quadtree and weighted focus-measure (Quadtree) , and guided-filter-based difference image  have been presented. Generally, these methods perform well for extracting image details and providing spatial consistency.
In addition, with the high application value of the SR and dictionary learning theories, they are widely used in image processing, including multi-focus image fusion [9–14,26,27]. By exploring the characteristics of the sparse coefficients of source images, Yang et al.  took the first step for applying the sparse representation theory to image fusion. To overcome the related disadvantages of the MST- and SR-based methods, Liu et al. proposed a general framework MST-SR for image fusion by taking the complementary advantages of MST and SR . Namely, in this MST-SR frame, the low-pass MST bands were merged with the SR-based scheme while the high-pass bands were fused using the conventional “max-absolute” rule. As a result, the contrast in the fused image is improved and the difficulty in determining decomposition level can be well solved. Moreover, they also presented an adaptive sparse-representation-based image fusion method in . In [12–14], models of dictionary-based sparse representation are also presented. Experimental results have verified that the SR- and dictionary-learning-based fusion methods usually are superior to the traditional MST-based fusion methods in the literature.
3. PROPOSED MULTI-FOCUS IMAGE FUSION METHOD
The human visual system has the ability to distinguish blur defocus regions (or patches) from focus regions (or patches) in a multi-focus image. It implies a potential foundation to construct a system based on multi-focus image examples in focus or defocus states, which can automatically distinguish the defocus from the focus regions. We follow the dictionary-learning-based sparse representation of several classical multi-focus images to determine the defocus regions from the focus regions in multi-focus images. Furthermore, we propose a multi-focus image fusion method based on DLRGF; the overall framework is shown in Fig. 1. From Fig. 1, we can clearly see the main process of the proposed DLRGF-based fusion method. First, we use several classical multi-focus images blurred by a rolling guidance filter to learn a dictionary, which is constructed via offline learning. Then we employ the learned dictionary on the input images to obtain focus region feature maps. Subsequently, we obtain the initial decision map via comparing the differences of the focus regions feature map. The latter is to optimize the initial decision map to obtain a final optimized decision map. Finally, the fused image is obtained via the final decision map.
A. Rolling Guidance Filter and Sparse Dictionary
1. Rolling Guidance Filter
Images contain many levels of important structures and edges. In the majority of existing research about edge preserving filters, scale-aware local operations were seldom considered in a practical approach. To address this problem, Zhang et al.  proposed the rolling guidance filter to filter images with the complete control of detail smoothing under a scale measure. It is based on a rolling guidance implemented in an iterative manner. Assume that the input image is represented by , , and index pixel coordinates in the images. The rolling guidance filter is expressed as
2. Sparse Dictionary
Sparse representation  commonly is described as follows. Given a set of signals , each signal will formally be written as a sparse number of dictionary atoms as
In natural image decomposition, we collect overlapped image patches as input. Each image patch is vectorized as in Eq. (5), and a dictionary is trained on all the image patches decomposed from the input image. Based on the constructed dictionary, each image patch is decomposed into a few atoms and their corresponding non-zero coefficients, forming the reconstructed feature via Eq. (5).
Considering inherent discrepancy between the two types of dictionaries trained by clear images and blurred images, we propose a new blur metric and apply it into multi-focus image fusion. First, we learn a blur-image dictionary via Eq. (5). The dictionary is learned over 60 patches randomly cropped from four multi-focus images blurred by the rolling guidance filter. Since the filtering results of the classical multi-focus images are blurred by the rolling guidance filter, they have similar visual effect with the corresponding defocused regions (or patches) in the multi-focus images, as shown in Fig. 2. In addition, it can control the level of details during filtering and automatically refine edges that can be preserved in order to preserve large-scale structures optimally as well. We believe learning a dictionary from images blurred by a rolling guidance filter can efficiently represent the defocused regions in multi-focus images. In the stage of learning the dictionary in our proposed method, the spatial and range weights of rolling guidance filtering with , , and iteration number , respectively, the corresponding to the used dictionary atoms is set to 5 in patch decomposition and the total dictionary size is , the patch size is . We have also tried other choices, including increasing and decreasing these six parameters. It is found that this configuration is sufficient for our feature construction. Moreover, the classical multi-focus images used to learn the dictionary is shown in Fig. 3.
After is learned, it is applied to all image patches of input images, both blur and clear (defocus and focus), for defocus identification. For each new patch , we use another sparse representation to decompose it into basic atoms. It can be described as follows:6) is solved by orthogonal matching pursuit . The output atoms and corresponding coefficients reflect whether the input is defocused or not. We build sparsity feature for the input image patch . It is expressed as
B. Image Classification via Sparse Representation
Given the input multi-focus images and , the proposed multi-focus image fusion method based on dictionary learning with rolling guidance filter is described as follows. The focus feature map can be calculated as follows:9) is calculated by Eq. (7), is corresponded to the input image , and, likewise, is corresponded to the input image . Since the discrepancy between the focus regions and defocus regions in the focus feature map obtained via Eq. (8) is not very obvious, to amplify this discrepancy, the rolling guidance filter is applied to smooth the feature map again. And it can be obtained as follows:
In this subsection, we choose the parameters as , , and .
C. Decision Map Regularization
After the good classification map is obtained via Eq. (10), the initial decision fusion map can be easily calculated as follows:
Since the decision map is also not aligned with object boundaries and contains some small focused regions as well as some small holes surrounded by the focused region, we applied a simple post-processing approach to process these warts. Namely, the closing operator of mathematical morphology imclose with a disk structuring is used to obtain an ideal fusion decision map , and it can be described as follows:
Given the ideal decision map , the fused image can be calculated by
In this section, we will confirm the effectiveness and superiority of the proposed DLRGF fusion method for both registered and mis-registered images. Three sets of representative multi-focus images are selected to conduct experiments, as shown in Fig. 4. The first row of Fig. 4 displays three different multi-focus images in which the focus of attention is on the left part of each image. The second row in Fig. 4 displays the corresponding input images with the focus of attention on the right part. Among these three sets of images, the first set [Figs. 4(a1) and 4(b1)] is perfectly registered images, and the other two pairs are mis-registered images. Figures 5, 7, and 9 show the fusion results obtained by different methods on the three sets of input images in Fig. 4. Moreover, Figs. 6, 8, and 10 give the corresponding difference images between each of the fused images in Figs. 5, 7, and 9 and the input image in Figs. 4(a1), 4(b2), and 4(b3), respectively. In addition, we will compare our proposed method with other existing representative or state-of-the-art fusion methods based on NSCT , adaptive sparse representation (ASR) , NSCT-SR , GF , and DSIFT . As well, to evaluate the performance of the proposed method and the compared methods, five representative image fusion metrics are selected as follows: , , , , and . The default parameters used in the related papers are adopted for these fusion metrics. computes how well the salient features of the source images are preserved. is used to evaluate how much edge information is transferred from the source images to the fused image. measures how the original information from the source images is preserved in the fused image. measures how well the structural information of the source images is preserved. is a human-perception-based fusion metric that utilizes the major features in the human visual system model. Furthermore, we also provide the normalized difference , defined by Eq. (14). Here denotes the difference between the fused image and one of the input images. and denote the maximum and minimum values of the difference image :
To ensure the reliability of our proposed method, the multi-focus images used in our experiments are selected based on some public image datasets. These images have been used in many related public papers [1,3,21,24] and are available at . Furthermore, the images used in Figs. 2 and 3 are available at  and .
A. Experimental Results and Discussion
From the three tests of Figs. 6, 8, and 10, it is obvious that the DLRGF method is superior to the others when the input images are either well registered or not. For an example of well-registered input images, as shown in Fig. 5, serious spatial inconsistencies are still introduced in the fusion results obtained by NSCT, ASR, NSCT-SR, GF, and DSIFT [see the rectangular regions in Figs. 6(a)–6(e)]. However, hardly any spatial inconsistencies are introduced in the fusion result obtained by DLRGF, as shown in Fig. 5(f). In addition, Figs. 7 and 9 give two examinations of input images with mis-registeration, as well.
From Fig. 7, it can be clearly see that some defocused image regions are mistakenly introduced into the fusion results obtained by NSCT, ASR, and NSCT-SR. Therefore, the fused images also produce some spatial inconsistency, especially on the boundaries of the mis-registered objects [e.g., the right clock; see the rectangular regions in Figs. 8(a)–8(c)]. The fusion results obtained by GF and DSIFT do not produce spatial inconsistency in the regions of the “right clock.” However, other regions of “the desk” also produce slight spatial inconsistency in the fusion results obtained by these two methods, as shown in the rectangular regions in Figs. 8(d) and 8(e). By contrast, these artifacts are not introduced in the fusion result obtained by DLRGF [see the rectangular regions in Fig. 8(f)].
At the same time, we can obviously see that spatial inconsistency is contained in the fusion results obtained by the NSCT, ASR, NSCT-SR, and GF methods from Fig. 9, especially in the regions of mis-registered objects [e.g., the head of the student; see in the quadrate regions in Figs. 10(a)–10(d)]. In addition, the fusion result obtained by DSIFT also produces slight spatial inconsistency [see in the quadrate regions in Fig. 10(e)]. However, the fusion result obtained by the DLRGF method may not introduce any spatial inconsistency, as shown in the quadrate regions in Fig. 10(f).
Moreover, Table 1 gives the experimental data. The highest values in Table 1 are shown in bold. From Table 1, it can be seen that DLRGF is more robust to input images with registration and mis-registration, and also shows that the DLRGF is superior to the classical fusion method (e.g., NSCT) and can compete with the state-of-the-art fusion method (e.g., GF, DSIFT) in terms of significant extracted information and spatial consistency. As a whole, it can be concluded that, based on our experiments, the proposed method presents competitive fusion performance as compared with previous methods, both in visual comparisons and objective evaluations.
In this paper, we propose a new multi-focus image fusion method based on dictionary learning with a rolling guidance filter for well-registered and mis-registered images. In the proposed method, via the dictionary learning approach, several classical multi-focus images blurred by the rolling guidance filter served as the learning images. Then the learned dictionary is performed on the input images to construct a focus measurement model to produce a focus regions map. Subsequently, we compare the focus regions maps of the input images to determine the initial decision map. Finally, the initial decision map is optimized by a morphology operator to obtain a satisfied decision map. Experimental results demonstrate that the proposed method is competitive with the current state of the art and superior to some representative methods when input images are well registered and mis-registered. However, how to improve the efficiency of the proposed method and extend the proposed method to other image fusion fields, such as infrared and visible image fusion and medical image fusion, are the future research directions.
National Natural Science Foundation of China (NSFC) (61401343); Open Research Fund of Key Laboratory of Spectral Imaging Technology, Chinese Academy of Sciences (CAS) (LSIT201503).
1. Y. Liu, S. Liu, and Z. Wang, “Multi-focus image fusion with dense SIFT,” Inf. Fusion 23, 139–155 (2015). [CrossRef]
2. Q. Zhang and B. Guo, “Multifocus image fusion using the nonsubsampled contourlet transform,” Signal Process. 89, 1334–1346 (2009). [CrossRef]
3. S. Li, X. Kang, J. Hu, and B. Yang, “Image matting for fusion of multi-focus images in dynamic scenes,” Inf. Fusion 14, 147–162 (2013). [CrossRef]
4. X. Bai, Y. Zhang, F. Zhou, and B. Xue, “Quadtree-based multi-focus image fusion using a weighted focus-measure,” Inf. Fusion 22, 105–118 (2015). [CrossRef]
5. T. Wan, C. Zhu, and Z. Qin, “Multifocus image fusion based on robust principal component analysis,” Pattern Recognit. Lett. 34, 1001–1008 (2013). [CrossRef]
6. Z. Zhou, S. Li, and B. Wang, “Multi-scale weighted gradient-based fusion for multi-focus images,” Inf. Fusion 20, 60–72 (2014). [CrossRef]
7. Y. Liu, J. Jin, Q. Wang, Y. Shen, and X. Dong, “Region level based multi-focus image fusion using quaternion wavelet and normalized cut,” Signal Process. 97, 9–30 (2014). [CrossRef]
8. N. Kausar and A. Majid, “Random forest-based scheme using feature and decision levels information for multi-focus image fusion,” Pattern Anal. Appl. 19, 221–236 (2016). [CrossRef]
9. B. Yang and S. Li, “Multifocus image fusion and restoration with sparse representation,” IEEE Trans. Instrum. Meas. 59, 884–892 (2010). [CrossRef]
10. Y. Liu, S. Liu, and Z. Wang, “A general framework for image fusion based on multi-scale transform and sparse representation,” Inf. Fusion 24, 147–164 (2015). [CrossRef]
11. Y. Liu and Z. Wang, “Simultaneous image fusion and denoising with adaptive sparse representation,” IET Image Process. 9, 347–357 (2015). [CrossRef]
12. M. Nejati, S. Samavi, and S. Shirani, “Multi-focus image fusion using dictionary-based sparse representation,” Inf. Fusion 25, 72–84 (2015). [CrossRef]
13. M. Kim, D. K. Han, and H. Ko, “Joint patch clustering-based dictionary learning for multimodal image fusion,” Inf. Fusion 27, 198–214 (2016). [CrossRef]
14. Y. Li, F. Li, B. Bai, and Q. Shen, “Image fusion via nonlocal sparse K-SVD dictionary learning,” Appl. Opt. 55, 1814–1823 (2016). [CrossRef]
15. S. Li, B. Yang, and J. Hu, “Performance comparison of different multiresolution transforms for image fusion,” Inf. Fusion 12, 74–84 (2011). [CrossRef]
16. V. N. Gangapure, S. Banerjee, and A. S. Chowdhury, “Steerable local frequency based multispectral multifocus image fusion,” Inf. Fusion 23, 99–115 (2015). [CrossRef]
17. H. Li, B. Manjunath, and S. Mitra, “Multisensor image fusion using the wavelet transform,” Graph. Models Image Process. 57, 235–245 (1995). [CrossRef]
18. Q. Miao, C. Shi, P. Xu, M. Yang, and Y. Shi, “A novel algorithm of image fusion using shearlets,” Opt. Commun. 284, 1540–1547 (2011). [CrossRef]
19. Y. Yang, S. Tong, S. Huang, and P. Lin, “Dual-tree complex wavelet transform and image block residual-based multi-focus image fusion in visual sensor networks,” Sensors 14, 22408–22430 (2014). [CrossRef]
20. B. Yu, B. Jia, L. Ding, Z. Cai, Q. Wu, R. Law, J. Huang, L. Song, and S. Fu, “Hybrid dual-tree complex wavelet transform and support vector machine for digital multi-focus image fusion,” Neurocomputing 182, 1–9 (2016). [CrossRef]
21. S. Li and B. Yang, “Multifocus image fusion using region segmentation and spatial frequency,” Image Vis. Comput. 26, 971–979 (2008). [CrossRef]
22. V. Aslantas and R. Kurban, “Fusion of multi-focus images using differential evolution algorithm,” Expert Syst. Appl. 37, 8861–8870 (2010). [CrossRef]
23. X. Xia, S. Fang, and Y. Xiao, “High resolution image fusion algorithm based on multi-focused region extraction,” Pattern Recognit. Lett. 45, 115–120 (2014). [CrossRef]
24. S. Li, X. Kang, and J. Hu, “Image fusion with guided filtering,” IEEE Trans. Image Process. 22, 2864–2875 (2013). [CrossRef]
25. X. Yan, H. Qin, J. Li, H. Zhou, and T. Yang, “Multi-focus image fusion using a guided-filter-based difference image,” Appl. Opt. 55, 2230–2239 (2016). [CrossRef]
26. H. Yin, Y. Li, Y. Chai, Z. Liu, and Z. Zhu, “A novel sparse-representation-based multi-focus image fusion approach,” Neurocomputing 216, 216–229 (2016). [CrossRef]
27. Q. Zhang and M. Levine, “Robust multi-focus image fusion using multi-task sparse representation and spatial context,” IEEE Trans. Image Process. 25, 2045–2058 (2016). [CrossRef]
28. Q. Zhang, X. Shen, L. Xu, and J. Jia, “Rolling guidance filter,” in European Conference on Computer Vision (ECCV, 2014), pp. 815–830.
29. M. Aharon, M. Elad, and A. Bruckstein, “SVD: an algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Trans. Signal Process. 54, 4311–4322 (2006). [CrossRef]
30. A. M. Bruckstein, D. L. Donoho, and M. Elad, “From sparse solutions of systems of equations to sparse modeling of signals and images,” SIAM Rev. 51, 34–81 (2009). [CrossRef]
31. J. Zhao, R. Laganiere, and Z. Liu, “Performance assessment of combinative pixel-level image fusion based on an absolute feature measurement,” Int. J. Innovat. Comput. Inf. Control 6, 1433–1447 (2007).
32. C. S. Xydeas and V. S. Petrovic, “Objective image fusion performance measure,” Electron. Lett. 36, 308–309 (2000). [CrossRef]
33. M. Hossny, S. Nahavandi, and D. Creighton, “Comments on ‘information measure for performance of image fusion’,” Electron. Lett. 44, 1066–1067 (2008). [CrossRef]
34. C. Yang, J. Zhang, X. Wang, and X. Liu, “A novel similarity based quality metric for image fusion,” Inf. Fusion 9, 156–160 (2008). [CrossRef]
35. Y. Chen and R. Blum, “A new automated quality assessment algorithm for image fusion,” Image Vis. Comput. 27, 1421–1432 (2009). [CrossRef]