## Abstract

Due to the nature of involved optics, the depth of field in imaging systems is usually constricted in the field of view. As a result, we get the image with only parts of the scene in focus. To extend the depth of field, fusing the images at different focus levels is a promising approach. This paper proposes a novel multifocus image fusion approach based on clarity enhanced image segmentation and regional sparse representation. On the one hand, using clarity enhanced image that contains both intensity and clarity information, the proposed method decreases the risk of partitioning the in-focus and out-of-focus pixels in the same region. On the other hand, due to the regional selection of sparse coefficients, the proposed method strengthens its robustness to the distortions and misplacement usually resulting from pixel based coefficients selection. In short, the proposed method combines the merits of regional image fusion and sparse representation based image fusion. The experimental results demonstrate that the proposed method outperforms six recently proposed multifocus image fusion methods.

© 2013 OSA

## 1. Introduction

Image fusion [1] that combines two or more images obtained from a variety of sensors into a single composite image is becoming more and more important because the composite image has the merits of less blur and gives a better view/perspective for human and machine perception or for further image processing tasks. Multifocus image fusion comes under the category of image fusion. Nowadays, despite a variety of sensors, most of these sensors such as cameras with finite depth of field and optical microscopes cannot generate a single composite image of all relevant objects in focus, since their depth-of-focus is restricted. Meanwhile, the single image captured from any sensor cannot meet the command that all information content should be shown. In addition, studying a number of the similar images separately is a time consuming and labor intensive process. Multifocus images fusion, which integrates valuable in-focus information from a number of images of the same scene to create a synthetic image [2], is a useful method to solve the above problems. These years, multifocus image fusion technique is broadly used in many application fields such as optical microscopy [3, 4], medical imaging [5] and other fields [6, 7].

Recently, a variety of methods available to implement multifocus image fusion have been proposed. According to the nature of the processing of the input image stack, these methods can be broadly grouped into three families: region selection methods, multi-scale decomposition (MSD) methods and learning based methods [8].

Principally, the clearer the image, the more information is depicted. Region selection methods are performed according to this kind of principle. In other words, in these methods, clearer image regions of source images due to their respective sharpness are chosen to construct the final image [9]. For instance, Tian et al. [10] introduced the bilateral gradient-based sharpness criterion and presented it as a standard to choose the clearer image pixels (or blocks/regions). Zhang et al. [11] evaluated the clearness of image pixels (or blocks/regions) by the blocking-to-masking ratio (BMR). Basically, the procedure of region selection methods contains following two steps. First, the source images are partitioned by an image segmentation algorithm like normalized cuts [12], watershed-based segmentation [13] and others [14]. The chosen regions due to the different sharpness criteria are used to construct the fused image. Of course, the pixel by pixel selection and fusion can be regarded as an extreme case of region selection method, in which each region is just one pixel. But generally, the pixel based selection is so sensitive to noise that the noise may lead to the incorrect choice of pixels from the corresponding input images. The fact that images are perceived by humans in the region instead of pixels [15] also suggests the use of region based selection. For region-based methods, the segmentation algorithms are applied to partition the source images into coherent regions. However, up until now, there is not a general consensus on image segmentation and almost all the image segmentation algorithms are complicated and time consuming. More importantly, the effects of focus in source images are not considered in almost all the image segmentation methods applied in multifocus image fusion. This brings the risk of incorrect segmentation in which the pixels in focus and out of focus are partitioned in the same region owning to their similarities in features like the intensity, texture and spatial information considered in common segmentation algorithms. Because the regional image fusion methods select the building blocks from source images in a region-by-region manner, if the in-focus and out-of-focus pixels are segmented in the same region, the region with out-of-focus pixels is eventually used to build the fused image. Accordingly, the less clarity fusion result is produced.

Multiscale decomposition (MSD) methods are based on the hypothesis that the higher the frequency content in the frequency domain, the higher the corresponding contrasts in the spatial domain [8]. There are a variety of MSD based methods. The difference of different MSD based methods lies in the frequency content selection methods in the frequency domain. In the general procedure of MSD based methods, the source images are first decomposed into multiscale coefficients by different multiscale transforms. The corresponding methods are used to choose the coefficients. Finally, some fusion rules and corresponding inverse transforms are performed to form the fused image. The common multiscale transforms, such as wavelet transform [2], quaternion curvelet transform [16], contourlet transform [17], and recently popularized sparse representation (SP) [18], have been applied in multifocus image fusion problems. Since the MSD based methods can offer information on sharp contrast variations that the human visual cortex is sensitive to, they can produce the fused images of relative high quality. But the disadvantage of MSD methods is their less consideration of the spatial information in their pixel by pixel coefficients selection. That makes the MSD based methods not be able to work better in preserving the edge and texture information and leads to distortions in fused images. Another drawback of MSD methods is that there is not a global wavelet kernel or overcomplete dictionary to handle variety of scenes.

The third way of multifocus image fusion is by learning on empirical data and then making intelligent decisions for fusing different pixels or regions into an all-in-focus image [19]. Due to the difficulty in obtaining empirical data in most multifocus image fusion cases, the learning based methods are not widely used. This paper draws more attention to the region selection and MSD methods.

Considering the drawbacks of traditional regional methods and MSD methods discussed above, this paper proposes a novel approach to fuse multifocus images through the combination of the normalized cuts of clarity enhanced images and the regional selection of coefficients in the sparse representation domain. On the one hand, the clarity information is calculated by sparse representation and is combined with the source images to build a clarity enhanced image for partition; the possibility of partitioning the in-focus and out-of-focus pixels in the same region is decreased. Therefore, the regions produced by normalized cuts become more suitable for multifocus image fusion. On the other hand, due to the regional selection of coefficients, the sparse representation based method strengthens its robustness to the pixel distortions and misplacements usually resulting from pixel based coefficients selection. In short, the proposed method attempts to avoid the weaknesses of regional image fusion and MSD based image fusion and to take the advantages of both methods.

This paper is organized as follows. Section 2 gives the basic theories of normalized cuts and sparse representation for multifocus image fusion. In Section 3, our proposed regional fusion method is detailed. Section 4 concretely describes the experimental results and the performance analysis. Finally, we conclude this paper in Section 5.

## 2. Related work

#### 2.1. Normalized cuts and image fusion

Normalized cuts (Ncut), an unsupervised segmentation algorithm, is extensively used in the field of image segmentation. To understand this algorithm better, we assume an image *G* can be represented as a set of nodes *V* and is to be divided only into two sets *A* and *B* (*A* ∪ *B* = *V* and *A* ∩ *B* = Φ) by removing edges connecting these two parts. If the image needs to be divided into more parts, the same principle applies. Shi and Malik [12] put forward a method called normalized cuts (Ncut) to define a disassociation measure as follows:

*A*to all nodes in image

*G*. Here,

*w*denotes the weight value between nodes

*u*and

*v*. Similarly, $\mathit{assoc}\left(B,V\right)=\sum _{u\in B}\sum _{v\in V}w\left(u,v\right)$ is the total connection from nodes in

*B*to all nodes in image

*G*. Here, the expression is the degree of dissimilarity between these two parts A and B in graph theory. Due to the two equations:

*assoc*(

*A*,

*V*) =

*assoc*(

*A*,

*A*) +

*cut*(

*A*,

*B*) and

*assoc*(

*B*,

*V*) =

*assoc*(

*B*,

*B*) +

*cut*(

*A*,

*B*), the Eq. (1) can be transformed as follows.

*A*and

*B*but also the similarity among nodes in the same part. Hence, the optimal segmentation is obtained when the value of Ncut is minimized. The reference [12] provides a detailed algorithm to segment the images by minimization of normalize cuts.

For the purpose of image fusion, in Li’s method [15], the normalized cuts algorithm is used to partition the intermediate fused image that is obtained by averaging all source images. According to the segmentation result on the intermediate fused image, the source images are partitioned. Then the spatial frequency of each region of source images is calculated because the spatial frequency in some extent indicates the overall active level of images [20]. Finally, through comparing the spatial frequencies of the corresponding regions, the regions are selected and fused into the resulting final image. However, since the effects of focus in intermediate fused image is not taken into account, the in-focus and out-of-focus pixels incline to be partitioned in the same region if their features are similar. As a result, the inaccurate segmentation results would lead to a bad fusion result.

#### 2.2. Sparse representation and image fusion

Sparse representation (SP), which belongs to the extensions to the classical wavelet transform, has become an invaluable tool recently and is widely used in image or signal processing tasks like denoising, classification [21], compressed sensing [22], and so on. Basically, sparse representation, which extracts information of original images effectively and completely, makes the information of original images represented as a linear combination of few atoms from an overcomplete dictionary [23].

In the sparse linear model, one patch of an image can be reshaped into a column vector *v*, and

*T*is the number of atoms and

*D*= {

*d*

_{1},

*d*

_{2},...,

*d*} is the given overcomplete dictionary that can be created by using discrete cosine transforms, short-time Fourier transforms, wavelet transforms or even directly learning from some images [24].

_{T}*x*= {

*x*

_{1},

*x*

_{2},...

*x*} denotes the coefficient of

_{T}*v*according to the overcomplete dictionary. Based on the sparse representation theory, if the actual pixel values of source images are determined by the given dictionary, the count of the non-zero entries in coefficient

*x*is needed to be minimized. Assume ||

*x*||

_{0}is the number of the non-zero entries in

*x*, the above discussion can be formulated as follows.

*ε*denotes the global error. The optimization problem in (5) can be solved by systematically testing all the potential combinations of columns of

*x*[25]. In this paper, we choose the greedy algorithm, orthogonal matching pursuit (OMP), to solve the problem and the details of the OMP algorithm can be found in [26].

In traditional sparse representation based image fusion like [18], the sparse coefficients are used to represent the information of source images and the larger sum of absolute coefficients mean higher activity level of corresponding image patch. The choose-max fusion rule is then applied to combine the sparse coefficients. Finally, the fused image is constructed by using inverse transformation over the fused sparse coefficients. The drawback of such kind of methods is that the spatial information of source images is less considered in the pixel/patch wise coefficients fusion phase. And this may lead to distortions and misplacements of pixels in the fused images. As an illustrated example, we list several magnified portions of the fused images produced by traditional sparse representation method and the proposed method in Fig. 1. As shown in Fig. 1, the traditional sparse representation method may produce inhomogeneity (highlighted in Fig. 1(a)) and artifacts (highlighted in Fig. 1(c)). On the contrary, the proposed method avoids these distortions successfully (shown in Fig. 1(b) and Fig. 1(d)).

## 3. Proposed method

After studying the advantages and disadvantages of region based methods and sparse representation methods, we proposed a novel method that is based on the combination and enhancement of sparse representation and normalized cuts to the multifocus image fusion problem. The schematic diagram of our multifocus images fusion algorithm is shown in Fig. 2. For simplicity, we assume that there are only two source images A and B here. The rationale behind the proposed method applies to the fusion of more than two multifocus images. The proposed method consists of three main phases detailed as follows.

#### 3.1. Clarity measurement based on sparse representation

We use the window sliding technique to handle the two source images A and B in sequence, from left to right and from top to bottom. Each source image can be divided into a series of overlapped image patches with the size of *n* × *n*. Each image patch is translated into a vector that can be approximated by a linear combination of the fixed and known overcomplete dictionary whose number of atoms is *T*. For any image patch of a source image, the corresponding vector *v* can be expressed as follows

*d*denotes one atom of the overcomplete dictionary

_{t}*D*= [

*d*

_{1},

*d*

_{2},...,

*d*]. Assuming that each source image can be divided into

_{T}*J*patches, we can derive all the

*J*patches (their corresponding vectors) as

*S*is defined as

_{1}is the Manhattan norm. For the two source image A and B, the sparse coefficients

*S*and

_{A}*S*can be calculated by OMP algorithm according to the principle of sparse representation [23]. With

_{B}*S*and

_{A}*S*, the patch clarity levels

_{B}*C*and

_{A}*C*for image patches of source images can be derived similar to (9). Then we can get the clarity level images

_{B}*A*and

_{s}*B*, in which the clarity level of a pixel is calculated by averaging the clarity levels of all the patches that cover the pixel. Finally, the relative clarity level images

_{s}*B*′

*can be calculated by using this expression as follows:*

_{s}According to the original sparse representation based image fusion paper [18], the major advantage of sparse representation base clarity is that the sparse representation is “more effectively and completely extracting the underlying information of the original images”. Or in other word, comparing to other decomposition methods like wavelets, sparse representation can give us more accurate salient features of images, therefore the better clarity measurements. Of course, our choice of spare representation based clarity measure does not invalidate other clarity measures like the spatial frequency used in the original regional multifocus image fusion paper [15]. Other clarity measures can be applied to build the clarity enhanced image. But because the sparse coefficients are used for later image fusion step, applying sparse coefficients to calculate the clarity is direct and convenient.

#### 3.2. Segmentation based on clarity enhanced image

In traditional region based fusion methods, the image to be partitioned is usually selected as the simple average of source images. When we apply the segmentation algorithms over the average image, the focus information in source images is not considered. Therefore, the out-of-focus and in-focus pixels are very possibly partitioned in the same region in cases that they are similar in features considered in traditional segmentation algorithms. The best way to solve this problem is to let the segmentation algorithms like normalized cuts take the focus information as one of the features considered in the segmentation procedure. But this approach is algorithm-specific and needs a major revision of the segmentation algorithm.

In this paper, we use a much simpler method - using an image containing both the clarity information and the intensity information as the image to be segmented. Specifically, in our proposed regional multifocus image fusion method, the clarity enhanced image derived as follows is segmented by normalized cuts to get the regions. Because the relative clarity measure *B*′* _{s}* is in the interval [0, 1], we normalize the source images into the same interval and denote them as

*A*′ and

*B*′. The clarity enhanced image

*CC*is obtained by

*α*is designed to adjust the contribution of the relative clarity measure and the original information from source images. When the

*α*= 1, the relative clarity measure is directly selected as the clarity enhanced image. On the contrary, if

*α*= 0, the clarity enhanced image is degraded to the traditional image to be partitioned - the simple average of source images. Because the clarity enhanced image contains both the information form the source images and the clarity information generated by sparse coefficients, the risk of partitioning the in-focus pixels and out-of-focus pixels in the same region is decreased, and the better segmentation for image fusion can be obtained. As an illustrated sample, Fig. 3 showed a pair of source images, their simple average, the corresponding clarity enhanced image, and the segmentation results based on different images. Clearly the segmentation result from clarity enhanced image is much better than the one derived from the simple average of source images.

#### 3.3. Regional image fusion

According to segmentation results of *CC* in phase 2, we partition the normalized images *A*′ and *B*′ into homogeneous regions. After calculating the mean clarity of each region of *A*′ and *B*′, for regions of the corresponding position of source images, we compared their means and adopted the choose-max rule to select the regions. Then according to the selected regions, we choose the corresponding column vectors of *S _{A}* and

*S*to construct the fused sparse coefficient matrix

_{B}*S*. According to Eq. (7), by using the fused sparse coefficient matrix

_{F}*S*and the overcomplete dictionary D, the patch vectors of fused image can be derived by:

_{F}*I*is reconstructed using

_{F}*V*. We reshape each vector

_{F}*v*in

_{Fj}*V*into a patch with size

_{F}*n*×

*n*and then we combine all the image patches according to their position respectively. This basically is an inverse process of phase 1. For each pixel position, the pixel value is the sum of several patch values. And the pixel value is divided by the number of patches coving the pixel to obtain the final reconstructed result. In this phase, we use the regional selection to construct the sparse coefficient matrix and thereafter the fused image. As a result, we have taken advantage of the sparse representation, i.e., effectively and completely extracting more valuable information of original images [23]. More importantly, the regional selection instead of pixel based selection has considered more on the spatial information in source images and attempted to keep more such information in fused images.

## 4. Experimental results

In order to evaluate the performance of the proposed method, a series of experiments are conducted on eight sets of multifocus images: ‘lab’, ‘clocks’, ‘disk’, ‘leaf’, ‘newspaper’, ‘pepsi’, ‘aircraft’ and ‘bottle’ as shown in Fig. 4. Each set contains 2 source images with different depths of focus. For example, in Fig. 4(a), since the focus is on the clock, the clock is clear in vision while the student is blurred. But for Fig. 4(b), the situations for the clock and the student are switched. The other sets of images in Fig. 4 have similar situations. Our target to fuse the images with different depths of focus into clearer image with extended depth of fields.

For the purpose of comparison, apart from the proposed method, several recently proposed multifocus image fusion methods are also performed on the same sets of source images. These methods are listed as follows:

- Multifocus image fusion based on sparse representation [18]: This is the traditional sparse representation based multifocus image fusion discussed in section 2.2.
- Multifocus image fusion based on region segmentation and spatial frequency [15]: This is a typical region based method of multifocus image fusion. Normalized cut is used to segment the intermediate fused image that is obtained by using the simple average method to the source image. According to their spatial frequencies, the fused image can be composited [15].
- Multifocus image fusion based on homogeneity similarity [2]: In this method, the initial fused image, which is processed by using multi-resolution image fusion method, is then improved by using the homogeneity similarity. The fused image can be obtained by weighting the neighborhood pixels of the point of source images [2].
- Multifocus image fusion based on sum-modified-Laplacian [17]: This is a typical MSD method of multifocus image fusion. In this method, Sharp Frequency Localized Contour let Transform (SFLCT), which is one of multi-scale transformation, is used. The fused image can be obtained by using Sum-modified-Laplacian (SML) to distinguish SFLCT coefficients from the clear parts or from blurry parts [17].

For the compared methods, the parameters are set the same as the ones used in corresponding references [2, 10, 11, 15, 17, 18]. The parameters of the proposed method are set as the block size of 8x8, the global error *ε* = 0.01 and the number of regions *nbSegments* = 15. In order to obtain the best fused image, different values of *α* in Eq. (6) and (7) have been tried between 0 and 1 till the maximum *Q ^{AB/F}* obtained. Here the

*Q*is an index that shows the amount of edge information transferred from the input images to the fused images [27]. We will provide more explanations for it later. The image segmentation tool provided by the original authors of Normalized cuts [12] only needs the users to set one parameter, i.e., the segmentation number -

^{AB/F}*nbSegments*. We set the

*nbSegments*= 15 by experience. In our experiments, this setting gives good segmentation results. But to choose the best

*nbSegments*or the best segmentation results, one possible approach is testing different values of

*nbSegments*and finding the best one according to the fusion quality index like

*Q*. Of course, this will increase the computational burden.

^{AB/F}The resulting fused images obtained by compared methods and the proposed method are illustrated in Fig. 5, Fig. 6, Fig. 7, Fig. 8, Fig. 9, Fig. 10, Fig. 11, and Fig. 12. According to these figures, it is clear that the methods 1, 3, 4 and the proposed method generally provides better visual effect than the methods 2, 5 and 6 in terms of clarity of available tent and edges. For instance, incorrect placement appears in the position of the label in the fused image in Fig. 10(b), Fig. 10(f), and Fig. 10(g). Similar results are obtained on the other seven examples. From the analysis, we can conclude that our method and the compared methods 1, 3, 4 generally outperform the compared methods 2, 5 and 6 in the subjective (or qualitative) evaluation. The segmentations results with clarity enhanced images and the traditional segmentation results over average images are also illustrated in Fig. 5–12. These results by clarity enhanced image exhibit more correlations with focal information pertained to source images.

Objective (or quantitative) evaluation is also included in our experiments. For a clearer comparison, two image fusion evaluation criteria, discussed in the following section, are used to provide objective performance comparison in this paper.

#### 4.1. Q^{AB/F}

The first criterion is the *Q ^{AB/F}* proposed by C. Xydeas and V. Petrovic in [27]. This criterion considers the amount of edge information transferred from the input images to the fused images and the final index is constructed by the gradient information of the image. In the

*Q*method, a Sobel edge detector is used to calculate the edge strength and orientation information at each pixel in both the source and fused images [15]. Then the relative edge strengthened orientation information of source images with respect to fused image are calculated and accumulated to measure the image activities kept from source images to fused images. Further details can be found from reference [27]. In this paper,

^{AB/F}*Q*is used to show the total transferred information during the multifocus image fusion. The larger

^{AB/F}*Q*value indicates that more edge information from the source images are transferred to the fused image while the smaller

^{AB/F}*Q*value indicates that less edge information from the source images are transferred to the fused image and more edge information is lost.

^{AB/F}#### 4.2. The average correlation coefficient between blocks of ground truth and blocks of fused image

The second criterion used to quantitatively compare different fusion methods is derived from the correlations between the fused image and the ground truth regions (actual focused regions) in the source images. Specifically, the criterion is defined as the average correlation coefficient between two matrices. The first matrix is the matrix of the corresponding ground truth transformation and the second matrix is that of the corresponding fused image transformation. For the source images A and B of size *m* × *n*, we pull two clear image blocks from them. We can pull two clear image blocks from them by observing the source images. And each image block size is *cm* × *cn*(*cm* < *m*, *cn* < *n*). Then blocks of the same size and same location are pulled from the fused image F. The correlation coefficient between ground truth images and fused image is given by:

*Ā*,

*B̄*and

*F̄*are the average or mean of matrix elements. The average correlation coefficient between blocks of ground truth and fused image is calculated as: Since a good multifocus fusion method should keep as much as possible information in the in-focus regions from source images to the fused image, the larger average correlation coefficient value shows a better fusion method.

For the image sets used in this paper, different ground truth blocks of different pairs of source are shown in Fig. 4. For all image sets tested in this paper, the *Q ^{AB/F}* and the average correlation coefficient between blocks of ground truth and blocks of fusion image are listed in Table 1 and 2, where the values in bold indicate the highest quality measures obtained by different fusion methods. It can be seen in Table. 1 and 2, according to the two quantitative performance criteria, that the proposed method is better than other methods in all testing image sets. Comprehensively considering all the results we have obtained, the conclusion can be safely drawn that the quantitative evaluation results coincide with the visual effect very well and our method can provide the best performance in the experiments.

Table 3 lists the *α* values finally selected for different image sets in the proposed method according to the *Q ^{AB/F}*. Because all the

*α*have the values larger than zero, we know the embedding of clarity information into the image to be partitioned can bring better image fusion results. One additional interesting finding is that in the 8 testing image sets, 6 ones choose the

*α*= 1, i.e., directly taking the relative clarity measure as the clarity enhanced image being partitioned. This can be explained by Fig. 3; comparing the source images and the relative clarity measure (the clarity enhanced image in this case), we know the relative clarity measure already contained enough information like the texture and edge information from source images, and the supplementary combination with source image is not necessary. But just as the Fig. 13 illustrated, sometimes, the simple relative clarity measure is still not enough (in this case, the important information in the white box is missing), we need to add information from source images to get a better candidate which can produce nice partitions for image fusion. Anyway, considering the computation cost of selecting different

*α*, one simplified version of the proposed method is directly using relative clarity measure as the image being partitioned to obtain the regional information for regional sparse coefficients selection.

The proposed method shows better performance than several existing methods in our experiments, but it also has some limitations. One prominent limitation is the intensive computation problem. Both the normalized cuts for image segmentation and the orthogonal matching pursuit algorithm for sparse representation require heavy computation. Although this problem is alleviated by the development of hardware, the proposed method is not so suitable to the computing constrained environments. Another pitfall of the proposed method is the possible inaccurate segmentation results. Although the well-known normalized cuts algorithm is applied to partition the clarity enhanced image, due to the difficulty of the image segmentation task itself, we cannot avoid the possibility of having some images not easy to be well segmented and some inaccurate segmentation results are obtained. In the long run, the less than satisfactory fused images may be produced. For this image segmentation related limitation, we suggest the comparison between the traditional sparse representation based multifocus image fusion and the proposed method according to the fusion quality index like the *Q ^{AB/F}*. If the proposed method produce a image with better index, we accept it. Otherwise, we take the result of traditional sparse representation because it is indeed an extreme case of the proposed method that segments the images into regions containing just one pixel.

## 5. Conclusion

In this paper, a new multifocus image fusion approach is proposed to generate clearer image with extended depth of field. The new method combines the region based image fusion with the sparse representation and takes the merits of both methods. Firstly, the clarity enhanced image is constructed by using the sparse representation’s coefficients of source images and the average image. Then, according to the segmentation result of the clarity enhanced image that is partitioned by normalized cuts, the fused sparse coefficient matrix is constructed. Finally, after the inverse transformation of the fused sparse coefficient matrix, the fused image can be obtained. Several pairs of multifocus images are used to test the performance of our proposed methods. We use six different existing methods to process the same multifocus images and compare their performance with the proposed method. The experimental results demonstrate that the proposed method outperforms the six existing multifocus image fusion methods in terms of visual quality and objective evaluation. The proposed multifocus image fusion method is a good choice for extending depth of field and obtaining image with more valuable in-focus information from a number of images of the same scene.

The results of this paper point to several interesting directions for future work. One direction is to obtain better partition results for regional fusion, applying other clarity measurements in clarity concerned image segmentation and/or directly embed the clarity measurements into the segmentation procedures of normalized cuts algorithm. As an extension of traditional sparse representation, the structured sparse representation recently has attracted a lot of attention [28][29]. Therefore, another interesting direction of future work is to combine the clarity enhanced image segmentation and structured sparse representation for multifocus image fusion and to check if the structured sparse representation can bring better image fusion results.

## Acknowledgments

This work is supported in part by the National Fundamental Research 973 Program of China under grant 2011CB302801, Macau Science and Technology Development Fund grant number 008/2010/A1, and University of Macau Start-up Research Grant and Multi-year Research Grants.

## References and links

**1. **H. Li, B. Manjunath, and S. K. Mitra, “Multisensor image fusion using the wavelet transform,” Graph. Model. Im. Proc. **57**(3), 235–245 (1995) [CrossRef]

**2. **H. Li, Y. Chai, H. Yin, and G. Liu, “Multifocus image fusion and denoising scheme based on homogeneity similarity,” Opt. Commun. **285**(2), 91–100 (2012). [CrossRef]

**3. **Y. Song, M. Li, Q. Li, and L. Sun, “A new wavelet based multi-focus image fusion scheme and its application on optical microscopy,” in *Proceedings of IEEE Conference on Robotics and Biomimetics* (Institute of Electrical and Electronics Engineers, Kunming, China, 2006), pp. 401–405.

**4. **Y. Chen, L. Wang, Z. Sun, Y. Jiang, and G. Zhai, “Fusion of color microscopic images based on bidimensional empirical mode decomposition,” Opt. Express **18**(21), 21757–21769 (2010). [CrossRef]

**5. **Q. Guihong, Z. Dali, and Y. Pingfan, “Medical image fusion by wavelet transform modulus maxima,” Opt. Express **9**(4), 184–190 (2001). [CrossRef]

**6. **T. Stathaki, *Image Fusion: Algorithms and Applications* (Academic Press, 2008).

**7. **X. Bai, F. Zhou, and B. Xue, “Fusion of infrared and visual images through region extraction by using multi-scale center-surround top-hat transform,” Opt. Express **19**(9), 8444–8457 (2011). [CrossRef]

**8. **H. Hariharan, “Extending Depth of Field via Multifocus Fusion,” PhD Thesis, The University of Tennessee, Knoxville, 2011.

**9. **H. B. Mitchell, *Image Fusion: Theories, Techniques and Applications* (Springer, 2010). [CrossRef]

**10. **J. Tian, L. Chen, L. Ma, and W. Yu, “Multi-focus image fusion using a bilateral gradient-based sharpness criterion,” Opt. Commun. **284**(1), 80–87 (2011). [CrossRef]

**11. **Y. Zhang and L. Ge, “Efficient fusion scheme for multi-focus images by using blurring measure,” Digital Sig. Process. **19**(2), 186–193 (2009). [CrossRef]

**12. **J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell. **22**(8), 888–905 (2000) [CrossRef]

**13. **A. Bleau and L.J. Leon, “Watershed-based segmentation and region merging” Comput. Vis. Image Und. **77**(3), 317–370 (2000). [CrossRef]

**14. **N.R. Pal and S.K. Pal, “A review on image segmentation techniques” Pattern Recogn. **26**(9), 1277–1294 (1993) [CrossRef]

**15. **S. Li and B. Yang, “Multifocus image fusion using region segmentation and spatial frequency,” Image Vis. Comput. **26**(7), 971–979 (2008). [CrossRef]

**16. **L. Guo, M. Dai, and M. Zhu, “Multifocus color image fusion based on quaternion curvelet transform,” Opt. Express **20**(17), 18846–18860 (2012). [CrossRef]

**17. **X. Qu, J. Yan, and G. Yang, “Multifocus image fusion method of sharp frequency localized contourlet transform domain based on sum-modified-laplacian,” Opt. Precis. Eng. **17**(5), 1203–1212 (2009).

**18. **B. Yang and S. Li, “Multifocus image fusion and restoration with sparse representation,” IEEE Trans. Instrum. Meas. **59**(4), 884–892 (2010). [CrossRef]

**19. **Z. Wang, Y. Ma, and J. Gu, “Multi-focus image fusion using PCNN,” Pattern Recogn. **43**(6), 2003–2016 (2010). [CrossRef]

**20. **S. Li, J. T. Kwok, and Y. Wang, “Combination of images with diverse focuses using the spatial frequency,” Inf. Fusion **26**(7), 169–176 (2001). [CrossRef]

**21. **K. Huang and S. Aviyente, “Sparse representation for signal classification,” Adv. Neural Inf. Process. Syst. **19**, 609–616 (2007).

**22. **D. L. Donoho, “Compressed sensing,” IEEE Trans. Inform. Theory. **52**(4), 1289–1306 (2006). [CrossRef]

**23. **B. A. Olshausen, “Emergence of simple-cell receptive field properties by learning a sparse code for natural images,” Nature (London) **381**, 607–609 (1996). [CrossRef]

**24. **R. Rubinstein, A. M. Bruckstein, and M. Elad, “Dictionaries for sparse representation modeling,” Proc. IEEE **98**(6), 1045–1057 (2010). [CrossRef]

**25. **G. Davis, S. Mallat, and M. Avellaneda, “Adaptive greedy approximations,” Constr. Approx. **13**(1), 57–98 (1997).

**26. **M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: an algorithm for designing overcomplete dictionaries for sparse sepresentation,” IEEE Trans. Sig. Proces. **54**, (11)4311–4322 (2006) [CrossRef]

**27. **C. Xydeas and V. Petrovic, “Objective image fusion performance measure,” Electron. Lett. **36**(4), 308–309 (2000). [CrossRef]

**28. **J. Huang, T. Zhang, and D. Metaxas, “Learning with structured sparsity,” Proceedings of the 26th Annual International Conference on Machine Learning , 417–424 (2009).

**29. **J. Huang, X. Huang, and D. Metaxas, “Learning with dynamic group sparsity,” Proceedings of the 12th International Conference on Computer Vision , 64–71 (2009).