Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Scale-invariant pattern recognition using a combined Mellin radial harmonic function and the bidimensional empirical mode decomposition

Open Access Open Access

Abstract

A novel scale and shift invariant pattern recognition method is proposed to improve the discrimination capability and noise robustness by combining the bidimensional empirical mode decomposition with the Mellin radial harmonic decomposition. The flatness of its peak intensity response versus scale change is improved. This property is important, since we can detect a large range of scaled patterns (from 0.2 to 1) using a global threshold. Within this range, the correlation peak intensity is relatively uniform with a variance below 20%. This proposed filter has been tested experimentally to confirm the result from numerical simulation for cases both with and without input white noise.

©2009 Optical Society of America

1. Introduction

A major goal of the pattern recognition is to find methods that could provide some distortion-invariant recognition, i.e. the detection process has low sensitivity to the deformations of input objects. Typically, three deformations are considered: position, rotation, and scale changes of the input. The scale-invariant pattern recognition is a more difficult problem and requires special attention owing to the difference in energy for different scales and no inherent periodicity.

In 1988, Mendlovic introduced the Mellin radial harmonic function into a filter design, which is called the RHF filter, for scale-invariant pattern recognition [1]. However, with varying scale of the input pattern, this method results in variations of correlation peak intensity (CPI). This characteristic severely limits the allowable scale range for the input pattern. To improve the discrimination capability, several methods have been proposed, which include mainly the phase-only radial harmonic filter (PORHF), wavelet radial harmonic filter (MRHW) and so on [2, 3]. But the phase-only based filters provide poor discrimination in a noisy environment. The wavelet transform performs the function of a bandpass filter, which partially removes the lower-frequency part, and also reduces the high-frequency noise. However, its main disadvantage is that the wavelet basis function must be designated in advance and the different basis functions will lead the different results.

In the present paper, scale-invariant pattern recognition is achieved by combining bidimensional empirical mode decomposition (BEMD) in the RHF filter design, which is named as BEMD_RHF filter. The BEMD_RHF is obtained by making the high frequency component (the first IMF) as the reference pattern of the RHF filter, and the edge of the residual of the input pattern without the first three IMFs as the test pattern for denoising. The proposed filter provides a wider allowable scale change of the object.

2. Theory

Shift- and scale-invariant pattern recognition can be achieved by using radial-harmonic expansion, in which the reference image is decomposed into a series of radial harmonic components [1]. The shift- and scale-invariant correlation is thus achieved if a single harmonic component from the expansion is used as a matched filter. A 2-D pattern f(r,θ) can be decomposed into a set of RHF{ri2πm1} in polar coordinates:

f(r,θ)=m=fm(θ)ri2πm1

Where the coefficient for the m’th order is

fm(θ;x0,y0)=L1r0R0f(r,θ;x0,y0)ri2πm1rdr

and (x 0, y 0) is the expansion center. The finite radius R 0 covering the pattern and the smallest radius r 0 define the range of the expansion, where proper choice of the integer L=lnR 0-lnr 0 is required. The M’ th-order RHF is chosen as the RHF filter function:

fM(r,θ;x0,y0)=fM(θ;x0,y0)ri2πM1

Thus, the CPI for the RHF filter and input pattern g(r,θ) can be written as

Cgf(x1,y1;x0,y0)2=02πr0R0g(r,θ;x1,y1)fM*(r,θ;x0,y0)rdrdθ2
=02πr0R0[m=gm(θ;x1,y1)ri2πm1][fM*(θ;x0,y0)ri2πM1]rdrdθ2
=02πr0R0m=[gm(θ;x1,y1)fM*(θ;x0,y0)ri2π(mM)1]drdθ2
={0mM02πr0R0gM(θ;x1,y1)fM*(θ;x0,y0)r1drdθ2m=M
=L02πgM(θ;x1,y1)fM*(θ;x0,y0)dθ2

For an input pattern gβ(r,θ)=g(rβ,θ), the CPI for this scaled pattern and RHF filter becomes

Cgβf(x1,y1;x0,y0)2=02πr0R0g(rβ,θ;x1,y1)fM*(r,θ;x0,y0)rdrdθ2
=02πr0R0[m=gm(θ;x1,y1)ri2πm1β1i2πm]×
[fM*(θ;x0,y0)ri2πM1]rdrdθ2
=β2L02πgM(θ;x1,y1)fM*(θ;x0,y0)dθ2
=β2Cgf(x1,y1;x0,y0)2

The CPI of the RHF filter is proportional to the square of the scale factor; it is not fully scale-invariant. This effect might severely limit the allowable range of the scale of the input pattern, particularly when the sidelobes of the CPI are large or the noise level is high.

BEMD is the 2-D extension of the empirical mode decomposition [4]. It decomposes a nonstationary and nonlinear signal in a finite number of high and low frequency oscillations of zero mean, called intrinsic mode functions (IMF). The decomposition is carried out through a fully data-driven sifting process, so that no basis functions need to be fixed. Let the original image be denoted as I, a bidimensional IMF as BIMF, and the residue as R. In the decomposition process, ith BIMF Fi is obtained from its source image Si, where Si is a residue image obtained as Si=I - ∑i-1j=1(Fj) and S 1=I.

The steps of the BEMD process can be summarized and simplified as follows:

1) Detect the extrema (both maxima and minima) of the image Si when i=1, Si=I;

2) Compute the 2-D upper and lower envelope;

3) Determine the mean m of upper and lower envelopes;

4) Subtract out the mean from the image: Si-m=h, and let Si=h

5) Repeat steps 2–4 until h is a BIMF, Fi;

6) Let i=i+1, Si=I - ∑i-1j=1(Fj) if the stopping criterions are not satisfied, go to 1), otherwise finish the process.

This process gets several BIMFs, and a residue R. Components superposition reconstructs the data:

I=i=1n(Fi)+R

The empirical mode decomposition (EMD) or BEMD is a sifting process that decomposes a signal into IMFs or BIMFs and a residue based basically on the local frequency or oscillation information. The first IMF/BIMF contains the highest local frequencies of oscillation or the highest local spatial scales, the final IMF/BIMF contains the lowest local frequencies of the oscillation and the residue R only contains the trend of the signal. More details about the EMD and BEMD can refer to Ref. [4, 5].

The theory of scale invariant pattern recognition using RHF functions is taken as the guideline for our filter design; the difference is that we use the BEMD-filtered object pattern as the original reference pattern for harmonic expansion.

And to decrease the factor β 2 of Eq. (5), a contour object could be used to remove the useless low-frequency components by digital preprocessing to extract the edges of the original block images [2]. To utilize the contour of the object, the first BIMF, F 1(f) of the reference image f (x, y) is used to act as the reference pattern, and is expanded as the combination Eq. (1) of RHF. And one particular order, radial{F 1(f)}, in Eq. (3) is chosen as the filter function. Then this filter and the input pattern F 1(g) of the input image g(x, y) are used to perform the correlation operation.

Cgf(x1,y1;x0,y0)=radial{F1(f)}F1(g)

The symbol ⊗ denotes the correlation operation.

Since the reference image generally can be obtained without noise and real input images for recognition often contain random noise, it is necessary to test the proposed method for noisy patterns as well. It has been observed that the first BIMF F 1(g) constitutes most of the noise in the signal [4]. Hence removal of the first BIMF F 1(g) reduces high spatial frequencies. Since BEMD is local in nature, image blurring is reduced. Filtering occurs in time space rather than in frequency space; therefore, any nonlinearity and nonstationarity present in the data are preserved. Although F 1(g) has been observed to contain most of the noise, the first few BIMFs from BEMD still usually contain a lot of the noise in the input image; therefore, removing them and reconstructing the image with the remaining BIMFs tend to denoise the image. The number of BIMFs needed to be removed depends on the level of noise in the image. In this paper, the first three BIMFs of the input image g(x, y) were discarded as

Resid(g4)=gF1(g)F2(g)F3(g)

The edge of the residual of the input image g(x, y), Edge(Resid(g⃗)), is used to perform the correlation operation. The Laplacian of Gaussian method is used to find the edge of Resid(g 4⃗).

Cgf(x1,y1;x0,y0)=radial{F1(f)}Edge(Resid(g4))

3. Simulation and results

3.1. Experimental setup

The performance of the proposed method, BEMD-RHF, is evaluated via computer simulation. Each of the patterns under consideration is expressed as a 256×256 matrix. Matlab codes for BEMD are as developed by Damerval et al., which details can be found in [5, 6]. And the programs of BEMD-RHF were developed in Matlab and executed in Intel Pentium 2.4G Hz system with 512MB RAM.

The F16 fighter is used as the reference image, which is shown in Fig. 1(a). There are 6 BIMFs and 1 residue extracted from the reference image, which are shown in Figs. 1(b)–1(h).

 figure: Fig. 1.

Fig. 1. The reference object and the BEMD components. (a) F16 fighter as the reference object. (b) to (g): the corresponding BIMFs of the F16 (from BIMF1 to BIMF6).(h) the residue of BEMD decomposition.

Download Full Size | PDF

In the construction of the filter, the radial harmonic expansion is performed on this reference with the geometrical center as the expansion center. The first order radial harmonic is taken as the RHF.

3.2. Shift and scale invariant

The first experiment was performed to observe the shift and scale invariant. The input image contains four scaled versions of the F16 fighter with scale factors β=1, 0.8, 0.5, 0.2, as shown in Fig. 2(a). Using the traditional RHF filter, PORHF and BEMD_RHF of expansion order 1 obtained from the object of Fig. 1(a), the correlation intensity distribution for the input objects of Fig. 2(a) is shown in Figs. 2(b), 2(d) and 2(f). To simplify the comparison of the peak intensity, the corresponding cross-sectional intensity distribution through all the correlation peaks is shown in Figs. 2(c), 2(e) and 2(g).

The shift- and scale-invariance for the RHF, the PORHF and the BEMD_RHF are achieved. As seen from the Fig. 2, the normalized CPIs for the RHF and PORHF approximately follow the rule of Eq. (5) for scale β equal to 1, 0.8, 0.5 and 0.2. At the same time, it can be observed that the BEMD_RHF presents better flatness of the peak intensity versus the scale change. This result indicates that the problem of quasi-invariance is solved.

The second experiment was performed to observe the capability to distinguish the scaled reference pattern from the false target. The input image contains four scaled versions of the F16 fighter with scale factors β=1, 0.8, 0.5, 0.2 and one Mig25 with scale factor β=1, as shown in Fig. 3(a). The four scaled versions of the F16 fighter are respectively in the top left corner, bottom left corner, top right corner and bottom right corner of the input image. And the Mig25 is in the center of input image.

The correlation intensity distribution and the corresponding cross-sectional intensity distribution through the correlation peaks are shown in Figs. 3(b) and 3(c). The normalized CPIs (0.83:0.87:1:0.82) for the four F16s are larger than that (0.27) for the Mig25. In this case a nonlinear effect on the CPI appears, such that the CPI does not depend on the size of the input object. As seen from Fig. 3(c), the difference among the CPIs for the F16 is reduced to less than 20%. It is obvious that it is easy to distinguish two categories when a threshold is used. The threshold can be set in a wide gap from 0.8 to 0.3. It proves that the discrimination capability and the scale invariant property of the proposed BEMD_RHF are excellent and outstanding.

 figure: Fig. 2.

Fig. 2. Shift and scale invariance test. (a) Four scaled versions of the F16 fighter with scale factors β=1,0.8,0.5,0.2. (b) to (g): The correlation intensity distribution and the corresponding cross-sectional intensity distribution when using (b) and (c) the RHF, (d) and (e) the PORHF, (f) and (g) the BEMD_RHF.

Download Full Size | PDF

 figure: Fig. 3.

Fig. 3. Distinguishing the scaled versions of the reference pattern from the false target. (a) The input image with 4 scaled F16 and 1 Mig25. (b) The correlation intensity distribution. (c) The corresponding cross-sectional intensity distribution.

Download Full Size | PDF

To analyze how the CPI depends on the scale factor β of the F16 for the MRHW (M=1, a=0.5) and the BEMD_RHF, we simulate some typical cases and present the results in Fig. 4. In Fig. 4, both of the CPI curves for the MRHW and BEMD_RHF do not depend on the scale factor β in any explicit way. It can be found that the recognition range of the BEMD_RHF is wider than the MRHW almost at the whole range for scale change whether the threshold is taken at any value. The variance for the CPI curve of BEMD_RHF is a better uniformity than the MRHW (M=1, a=0.5). For instance, if the threshold is set at 0.6, the recognition range of the BEMD_RHF is from 0.15 to 1, which is wider than of the MRHW (from 0.25 to 1).

 figure: Fig. 4.

Fig. 4. Comparing the normalized correlation peak intensity of BEMD_RHF with that of the MRHW when the scale factor changes from 0.1 to 1.

Download Full Size | PDF

3.3. Noise robustness

Since real images often contain random noise introduced either by the transmission through a noisy channel or by the imaging process, it is necessary to test the proposed method for noisy images as well. To examine the robustness of the proposed method to additive noise, we do experiments using additive white Gaussian noise with zero mean and a variance dependent on the required signal to noise rate (SNR).

 figure: Fig. 5.

Fig. 5. Noise robustness test. (a),(d),(g),(j), (m) and (p) The input images with SNR=25, 15, 5, 0, -2, and -5; (b), (e), (h), (k), (n), (q) and (c), (f), (i), (l), (o), (r) The corresponding correlation intensity distribution planes and the corresponding cross-sectional intensity distributions through the correlation peaks

Download Full Size | PDF

We added white Gaussian noise to the input image of Fig. 3(a). Some examples of noisy images, at the considered values of SNR, are given in Fig. 5. And Figs. 5(a), 5(d), 5(g), 5(j), 5(m) and 5(p) are the input images with SNR=25, 15, 5, 0, -2, and -5 respectively. Therefore, Figs. 5(b), 5(e), 5(h), 5(k), 5(n), 5(q) and Figs. 5(c), 5(f), 5(i), 5(l), 5(o), 5(r) show the corresponding correlation intensity distribution planes and the corresponding cross-sectional intensity distributions through the correlation peaks. As shown in Fig. 5, there is not obvious performance degradation to recognize the different scaled version of the F16 when SNR is larger than 0, and the additive noise can only affect markedly the CPIs of the false target. The noise produces some random correlation intensity distribution in the output plane but does not affect the correlation peaks in a significant way. All of the CPIs of the four scaled F16 are larger than 0.6. The CPIs of the Mig25 are smaller than 0.4, and increase as the SNR decrease gradually. When SNR is smaller than 0, the additive noise blur the diversity (difference) of two categories gradually as the SNR decrease.

4. Conclusions

The traditional Mellin radial harmonic function of an object is capable of identifying objects at different scale factors, which correlation intensity peaks depend on the squared value of the scale factor. This is an important disadvantage, and limits the allowable range of the scale change. EMD is a new multiresolution decomposition method that decomposes signals into basis functions that are adapted from the signals themselves, and can act as a bandpass filter in a correlator that simultaneously reduces the low-frequency component of the object and high-frequency component of the noise.

A novel method BEMD_RHF is proposed for scale invariant pattern recognition by combining the BEMD with the Mellin radial harmonic decomposition. Compared with the RHF and the PORHF, the noise robustness and discrimination of the BEMD_RHF is improved. The flatness of its peak intensity response versus scale change is also improved. This property is important, since we can detect a large range of scaled patterns (0.2<β<1) using a global threshold. Within this range, the CPI is relatively uniform with a variance below 20%. This proposed BEMD_RHF filter has been tested experimentally to confirm the results from numerical simulation for cases both with and without input white noise.

Acknowledgements

This work was supported by the National Natural Science Foundation of China (No.60803087). The authors would like to thank the anonymous reviewers for their constructive comments, which helped us to improve this paper.

References and Links

1. D. Mendlovic, E. Maron, and N. Konforti, “Shift and scale invariant pattern recognition using Mellin radial harmonics,” Opt. Commun. 67, 172–176(1988). [CrossRef]  

2. A. Moya, J. J. Esteve-Taboada, J. Garcia, and C. “Ferreira, Shift- and scale-invariant recognition of contour objects with logarithmic radial harmonic filters,” Appl. Opt. 39, 5347–5351(2000). [CrossRef]  

3. Yih-Shyang Cheng and Hui-Chi Chen, “Improved performance of scale-invariant pattern recognition using a combined Mellin radial harmonic function and wavelet transform,” Opt. Eng. 46, 107204 (Oct. 29, 2007) [CrossRef]  

4. J. C. Nunes, Y. Bouaoune, E. Deléchelle, O. Niang, and Ph. Bunel, “Image analysis by bidimensional empirical mode decomposition,” Image Vision Comput. 21, 1019–1026(2003). [CrossRef]  

5. C. Damerval, S. Meignen, and V. Perrier, “A fast algorithm for bidimensional EMD,” IEEE Signal Process Lett. 12, 701–704 (2005). [CrossRef]  

6. C. Damerval, “BEMD Toolbox : Bidimensional Empirical Mode Decomposition”. http://ljk.imag.fr/membres/Christophe.Damerval/software.html

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (5)

Fig. 1.
Fig. 1. The reference object and the BEMD components. (a) F16 fighter as the reference object. (b) to (g): the corresponding BIMFs of the F16 (from BIMF1 to BIMF6).(h) the residue of BEMD decomposition.
Fig. 2.
Fig. 2. Shift and scale invariance test. (a) Four scaled versions of the F16 fighter with scale factors β=1,0.8,0.5,0.2. (b) to (g): The correlation intensity distribution and the corresponding cross-sectional intensity distribution when using (b) and (c) the RHF, (d) and (e) the PORHF, (f) and (g) the BEMD_RHF.
Fig. 3.
Fig. 3. Distinguishing the scaled versions of the reference pattern from the false target. (a) The input image with 4 scaled F16 and 1 Mig25. (b) The correlation intensity distribution. (c) The corresponding cross-sectional intensity distribution.
Fig. 4.
Fig. 4. Comparing the normalized correlation peak intensity of BEMD_RHF with that of the MRHW when the scale factor changes from 0.1 to 1.
Fig. 5.
Fig. 5. Noise robustness test. (a),(d),(g),(j), (m) and (p) The input images with SNR=25, 15, 5, 0, -2, and -5; (b), (e), (h), (k), (n), (q) and (c), (f), (i), (l), (o), (r) The corresponding correlation intensity distribution planes and the corresponding cross-sectional intensity distributions through the correlation peaks

Equations (17)

Equations on this page are rendered with MathJax. Learn more.

f(r,θ)=m=fm(θ)ri2πm1
fm(θ;x0,y0)=L1 r0R0f(r,θ;x0,y0)ri2πm1rdr
fM(r,θ;x0,y0)=fM (θ;x0,y0) ri2πM1
Cgf(x1,y1;x0,y0)2=02πr0R0g(r,θ;x1,y1)fM*(r,θ;x0,y0)rdrdθ2
=02πr0R0[m=gm(θ;x1,y1)ri2πm1][fM*(θ;x0,y0)ri2πM1]rdrdθ2
=02πr0R0m=[gm(θ;x1,y1)fM*(θ;x0,y0)ri2π(mM)1]drdθ2
={0mM02πr0R0gM(θ;x1,y1)fM*(θ;x0,y0)r1drdθ2m=M
=L02πgM(θ;x1,y1)fM*(θ;x0,y0)dθ2
Cgβf(x1,y1;x0,y0)2=02πr0R0g(rβ,θ;x1,y1)fM*(r,θ;x0,y0)rdrdθ2
=02πr0R0[m=gm(θ;x1,y1)ri2πm1β1i2πm]×
[fM*(θ;x0,y0)ri2πM1]rdrdθ2
=β2 L02πgM(θ;x1,y1)fM*(θ;x0,y0)dθ2
=β2 Cgf(x1,y1;x0,y0)2
I=i=1n(Fi)+R
Cgf(x1,y1;x0,y0)=radial{F1(f)}F1(g)
Resid(g4)=gF1(g)F2(g)F3(g)
Cgf(x1,y1;x0,y0)=radial{F1(f)}Edge(Resid(g4))
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.