## Abstract

Lots of prior models for natural image wavelet coefficients have been proposed in the last two decades. Although most of them belong to the Scale Mixture of Gaussian (GSM) models, they are of obviously different analytical forms. As a result, Bayesian image denoising algorithms based on these prior models are also very different from each other. In this paper, we develop a novel image denoising algorithm by combining the Expectation Maximization (EM) scheme and the properties of the GSM models. The developed algorithm is of a simple iterative form and can converge quickly. It only uses the derivative information of a probability density function and is suitable for all GSM-type prior models that have an analytical probability density function. The developed algorithm can be viewed as a unified Bayesian image denoising framework. As examples, several classical and recently-proposed prior models for natural image wavelet coefficients are tested and some new results are obtained.

©2008 Optical Society of America

## 1. Introduction

Prior models play a key role in the area of wavelet-based Bayesian image denoising [1]. Over the last two decades, a number of models have been developed to characterize the statistical behaviors of natural image wavelet coefficients. The widely-used *univariate* statistical models include the Generalized Laplacian model [2, 3], Alpha-stable model [4, 5], exponential model [6], Bessel K Form density model [7, 8, 9] and mixture of two distributions [10, 11]. These univariate models are suitable to characterize an empirical statistic of natural images: the filter response of natural images is always sharply peaked at zero with heavy tails [12]. Recently, in order to describe the joint statistical behavior among natural image wavelet coefficients, some *multivariate* models also have been proposed such as the multivariate Laplacian model [13, 14, 15, 16], multivariate exponential model [17], multivariate Alpha-stable model [18], multivariate Bessel K Form density model [7], multivariate scale mixture of Gaussian (GSM) model [19], multivariate normal inverse Gaussian model [20], and multivariate Elliptically Contoured Distribution (ECD) model [21].

Based on various prior models such as those mentioned above, many Bayesian image denoising algorithms have been developed [2, 4, 5, 9, 10, 11, 13, 14, 16, 17, 18, 19, 20, 21, 22]. Due to the fact that these algorithms use different prior models, as well as different Bayesian inference technologies (such as the Minimum Mean-squared Error (MMSE) estimate or the Maximum a posterior (MAP) estimate), they always present very different forms. Interestingly, although most existing natural image wavelet coefficient models have obviously different analytical forms, they can be categorized in the same class - they essentially belong to the GSM models [23]. Thus, a natural question is: can we develop a unified denoising framework for these models? Such a unified framework, if available, will be very useful not only for the image denoising application itself but also in the research of natural image statistic. For example, using the unified framework, one can fairly compare the performance of these models for image denoising, thus understands which model can characterize the statistical behaviors of natural image wavelet coefficients the best and why. In addition, when a new GSM-type prior model is available, to use it for image denoising, one only needs to simply put it into the unified denoising framework rather than to develop a new estimator based on its specific analytical form.

In this paper, we will develop a novel image denoising algorithm by combining the EM scheme and the properties of the Scale Mixture of Gaussian (GSM) models. The developed algorithm provides a unified image denoising framework for the GSMmodels that have an analytical expression form. The algorithm is of a simple iterative form and only uses the derivative information of a prior model. When a new prior model is available, it can be easily adopted for image denoising with the developed algorithm. As examples, several classical and recently-proposed statistical models of wavelet coefficients of natural images are tested in the denoising framework and some new results are obtained.

The outline of the rest of the paper is as follows. Section 2 gives the derivation of the unified image denoising algorithm. Examples are given that test several classical and recently-proposed statistical models in Section 3. Section 4 reports experimental results and provides a sufficient discussion. Concluding remarks are summarized in Section 5.

## 2. Unified denoising algorithm: derivation

Assume that noisy image wavelet coefficients have been partitioned into *M* small neighborhoods, and each neighborhood includes *d* coefficients. Let **Y**={*Y*
_{1},…,*Y _{M}*} denotes the

*M*observed

*d*-dimensional noisy data and each

*Y*, 1≤

_{i}*i*≤

*M*, satisfies the model

where *X _{i}*, 1≤

*i*≤

*M*, is the corresponding noise-free wavelet coefficients and

*ε*~

*N*(0⃗,

*ρ*) is the additive Gaussian white noise. Suppose that the noise-free wavelet coefficient

_{ε}*X*, 1≤

_{i}*i*≤

*M*, are i.i.d random variables each having a GSM distribution ${X}_{i}\stackrel{d}{=}\sqrt{{z}_{i}}U,$ , where

*U*~

*N*(0⃗,

*ρ*) and

*z*is a hidden positive scalar multiplier. Let

_{i}*Z*={

*z*

_{1}, …,

*z*} and

_{M}*z*, 1≤

_{i}*i*≤

*M*, be statistically independent. Thus, we have

*P*(

*Z*)= $P\left(Z\right)=\underset{i=1}{\overset{M}{\Pi}}P\left({z}_{i}\right)$ .

In [24], J. M. Bioucas-Dias proposed a highly effective image deconvolution algorithm based on the *univariate* GSM models. The algorithm employed a new generalized Expectation Maximization (EM) scheme, where the missing variables were the scalar multipliers of the GSM models and the maximization step of the EM scheme was replaced with a linear stationary second-order iterative method. We will use the standard EM scheme to estimate **X**={*X*
_{1}, …,*X _{M}*} under the observed data

**Y**={

*Y*

_{1}, …,

*Y*} in this paper. Similarly to the technology adopted in [24], we will set the missing variables to be the scalar multipliers of the GSM models. As we will show below, combined with several special properties of the GSM models, the EM scheme will lead to a very simple iterative image denoising algorithm that is suitable for both

_{M}*univariate*and

*multivariate*GSM models.

The standard EM scheme works iteratively by alternating two steps: the E-step (expectation step) and the M-step (maximization step). In our setting, *Z* is the missing (or hidden) variable and (**Y**,*Z*) is the so-called complete data. The E-step finds the expected value of the loglikelihood log*P*(**X**,**Y**,*Z*) given the observed data **Y** and the current estimate **X**
^{t}={*X ^{t}*

_{1}, …,

*X*},

^{t}_{M}Using the Bayesian rule, we have *P*(**X**,**Y**,*Z*)=*P*(**Y**|*Z*,**X**)*P*(*Z*,**X**)=*P*(**Y**|**X**)*P*(**X**|*Z*)*P*(*Z*). As a result, Eq. (2) can be rewritten as

$$\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}=\mathrm{log}P\left(\mathbf{Y}|\mathbf{X}\right)+\underset{Z}{\int}\mathrm{log}P\left(\mathbf{X}|Z\right)P(Z|\mathbf{Y},{\mathbf{X}}^{t})dZ+C(Z,\mathbf{Y},{\mathbf{X}}^{t}),$$

where *C*(*Z*,**Y**,**X**
^{t}) is a function of *Z*,**Y** and **X**
^{t}. In addition, since

$P\left(Z|\mathbf{Y},{\mathbf{X}}^{t}\right)=\frac{P(\mathbf{Y}|{\mathbf{X}}^{t},Z)P\left({\mathbf{X}}^{t}|Z\right)P\left(Z\right)}{P\left(\mathbf{Y}|{\mathbf{X}}^{t}\right)P\left({\mathbf{X}}^{t}\right)}=\frac{P\left({\mathbf{X}}^{t}|Z\right)P\left(Z\right)}{P\left({\mathbf{X}}^{t}\right)}=P\left(Z|{\mathbf{X}}^{t}\right),$

we get

$Q(\mathbf{X},{\mathbf{X}}^{t})=\mathrm{log}p\left(\mathbf{Y}|\mathbf{X}\right)+\underset{Z}{\int}\mathrm{log}p\left(\mathbf{X}|Z\right)P\left(Z|{\mathbf{X}}^{t}\right)\mathrm{dz}+C(Z,\mathbf{Y},{X}^{t}).$

An important property of a GSM model is that it is Gaussian when conditioned on the positive scalar multiplier [19]. Hence, we have *X _{i}*|

*z*~

_{i}*N*(0⃗,

*z*), 1≤

_{i}ρ*i*≤

*M*, and

$P\left(\mathbf{X}\mid Z\right)=\underset{i=1}{\overset{M}{\Pi}}P\left({X}_{i}\mid {z}_{i}\right)={\left(2\pi \right)}^{-\frac{d}{2}\times M}\underset{i=1}{\overset{M}{\Pi}}{\mid {z}_{i}\rho \mid}^{-\frac{1}{2}}\mathrm{exp}[-\frac{1}{2}\sum _{i=1}^{M}{X}_{i}^{T}{\left({z}_{i}\rho \right)}^{-1}{X}_{i}].$

It follows that

$$\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}=\mathrm{log}P\left(\mathbf{Y}\mid \mathbf{X}\right)-\frac{1}{2}\sum _{i=1}^{M}{X}_{i}^{T}{\rho}^{-1}{X}_{i}\underset{{z}_{i}}{\int}{z}_{i}^{-1}P\left({z}_{i}\mid {X}_{i}^{t}\right)d{z}_{i}+C\prime (Z,\mathbf{Y},{\mathbf{X}}^{t}).$$

The M-step of the EM scheme is to maximize the expectation Eq. (3) we computed in the E-step above. Let $\Omega \left({X}_{i}^{t}\right)=\underset{{z}_{i}}{\int}{z}_{i}^{-1}P\left({z}_{i}|{X}_{i}^{t}\right)d{z}_{i}$ . We have

$$\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}=\underset{{X}_{i}}{\mathrm{arg}max}[-\frac{1}{2}\sum _{i=1}^{M}{({Y}_{i}-{X}_{i})}^{T}{\rho}_{\epsilon}^{-1}({Y}_{i}-{X}_{i})-\frac{1}{2}\sum _{i=1}^{M}{X}_{i}^{T}{\rho}^{-1}{X}_{i}\Omega \left({X}_{i}^{t}\right)]$$

$$\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}=\mathrm{arg}\mathrm{zero}\frac{d}{d{X}_{i}}\left[\sum _{i=1}^{M}{({Y}_{i}-{X}_{i})}^{T}{\rho}_{\epsilon}^{-1}({Y}_{i}-{X}_{i})+\sum _{i=1}^{M}{X}_{i}^{T}{\rho}^{-1}{X}_{i}\Omega \left({X}_{i}^{t}\right)\right]$$

$$\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}=\mathrm{arg}\mathrm{zero}\left[-2{\rho}_{\epsilon}^{-1}({Y}_{i}-{X}_{i})+2\Omega \left({X}_{i}^{t}\right){\rho}^{-1}{X}_{i}\right].$$

Hence, we get

or

Another important property of a GSM model is that its probability density function can be expressed as a function of the quadratic form of a random vector [21]. The property means *P*(*X ^{t}_{i}*)=

*f*(

*r*) if we let

*r*=(

*X*)

^{t}_{i}^{T}

*ρ*

^{-1}

*X*. It follows that $\Omega \left({X}_{i}^{t}\right)=-\frac{2}{f\left(r\right)}\frac{\mathrm{df}}{\mathrm{dr}}$ (see the Appendix). Now we can rewrite Eq. (4) as

^{t}_{i}$$={(-2\frac{d}{\mathrm{dr}}\mathrm{log}f\left(r\right){\rho}^{-1}+{\rho}_{\epsilon}^{-1})}^{-1}{\rho}_{\epsilon}^{-1}{Y}_{i}.$$

So far, we have obtained an iterative denoising formula Eq. (5), which only uses the derivative information of a GSM model, namely,
$\frac{d}{\mathrm{dr}}\mathrm{log}f\left(r\right)$
. During the derivation above, we utilized two important properties of the GSM models. The first property is that a GSM random variable is Gaussian when conditioned on the scalar multiplier. This property was used in the E-step. The second property is that the probability density function of a GSM random variable can be written as a function of its quadratic form. This is true, since the GSM model is in fact a special subfamily of the Elliptically Contoured Distribution Family (ECDF) [21]. This property was used in the M-step. Considering that most existing models of wavelet coefficients of natural images belong to the GSM models, Eq. (5) hence provides a unified framework for the image denoising application. To verify Eq. (5), we will use it to derive two classical denoising algorithms. One is the Wiener filtering and the other is the Bishrinkage function proposed in [13]. Let *t*→+∞, Eq. (5) reduces to

We first check the Wiener filtering case. Assume *X _{i}* obeys the Gaussian distribution, namely,

*X*~

_{i}*N*(0→,

*ρ*). Since $P\left({X}_{i}\right)=f\left(r\right)={\left(2\pi \right)}^{-\frac{d}{2}}{\mid \rho \mid}^{-\frac{1}{2}}\mathrm{exp}(-\frac{1}{2}r)$ , we have $\frac{d}{\mathrm{dr}}\mathrm{log}f\left(r\right)=-\frac{1}{2}$ . Equation (6) hence becomes

which is just the classical Wiener filtering for Gaussian data contaminated by Gaussian white noise.

In [13], Sendur and Selesnick developed a highly effective denoising algorithm referred to as the Bishrinkage. To develop the algorithm, they used a bivariate model

$$=f\left(r\right)=\frac{3}{2\pi {\sigma}^{2}}\mathrm{exp}\left(-\sqrt{3}{r}^{\frac{1}{2}}\right),$$

where *X _{i}*=(

*x*

_{1},

*x*

_{2})

^{T}and $\rho =\left(\begin{array}{cc}{\sigma}^{2}& 0\\ 0& {\sigma}^{2}\end{array}\right).$ .

The Bishrinkage function in [13] was derived from aMAP estimator. It is easy to show that we can also obtain the Bishrinkage function using Eq. (6). Now we have $\frac{d}{\mathrm{dr}}=\mathrm{log}f\left(r\right)=-\frac{\sqrt{3}}{2}{r}^{-\frac{1}{2}}$ . Substituting it into Eq. (6), we can obtain the Bishrinkage function

where (∙)_{+} is defined as
${\left(x\right)}_{+}=\{\begin{array}{c}2,\mathrm{if}x0\\ \phantom{\rule{.2em}{0ex}}x,\mathit{otherwise}\end{array}$
.

The bivariate model Eq. (8) can be extended to multivariate cases and the resulting multivariate models belong to the Elliptically Contoured Distribution Family (ECDF) [21]. In [21], we developed an effective image denoising algorithm using the Maximum a Posterior (MAP) estimator based on these extended multivariate models. Interestingly, the algorithm developed in [21] has the same form as Eq. (6), which is the limit case of Eq. (5). Of course, it is not surprising that the EM scheme and the MAP estimator can obtain the same result, if we realize that the EM scheme is essentially a method to solve MAP (or ML) problems. We would like to emphasize that the iterative formula Eq. (5) converges due to the property of EM scheme itself. Hence, the derivation above, in fact, provides an explanation of the convergence of the algorithm developed in [21].

Next, we will simply discuss the numerical implementation of the iterative formula Eq. (5). For more details please see [21]. Let *S* be the symmetric square root of the positive definite matrix *ρ _{ε}* (i.e.,

*ρ*=

_{ε}*SS*), and {

^{T}*Q*, Λ} be the eigenvector/eigenvalue expansion of the matrix

*S*

^{-1}

*ρS*, namely

^{T}*Q*Λ

*Q*=

^{T}*S*

^{-1}

*ρS*(we note here

^{T}*Q*=

^{T}*Q*

^{-1}). It follows that

Hence we can rewrite Eq. (5) as

$$\phantom{\rule{.9em}{0ex}}\phantom{\rule{.9em}{0ex}}\phantom{\rule{.2em}{0ex}}=\mathrm{SQ}{\left[\frac{{\lambda}_{i}}{{\lambda}_{i}-2\frac{d}{\mathrm{dr}}\mathrm{log}f\left(r\right)}\right]}_{1}^{d}{\left(\mathrm{SQ}\right)}^{-1}{Y}_{i},$$

where
${\left[\frac{{\lambda}_{i}}{{\lambda}_{i}-2\frac{d}{\mathrm{dr}}\mathrm{log}f\left(r\right)}\right]}_{1}^{d}$
denotes the diagonal matrix
$\mathrm{diag}[\frac{{\lambda}_{1}}{{\lambda}_{1}-2\frac{d}{\mathrm{dr}}\mathrm{log}f\left(r\right)},\dots ,\frac{{\lambda}_{d}}{{\lambda}_{d}-2\frac{d}{\mathrm{dr}}\mathrm{log}f\left(r\right)}]$
, and *λ _{i}*, 1≤

*i*≤

*d*, are the diagonal elements of Λ, namely, Λ=

*diag*(

*λ*

_{1},…,

*λ*. In addition, using

_{d}*S*,

*Q*and Λ, we can write

*r*as

$$\phantom{\rule{.2em}{0ex}}\phantom{\rule{.2em}{0ex}}={\left({Q}^{T}{S}^{-1}{X}_{i}^{t}\right)}^{T}{\Lambda}^{-1}{Q}^{T}{S}^{-1}{X}_{i}^{t}=\sum _{i=1}^{d}\frac{1}{{\lambda}_{i}}{V}_{i}^{2},$$

where *V _{i}*, 1≤

*i*≤

*d*, denotes the

*i*-th element of the

*d*-dimension vector

*Q*

^{T}*S*

^{-1}

*X*. Now we can obtain the estimate of

^{t}_{i}*X*,1≤

_{i}*i*≤

*M*, through the iterative formula Eq. (11).

## 3. Unified denoising algorithm: examples

As mentioned above, most existing statistical models for natural image wavelet coefficients belong to the GSM models. In this section, we will test several classical and recently-proposed GSM prior models for the image denoising application using the unified framework developed in Section 2. According to the iterative formula Eq. (11), we only need to write a GSM prior model as a function of the quadratic form and compute the derivative of its logarithm with respect to *r*. We summarize the unified denoising algorithm as follows.

*Unified Denoising Algorithm:*

- Write a probability density function of a GSMmodel as a function of the quadratic form such as
*P*(*X*)=*C*·*f*(*r*), where*r*=*X*^{T}ρ^{-1}*X*and*C*is a constant; - Compute the derivative of log
*f*(*r*) with respect to*r*, $\frac{d}{\mathrm{dr}}\mathrm{log}f\left(r\right)$ ; - Obtain the estimate using the iterative formula Eq. (11).

#### 3.1. Examples for univariate prior models

In this subsection, we will focus on the usage of several univariate models (namely, *d*=1). Note in this case, we have *r*=*σ*
^{-2}
*x*
^{2}.

1). Univariate Gaussian Model

Although the 1-D Gaussian model is not suitable to characterize the heavy-tailed marginal statistical behavior of the wavelet coefficients of natural images, we still test this model for analysis and comparison purpose. The univariate Gaussian model is of the form

where ${f}_{G}\left(t\right)=\mathrm{exp}\left(-\frac{r}{2}\right).$ . Obviously, we have

2). Univariate Generalized Laplacian Model

To model the heavy-tailed property of natural image wavelet coefficients, a widely-used univariate model is the two parameter Generalized Laplacian (GL) model of the form [2, 3]

where
$\Psi (s,p)=\frac{2s}{p}\Gamma \left(\frac{1}{p}\right)$
is a normalization term and Γ(*x*)=∫^{∞}
_{0}
*t*
^{x-1}
*e*
^{-t}
*dt* is the Gamma function.

The GL model is zero-mean and symmetric, and can capture the large kurtosis property of the 1-D marginal statistic of natural image wavelet coefficients [2, 3]. The typical values of *p* for the wavelet coefficients of natural images are in the range [0.5 1.0], corresponding to kurtosis values in the range [6, 25.2] [25]. The parameters {*s*,*p*} are directly related to the second and fourth moment, σ^{2} and *m*
_{4}, of the random variable *x*. Specifically, the second and fourth moments of a generalized Laplacian signal corrupted by additive Gaussian white noise are [2]

Using Eq. (16), it is convenient to compute {*s*, *p*} from the noisy wavelet coefficients.

In order to use the iterative formula Eq. (11), we rewrite Eq. (15) as

$$\phantom{\rule{.9em}{0ex}}\phantom{\rule{.9em}{0ex}}=C\xb7\mathrm{exp}\left[-{\left(\frac{\sigma}{s}\right)}^{p}{r}^{\frac{p}{2}}\right]=C\xb7\mathrm{exp}\left(-\alpha {r}^{\beta}\right)=C\xb7{f}_{\mathrm{GL}}\left(r\right),$$

where
$\alpha ={\left(\frac{\sigma}{s}\right)}^{p}$, $\beta =\frac{p}{2}$ and *f _{GL}*(

*r*)=exp(-

*αr*). Obviously, we have

^{β}3). Univariate Bessel K Form Model

For *p*>0 and *s*>0, the probability density function of the univariate Bessel K Form model can be expressed as [7, 8]

where **K**
_{ν} is the modified Bessel function of the second kind defined as

${\mathbf{K}}_{v}\left(zx\right)=\frac{\Gamma \left(v+\frac{1}{2}\right){\left(2x\right)}^{v}}{\sqrt{\pi}\Gamma \left(p\right)}\underset{0}{\overset{\infty}{\int}}\frac{\mathrm{cos}\left(\mathrm{zt}\right)\mathrm{dt}}{{\left({t}^{2}+{x}^{2}\right)}^{v+\frac{1}{2}}}\left(\mathrm{Re}\left(v\right)>-\frac{1}{2},z>0,\mid \mathrm{arg}x\mid <\frac{\pi}{2}\right),$

*p* and *s* are the shape and scale parameters, respectively.

The probability density function of the univariate Bessel K Form model is unimodal, symmetric around the mode. When *p*<1, *P*(*x*) is sharply peaked and has heavy tails. The parameters {*s*,*p*} are directly related to the second and fourth cumulant, σ^{2} and *κ*, of the random variable *x*. Specifically, we have σ^{2}=*ps*,
$\kappa =\frac{3}{p}+3$
[8, 9]. Since adding a Gaussian process to a zero-mean process only modifies its variance but not the other cumulants, we can obtain the estimate of parameters {*s*,*p*} from the noisy wavelet coefficients using the relation

where $\tilde{\sigma}$
^{2} and *$\tilde{\kappa}$* denote the sample variance and the fourth cumulant of noisy wavelet coefficients, respectively. Using the relation
$s=\frac{{\sigma}^{2}}{p}$
, we can rewrite Eq. (19) as

where ${f}_{\mathrm{BK}}\left(r\right)={r}^{\frac{p}{2}-\frac{1}{4}}{\mathbf{K}}_{p-\frac{1}{2}}\left(\sqrt{2\mathrm{pr}}\right)$ . Since $\frac{d}{\mathrm{dz}}\left[{z}^{v}{\mathbf{K}}_{v}\left(z\right)\right]=-{z}^{v}{\mathbf{K}}_{v-1}\left(z\right)$ [26], we get

Now, we have

4). Asymptotic Univariate Bessel K Form Model

Although the Bessel K Form model Eq. (19) describes the marginal statistical behavior of natural image wavelet coefficients very well [8, 9], its functional form is not easy to work with for some applications. The Bessel K Form model with the parameters *p* and *s* can be approximated well by the function (that we call asymptotic Bessel K Form model) [8]

For denoising applications, parameters of the asymptotic Bessel K Form model can be estimated using Eq. (20) directly from the noisy wavelet coefficients. Now we can rewrite Eq. (24) as

$$\phantom{\rule{.9em}{0ex}}\phantom{\rule{.9em}{0ex}}=\frac{1}{\Gamma \left(p\right)}{\left(\frac{2}{s}\right)}^{\frac{p}{2}}{\sigma}^{p-1}{\left(\frac{{x}^{2}}{{\sigma}^{2}}\right)}^{\frac{p-1}{2}}\mathrm{exp}\left(-\sqrt{\frac{{2\sigma}^{2}}{s}}{\left(\frac{{x}^{2}}{{\sigma}^{2}}\right)}^{\frac{1}{2}}\right)$$

$$\phantom{\rule{.9em}{0ex}}\phantom{\rule{.9em}{0ex}}=C\xb7{r}^{\frac{p-1}{2}}\mathrm{exp}\left(-{r}^{\frac{1}{2}}\sqrt{\frac{{2\sigma}^{2}}{s}}\right).$$

Since σ^{2}=*ps*, we get

where ${f}_{\mathrm{AABK}}\left(r\right)={r}^{\frac{p-1}{2}}\mathrm{exp}\left(-\sqrt{2pr}\right)$ . Obviously, we have

5). Laplacian Model (or Bi-exponential Model)

The probability density function of the Laplacian model is of the form

where *p*=2*s*. The tails of the Laplacian model are intermediate in heaviness between the Student *t*-distribution and the Gaussian distribution. Since for the Laplacian model, σ^{2}=*Var*(*x*)=2*s*, we have

where ${f}_{\mathrm{BE}}\left(r\right)=\mathrm{exp}(-\sqrt{2r})$ . Obviously, we have

#### 3.2. Examples for multivariate prior models

It is well-known that a residual dependency structure always remains among wavelet coefficients, although the orthonormal wavelet was found to be fairly well-decorrelated. Recently, some multivariate models have been proposed to characterize the statistical dependence among natural image wavelet coefficients [7, 13, 15, 16, 17, 18, 19, 20, 21]. Generally, image denoising algorithms based on multivariate models can produce better denoising performance than those based on univariate models. In this subsection, we will focus on the usage of the multivariate GSM models that have an analytical expression form to the image denoising application under the unified denoising framework.

1). Multivariate Gaussian Model

The multivariate Gaussian model is of the form

where ${f}_{G}\left(r\right)=\mathrm{exp}\left(-\frac{1}{2}r\right)$ . The same as the univariate Gaussian case, we have

2). Multivariate Laplacian Model

The multivariate Laplacian model is of the form

where *q*(*X*)=*λX ^{T}*

*ρ*

^{-1}

*X*. Themodel is useful for representingmultivariate, sparsely distributed data, with mutually dependent components [15]. Equation (33) can be rewritten as

where ${f}_{\mathrm{ML}}\left(r\right)=\frac{{\mathbf{K}}_{\frac{d}{2}-1}\left(\sqrt{2r}\right)}{{r}^{\frac{d}{4}-\frac{1}{2}}}$ . Using Eq. (22), we can obtain

3). Multivariate Elliptically Contoured Distribution Model

In [21], we extended a generalized version of the bivariate model Eq. (8) to multivariate cases and obtained several new multivariate models that belong to the Elliptically Contoured Distribution Family (ECDF). The generalized bivariate model is of the form

where *X*(*x*
_{1}, *x*
_{2})^{T}. Obviously, we have

The extended 4-D model is of the form

where *X*=(*x*
_{1}, *x*
_{2}, *x*
_{3}, *x*
_{4})^{T}. Similarly, we have

The extended 10-D model is of the form

where *X*=(*x*
_{1}, …, *x*
_{10})^{T}. Again, we have

4). Multivariate Exponential Model

Although the Elliptically Contoured Distribution models developed in [21] are highly effective for the image denoising application, how they fit the empirical statistics of the wavelet coefficients of natural images remains unknown. In [17], we built several new multivariate models that not only match well with the observed statistics of the wavelet coefficients of natural images, but also have analytical forms simpler than those in [21]. These models are of exponential forms and their parameters were determined by directly fitting the 1-D histogram of the quadratic form of wavelet coefficients.

The exponential models can be expressed as [17]

where *X*=(*x*
_{1}, …, *x _{d}*)

^{T}and

*f*(

_{MEM_d}*r*)=exp(-

*a*

_{2}

*r*

^{a3}). For the 2-D case, we have

*a*

_{1}=-2.1,

*a*

_{2}=6.8 and

*a*

_{3}=0.17. For the 4-D case, we have

*a*

_{1}=-3.2,

*a*

_{2}=6.3 and

*a*

_{3}=0.22. For the 9-D case, we have

*a*

_{1}=-4.0,

*a*

_{2}=5.6 and

*a*

_{3}=0.26. For the 10-D case, we have

*a*

_{1}=-4.5,

*a*

_{2}=5.5 and

*a*

_{3}=0.3. From Eq. (42), we obtain

## 4. Experiment results and discussion

In this section, we will test and compare the performance of prior models demonstrated in Section 3 for the image denoising application under the unified denoising framework.

We use several standard test images, namely, Barbara, Lena, Boat and Pepper for our experiments. These images are available at Portilla’s homepage http://www.io.csic.es/PagsPers/JPortilla/denoise/software/index.htm. The first three images are of size 512×512, and the last one is of size 256×256. To study the dependency of our denoising algorithm on different noise levels, these test images are contaminated by additive Gaussian white noise with different standard deviation, 10, 20 and 30. The denoising performance is evaluated both in terms of PSNR and visually. A 4-scale orthonormal wavelet transform with the Symlet-8 filter is used here. The PSNR of the restoration images versus the full range of the input noise level are listed in Table 1 and Table 2 (in both tables, 1×1+1, 3×1+1, 3×3 and 3×3+1 denote the used neighborhood structures. For example, 3×+1 denotes the used neighborhood structure containing a 3×3 local window plus a parent coefficient of the central coefficient of the window). For comparison, we also list in both tables the PSNR due to several other image denoising algorithms including the hard-threshold procedure (3σ_{n}) (Hard Threshold), the MMSE estimator based on the GL model proposed by Simoncelli *et. al* (GL_MMSE) [2], the GSM-based MMSE algorithm developed by Portilla *et. al* (GSM_MMSE) [19] and the Wiener filtering in the wavelet domain (Wiener).

In our experiments, we use the Successive SubstitutionMethod for the iterative formula Eq. (11) and the initialization is set to be zero (for more details about the Successive Substitution Method please see the Appendix in [21]). Figure 1(a–d) display the PSNR of the solution versus the number of iterations for the Pepper image with different noise level (20 and 30) when employing the asymptotic univariate Bessel K Form model and the multivariate exponential model (using a 3×3 neighborhood structure), respectively. Although the Successive Substitution Method is only of linear convergence, these examples clearly show the unified iterative algorithm can converge quickly. From a practical viewpoint, four or five iterations are enough to get a stable solution. For the Pepper image, when using the asymptotic univariate Bessel K Form model, the algorithm only runs 0.35 seconds for 4 iterations and 0.51 seconds for 10 iterations; when using the multivariate exponential model (using a 3×3 neighborhood structure), the algorithm only runs 1.28 seconds for 4 iterations and 2.64 seconds for 10 iterations (The platform we used is as follows: Hardware Architecture: PC; Operating System: Windows XP; Processor: 3.2 GHz; Memory Size: 2G; Application software: Matlab 7.1).

From Table 1 and Table 2, we can observe some interesting phenomena:

1). The classical Wiener filtering in the wavelet domain produces almost the same performance as the unified iterative algorithm using the Gaussian model (Gaussian_UA). One can check for this the univariate case and the multivariate case in both tables. Figure 1(e,f) display two convergence plots for the multivariate Gaussian model case (using a 3×3 neighborhood structure) for the Pepper image at noise level 20 and 30, respectively. Interestingly, only after the first iteration, the algorithm has converged to the optimal solution (in Section 2, we have noted the equivalence relation between the unified algorithm and the Wiener filtering (see Eq. (7)), which is the optimal estimate under the minimum *L*
_{2} error sense for the Gaussian data contaminated by the additive Gaussian white noise). This verifies the validity of our unified algorithm for the image denoising application.

2). It is well known that the univariate Laplacian model is not very suitable to model the statistics of natural image wavelet coefficients since its tails are not heavy enough. However, we surprisingly find in Table 1 and Table 2, for the univariate case, all models except the Gaussian model produce a similar performance. In most cases, the univariate Laplacian model even produces results that are a little better. It has been widely-accepted in the literature that the

band-adaptive algorithms commonly perform better than the non-band-adaptive ones. It is more surprising if we realize that other models are band-adaptive — they adjust their parameters for every wavelet coefficient band, whereas the univariate Laplacian model is not. This can be explained using a relation in our denoising setting. The unified denoising algorithm operates on the quadratic form *r*=*X ^{T}*

*ρ*

^{-1}

*X*rather than the wavelet coefficients themselves. In the univariate case,

*r*can be written as $r=\frac{{x}^{2}}{{\sigma}^{2}}$ . It follows that the expectation $E\left(r\right)=\frac{E\left({x}^{2}\right)}{{\sigma}^{2}}=1$ , which, we would like to emphasize, is not band-dependent. Obviously, some non-band-adaptive algorithms can obey the relation better than the band-adaptive ones. Thus, they maybe perform better than the band-adaptive ones (if the non-band-adaptive algorithms also can obey some higher order statistic relations such as $\mathrm{Var}\left(r\right)=E\left({r}^{2}\right)-{E}^{2}\left(r\right)=\frac{E\left({x}^{4}\right)}{{\sigma}^{4}}-1$ , of course). From Table 1 and Table 2, we also can find the multivariate Laplacian model leads to a very good denoising performance (note that no parameters need to be adjusted in this case).

3). In addition, we can find that for the univariate GL model, the unified algorithm and the MMSE algorithm in [2] have a very similar performance. Generally, the PSNR difference

between both algorithms is only about 0.1dB. This again implies the validity of the unified algorithm.

To compare the visual effect of different denoising algorithms, some restoration images for the Pepper image with noise standard deviation 20 are shown in Fig. 2 and Fig. 3. As usual, the hard threshold procedure produces many artifacts that blemish the restoration image seriously (see Fig. 2(a)). Interestingly, in the *univariate* case, the visual effect due to the GL_MMSE algorithm in [2], the unified denoising algorithm using the GL model (GL_UA), the Bessel K Form model (Bessel K Form_UA), the asymptotic Bessel K Form model (Asy. Bessel K Form_UA) and the Laplacian model (Laplacian_UA) are very similar to each other. In Fig. 3, we can find that the visual effect due to the 3×3 Wiener filtering in the wavelet domain and the unified denoising algorithm using the multivariate Gaussian model (Gaussian_UA) are almost the same. Obviously, the unified denoising algorithm based on the multivariate Laplacian model (Laplacian_UA), the multivariate Elliptically Contoured Distribution model (MECDF_UA) and the multivariate exponential model (MEM_UA) all produces a satisfactory visual effect. In order to make it clear, we display crops of denoised Boat images at noise level 20 in Fig. 4. Among them, the multivariate Laplacian model produces the best visual effect.

## 5. Conclusion

An image denoising algorithm based on natural image prior models has been developed by combining the EM scheme and the properties of the GSM models. It not only has a simple iterative form but also is computationally tractable. Due to the fact that most prior models of natural image wavelet coefficients belong to the GSM models, the developed algorithm can be viewed as a unified image denoising framework. We have tested several classical and recently-proposed prior models under the unified algorithm. We found that the developed algorithm based on the multivariate Laplacian model can produce restoration results with high quality, not only in terms of the PSNR but also visually. Considering that the main purpose of the paper is to show how the algorithm is obtained and how to use it, we did not test extensively existing prior models for natural images. We will give more test results in future. It is our hope that the developed algorithm is helpful to those who will use new natural image models for the image denoising application.

## Appendix

To compute Ω(*X ^{t}_{i}*), we adopt a tip used in [24]:

$\Omega \left({X}_{i}^{t}\right)=\underset{{z}_{i}}{\int}{z}_{i}^{-1}P\left({z}_{i}\right|{X}_{i}^{t}){\mathrm{dz}}_{i}=\frac{1}{P\left({X}_{i}^{t}\right)}\underset{{z}_{i}}{\int}{z}_{i}^{-1}P\left({X}_{i}^{t}\right|{z}_{i})P\left({z}_{i}\right){\mathrm{dz}}_{i}.$

Considering that

$$\phantom{\rule{.8em}{0ex}}\phantom{\rule{.8em}{0ex}}\phantom{\rule{.8em}{0ex}}\phantom{\rule{.8em}{0ex}}=P({X}_{i}^{t}\mid {z}_{i})(-\frac{1}{2}{z}_{i}^{-1})\frac{d{\left({X}_{i}^{t}\right)}^{T}{\rho}^{-1}{X}_{i}^{t}}{{\mathrm{dX}}_{i\_\mathrm{cent}}^{t}}=-\frac{1}{2}{z}_{i}^{-1}P({X}_{i}^{t}\mid {z}_{i})\frac{d{\left({X}_{i}^{t}\right)}^{T}{\rho}^{-1}{X}_{i}^{t}}{{\mathrm{dX}}_{i\_\mathrm{cent}}^{t}},$$

where *X ^{t}_{i_cent}* denotes the central element of the

*d*-dimensional vector

*X*, we obtain

^{t}_{i}${z}_{i}^{-1}P\left({X}_{i}^{t}\right|{z}_{i})=-\frac{2}{\frac{d{\left({X}_{i}^{t}\right)}^{T}{\rho}^{-1}{X}_{i}^{t}}{{\mathrm{dX}}_{i\_\mathrm{cent}}^{t}}}\frac{\mathrm{dP}\left({X}_{i}^{t}\right|{z}_{i})}{{\mathrm{dX}}_{i\_\mathrm{cent}}^{t}}.$

It follows that

$\Omega \left({X}_{i}^{t}\right)=\frac{1}{P\left({X}_{i}^{t}\right)}\underset{{Z}_{i}}{\int}-\frac{2}{\frac{{d\left({X}_{i}^{t}\right)}^{T}{\rho}^{-1}{X}_{i}^{t}}{{\mathrm{dX}}_{i\_\mathrm{cent}}^{t}}}\frac{\mathrm{dP}({X}_{i}^{t}\mid {z}_{i})}{{\mathrm{dX}}_{i\_\mathrm{cent}}^{t}}P\left({z}_{i}\right){\mathrm{dz}}_{i}=-\frac{2}{P\left({X}_{i}^{t}\right)\frac{{d\left({X}_{i}^{t}\right)}^{T}{\rho}^{-1}{X}_{i}^{t}}{{\mathrm{dX}}_{i\_\mathrm{cent}}^{t}}}\frac{d}{{\mathrm{dX}}_{i\_\mathrm{cent}}^{t}}P\left({X}_{i}^{t}\right).$

Since *P*(*X ^{t}_{i}*)=

*f*(

*r*) and

*r*=(

*X*)

^{t}_{i}^{T}

*ρ*

^{-1}

*X*, we obtain

^{t}_{i}$\Omega \left({X}_{i}^{t}\right)=-\frac{2}{f\left(r\right)\frac{{d\left({X}_{i}^{t}\right)}^{T}{\rho}^{-1}{X}_{i}^{t}}{{\mathrm{dX}}_{i\_\mathrm{cent}}^{t}}}\frac{\mathrm{df}}{\mathrm{dr}}\frac{{d\left({X}_{i}^{t}\right)}^{T}{\rho}^{-1}{X}_{i}^{t}}{{\mathrm{dX}}_{i\_\mathrm{cent}}^{t}}=-\frac{2}{f\left(r\right)}\frac{\mathrm{df}}{\mathrm{dr}}.$

## Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant No. 60672126), the National Basic Research Program (973 Program) of China (Grant No. 2006CB705700), and Science and Technology innovation engineering important project in university of China (Grant 706053).

## References and links

**1. **A. Srivastava, B. Lee, E. P. Simoncelli, and S.-C. Zhu, “On advances in statistical modeling of natural images,” Journal of Mathematical Imaging and Vision **18**, 17–33 (2003). [CrossRef]

**2. **E. P. Simoncelli and E. H. Adelson, “Noise removal via Bayesian wavelet coring,” in *Proc. 3rd Int. Conf. on Image Processing* (Lausanne, Switzerland, 1996), pp. 379–382. [CrossRef]

**3. **S. G. Mallat, “A theory for multiresolution signal decomposition: the wavelet representation,” IEEE Pattern Anal. Machine Intell. **11**, 674–693 (1989). [CrossRef]

**4. **A. Achim, A. Bezerianos, and P. Tsakalides, “Novel Bayesian multiscale method for speckle removal in medical ultrasound images,” IEEE Trans. Med. Imag. **20**, 772–783 (2001). [CrossRef]

**5. **A. Achim, P. Tsakalides, and A. Bezerianos, “SAR image denoising via Bayesian wavelet shrinkage based on heavy-tailed modeling,” IEEE Trans. Geosci. Remote Sensing **41**, 1773–1784 (2001). [CrossRef]

**6. **J. Huang, “Statistics of natural images and models,” PhDthesis, Division of Appled Mathematics, Brown University (2000).

**7. **U. Grenander and A. Srivastava, “Probability models for clutter in natural images,” IEEE Pattern Anal. Machine Intell. **23**, 424–429 (2001). [CrossRef]

**8. **A. Srivastava, X. Liu, and U. Grenander, “Universal analytical forms for modeling image probability,” IEEE Pattern Anal. Machine Intell. **28**, 1200–1214 (2002). [CrossRef]

**9. **J. M. Fadili and L. Boubchir, “Analytical form for a Bayesian wavelet estimator of images using the Bessel K Form densities,” IEEE Trans. Image Processing **14**, 231–240 (2005). [CrossRef]

**10. **B. Vidakovic, “Nonlinear wavelet shrinkage with Bayes rules and Bayes factors,” J. Amer. Stat. Assoc. **93**, 173–179 (1998). [CrossRef]

**11. **H. A. Chipman, E. D. Kolaczyk, and R. E. McCulloch, “Adaptive Bayesian wavelet shrinkage,” J. Amer. Stat. Assoc. **92**, 1413–1421 (1997). [CrossRef]

**12. **D. J. Field, “Relations between the statistics of natural images and the response properties of cortical cells,” J. Opt. Soc. Am. A **4**, 2379–2394 (1987). [CrossRef] [PubMed]

**13. **L. Sendur and I. W. Selesnick, “Bivariate shrinkage functions for wavelet-based denoising exploiting interscale dependency,” IEEE Trans. Signal Processing **50**, 2744–2756 (2002). [CrossRef]

**14. **L. Sendur and I. W. Selesnick, “Bivariate shrinkage with local variance estimation,” IEEE Signal Processing Lett. **9**, 438–441 (2002). [CrossRef]

**15. **T. Eltoft, K. Taesu, and Te-Won Lee, “On the multivariate Laplace distribution,” IEEE Signal Processing Lett. **13**, 300–303 (2006). [CrossRef]

**16. **S. Tan and L. Jiao, “Multishrinkage: analytical form for a Bayesian wavelet estimator based on the multivariate Laplacian model,” Opt. Lett. **32**, 2583–2585 (2007). [CrossRef] [PubMed]

**17. **S. Tan and L. Jiao, “Wavelet-based Bayesian image estimation: from marginal and bivariate prior models to multivariate prior models,” IEEE Trans. Image Processing, *submitted*, (2006).

**18. **A. Achim and E. E. Kuruoglu, “Image denoising using bivariate-stable distributions in the complex wavelet domain,” IEEE Signal Processing Lett. **12**, 17–20 (2005). [CrossRef]

**19. **J. Portilla, V. Strela, M. J. Wainwright, and E. P. Simoncelli, “Image denoising using scale mixtures of Gaussians in the wavelet domain,” IEEE Trans. Image Processing **12**, 1338–1351 (2003). [CrossRef]

**20. **T. A. Øigård, A. Hanssen, R. E. Hansen, and F. Godtliebsen, “EM-estimation and modeling of heavy-tailed processes with the multivariate normal inverse Gaussian distribution,” Signal Process. **85**, 1655–1673 (2005). [CrossRef]

**21. **S. Tan and L. Jiao, “Multivariate statistical models for image denoising in the wavelet domain,” International Journal of Computer Vision **75**, 209–230 (2007). [CrossRef]

**22. **P. V. Gehler and M. Welling, “Product of Edgeperts,” in *Advances in Neural Information Processing System*, Y. Weiss, B. Scholkopf, and J. Platt, eds. (Cambridge, MA., 2005), pp. 419–426.

**23. **M. J. Wainwright and E. P. Simoncelli, “Random cascades on wavelet trees and their use in analyzing and modeling natural images,” Applied and Computational Harmonic Analysis **11**, 89–123 (2001). [CrossRef]

**24. **J. M. Bioucas-Dias, “Bayesian wavelet-based image deconvolution: a GEM algorithm exploiting a class of heavy-tailed priors,” IEEE Trans. Image Processing **15**, 937–951 (2006). [CrossRef]

**25. **R. W. Buccigrossi and E. P. Simoncelli, “Image compression via joint statistical characterization in the wavelet domain,” IEEE Trans. Image Processing **8**, 1688–1701 (1999). [CrossRef]

**26. **M. Abramowitz and C. A. Stegun, *Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables* (New York: Dover, 1972).