## Abstract

Outdoor imaging in haze is plagued by poor visibility. A major problem is spatially-varying reduction of contrast by airlight, which is scattered by the haze particles towards the camera. However, images can be compensated for haze, and even yield a depth map of the scene. A key step in such scene recovery is subtraction of the airlight. In particular, this can be achieved by analyzing polarization-filtered images. This analysis requires parameters of the airlight, particularly its degree of polarization (DOP). These parameters were estimated in past studies by measuring pixels in sky areas. However, the sky is often unseen in the field of view. This paper derives several methods for estimating these parameters, when the sky is not in view. The methods are based on minor prior knowledge about a couple of scene points. Moreover, we propose blind estimation of the DOP, based on the image data. This estimation is based on independent component analysis (ICA). The methods were demonstrated in field experiments.

©2009 Optical Society of America

## 1. Introduction

Imaging in poor atmospheric conditions [1, 2] affects human activities [3], as well as remote sensing and surveillance. Hence, analysis of images taken in haze is important. Moreover, such research promotes other domains of vision through scattering media, such as water [4–6] and tissue. It is widely acknowledged that exploiting polarization in imaging can enhance visibility in such conditions [7–16]. Various studies suggest acquiring frames at distinct polarization states, and then mathematically manipulating the pair of frames to see better. This path is motivated by evidence of such a process in biological systems [17–19].

An additional approach to the problem of imaging in haze is to model and invert the image-formation process, based on a-priori knowledge about the scene [1, 20, 21]. The polarization and inversion paths were both combined in an approach described in [22, 23]. There, the inversion of the image formation process capitalizes on the fact that one of the sources of image degradation in haze is partially polarized. Such analysis yields an estimate for the distance map of the scene, in addition to a dehazed image. Interestingly, recovery of a three dimensional range map based on polarization is also used for studying solid objects [24]. The feasibility of inversion of haze effect based on analysis of polarization is enhanced by advances in polarimetric cameras [25–31].

To achieve such inversion, environmental parameters are required. In particular, it is important to know the parameters of stray light (called airlight [3,14,22,32–35]) created by haze, which greatly decreases image contrast. These parameters can be determined from the image data itself. Originally [23], the required parameters were derived from measurements of pixels that correspond to the *sky* by the *horizon* (even automatically [22]). In this paper we refer to that method as *sky-based*. For example, a hazy scene is shown [36] in Fig. 1(a) and the *sky-based* dehazing result is shown in Fig. 1(b).

The prior parameter estimator [22, 23] relies, thus, on the existence of sky in the field of view (FOV). This reliance has been a limiting factor, which inhibited the usefulness of the approach. Often, the FOV does not include a sky area. This occurs, for example, when viewing from a high vantage point, when the FOV is narrow, or when there is partial cloud cover on the horizon. In this work we address this problem. We manage calibration of the environmental parameters and consequently enable successful dehazing, despite the absence of sky in the FOV. Moreover, we propose a method that *blindly* separates the airlight radiance (the main cause for contrast degradation) from the object’s signal. The parameter that determines this separation is the degree of airlight polarization. It is estimated without any user interaction. The method exploits mathematical tools developed in the field of blind source separation (BSS), also known as independent component analysis (ICA).

ICA has already contributed to solving image separation [37–39], particularly with regard to reflections. The problem of haze is more complex than reflections, since object recovery is obtained by nonlinear interaction of the raw images. Nevertheless, we show that the radiance of haze (airlight) can be separated by ICA, by the use of a simple pre-process. Dehazing had been attempted by ICA based on color cues [40]. However, an implicit underlying assumption behind Ref. [40] is that radiance is identical in all the color channels, i.e. the scene is gray. This is untypical in nature.

We successfully performed calibration of the required atmospheric environmental parameters (including polarization), in multiple real experiments conducted in haze. Skyless dehazing then followed. We obtained blind parameter estimation which was consistent with direct sky measurements. Partial results were presented in Ref. [41].

## 2. Theoretical background

To make the paper self-contained, this section briefly reviews the known formation model of hazy images. It also describes a known inversion process of this model, which recovers visibility. This description is based on Ref. [23]. As shown in Fig. 2, an acquired frame is a combination of two main components. The first originates from the object radiance. Let us denote by *L*
^{object} the object radiance as if it was taken in a clear atmosphere, without scattering in the line of sight (LOS). Due to atmospheric attenuation [23], the camera senses a fraction of this radiance. This attenuated signal is the *direct transmission*

where

is the transmittance of the atmosphere. The transmittance depends on the distance *z* between the object and the camera, and on the atmospheric attenuation coefficient *β*, where ∞ > *β* > 0. The second component is the *path radiance* (*airlight*). It originates from the scene illumination (e.g., sunlight), a portion of which is scattered into the LOS by the haze. Let *a*(*z*) be the contribution to airlight from scattering at *z*, accounting for attenuation this component undergoes due to propagation in the medium. The aggregate of *a*(*z*) yields the airlight

Here *A*
_{∞} is the value of airlight at a non-occluded horizon. It depends on the haze and illumination conditions. Contrary to the direct transmission, airlight increases with the distance and dominates the acquired image irradiance

at long range. The addition of airlight [42] is a major cause for reduction of signal contrast.

In haze, the airlight is often partially polarized. Hence, the airlight image component can be modulated by a polarizer mounted on the camera (analyzer). At one polarizer orientation the airlight contribution is least intense. Since the airlight disturbance is minimal here, this is the *best state* of the polarizer. Denote this airlight component as *A*
^{min}. There is another polarizer orientation (perpendicular to the former), for which the airlight contribution is the strongest, and denoted as *A*
^{max}. The overall airlight given in (Eq. 3) is given by

Assuming that the direct transmission is not polarized, the energy of *D* is equally split among the two polarizer states. Hence, the overall measured intensities at the polarizer orientations
mentioned above are

The degree of polarization (DOP) of the airlight is defined as

where *A* is given in Eq. (3). For narrow FOVs, this parameter does not vary much. In this work we indeed use a narrow FOV, hence assume that *p* is laterally invariant. Eq. (7) refers to the aggregate airlight, integrated over the LOS. Is *p* invariant to distance? Implicitly, this would mean that the DOP of *a*(*z*) is unaffected by distance. This is not strictly true. Underwater, it has recently been shown [18] that the DOP of light emanating from *z* may decay with *z*. Such depolarization [43] may also be the case in haze, due to multiple scattering. Multiple scattering also causes blur of *D*. For simplicity, we neglect the consequence of multiple scattering (including blur), as an effective first order approximation, similarly to Refs. [22, 23]. Note that

It follows that

It is easy to see from Eqs. (4,9) that an estimate for *I*
^{total} can be obtained by

Dehazing is performed by inverting the image formation process. The first step separates the haze radiance (airlight) *A* from the object’s direct transmission *D*. The airlight is estimated as

Then, Eq. (4) is inverted to estimate *D*. Subsequently, Eq. (1) is inverted based on an estimate of the transmittance (following Eq. 3)

These operations are compounded to

Two problems exist in this process. First, the estimation (i.e., separation) of airlight requires the parameter *p*. Secondly, compensation for attenuation requires the parameter *A*
_{∞}. Both of these parameters are generally unknown, and thus provide the incentive for this paper. In past studies that used polarization for dehazing, these parameters were estimated based on pixels which correspond to the sky near the horizon.

A method to calibrate the parameters when the sky is unseen was theoretically proposed in Ref. [23]. However, it has not demonstrated practical feasibility. It required the presence in the FOV of at least two different classes of multiple objects, each class having a similar (unknown) radiance in the absence of haze. For example, the classes can be buildings and trees. We found that method to be impractical. It can be difficult to identify *two* sets, each having a distinct class of similar objects. Actually, sometimes scenes do not have two such classes. Moreover, to ensure a significant difference between the classes, one should be darker than the other. However, estimating the parameter based on dark objects is prone to error caused by noise. Therefore, in practical tests, we found the theoretical skyless possibility mentioned in Ref. [23] to be impractical.

We now recap the assumptions underlying our model, as expressed by the above expressions. First, single scattering is dominant. Hence, blur and depolarization due to multiple scattering do not have a strong influence. Second, the distribution of scattering particles in the scene is spatially homogeneous. In reality, these assumptions do not hold accurately, particularly as this paper deals with methods to use in field experiments. Therefore, they are used as an approximation model, that yields algorithms having practical effectiveness in the field. The errors of this approximation may cause small but noticeable deviations, as discussed in Sec. 6.

## 3. Skyless parameter calibration and dehazing

This section introduces several methods to recover the environmental parameters *p* and *A*
_{∞} when the sky is not in view. The method presented in Sec. 3.1 requires the use of just a single class of objects residing at different distances. The consecutive methods assume that the parameter *p* is known. This parameter can be *blindly* derived by a method described in Sec. 4. Consequently, there is reduction of the information needed about objects and their distances. The method presented in Sec. 3.2 only requires the relative distance of two areas in the FOV, regardless of their underlying objects. The method described in Sec. 3.3 requires two similar objects situated at different, but not necessarily known distances. Table 1 summarizes the requirements of each of these novel methods.

#### 3.1. Distance-based dehazing

In this section, we develop a method for estimating *p* and *A*
_{∞} based on known distances to similar objects in the FOV. An idea to estimate atmospheric parameters by marking selected scene points was suggested in [20]. Motivated by this idea, suppose we can mark two scene points (*x _{k},y_{k}*),

*k*= 1,2, which, in the absence of scattering, would have a similar (unknown) radiance. For example, these can be two similar buildings which have an unknown radiance

*L*

^{build}. The points, however, should be at different distances from the camera

*z*

_{2}>

*z*

_{1}. For example, the two circles in Fig. 1(a) correspond to two buildings, situated at known distances of 11km and 23km. Using Eqs. (1,2,3,6), the image values corresponding to the object at distance

*z*

_{1}are

Similarly, for the object at distance *z*
_{2},

Let us denote

It can be shown that *C*
_{2} > *C*
_{1}. Note that *C*
_{1} and *C*
_{2} are known, since *I*
^{max}
_{k} and *I*
^{min}
_{k} constitute the acquired data at coordinates in the FOV.

where

Dividing Eq. (19) by Eq. (20) yields the following constraint

Let a zero crossing of *G*(*V*) be at *V*
_{0}. We now show that based on this *V*
_{0}, it is possible to estimate *p* and *A*
_{∞}. Then, we prove the existence and uniqueness of *V*
_{0}.

Estimation of the parameters is done in the following way. It can be shown that

and

Following Eqs. (5,23,24), an estimate for *A*
_{∞} is obtained by

Thus, Eqs. (25,26,27) recover the required parameters *p* and *A*
_{∞}. Two known distances of similar objects in the FOV are all that is required to extract parameters used for polarization-based dehazing, when the sky is not available.

Let us now prove the existence and uniqueness of *V*
_{0}.

*G*|_{V=0}> 0, since*C*_{2}>*C*_{1}.*G*|_{V=1}= 0. This root of*G*is not in the domain.- The function
*G*(*V*) has only one extremum. The reason is that its derivative$$\genfrac{}{}{0.1ex}{}{\partial G}{\partial V}={z}_{2}{C}_{1}{V}^{{z}_{2}-1}-{z}_{1}{C}_{2}{V}^{{z}_{1}-1}$$is null only when

- This extremum is a minimum. It can be shown that
*∂*^{2}*G*/*∂V*^{2}> 0.

Due to these facts, Eq. (22) always has a unique root at *V*
_{0} ∈ (0,1). Typical plots of *G*(*V*) are shown in Fig. 3. Due to the simplicity of the function *G*(*V*), it is very easy to find *V*
_{0} using standard tools (e.g., Matlab).

The solution *V*
_{0} can be found even when *z*
_{1} and *z*
_{2} are only *relatively* known, i.e., it is possible to estimate the parameters *A*
_{∞} and *p* based only on the relative distance *z*̃ = *z*
_{2}/*z*
_{1}, rather than absolute distances. For example, in Fig. 1(a), *z*̃ = 2.091. Denote

Then, Eq. (22) is equivalent to the constraint

Similarly to (22), also Eq. (31) has a unique root at *V*̃_{0} ∈ (0,1). Hence, deriving the parameters is done similarly to Eqs. (25,26,27). Based on *V*̃, *A*
_{∞} is estimated as

Similarly to Eq. (26),

Then, Eq. (27) yields *p̂*.

Based on these parameters, Eqs. (11,12) yield *Â*(*x,y*) and *t̂*(*x,y*). Then *L̂*^{object}(*x,y*) is derived using Eq. (13), for the entire FOV. This dehazing method was applied to Scene 1, as shown in Fig. 1(e). There is a minor difference between Figs. 1(b) and 1(e). This is discussed in Sec. 7.

The absolute distances *z*
_{1},*z*
_{2} or their ratio *z*̃ can be determined in various ways. One option is to use a map (this can be automatically done using a digital map), assuming the camera location is known. Relative distance can be estimated using the apparent ratio of two similar features that are situated at different distances. Furthermore, the absolute distance can be estimated based on the typical size of known objects.

#### 3.2. Distance-based dehazing, with known p

In some cases, similar objects at known distances may not exist in the FOV, or may be hard to find. Then, we cannot use the method presented in Sec. 3.1. In this section we overcome this problem, by considering two regions at different distances *z*
_{1} < *z*
_{2}, regardless of the underlying objects. Therefore, having knowledge of two distances of arbitrary areas is sufficient. The approach assumes that the parameter *p* is known. This knowledge may be obtained by a method we describe in Sec. 4, which is based on ICA. Based on a known *p* and on Eq. (11), *Â* is derived for every coordinate in the FOV. As an example, the estimated airlight map *Â* corresponding to Scene 1 is shown in Fig. 4. The two rectangles represent two regions, situated at distances *z*
_{1} and *z*
_{2}. Note that unlike Sec. 3.1, there is no demand for the regions to correspond to similar objects.

For regions around image coordinates (*x*
_{1},*y*
_{1}) and (*x*
_{2},*y*
_{2}) having respective distances *z*
_{1} and *z*
_{2}, Eq. (34) can be written as

where *V* is defined in Eq. (21). It follows from Eq. (35) that

and

Dividing Eq. (36) by Eq. (37) yields the following constraint

where

Recall that *Â* is known via Eq. (11), since here *p* is known. Hence, *α* can be calculated. Since *z*
_{1} < *z*
_{2}, then *α* > 0.

Similarly to (22), Eq. (38) has a unique solution at *V*
_{0} ∈ (0,1). We now prove the existence and uniqueness of *V*
_{0}.

*G*|_{p}_{V=0}< 0, since (-2*α*) < 0.*G*|_{p}_{V=1}= 0. This root of*G*is not in the domain._{p}- The function
*G*(_{p}*V*) has only one extremum: its derivative is null only once. - This extremum is a maximum. It can be shown that
*∂*^{2}*G*/_{p}*∂V*^{2}< 0.

Due to these facts, Eq. (38) has a unique root at *V*
_{0} ∈ (0,1). Typical plots of *G _{p}*(

*V*) are shown in Fig. 5. Based on Eq. (35),

Similarly to Sec. 3.1, it is possible to estimate *A*
_{∞} based only on the relative distance *z*̃ = *z*
_{2}/*z*
_{1}, rather than absolute ones. Then,

where *V*̃ is defined in (30). Eq. (41) has a unique solution *V*̃_{0} ∈ (0,1). Based on Eqs. (35),

This dehazing method was applied to Scene 1, as shown in Fig. 1(d).

#### 3.3. Feature-based dehazing, with known p

In this section we describe a method to estimate *A*
_{∞}, based on identification of two similar objects in the scene. As in Sec. 3.1, these can be two similar buildings which have an unknown radiance *L*
^{build}. Contrary to Sec. 3.1, the distances to these objects are not necessarily known. Nevertheless, these distances should be different. As in Sec. 3.2, this method is based on a given estimate of *p̂*, obtained, say, by the BSS method of Sec. 4. Thus, an estimate of *Â*(*x,y*) is at hand.

It is easy to show [23] from Eqs. (11,12,13) that

where

is a constant for these objects. Buildings at different distances have different intensity readouts, due to the effects of scattering. Therefore, they have different values of *I*
^{total} and *A*. According to Eq. (43), *Î*^{total} as a function of *Â* forms a straight line. Such a line can be determined using two data points. Extrapolating the line, its intercept yields the estimated radiance value *L̂*^{build}. Let the slope of the fitted line be *S*
^{build}. We can now estimate *A*
_{∞} as

Based on *̂A*_{∞} and *p̂*, we can recover *L̂*^{object}(*x,y*) for all pixels, as explained in Sec. 3.1. As an example, the two circles in Fig. 1(a) mark two buildings residing at different distances. The values of these distances are ignored, as if they are unknown. The corresponding dehazing result is shown in Fig. 1(c).

## 4. Blind estimation of *p*

Both Secs. 3.2, 3.3 assume that the parameter *p* is known. In this section, we develop a method for blindly estimating *p*. First, note that Eq. (13) can be rewritten as

This is a nonlinear function of the raw images *I*
^{max} and *I*
^{min}, since they appear in the denominator, rather than just superimposing in the numerator. However, the image model illustrated in Fig. 2 has a linear aspect: in Eqs. (4,10), the sum of the two acquired images *I*
^{min},*I*
^{max} is equivalent to a linear mixture of two components, *A* and *D*. This linear interaction makes it easy to use tools that have been developed in the field of ICA for linear separation problems. This section describes our BSS method for hazy images. The result of this BSS yields *p̂*.

#### 4.1. Facilitating linear ICA

To facilitate linear ICA, we attempt to separate *A*(*x,y*) from *D*(*x,y*). ICA relies on independence of *A* and *D*. Thus, we describe a transformation that enhances the reliability of this assumption. From Eq. (9), the two acquired images constitute the following equation system:

where

Thus, the estimated components are

where

Eqs. (47,49) are in the form used by linear ICA. Since *p* is unknown, then the mixing matrix **M** and separation matrix **W** are unknown. The goal of ICA in this context is: given only the acquired images *I*
^{max} and *I*
^{min}, find the separation matrix **W** that yields “good” *Â* and *D̂*. A quality criterion must be defined and optimized. Typically, ICA would seek *Â* and *D̂* that are statistically independent (see [44–47]).Thus, ICA assumes independence of *A* and *D*. However, the airlight *A* always increases with the distance *z*, while *D* tends to fall, in general, with *z*. Thus, there is a negative correlation between *A* and *D*. To observe this, consider the hazy **Scene 2**, shown in Fig. 6. The negative correlation between *A* and *D*, corresponding to this scene is seen in Fig. 7. There are local effects that counter this observation, in places where the inherent object radiance *L*
^{object} increases with *z*. Thus, the significant negative correlation mentioned
above occurs mainly in the *lowest* spatial frequency components: *D* decays with the distance only *roughly*. On the other hand, in some frequency components we can expect significant independence (Fig. 7).

Hence, the assumption underlying ICA is enhanced by transforming the images to a representation that is more appropriate than raw pixels. We work only with linear transformations as in [48], since we wish to maintain the linear relations expressed in Eqs. (47)–(50). There are several common possibilities for linear band-pass operations. We opted for a wavelet transformation (see for example [49]), but the derivation is not limited to that domain. Define

as the wavelet (or sub-band) image representation of *D*. Here *c* denotes the sub-band channel, while 𝓦 denotes the linear transforming operator. Similarly, define the transformed version of *A*,*Â*,*D̂*,*I*
^{max} and *I*
^{min} as *A _{c}*,

*Â*

_{c},

*D̂*

_{c},

*I*

^{max}

_{c}and

*I*

^{min}

_{c}, respectively (see example in Fig. 7). Due to the commutativity of linear operations,

where **W** is the same as defined in Eq. (50).

We now perform ICA over Eq. (52). As we shall see in the experiments, this approach is very effective. The assumption of statistical independence in sub-band images is powerful enough to blindly deliver the solution. In our case, the solution of interest is the matrix **W**, from which we derive *p*. Based on *p*, the airlight is estimated, and can then be separated from *D*(*x,y*), as described in Sec. 2.

#### 4.2. Scale insensitivity

When attempting ICA, we should consider its fundamental ambiguities [46]. One of them is *scale*: if two signals are independent, then they remain independent even if we change the scale of any of them (or both). Thus, ICA does not reveal the true scale of the independent components. A special case of scale ambiguity is the *sign* ambiguity, for which the scale is -1. This scale (and sign) ambiguity can be considered both as a problem, and as a helpful feature. The problem is that the estimated signals may be ambiguous. However, in our case, we have a *physical model* behind the mixture formulation. As we shall see, this model eventually disambiguates the derived estimation. Moreover, *we benefit* from this scale-insensitivity. As we show in Sec. 4.3, the fact that ICA is insensitive to scale simplifies the intermediate mathematical steps we take [50].

#### 4.3. Optimization criterion

Minimization of statistical dependency is achieved by minimizing the mutual information (MI) of the sources. The MI of *Â*_{c} and *D̂*_{c} can be expressed as (see for example [51])

Here 𝓗_{Â̂c} and 𝓗_{D̂̂c} are the marginal entropies of *Â̂*_{c} and *D̂̂*_{c}, respectively, while 𝓗_{Â̂c,D̂̂c} is their joint entropy. However, estimating the joint entropy from samples is an unreliable calculation. Therefore, it is desirable to avoid joint entropy estimation. In the following, we bypass direct estimation of the joint entropy, and in addition we describe other steps that enhance the efficiency of the optimization.

Let us look at the separation matrix **W** (Eq. 50). Its structure implies that up to a scale *p*, the estimated airlight *Â* is a simple difference of the two acquired images. Denote *Ã _{c}* as an estimation for the airlight component

*Â*, up to this scale

_{c}Similarly, denote

as the estimation of *D̂ _{c}* up to a scale

*p*, where here

Hence, the separation matrix of *D̂ _{c}* and

*Â*is

_{c}Minimizing the statistical dependency of *Â _{c}* and

*D̂*means that

_{c}*Ã*and

_{c}*D̃*should minimize their dependency too. We thus minimize the MI of

_{c}*D̃*and

_{c}*Ã*,

_{c}as a function of *w*
_{1} and *w*
_{2}. This way, the number of degrees of freedom of **W**̃ is two. Minimizing Eq. (58) yields estimates *ŵ*_{1} and *ŵ*_{2} which are related to *w*
_{1} and *w*
_{2} by an unknown global scale factor. This aspect is treated in Sec. 4.4.

As mentioned, estimating the joint entropy is unreliable and complex. Yet, Eqs. (47,49) express pointwise processes of mixture and separation: the airlight in a point is mixed only with the direct transmission of the same point in the raw frames. Following [46], for pointwise mixtures Eq. (58) is equivalent to

Here 𝓗_{Imaxc,Iminc} is the joint entropy of raw frames. As such, its value is a constant set by the raw data, and hence does not depend on **W**̃. For this reason, we ignore it in the optimization process. Moreover, note from Eq. (54), that *Ã _{c}* does not depend on

*w*

_{1},

*w*

_{2}. Therefore, 𝓗

_{Ẫc}is constant and can also be ignored in the optimization process. Thus, the optimization problem we solve is simplified to

where the term log|*w*
_{2} + *w*
_{1}| expresses log|det(**W**̃)| for the matrix given in Eq. (57).

At this point, the following argument can come up. The separation matrix **W**̃ (Eq. 57) has essentially only one degree of freedom *p*, since *p* dictates *w*
_{1} and *w*
_{2}. Would it be simpler to optimize directly over *p*? The answer is no. Such a move implies that *p* = (*ŵ*_{1} +*ŵ*_{2})/2. This means that the scale of *ŵ*_{1} and *ŵ*_{2} is fixed to the true unknown value, and so is the scale of the estimated sources *D̂* and *Â*. Hence scale becomes important, depriving us of the ability to divide **W**̃ by *p*. Thus, if we wish to optimize the MI over *p*, we need to explicitly minimize Eq. (53). This is more complex than Eq. (60). Moreover, this requires estimation of 𝓗_{Âc}, which is unreliable, since the airlight *A* has very low energy in high-frequency channels *c*. Thus, minimizing Eq. (60) while enjoying the scale insensitivity is preferable to minimizing Eq. (53) over *p*.

#### 4.4. Back to polarization calibration

The optimization described in Sec. 4.3 (which will later be simplified in Sec. 4.5) yields estimates *ŵ*_{1} and *ŵ*_{2}. We now use these values to derive an estimate for *p*. Apparently, from Eq. (56), *p̂* is simply the average of *ŵ*_{1} and *ŵ*_{2}. However, ICA yields *ŵ*_{1} and *ŵ*_{2} up to a global scale factor, which is unknown. Fortunately, the following estimator

is invariant to that scale. This process is repeated in each color channel.

Once *p̂* is derived, it is used for constructing **W** in Eq. (50). Then, Eq. (49) separates the airlight *Â* and the direct transmission *D̂*. This recovery is *not* performed on the sub-band images. Rather, it is performed on the raw image representation, as in prior sky-based dehazing methods.

We stress that in this scheme, we bypass all inherent ICA ambiguities: permutation, sign and scale. Those ambiguities do not affect us, because we essentially recover the scene using a *physics-based method*, not a pure signal processing ICA. We use ICA only to find *p̂*, and this is done in a way (Eq. 61) that is scale invariant.

#### 4.5. Efficient optimization using a probability model

As written above, Eq. (60) yields *ŵ*_{1} and *ŵ*_{2}, from which *p̂* is subsequently derived. In this section, we take steps that further simplify the estimation of the cost function (60). This would allow for more efficient optimization.

Recall that we use sub-band images at various spatial frequencies. In natural scenes, sub-band images are known to be *sparse*. In other words, almost all the pixels in a sub-band image have values that are very close to zero. Hence, the probability density function (PDF) of a sub-band pixel value is sharply peaked at the origin. A PDF model which is widely used for such images is the generalized Laplacian (see for example [49])

where *ρ* ∈ (0,2) and *σ* are parameters of the distribution. Here *μ*(*ρ, σ*) is a normalization constant. The scale parameter *σ* is associated with the standard deviation (STD) of the distribution. However, we do not need this scale parameter. The reason is that ICA recovers each signal up to an arbitrary intensity scale, as mentioned. Thus, optimizing a scale parameter during ICA is meaningless. We can thus set a fixed unit scale (*σ* = 1) to the PDF in Eq. (62). This means that whatever *D̃ _{c}*(

*x,y*) is, its values are

*implicitly*re-scaled by the optimization process to fit this unit-scale model. Therefore, the generalized Laplacian in our case is

This prior of image statistics can be exploited for the entropy estimation needed in the optimization [53, 54]. Entropy is defined (see for example [51]) as

where 𝓔 denoted expectation. Substituting Eq. (63) into Eq. (64) and replacing the expectation with empirical averaging, the entropy estimate is

Here *N* is the number of pixels in the image, while *ν*(*ρ*) = log [*μ*(*ρ*)]. Note that **ν**(*ρ*) does not depend on *D̃ _{c}*, and thus is independent of

*w*

_{1}and

*w*

_{2}. Hence,

*ν*(

*ρ*) can be ignored in the optimization process. The generalized Laplacian model simplifies the optimization problem to

The cost function (66) is a simple expression of the variables. Recall that expressions such as Eq. (53) or (60) use entropies, which may imply complex estimation of probability density distributions based on image histograms. In contrast, Eq. (66) bypasses the need for entropies, densities and histograms: it relies directly on the pixel values *D̃ _{c}*(

*x,y*) in the sub-band images.

Eq. (66) appears to be relatively simple to optimize. However, we prefer a convex formulation of the cost function, as it guarantees a unique solution, which can be reached efficiently using gradient-based methods. Consider the following problem

$$\mathrm{where}\phantom{\rule[-0ex]{.2em}{0ex}}{\tilde{D}}_{c}={w}_{1}{I}_{c}^{max}+{w}_{2}{I}_{c}^{min}.\phantom{\rule[-0ex]{.2em}{0ex}}$$

This is an approximation of Eq. (66), in which *ρ* = 1. We explain in the appendix that this approximation is unimodal and convex. Furthermore, the appendix discusses why this approximation is reasonable. Thus, Eq. (67) is the core of our ICA optimization. For convex problems such as this, convergence speed is enhanced by use of local gradients. See [48] for the differentiation of the absolute value function.

#### A note about channel voting

In principle, the airlight DOP *p* should be independent of the wavelet channel *c*. However, in practice, the optimization described above yields, for each wavelet channel, a different estimated value *p̂*. The reason is that some channels better comply with the independence assumption of Sec. 4.1, than other channels. Nevertheless, there is a way to overcome poor channels. Channels that do not obey the assumptions yield a random value for *p̂*. On the other hand, channels that are “good” yield a consistent estimate. Hence the optimal *p̂* is determined by voting. Moreover, this voting is constrained to the range *p̂* ∈ [0,1], due to Eq. (8). Any value outside this range is ignored. As an example, the process described in this section was performed on Scene 1. The process yielded a set of *p̂* values, one for each channel. Fig. 8 plots the voting result as a histogram per color channel. The dominant bar in each histogram determines the selected values of *p̂*.

## 5. Inhomogeneous distance

To separate *A* from *D* using ICA, both must be spatially varying. Consider the case of a spatially constant *A*. This occurs when all the objects in the FOV are at the same distance *z* from the camera. In this case, 𝓗_{Ac} and 𝓘(*A _{c},D_{c}*) are null, no matter what the value of the constant

*A*is. Hence, ICA cannot derive

*p*here. Therefore, to use ICA for calibrating the DOP, the distance

*z*must vary across the FOV. Distance nonuniformity is also necessary in the other methods (not ICA-based), described in Sec. 3, for estimating

*p*and

*A*

_{∞}. We note that scenarios having laterally inhomogeneous

*z*are the most common and interesting ones. In the special cases where

*z*is uniform, dehazing by Eq. (13) is similar to rather standard contrast stretch: subtracting a constant from

*I*

^{total}, followed by global scaling.

## 6. Additional experiments and comparisons

In addition to the experiments that correspond to images displayed in the previous sections, we performed several additional experiments. The respective images are shown in Figs. 9,10 and 11. In all these figures, circles overlayed on the raw image mark buildings, while the overlayed rectangles mark arbitrary points at different distances. The prior dehazing literature [22, 23] estimates the parameters directly from the sky, at the time the images were acquired. This yields values *p*
_{sky} and *A*
^{sky}
_{∞}, respectively. We have done so also in the experiments shown in this paper. Then, we compare *p*
^{sky} with the value of *p̂* obtained by each of the methods and experiments described in this paper, per color channel. This comparison is shown in Table 2. A similar comparison is done with respect to *Â*_{∞} in Table 3. Per scene and per color, the values of *p̂* are quite similar to one another and to *p*
^{sky}. The absolute errors in the DOP are ≈ 1 – 3%. Since the DOP itself is given in percent, the relative deviations in the DOP are typically ≈ 5%. The relative deviations in *Â*_{∞} are ≈ 8%.

What can be the reason for these deviations, and how critical are they? Each estimation method can only be as good as its underlying assumptions. In this work, each of the methods is based on some assumptions, which are accurate up to a point. Let us consider specific methods described in this paper. Secs. 3.1 and 3.3 assume that two marked objects in the FOV, e.g., two buildings, have the *same* underlying radiance *L*
^{object}. However, this is a rough assumption. It can be expected that buildings at different geographical places built at different years would have a somewhat different reflectance. The deviation from equality of the object reflectance propagates to the numerical estimation of the airlight parameters.

Even *p*
^{sky} and *A*
^{sky}
_{∞} are based on an assumption. These sky values correspond to objects at an infinite distance from the camera. This assumes that atmospheric conditions and scattering effects are effectively uniform to *infinity*. However, along an infinite LOS, atmospheric parameters *do* change. Eventually, this LOS passes beyond the atmosphere, in outer space, due to the Earth curvature. The direct sky measurement method thus assumes that most effects accumulate within a finite *effective* distance. Hence, the sky-based measurement, which was used in the prior art [20, 23], is by itself an approximation, based on a rough assumption. Even within finite distances, the atmospheric effects may be slightly nonuniform. This somewhat affects each of the parameter estimation methods, as each is using sets of points at different distances. These effects and others make the resulting *p̂* and *Â*_{∞} in any estimator deviate from the true value and from the results obtained by other estimators. However, the results are rather close to each other, given the uncontrolled, outdoor field conditions.

Based on the estimated parameters, dehazing was performed. The variations in *p̂* and *Â*_{∞} affect the visual result of *L̂*^{object}, as seen in Figs. 1,6,9,10 and 11. Nevertheless, in each experiment, the result of each dehazing method is remarkably better than the original hazy image *I*
^{min}, which was taken using the best polarizer state. The contrast and color of distant objects (trees, red roofs) are recovered from their dull bluish data. Hence, the enhancement achieved by dehazing is significant, in any of the calibration methods. Dehazing is thus tolerant to parameter uncertainties that arise from small deviations from the assumptions.

## 7. Discussion

While on its own each dehazing result is much better than the raw data, it is interesting to determine which of the methods yields the “best visual” results. This is a perceptual matter, and it requires a psychophysical methodology for testing and grading. The choice between methods is complex since each of them is based on a different set of priors (distances, similar objects, independent components). Objective evaluation of the accuracy of each method would require very extensive empirical testing. This is required to capture the wide variability of natural conditions in a statistically meaningful way. Hence, it would be useful to conduct such an empirical study.

Multiple scattering events have two inhibiting effects. First, multiple scattering blurs the object. Second, this scatter depolarizes the airlight, limiting the ability to exploit polarization for estimation of the distance-dependent airlight and transmittance. However, the dehazing method we work with assumes that single-scattering events are dominant. This assumption may lose its validity at very long distances [18]. The reason is that light propagating for multiple attenuation lengths increases the chance of experiencing secondary scattering effects. It may thus be worth studying the limitations of the method, in this context.

A desirable task is to completely automate skyless calibration of the atmospheric parameters. A significant part of this paper deals with blind estimation of the the DOP _{p}. This was done by taking a mathematical approach (ICA), which has solid foundations. To complete the automation, the estimation of *A*
_{∞} should be blind. This is an important direction. It is worth pursuing adaptations of this work to other scattering modalities, such as underwater photography [6, 52].

## A. A convex formulation

In Sec. 4.5, the minimization problem of Eq. (66) is approximated using Eq. (67). In this appendix we explain this move. Recall that in numerical optimization it is preferable to use a convex formulation of the cost function.

First, note that *D̃ _{c}*(

*x,y*) is a convex function of

*w*

_{1}and

*w*

_{2}, as seen in the linear relation given in Eq. (55). Moreover, the term [- log |

*w*

_{2}+

*w*

_{1}|] in Eq. (66) is a convex function of

*w*

_{1}and

*w*

_{2}, in the domain (

*w*

_{2}+

*w*

_{1}) ∈ 𝓡

^{+}. The optimization search can be limited to this domain. The reason is that following Eq. (56), (

*ŵ*

_{2}+

*ŵ*

_{1}) = 2

*κp*, where

*κ*is an arbitrary scale arising from the ICA scale insensitivity. If

*κ*> 0, then (

*w*

_{2}+

*w*

_{1})∈ 𝓡

^{+}since by definition

*p*≥ 0. If

*κ*< 0, we may simply multiply

*κ*by -1, thanks to this same insensitivity. Hence, the overall cost function (66) is convex, if |

*Dį*|

_{c}^{ρ}is a convex function of

*D̃*.

_{c}The desired situation of having |*D̃ _{c}*|

^{ρ}convex occurs only if

*ρ*≥ 1. Apparently, we should estimate

*ρ*at each iteration of the optimization, by fitting the PDF model (Eq. 62) to the values of

*D̃*(

_{c}*x,y*). Note that this requires estimation of

*σ*as well. Such parameter estimation is computationally complex, however. Therefore, we preferred using an approximation and

*set*the value of

*ρ*, such that convexity is obtained. Note that

*ρ*< 1 for sparse signals, such as typical sub-band images. The PDF representing the sparsest signal that yields a convex function in Eq. (66) corresponds to

*ρ*= 1. Thus we decided to use

*ρ*= 1 (see also [53–55]).By this decision, we may have sacrificed some accuracy, but enabled convexity. In contrast to the other steps described in sections 4.1,4.3 and 4.5, the use of

*ρ*= 1 is an

*approximation*. How good is this approximation? To study this, we sampled 5364 different images

*D̃*(

_{c}*x,y*). These images were based on various values of

*p, c*and on different raw frames. Then, the PDF model (Eq. 62) was fitted to the values of each image. The PDF fit yielded an estimate of

*ρ*per image. A histogram of the estimates of

*ρ*over this ensemble is plotted in Fig. 12. Here,

*ρ*= 0.9 ± 0.3. It thus appears that the approximation is reasonable.

## Acknowledgments

We thank the reviewers for their useful comments. Yoav Schechner is a Landau Fellow - supported by the Taub Foundation, and an Alon Fellow. The research was done in collaboration with ElOp Ltd., and partially supported by the Israel Science Foundation (Grant 1031/08). It was conducted in the Ollendorff Minerva Center in the Elect. Eng. Dept. at the Technion. Minerva is funded through the BMBF

## References and links

**1. **N. S. Kopeika, *A System Engineering Approach to Imaging*, SPIE Press, Bellingham (1998).

**2. **R. T. Tan, N. Pettersson, and L. Petersson, “Visibility enhancement for roads with foggy or hazy scenes,” In Proc. IEEE Intelligent Vehicles Symposium 19–24 (2007).

**3. **R. C. Henry, S. Mahadev, S. Urquijo, and D. Chitwood “Color perception through atmospheric haze,” J. Opt. Soc. Am. A **17**, 831–835 (2000). [CrossRef]

**4. **J. S. Jaffe, “Computer modelling and the design of optimal underwater imaging systems,” IEEE J. Oceanic Eng. **15**, 101–111 (1990). [CrossRef]

**5. **D. M. Kocak, F. R. Dalgleish, F. M. Caimi, and Y. Y. Schechner, “A focus on recent developments and trends in underwater imaging,” MTS Journal **42**, 52–67 (2008).

**6. **Y. Y. Schechner and N. Karpel, “Recovery of underwater visibility and structure by polarization analysis,” IEEE J. Oceanic Eng. **30**, 570–587 (2005). [CrossRef]

**7. **P. C. Y. Chang, J. C. Flitton, K. I. Hopcraft, E. Jakeman, D. L. Jordan, and J. G. Walker, “Improving visibility depth in passive underwater imaging by use of polarization,” Appl. Opt. **42**, 2794–2803 (2003). [CrossRef] [PubMed]

**8. **D. B. Chenault and J. L. Pezzaniti, “Polarization imaging through scattering media,” In Proc. SPIE **4133**, 124–133 (2000).

**9. **S. G. Demos and R. R. Alfano, “Optical polarization imaging,” Appl. Opt. **36**, 150–155 (1997). [CrossRef] [PubMed]

**10. **X. Gan, S. P. Schilders, and M. Gu, “Image enhancement through turbid media under a microscope by use of polarization gating method,” J. Opt. Soc. Am. A **16**, 2177–2184 (1999). [CrossRef]

**11. **S. Harsdorf, R. Reuter, and S. Tönebön, “Contrast-enhanced optical imaging of submersible targets,” In Proc. SPIE **3821**, 378–383 (1999).

**12. **M. J. Raković, G. W. Kattawar, M. Mehrübeoğlu, B. D. Cameron, L. V. Wang, S. Rastegar, and G. L. Coté, “Light backscattering polarization patterns from turbid media: theory and experiment,” Appl. Opt. **38**, 3399–3408 (1999). [CrossRef]

**13. **S. P. Schilders, X. S. Gan, and M. Gu, “Resolution improvement in microscopic imaging through turbid media based on differential polarization gating,” Appl. Opt. **37**, 4300–4302 (1998). [CrossRef]

**14. **J. S. Tyo, M. P. Rowe, E. N. Pugh Jr., and N. Engheta, “Target detection in optically scattering media by polarization-difference imaging,” Appl. Opt. **35**, 1855–1870 (1996). [CrossRef] [PubMed]

**15. **J. S. Tyo, “Enhancement of the point-spread function for imaging in scattering media by use of polarization-difference imaging,” J. Opt. Soc. Am. A **17**, 1–10 (2000). [CrossRef]

**16. **K. M. Yemelyanov, S. S. Lin, E. N. Pugh Jr., and N. Engheta, “Adaptive algorithms for two-channel polarization sensing under various polarization statistics with nonuniform distributions,” Appl. Opt. **45**, 5504–5520 (2006). [CrossRef] [PubMed]

**17. **G. Horváth and D. Varjù, *Polarized Light in Animal Vision*, Springer-Verlag, Berlin (2004).

**18. **N. Shashar, S. Sabbah, and T. W. Cronin, “Transmission of linearly polarized light in seawater: implications for polarization signaling,” J. Exper. Biology , **207**, 3619–3628 (2004). [CrossRef]

**19. **R. Wehner, “Polarization vision a uniform sensory capacity?,” J. Exper. Biology **204**, 2589–2596 (2001).

**20. **F. Cozman and E. Kroktov, “Depth from scattering,” In Proc. IEEE CVPR, 801–806 (1997).

**21. **K. Tan and J. P. Oakley, “Physics-based approach to color image enhancement in poor visibility conditions,” J. Opt. Soc. Am. A **18**, 2460–2467 (2001). [CrossRef]

**22. **E. Namer and Y. Y. Schechner, “Advanced visibility improvement based on polarization filtered images,” In Proc. SPIE **5888**, 36–45 (2005).

**23. **Y. Y. Schechner, S. G. Narasimhan, and S. K. Nayar, “Polarization-based vision through haze,” Appl. Opt. **42**, 511–525 (2003). [CrossRef] [PubMed]

**24. **D. Miyazaki, M. Saito, Y. Sato, and K. Ikeuchi,“Determining surface orientations of transparent objects based on polarization degrees in visible and infrared wavelengths,” J. Opt. Soc. Am. A **19**, 687–694 (2002). [CrossRef]

**25. **V. Gruev, A. Ortu, N. Lazarus, J. V. der Spiegel, and N. Engheta, “Fabrication of a dual-tier thin film micropolarization array,” Opt. Express **15**, 4994–5007 (2007). [CrossRef] [PubMed]

**26. **N. Gupta, L. J. Denes, M. Gottlieb, D. R. Suhre, B. Kaminsky, and P. Metes, “Object detection with a field-portable spectropolarimetric imager,” Appl. Opt. **40**, 6626–6632 (2001). [CrossRef]

**27. **C. K. Harnett and H. G. Craighead, “Liquid-crystal micropolarizer array for polarization-difference imaging,” Appl. Opt. **41**, 1291–1296 (2002). [CrossRef] [PubMed]

**28. **J. S. Tyo, D. L. Goldstein, D. B. Chenault, and J. A. Shaw, “Review of passive imaging polarimetry for remote sensing applications,” Appl. Opt. **45**, 5453–5469 (2006). [CrossRef] [PubMed]

**29. **J. S. Tyo and H. Wei, “Optimizing imaging polarimeters constructed with imperfect optics,” Appl. Opt. **45**, 5497–5503 (2006). [CrossRef] [PubMed]

**30. **J. Wolfe and R. Chipman, “High speed imaging polarimeter,” In Proc. SPIE **5158**, 24–32 (2003).

**31. **L. B. Wolff, “Polarization camera for computer vision with a beam splitter,”J. Opt. Soc. Am. A **11**, 2935–2945 (1994). [CrossRef]

**32. **C. F. Bohren and A. B. Fraser, “At what altitude does the horizon cease to be visible?,” American Journal of Physics **54**, 222–227 (1986). [CrossRef]

**33. **D. K. Lynch, “Step brightness changes of distant mountain ridges and their perception,” Appl. Opt. **30**, 3508–3513 (1991). [CrossRef] [PubMed]

**34. **E. J. McCartney, *Optics of the Atmosphere: Scattering by Molecules and Particles*, John Willey & Sons (1975).

**35. **S. K. Nayar and S. G. Narasimhan, “Vision in bad weather,” Proc. IEEE ICCV , 820–827 (1999).

**36. **
For clarity of display, the images shown in this paper have undergone the same standard contrast stretch. This operation was done only towards the display. The algorithms described in the paper were run on raw, unstretched data. The data had been acquired using a Nikon D-100 camera, which has a linear radiometric response. The mounted zoom lens used with the camera was set to focal length of ≈ 200*mm*, except for Fig. 9 in which it was ≈ 85*mm*. The camera was pointed at or slightly below the horizon.

**37. **H. Farid and E. H. Adelson, “Separating reflections from images by use of independent component analysis,” J. Opt. Soc. Amer A **16**, 2136–2145 (1999). [CrossRef]

**38. **S. Shwartz, M. Zibulevsky, and Y. Y. Schechner, “Fast kernel entropy estimation and optimization,” Signal Processing **85**, 1045–1058 (2005). [CrossRef]

**39. **S. Umeyama and G. Godin, “Separation of diffuse and specular components of surface reflection by use of polarization and statistical analysis of images,” IEEE Trans. PAMI **26**, 639–647 (2004). [CrossRef]

**40. **D. Nuzilland, S. Curila, and M. Curila, “Blind separation in low frequencies using wavelet analysis, application to artificial vision,” In Proc. ICA, 77–82 (2003).

**41. **S. Shwartz, E. Namer, and Y. Y. Schechner, “Blind haze separation,” In Proc. IEEE CVPR **2**, 1984–1991 (2006).

**42. **E. H. Adelson, “Lightness perception and lightness illusions,” in *The New Cognitive Neuroscience*, 2nd ed. ch. 24 339–351, MIT Preess, Cambridge (2000).

**43. **R. A. Chipman, “Depolarization index and the average degree of polarization,” Appl. Opt. **44**, 2490–2495 (2005). [CrossRef] [PubMed]

**44. **S. J. Bell and T. J. Sejnowski, “An information-maximization approach to blind separation and blind deconvolution,” Neural Computation **7**, 1129–1159 (1995). [CrossRef] [PubMed]

**45. **J.-F. Cardoso, “Blind signal separation: statistical principles,” Proc. IEEE **86**, 2009–2025 (1998). [CrossRef]

**46. **A. Hyvärinen, J. Karhunen, and E. Oja, *Independent Component Analysis*, John Wiley and Sons, New York (2001). [CrossRef]

**47. **D. T. Pham and P. Garrat, “Blind separation of a mixture of independent sources through a quasi-maximum likelihood approach,” IEEE Trans. Signal Processing , **45**, 1712–1725 (1997). [CrossRef]

**48. **P. Kisilev, M. Zibulevsky, and Y. Y. Zeevi, “Multiscale framework for blind source separation,” J. Machine Learning Research **4**, 1339–1364 (2004).

**49. **E. P. Simoncelli, “Statistical models for images: Compression, restoration and synthesis,” In Proc. Conf. Sig. Sys. and Computers, 673–678 (1997).

**50. **
An additional ICA ambiguity is *permutation*, which refers to mutual ordering of sources. This ambiguity does not concern us at all. The reason is that our physics-based formulation dictates a special form for the matrix **W**, and thus its rows are not mutually interchangeable.

**51. **T. M. Cover and J. A. Thomas, *Elements of Information Theory*, John Wiley and Sons, New York (1991). [CrossRef]

**52. **T. Treibitz and Y. Y. Schechner, “Instant 3Descatter,” In Proc. IEEE CVPR 1861–1868 (2006).

**53. **P. Bofill and M. Zibulevsky, “Underdetermined blind source separation using sparse representations,” Signal Processing **81**, 2353–2362 (2001). [CrossRef]

**54. **M. Zibulevsky and B. A. Pearlmutter, “Blind source separation by sparse decomposition in a signal dictionary,” Neural Computation archive **13**, 863–882 (2001). [CrossRef]

**55. **Y. Li, A. Cichocki, and S. Amari, “Analysis of sparse representation and blind source separation,” Neural Computation **16**, 1193–1234 (2004). [CrossRef] [PubMed]