Estimation of physically realizable Mueller matrices from experiments using global constrained optimization

Jawad Elsayed Ahmad; Yoshitate Takakura

doi:10.1364/OE.16.014274

1. Introduction

Stokes-Mueller polarimetry is a powerful technique for quantitative investigations of the polarization properties of optical systems as well as for complex scattering targets [1]. Imaging Stokes-Mueller polarimeters are operative in many applications including remote sensing, dermatology, ophthalmology and industrial control [2, 3, 4]. They are generally composed of linear polarizers cascaded with rotating retarders and ended with a charge-coupled camera that measures intensity (Fig.1). Accordingly, the extraction of Mueller matrix coefficients becomes possible from raw data on condition that not less than 16 measurements have been recorded [5].

Conventional techniques used to estimate Mueller matrices are implemented with linear reconstruction algorithms: matrix inversion (MI) or pseudo-inversion (PI). They quest the solution that minimizes the residual error of either L ¹, L ², L ^∞ or Frobenius norm depending upon configuration. The major shortcoming within these conventional methods is their inadequacy to preserve the solution stability to perturbations and systematic errors. Therefore, in the presence of experimental noise, the situation regularly degenerates and reconstructed Mueller matrices often fail to be physically admissible [6]. In other words, these conventional methods might be effective to seek the solution that minimizes the residual error but fail to ensure that the obtained solution is robust.

Mathematically, for a stationary, Gaussian and independent noise, these inversions are considered as a maximum likelihood (ML) estimation of the Mueller matrix from experimental intensity data. ML estimators tend to amplify noise impact in reconstructed Mueller matrices.

This situation could be very critical especially when analyzing physical properties of optical devices which are usually represented by realizable Mueller matrices with error-free polarimeters. Perturbations in experimental measurements can produce inaccurate results for different physical properties of analyzed optical and scatterers elements [7, 8]. Thus, analyzed optical system physical values namely retardance, diattenuation and polarizance risk to be incorrectly estimated [9]. For that reason, it is essential to incorporate further information about the solution in order to determine the most probable physical characterization of optical targets under test. This will certainly improve the quality of analysis and understanding of complex systems.

In reference [10] a considerable effort was dedicated to the analysis of the physical realizability of experimental Mueller matrices. Their work was based on a confidence ratio that can be attributed to each experimental Mueller matrix. This level of confidence was calculated using error propagation functions extracted from the criterion formulated by Givens and Kostinski [11]. Their work can be considered reliable only for testing a given Mueller matrix. No further improvement was considered to extract a physical Mueller matrix from the inadmissible Mueller matrix under test. Error filtering from inadmissible Mueller matrices was analyzed by authors of Refs. [12, 13]. The filtering technique was based on eliminating all the negative eigenvalues present in the hermitian coherency matrix associated with the inadmissible Mueller matrix. In another terms, negative eigenvalues were associated with a depolarizing phenomenon and the filtering method has proposed to eliminate the depolarization term causing inadmissibility, as far as deterministic Mueller matrices are concerned. Thus by replacing each negative eigenvalue by zero, reconstruction of an admissible Mueller matrix can be achieved.

Recently, in the context of imposing admissibility to a reconstructed Mueller matrix, an interesting paper was published [14]. Authors of reference [14] have inferred an admissible Mueller matrix by a maximum likelihood method applied to the corresponding hermitian matrix. Their assumption reposes on the fact that an admissible Mueller matrix has its corresponding hermitian matrix positive semi-definite. In fact, at the end of section (5.2) we will illustrate by an experimental example that an admissible Mueller matrix can have a negative eigenvalue in its corresponding hermitian matrix, which is totally in compliance with the paper written by authors of reference [15].

In the present work we aim to provide an efficient method for including proper side active constraints that lead to the most probable physical solution inferred from experimental data. We show that including the Givens and Kostinski (GK) criterion as active constraints in the optimization yields to stabilized physical solution. Constrained global optimization techniques are employed to ensure proper convergence.

2. Experimental and theoretical considerations

2.1. Polarimetric system overview

Because of the mathematical nature of the Mueller matrix/image, it is impossible to measure its elements directly. Instead, these elements can be retrieved from at least 4×4 intensity measurements through different polarization generated and analyzed states. Let P be the complete matrix that characterizes the Stokes parameters of the polarization state generator (PSG) and let A be the complete matrix representing an elliptical diattenuator known as polarization state analyzer (PSA). The expected intensity matrix recorded by the observation system is given by:

I_{t} \propto AMP .

Intensity data I _t are obtained from averaging on pixels corresponding to the entrance pupil of the camera. Such values should be representative of the radiance received by the observation system. When the system is calibrated, i.e., A and P are known, M can be easily determined from at least 16 measurements. Practically, it is recommended to have an over-determined system of measurements [16]. A pseudo-inversion (PI) method can be used instead of the normal matrix inversion. Writing relation (1) in a more compact form we get: I_t ∝ B m_l, where m_l is a reshape column vector of the Mueller matrix. The elements of matrix B are obtained by identification with relation (1), they are directly connected to the Kronecker product of the analyzer + polarizer Stokes parameters, B=P ^T ⊗A.

Fig. 1. Classical imaging polarimeter [17]. LS, incoherent light source; F, filter; P_H,V, horizontal and vertical linear polarizers; L_1,2, rotating retardation plates; IF, interferential filter.

Download Full Size | PDF

A conventional PI operation is characterized by minimizing the error function of leastsquares distance [18]. Let ε be the error between the expected intensities and experimental measurements ε=I_t-I_e=B m_l-I_e. The minimization of the least-squares error is done by computing the solution vector m_l that minimizes the error module norm ‖ε‖²=ε ^T ε. Hence, we can write:

ε^{T} ε = {(B m_{l} - I_{e})}^{T} (B m_{l} - I_{e})

Differentiating the above equation with respect to m_l we get B ^T B _ml-B ^T I_e=0, thus the choice of the matrix that minimizes the least-squares error is:

m_{l} = {(B^{T} B)}^{- 1} B^{T} I_{e} = B^{+} I_{e}

where B ⁺ is the pseudo-inverse of matrix B. One clearly notices that the linear algorithm described above is devoted exclusively to calculate the solution that minimizes the residual error norm ‖B m_l-I_e‖. This conventional approach cannot guarantee the admissibility of the solution because this specification was not integrated within optimization.

2.2. Physical realizability

Let the column vector S=[s _ο, s ₁, s ₂, s ₃]^T represent an arbitrary Stokes vector. A given 4×4 real matrix is considered a physical Mueller matrix if for each possible incident physical Stokes vector S_in the emergent Stokes vector S_e=M S_in is also physical [19], i.e., it satisfies the following inequality:

s_{0} \geq 0, s_{0}^{2} - (s_{1}^{2} + s_{2}^{2} + s_{3}^{2}) \geq 0 .

The problem of finding a necessary and sufficient condition for determining whether a given matrix M is a realizable Mueller matrix has been addressed by many authors [11, 19, 20]. Among them, the criterion formulated in Ref. (11) states that a necessary and sufficient condition for the matrix M ∈ℝ^4×4 to be an admissible Mueller matrix if and only if the spectrum of GM ^T GM is real and the eigenvector S _σ1 associated with the largest eigenvalue is a physical Stokes vector. The matrix G is the Lorentz metric written in the form of a unitary rotation tensor in Minkowski space-time, it is defined as diag(1, -1, -1, -1).

Posing [S,∑k]=eig(GM ^T GM), where ∑_k=1...4=diag(σ₁,σ₂,σ₃,σ₄) has been arranged in the descending order of eigenvalues. The rigorous extraction of an admissible Mueller matrix can be realized by implementing a non-linear constrained optimization instead of simple matrix inversion.

This constrained minimization procedure to extract a physical Mueller matrix from intensity data will take the following form:

{min_{m_{l}} ∥ B m_{l} - I_{e} ∥}^{2}

subject to : {\begin{matrix} imag {(Σ_{k})}_{k = 1 . . . 4} = 0; \\ S_{σ_{1}}^{T} G S_{σ_{1}} \geq 0; \\ s_{0, σ_{1}} \geq 0 . \end{matrix}

The first equality that has been imposed ensures that all eigenvalues are real. The other two inequalities are employed to guarantee that the eigenvector S _σ1 corresponding to the largest eigenvalue is a physical Stokes vector. They are direct implementation of relation (4). In the remaining part of the paper we will denote the equality constraint by h(M) and the two inequality constraints by g_j(M) where j={1,2}.

3. Global optimization setups

3.1. Initialization process

Finding a constrained global minimum of (5) is challenging as well as difficult. First, the large number of variables of the Mueller matrix make it difficult to reach a global minimum in a reasonable computational time. Moreover, feasible regions in a 16-dimensional space may be disjoint and the search may need to visit multiple feasible regions before finding the global minimum. Furthermore, the minima points satisfying the constraints are not necessarily the global minima of the original problem. In order to decrease error impact on raw data and also to decrease computation time, this minimization approach needs an initial estimate. It is preferable to be physically acceptable and near the optimum solution. This initial estimate is calculated first by finding the hermitian coherency matrix H corresponding to the Mueller matrix extracted from the PI method. This coherency matrix is then decomposed using target decomposition [21, 31] into H=UDU ^†, where D is a diagonal matrix arranged in the descending order of positive eigenvalues, U is a unitary matrix of corresponding eigenvectors and U ^† is its transpose conjugate.

In the case where the original Mueller matrix is not admissible, the target decomposition of its corresponding coherency matrix contains at least one negative eigenvalue in D. We suggest setting this value to zero and recalculating the matrix H _i and to recalculate from this new coherency matrix, the Mueller matrix M _i which could be used as an initial estimate.

Once we have managed to start running the optimization routine from a feasible region, it appears that the first equality constraint mentioned in relation (6) can be held inactive and solution will not be altered. Nevertheless, to be coherent with theory, we have kept this equality constraint activated in all experiments.

3.2. Sequential quadratic programming

We have implemented at the first place a deterministic constrained optimization based on the sequential quadratic programming (SQP) method [22]. In this method, a quasi-Newton approach is used between consecutive iterations and Broyden-Fletcher-Goldfarb-Shanno (BFGS) formula is maintained to approximate the Hessian. A line search is performed using Han’s merit function to determine the step length at each iteration [23].

Two major drawbacks emerge within Newton-like or SQP methods. First, they require the derivatives of both objective and constraints functions, this will restrict them to continuous differentiable functions. Thus treating discrete problems is subjected to the accuracy of several update formulae (interpolations) used to approximate first and second order derivatives. As far as Mueller matrix is concerned, these methods may frequently get trapped at local minima.

To circumvent from local minima, Gaussian perturbations were added to the attained minimum. The new matrix has been considered as the starting point for another SQP iteration. This operation is done until convergence to the same optimum for consecutive SQP iterations. Unfortunately results were not always satisfactory, convergence frequently occurs at a local minimum and a global minimum satisfying the realizability condition was rarely attained. The identified reason is that the number of local optimum (pseudo-solution) in the Mueller space is quite large and thus derivatives search methods based on local hill-climbing optimization become less effective. Moreover, this suggested iterative SQP technique becomes time-consuming especially since there do not exist any precise stop condition controlling the program flow.

3.3. Constrained simulated annealing

At this stage a new strategy orientation must be adopted to perform a global optimization that does not depend on derivatives search criterion. In this framework, using a stochastic approach based on the constrained simulated annealing (CSA) method is considered an alternate choice to SQP methods [24].

The proposed CSA algorithm consists of two major steps: generating trial points and accepting them based on some acceptance probability. To find a global minimum CSA seeks for a saddle point, in the joint space of the Mueller matrixMand the Lagrangian λ, satisfying both minimum objective function and constraints realization. The saddle point can be reached by carrying out probabilistic descents in the variable space (Mueller coefficients) and probabilistic local ascents in the Lagrange-multiplier space.

Without loss of generality, the CSA algorithm discussed in Algorithm 1 solves only equality constraints, in the form of h_i(M)=0, with the following augmented Lagrangian function:

ℒ (M, λ) = Residual (M) + λ^{T} ∣ h (M) ∣ + \frac{1}{2} ∥ h (M) ∥^{2}

since inequality constraints in the form g _j(M) ≤ 0 can always be transformed into equality constraints using a maximum function, g̃_j(M)=max(0,g_j(M))=0.

Algorithm 1 represents the proposed global search for solving Eq. (5). The algorithm starts by regrouping all necessary parameters to be initiated. A physical starting point X_i=(M _i,λ) is selected after calculating M _i from target decomposition and setting λ=0. At the beginning, optimum value X_opt is considered identical to X_i.

The control temperature T is initialized to be large enough in order to allow several search directions to be accepted at the beginning. In this setup, initial temperature is obtained first by calculating M _i and randomly generating 100 corresponding neighbors M′_i, then setting the initial temperature to take the maximum value between the Lagrangian and constraints functions:

T_{ini} = max (∣ ℒ ‒ (M_{i,}^{'} 1) - ℒ (M_{i,} 1) ∣, ∣ h (M_{i}) ∣) .

Our reasoning for choosing the initial temperature is based on the amount of violation observed in the problem. While the algorithm propagates, the temperature is reduced following a cooling schedule α after looping N_T times for the same temperature T. Theoretically, if T is reduced very slowly CSA will converge to a constrained global minimum [26]. Unfortunately, a very slowly cooled temperature is time-consuming and thus duration for treating a large number of inadmissible pixels may become extensively long. It appears that selecting a polynomial cooling schedule is very consistent and reliable when dealing with Mueller 16-variables space [27].

Algorithm 1 Constrained Simulated Annealing algorithm

Require: M⇐Mi;

Target decomposition;

Cooling rate: 0 < α < 1;

Starting temperature: Tini;

Number of trials per temperature: NT.

Stopping condition: Temperature ≪ 1 or Optimum is unchanged for several successive iterations

X_opt=(M,λ=0) and T=T_ini

1: while Stopping condition is not satisfied do

2: for k←1 to NT do

3: generate trial points X′=neighborhood(X_opt)

4: Ensure: feasible direction, X′ is physical

5: if 𝓛(X′) < 𝓛(Xopt) then

6: Accept Xopt=X′

7: else

8: Accept X′ if Pr(Xopt,X′) > rand[0,1]

9: end if

10: end for

11: update temperature by T←α×T

12: end while

After initialization, Algorithm 1 searches for a feasible neighbor around the optimal estimate X_opt. It searches first for a neighbor in the variable space by performing small perturbations to the Mueller matrix. If these perturbations were incapable of generating feasible points then it carries out probabilistic ascents of 𝓛(M,λ) with respect to λ for a fixed value of M in order to increase the penalty of violated constraints and to force them into satisfaction. A local examination in the Lagrangian space can be viewed as a global search in variable space.

We consider two cases in generating trial point X′=(M′,λ) or X′=(M,λ′). In the first case we set M′=M+α_τ ʘ N (0, σ_i), where α_τ is a 4×4 scaling matrix generated from a seed distribution that decreases with time; 𝒩(0,σ_i) is a varying Gaussian distribution with a randomly generated variance, and ʘ is the Hadamard product. The scaling matrix α _τ provides us with an adaptive varying step algorithm based on the experimental observation that the CSA algorithm needs larger searching steps at the beginning and as time passes the algorithm will approach a region near the optimum value. At this point, for a proper convergence, smaller steps in the searching directions will be more appropriate.

When generating a trial point X′=(M,λ′) we apply the following: λ′=λ+βψ, where β is a random variable uniformly generated in the range of [-1,1] and ψ is the maximum value of constraints violation. We set the ratio of generating (M′,λ) and (M,λ′) to be 20n to m, where n is the number of variables and m is the number of constraints [24], meaning thatMis updated more frequently than λ.

To this end, once a feasible neighbor is attained, CSA routine propagates similarly to classical SA routines but here the so-called energy cost function is replaced by an augmented Lagrangian function. The implemented CSA algorithm uses the Metropolis probability selection criterion controlled by temperature and the objective cost function, as in relation (9). The routine propagates as follows: given current solution X_opt, it generates a physical trial candidate X ′ in the vicinity of Xopt. If 𝓛(X′) < 𝓛(X_opt) then X′ is accepted with a probability of one as a starting point for the next iteration. Otherwise, X′ is accepted with probability:

P_{r} (X_{opt}, X') = {\begin{matrix} \exp (- \frac{ℒ (X') - ℒ (x_{opt})}{T}) & if & X' = (M', λ) . \\ \exp (- \frac{ℒ (X_{opt}) - ℒ (X')}{T}) & if & X' = (M, λ') \end{matrix}

which depends on whetherM or λ is changed in X′.

Finally, the stop conditions that govern the algorithm occur when the temperature becomes very small or when the optimum is kept unchanged for a couple of successive iterations at the same temperature.

4. Simulated case study

4.1. Modified Shepp-Logan phantom

In light of the analytical approach detailed in the previous section, a numerical code, that minimizes the residual error norm subjected to the physical admissibility constraints explicated in relation (6), was developed.

An interesting method to illustrate the validity of either the SQP or CSA techniques is by processing Mueller images instead of a single Mueller matrix. This will drive us to iterate the algorithm on each pixel of the image. In this section, presented results only concern the CSA technique described earlier.

Let us consider a simulated phantom composed of four different regions, each of them corresponds to a linear polarizer with a polarization angle of φ with respect to the vertical optical axis, Fig. 2.

Fig. 2. Modified Shepp-Logan phantom. Each color corresponds to different linear polarizer angle.

Download Full Size | PDF

Thus from each pixel a Mueller matrix can be generated [25] with:

M_{ϕ} = [\begin{matrix} 1 & \cos 2 ϕ & \sin 2 ϕ & 0 \\ \cos 2 ϕ & \cos^{2} 2 ϕ & \cos 2 ϕ \sin 2 ϕ & 0 \\ \sin 2 ϕ & \cos 2 ϕ \sin 2 ϕ & \sin^{2} 2 ϕ & 0 \\ 0 & 0 & 0 & 0 \end{matrix}] .

Fig. 3. Mueller matrix of the modified Shepp-Logan phantom with additive Gaussian noise.

Download Full Size | PDF

Fig. 4. Top view of the Poincaré sphere. The four linear polarizers are represented by small colored spheres where their centers are located at an angle of 2ϕ from the S ₁ axis. Left image: results without constrained optimization. Right image: results when employing CSA, all the points that were lying outside the Poincaré sphere are now translated into inside.

Download Full Size | PDF

From this Mueller image, synthesized intensities can be extracted from relation (1). Gaussian noise with zero mean and known standard deviation is then added to these intensities and the noise-affected intensities are injected back to retrieve a simulated Mueller image by performing a matrix inversion or a pseudo-inversion if the number of different orientation angles of the two wave plates was more than 16 combinations. Subsequently, all pixels of the noise-affected Mueller image are mapped on the Poincaré sphere, Fig. 4(a).

We can notice in Fig. 4 that added Gaussian noise transforms each region to a three-dimensional sphere centered at the theoretical location of each polarizer angle. The volume of each sphere is related to the noise variance. Furthermore, half of each sphere protrudes the Poincaré’s sphere physical boundaries. Hence, this part represents the location of inadmissible matrices. Figure 4(a) maps the noise impact of the first column of the Mueller matrix. In fact, to reflect the noise impact on all the Mueller matrix elements a surface plot is needed [28]. More precisely, for a noise with SNR≈30 dB, which is considered very modest, 84% of pixels inside the simulated Mueller image become outliers (physically inadmissible). Running a CSA routine will transform all these outliers to admissible pixels.

Fig. 5. Two extreme cases of possible transitions. Point A will be transformed to point A′ which will induce error reduction. Point B will be transformed to B′ which will result an error increase.

Download Full Size | PDF

4.2. Physical interpretation

After running the CSA minimization algorithm we can notice that the 3D spheres, corresponding to the intensity-added Gaussian perturbations, are then transformed to hemispheres with all inadmissible points lying in the outer part of the Poincaré sphere mapped to points lying inside the inner hemisphere. This will directly result in a reduction of the error volume occupied by each polarizer.

When adding Gaussian noise to ideal intensities with a signal-to-noise ratio of SNR≈30dB, Frobenius norm of the error between theoretical and synthesized data is about 8.30%. The CSA algorithm reduces such an error to 7.78%: an improvement can be noticed.

The physical interpretation of this phenomenon resides in studying two extreme cases based on the first column of the Mueller matrix (Fig. 5): a point A lying inside the outer hemisphere and having high error percentage (Euclidean distance d1) compared to the theoretical point C could be transformed to a point A′ inside the physical hemisphere. This operation will result in an error decrease between the admissible and the inadmissible Mueller matrices. On the contrary, CSA could transform an inadmissible point B with modest percentage error to a new admissible point B′ with larger percentage error.

To that end, if we sum up all possible combinations, we can expect that the CSA algorithm will not increase the error percentage of the experiment if we have only 50% of outliers. In fact, we have about 84% of outliers and thus when these outliers are transformed into inliers we have observed an error decrease. The reason is that CSA transforms outliers to an inner hemisphere with a smaller radius than a Gaussian distribution with a SNR of 30 dB.

On the other hand, SQP risks to increase the error percentage because it is not as accurate as CSA so there exist cases where a given inadmissible point has been transformed to a point inside the Poincaré sphere to a position very far from its corresponding class. The consecutive accumulation of such observations will increase the solution error percentage.

Fig. 6. Convergence test of the CSA algorithm performed on the modified Shepp-Logan phantom. (a) Geometrical definition of the distance δ: which is the minimum distance from the center of the Poincaré sphere to the CSA solution region. (b) Plot between the added noise variance σ ² with respect to the distance δ; this graph shows that the CSA algorithm converges to an admissible solution (δ≈1) very near the theoretical solution for different levels of added noise.

Download Full Size | PDF

4.3. CSA convergence check

A critical issue concerning any numerical method is the convergence test. Such a test will allow us to validate the robustness of the proposed numerical method under investigation. Here, we have tested the robustness, thus the convergence of the CSA algorithm with different added Gaussian noise variance σ ². The test was carried out as follows: First, the modified Shepp-Logan phantom is used to generate the intensity data. Second, Gaussian noise with controlled variance is added to these intensity images. Third, we retrieve the noisy Mueller image from the noisy intensity measurements. Finally, we run the CSA algorithm for each Mueller matrix associated to each pixel within the Mueller image.

To this end, to illustrate the convergence of the CSA method we calculate for each class its minimum degree of polarization distance δ, it is the nearest distance from each class to the center of the Poincaré sphere. This distance can be directly interpreted as the worst convergent point of the algorithm. If this distance is very far from the center of the sphere, δ≈1, then the algorithm has attained the convergence. Otherwise the algorithm is not converging and thus the estimated Mueller matrix is quite wrong.

In Fig. 6(b), a plot illustrating the value of δ with respect to different levels of noise variance. For an added noise variance of σ ²=2×10^-3 the algorithm converges to δ=0.97. Even for large noise values in the order of σ ²=3×10^-2 the CSA algorithm is converging near the theoretical (optimum) solution, δ=0.92.

5. Experimental results

5.1. Mueller matrix of the air

We have verified the proposed approach on experimental intensity measurements; intensities were recorded following 8×8 different angular positions of the quarter–wave plates of the PSG and PSA respectively. The Mueller matrix M is then obtained by inverting relation (1), M=A# I _t P#. Where # represents the matrix inverse when A and P are square matrices, or it represents the matrix Pseudo-Inverse when A and P are rectangular matrices.

When no sample (the air) is placed between the PSG and the PSA, one should get the 4×4 identity matrix[29]. This identity matrix should be representative of the theoretical Mueller matrix of the air. Classical unconstrained PI methods lead to:

$M_{PI} = [\begin{matrix} 1.000 & - 0.000 & 0.019 & 0.001 \\ 0.004 & 0.996 & 0.018 & - 0.001 \\ 0.001 & 0.016 & 0.995 & 0.000 \\ - 0.002 & 0.006 & - 0.003 & 0.992 \end{matrix}] .$

The above experimental matrix is very close to the theoretical identity matrix. The Frobenius norm distance between these two matrices is 3.34% but this matrix, M _PI, is not admissible: ∑={1.021,0.997,0.985,0.963} and $S_{σ_{1}}^{T} G S_{σ_{1}} = - 0.747 .$

Now when we use both SQP and CSA techniques and activate the admissibility constraints mentioned in Eq. (6) we get:

$M_{opt} = [\begin{matrix} 1.004 & - 0.002 & 0.017 & 0.001 \\ 0.006 & 0.985 & 0.010 & - 0.002 \\ 0.003 & 0.012 & 0.980 & 0.001 \\ - 0.002 & 0.005 & - 0.002 & 0.980 \end{matrix}] .$

In this case both methods have attained the same optimum solution. The above matrix is physically acceptable: ∑={1.003,0.988,0.960,0.947} and $S_{σ_{1}}^{T} G S_{σ_{1}} = - 0.817 .$ The relative Frobenius norm distance ‖M _PI-M _opt‖/‖M _PI+M _opt‖ between the physical matrix and the inadmissible one is 0.62%. The error in the physical matrix is slightly larger than the inadmissible matrix. This result highlights the fact that there exists an error increase penalty when using global optimization. In fact, the experimental polarimeter that we dispose of is very accurate, in addition we have estimated the Mueller matrix based on 8×8 measurements which is more precise than methods based on 16 acquisitions. The contribution of this method in noise reduction is more appreciated if the signal-to-noise ratio is lower than the threshold of 30 dB highlighted in the previous section.

5.2. Howell’s matrix

We have also carried out constrained minimization on the Mueller matrix taken from Ref.(6). It models a radiometer-collimator system. The rounded three decimal digits Howell’s matrix is:

$M_{H} = [\begin{matrix} 0.760 & - 0.062 & 0.029 & 0.118 \\ - 0.057 & 0.469 & - 0.181 & - 0.186 \\ 0.038 & - 0.171 & 0.539 & 0.028 \\ 0.124 & - 0.217 & - 0.012 & 0.661 \end{matrix}] .$

The spectrum of this matrix is real ∑={0.669,0.559,0.335,0.068}but the eigenvector $S_{σ_{1}} = {(0.116, 0.593, - 0.370, - 0.705)}^{T}$ associated with the largest eigenvalue does not correspond to a physical Stokes vector, $S_{σ_{1}}^{T} G S_{σ_{1}} = - 0.973 .$

To estimate an admissible representation from Howell’s experiment, simulation intensities are extracted based on an assumption that the acquisition system is a rotating PSA+PSG polarimeter. These intensities are then artificially injected back to estimate an admissible matrix using SQP and CSA routines. More precisely, the retrieval process of an admissible estimation from Howell’s matrix is done as follows: firstly an appropriate choice of angle combinations of the two wave plate retarders is selected in such a way that relation (1) becomes well-conditioned which means in order for the matrices A and P to become invertible, i.e., B becomes invertible [30]. This is done by searching for an angle combination of the two retardation wave plates L₁ and L₂ that minimizes the equal weighted variance, EWV=Trace [(B ⁺)^T B ⁺]. A suitable choice of retardation orientation angles based on minimizing the EWV criterion will eliminate numerical errors that may arise from ill-conditioned matrix inversion.

Secondly synthesized intensity matrix, I_e=B m_h, is simulated. After that, minimization algorithm SQP or CSA are conducted searching for the optimum vector m_l that minimizes the residual error ‖B m_l-I_e‖ provided that the solution must not violate the admissibility conditions explicated in Eq. (6). The admissible estimate given to initiate the process is extracted from target decomposition in the same way as explained earlier in the paper.

In this representative case when using the SQP optimization method we get:

$M_{SQP} = [\begin{matrix} 0.815 & - 0.083 & 0.056 & 0.142 \\ - 0.092 & 0.428 & - 0.164 & - 0.142 \\ 0.090 & - 0.187 & 0.488 & 0.005 \\ 0.140 & - 0.204 & - 0.050 & 0.693 \end{matrix}] .$

As expected this matrix turns out to be realizable. The spectrum is real ∑={0.626,0.602,0.344,0.054}, and $S_{σ_{1}} = {(0.970, - 0.159, 0.072, 0.167)}^{T}$ is indeed a physical Stokes vector, $S_{σ_{1}}^{T} G S_{σ_{1}} = - 0.883 .$ It should be noted that the relative Frobenius norm distance between M _H andM _SQP is 5.2%.

When we carry out global optimization based on the CSA method, for Howell’s matrix we get:

$M_{opt} = [\begin{matrix} 0.815 & - 0.093 & 0.039 & 0.123 \\ - 0.063 & 0.447 & - 0.181 & - 0.180 \\ 0.047 & - 0.160 & 0.492 & 0.030 \\ 0.127 & - 0.210 & - 0.009 & 0.647 \end{matrix}] .$

We can easily show that this matrix is realizable, Fig.7. The spectrum is real ∑={0.634,0.626,0.294,0.059} and $S_{σ_{1}}^{T} G S_{σ_{1}} = 0.327 .$ An improvement can be noticed compared to the SQP method: the relative error between this matrix and the original M _H matrix is 3.3%. It is believed that by using the CSA technique we have attained a physical saddle point closer to the global minimum of the objective function than the SQP method.

By further investigation of the above matrices we can consider two major observations: both matrices M _SQP and M _opt are realizable according to the GK criterion but if we analyze their corresponding hermitian coherency matrices we notice that they contain negative eigenvalues. The eigenvalues of H _SQP and H _opt are respectively {0.6261,0.1790,0.0932,-0.0834} and {0.6151,0.1938,0.0774,-0.0714}. Which means that the admissibility condition proposed by authors of references [19] and [31] is a sufficient condition but not a necessary one. For example, it is possible to obtain an admissible Mueller matrix by adding three deterministic Mueller matrices with positive weights to a Mueller matrix of a particular depolarizer with negative weight.

It should be mentioned that the GK criterion has permitted to obtain constraints that are easy to undertake within the present framework. The reason is that all the optimization procedures have been conducted by means of a calculator and the GK criterion can be easily expressed in terms of numerical matrix diagonalization. However, it is known that all Mueller matrices do not satisfy the GK criterion [15]. For example, the matrix M_pr:

Fig. 7. Degree of polarization (DoP) map. (a) For Howell’s experiment: the DoP map shows that this matrix is physically inadmissible because its DoP surface protrudes the Poincaré sphere boundaries. (b) The estimated physical matrix retrieved by a CSA optimization. It is clear that the CSA algorithm has estimated an admissible Mueller matrix from Howell’s experiment without altering the overall matrix properties preserved in the shape and orientation of its DoP map.

Download Full Size | PDF

M_{pr} = \frac{1}{2} [\begin{matrix} 1 & - 1 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{matrix}]

built by combining a polarizer and a rotator does not satisfy the GK criterion while it transforms any Stokes vector into another (eventually null) Stokes vector. It is therefore physically realizable. In fact, it works out that the GK criterion is not a necessary and sufficient condition: there is a whole class of Mueller matrices (the so-called type II) that cannot be considered. However, converting mathematical properties of such a class into a simple algorithm is not easy. A more global approach appears to be necessary. Work is in progress in that direction.

6. Conclusion

In conclusion, we have established the validity of employing constrained optimization routines to estimate physical Mueller matrices from intensity data. The constrained optimization based on the SQP method will not always converge to the optimum solution due to the complexity of the 16-dimensional hyperspace. In contrast, employing stochastic global optimization methods based on simulated annealing techniques instead of deterministic optimization techniques was satisfactory. Higher accuracy results are obtained over similar computational time. The CSA technique possesses the robustness and potential to go forth from local minima regions that may block Newton-like or SQP optimization algorithms.

Experimental investigations were performed and detailed CSA routine was discussed. When implementing constrained simulated annealing method on a synthesized phantom, optimization results featured a trustworthy stability. In addition, CSA algorithm transforms each inadmissible pixel/matrix within a Mueller image to a physical pixel/matrix while preventing the overall error level to cross the limits set by a hemisphere with a SNR of 30 dB.

References and links

1. J. S. Tyo, D. L. Goldstein, D. B. Chenault, and J. A. Shaw, “Review of passive imaging polarimetry for remote sensing applications,” Appl. Opt. 45, 5453–5469 (2006). [CrossRef] [PubMed]

2. J. M. Bueno and M. C. W. Campbell, “Confocal scanning laser ophthalmoscopy improvement by use of Mueller-matrix polarimetry,” Opt. Lett. 27, 830–832 (2002). [CrossRef]

3. A. Weber, M. Cheney, Q. Smithwick, and A. Elsner, “Polarimetric imaging and blood vessel quantification,” Opt. Express 12, 5178–5190 (2004). [CrossRef] [PubMed]

4. D. Miyazaki, K. Kagesawa, and K. Ikeuchi, “Transparent Surface Modeling from a Pair of Polarization Images,” IEEE Trans. Pattern Anal. Mach. Intell. 26, (2004).

5. J. L. Pezzaniti and R. A. Chipman, “Mueller matrix imaging polarimetry,” Opt. Eng. 34, 1558–1568 (1995). [CrossRef]

6. B. J. Howell, “ Measurements of the polarization effects of an instrument using partially polarized light,” Appl. Opt. 18, 808–812 (1979). [CrossRef]

7. J. W. Hovenier, H. C. van de Hulst, and C. V. M. van der Mee, “Conditions for the elements of the scattering matrix,” Astron. Astrophys. 157, 301–310 (1986).

8. J. W. Hovenier and C. V. M. van der Mee, “Testing scattering matrices: A compendium of recipes,” J. Quant. Spectrosc. Radiat. Transfer 55, 649–661 (1996). [CrossRef]

9. J. -F. Xing, “ On the Deterministic and Non-deterministic Mueller Matrix,” J. Mod. Opt. 39, 461–484 (1992). [CrossRef]

10. E. Landi Degl’Innocenti and J. C. del Toro Iniesta, “Physical significance of experimental Mueller matrices,” J. Opt. Soc. Am. A 15, 533–537 (1998). [CrossRef]

11. C. R. Givens and A. B. Kostinski, “A simple necessary and sufficient condition on physically realizable Mueller matrices,” J. Mod. Opt. 40, 471–481 (1993). [CrossRef]

12. S. R. Cloude and E. Pottier, “A Review of Target Decomposition Theorems in Radar Polarimetry,” IEEE Trans. Geosci. Remote Sens. 34, 498–518 (1996). [CrossRef]

13. F. Le Roy-Brehonnet, B. Le Jeune, P. Eliès, J. Cariou, and J. Lotrian, “ Optical media characterization by Mueller matrix decomposition,” J. Phys. D: Appl. Phys. 29, 34–38 (1996). [CrossRef]

14. A. Aiello, G. Puentes, D. Voigt, and J. P. Woerdman, “ Maximum-likelihood estimation of Mueller matrices,” Opt. Lett. 31, 817–819 (2006). [CrossRef] [PubMed]

15. A. V. Gopala Rao, K. S. Mallesh, and Sudha, “On the algebraic characterization of a Mueller matrix in polarization optics, ” J. Mod. Opt. 45, 955–987 (1998).

16. M. Reimer and D. Yevick, “Least-squares analysis of the Mueller matrix,” Opt. Lett. 31, 2399–2401 (2006). [CrossRef] [PubMed]

17. R. M. Azzam, “ Photopolarimetric measurement of the Mueller matrix by Fourier analysis of a single detected signal,” Opt. Lett. 2, 148–150 (1978). [CrossRef] [PubMed]

18. R. A. Horn and C. R. Johnson, Matrix Analysis (Cambridge U. Press, 1985).

19. D. Anderson and R. Barakat, “Necessary and sufficient conditions for a Mueller matrix to be derivable from a Jones matrix,” J. Opt. Soc. Am. A. 11, 2305–2319 (1994). [CrossRef]

20. R. Barakat, “Bilinear constraints between elements of the 4×4 Mueller-Jones transfer matrix of polarization theory,” Opt. Commun. 38, 159–161 (1981). [CrossRef]

21. J. R. Huynen, Phenomenological Theory of Radar Targets, PhD. thesis, University of Technology, The Netherlands (1970).

22. P. Spellucci, “ A SQP method for general nonlinear programs using only equality constrained subproblems,” Math. Prog. , 413–448 (Springer, 1998). [CrossRef]

23. R. Fletcher, Practical Methods of Optimization (Wiley, 1987).

24. B. W. Wah and T. Wang, in Principles and Practice of Constraint Programming, Vol. 461 (Springer, Heidelberg, 1999).

25. R. A. Chipman, Handbook of Optics, vol. II, 2nd Ed. M. Bass ed. (McGraw-Hill, 1995).

26. P. J. M. Laarhoven and E. H. L. Aarts, Simulated annealing: theory and applications (Kluwer Academic Publishers, 1987).

27. M. Lundy and A. Mees, in Mathematical Programming (Springer, 1986).

28. B. DeBoo, J. Sasian, and R. Chipman, “Degree of polarization surfaces and maps for analysis of depolarization,” Opt. Express , 12, 4941–4958 (2004). [CrossRef] [PubMed]

29. Y. Takakura and J. Elsayed Ahmad, “Noise distribution of Mueller matrices retrieved with active rotating polarimeters,” Appl. Opt. 46, 7354–7364 (2007). [CrossRef] [PubMed]

30. D. S. Sabatke, M. R. Descour, E. Dereniak, W. C. Sweatt, S. A. Kemme, and G. S. Phipps, “ Optimization of retardance for complete Stokes polarimeter,” Opt. Lett. 25, 802–804 (2000). [CrossRef]

31. S. R. Cloude, “ Conditions for the realisability of matrix operators in polarimetry,” in Polarization Considerations for Optical Systems II, Proc. SPIE1166, 177–185 (1989).

Estimation of physically realizable Mueller matrices from experiments using global constrained optimization

Abstract

1. Introduction

2. Experimental and theoretical considerations

2.1. Polarimetric system overview

2.2. Physical realizability

3. Global optimization setups

3.1. Initialization process

3.2. Sequential quadratic programming

3.3. Constrained simulated annealing

4. Simulated case study

4.1. Modified Shepp-Logan phantom

4.2. Physical interpretation

4.3. CSA convergence check

5. Experimental results

5.1. Mueller matrix of the air

5.2. Howell’s matrix

6. Conclusion

References and links

Cited By

Figures (7)

Equations (12)

Optics Express