## Abstract

Source optimization (SO) has emerged as a key technique for improving lithographic imaging over a range of process variations. Current SO approaches are pixel-based, where the source pattern is designed by solving a quadratic optimization problem using gradient-based algorithms or solving a linear programming problem. Most of these methods, however, are either computational intensive or result in a process window (PW) that may be further extended. This paper applies the rich theory of compressive sensing (CS) to develop an efficient and robust SO method. In order to accelerate the SO design, the source optimization is formulated as an underdetermined linear problem, where the number of equations can be much less than the source variables. Assuming the source pattern is a sparse pattern on a certain basis, the SO problem is transformed into a *l*_{1}-norm image reconstruction problem based on CS theory. The linearized Bregman algorithm is applied to synthesize the sparse optimal source pattern on a representation basis, which effectively improves the source manufacturability. It is shown that the proposed linear SO formulation is more effective for improving the contrast of the aerial image than the traditional quadratic formulation. The proposed SO method shows that sparse-regularization in inverse lithography can indeed extend the PW of lithography systems. A set of simulations and analysis demonstrate the superiority of the proposed SO method over the traditional approaches.

© 2014 Optical Society of America

## 1. Introduction

Due to the delay of extreme ultraviolet (EUV) lithography and the pressure of shrinking the critical dimension (CD), semiconductor manufacturers are moving deeper into low *k*_{1} lithography in 193nm ArF immersion lithography. The electronics industry has to rely on resolution enhancement techniques (RET) to compensate and minimize imaging distortions as the mask patterns are projected onto semiconductor wafers [1, 2]. Source optimization (SO) techniques encompass an important branch of RETs aimed at modifying the source intensity distribution, so as to modulate the intensities and directions of the incident light rays. Lithographic sources are usually jointly optimized with lithographic masks in the so called source-mask-optimization (SMO) methods [3], which are effective in attaining finer image features. SO, however, may be carried out in a multi-stage flow that includes the steps of SMO, as well as steps of pure SO. This paper focuses on pure SO methods. The development of the SMO workflow based on this SO approach is left for our future work.

A variety of SO approaches have been investigated in the literature [4–7]. Subsequently, pixelated sources with flexible profile and intensity were realized by the development of freeform diffractive optical elements (DOE) [8, 9]. Granik reviewed different source representations and optimization objectives, then highlighted the advantages of the pixel-based SO method [10]. Several pixelated SO methods were subsequently proposed [11, 12]. SO methods can also serve as a key step in SMO flows, which exploit the synergy between source and mask optimization to further push the lithographic resolution limit [13]. SMO approaches have been proposed to enhance the lithographic image fidelity [14–16], or improve the robustness of lithography systems to the process variations [3, 17–26]. However, the above mentioned SO and SMO methods are computational intensive and are likely to be trapped in locally optimal solutions.

Recently, Yu, et al. proposed a fast SO method based on the conjugate-gradient (CG) algorithm, where a linear combination of two quadratic objectives was used to replace the nonlinear sigmoid resist model [27]. The aerial image of the lithography system was formulated as *I⃗* = **I*** _{cc}J⃗*, where

*I⃗*∈ ℝ

^{N2×1}and $\overrightarrow{J}\in {\mathbb{R}}^{{N}_{s}^{2}\times 1}$ are the raster-scanned aerial image and source pattern, respectively. ${\mathbf{I}}_{cc}\in {\mathbb{R}}^{{N}^{2}\times {N}_{s}^{2}}$ is the illumination cross coefficient (ICC) that is used to represent the image formation process from source to wafer. The derivation of

**I**

*will be described at length in Section 3.*

_{cc}*N*and

*N*are the lateral dimensions on the image and source planes, respectively. The cost function of the SO in [27] was formulated as

_{s}**I**

*corresponding to the monitoring pixels on the inner and outer margins of mask features, and*

_{cc}*t*is the constant threshold level to represent the photoresist effect. ${\mathbf{I}}_{cc}^{0}$ is composed by the rows corresponding to the monitoring pixels in the non-pattern regions surrounding the drawn patterns, and

_{r}*δ*is chosen to be as small as possible. The first term in Eq. (1) preserves the contour of printed images after resist development, while the second term suppresses extra printings or side-lobes in the non-pattern regions. Since the cost function is quadratic, the optimal solution is guaranteed to be found by the CG method [28]. Yu’s method, however, has several inherent limitations. First, reducing the number of monitoring pixels will degrade the accuracy of source synthesis process and lead to inferior imaging performance. Thus, it is impossible to effectively improve the computational efficiency by reducing the dimension of Eq. (1). Second, Yu’s method produces complex source patterns, which add to the implementation complexity. Third, the solution of the quadratic cost function is inadequate to ensure high contrast of the aerial image, which may lead to insufficient imaging quality due to the process variations. Finally, this method was based on a scalar imaging model, which is inaccurate for current immersion lithography systems with hyper-NA (NA>1) [29].

This paper exploits the emerging theory of compressive sensing (CS) and sparse representations to formulate the SO as a basis pursuit problem. We develop a new approach to SO and thus attain a number of advantages over methods developed to date. CS enables the reconstruction of sparse signals or images from a small set of projected measurements [30, 31]. Sparsity refers to a signal representation where a significant number of coefficients of the image on a certain basis are zero. This work makes the fundamental assumption that an efficient source pattern has structure - it is not random. As such, the pattern will exhibit a sparse representation in some basis. This paper assumes that the source pattern is sparse on the two-dimensional discrete cosine transform (2D-DCT) basis. The DCT has been shown to have good energy compaction properties across a wide-variety of imagery types. It efficiently approaches that of the Karhunen-Loeve transform for signals based on certain limits of Markov processes [32]. Further work on selecting the best basis for representation is needed and is beyond the scope of this paper.

In each iteration of the optimization process, faint pixels in the pupil domain with intensity lower than a prescribed threshold are turned off in order to induce source sparsity in the pupil domain. It will be shown that the sparsity assumption in the DCT domain will regularize the source shape and reduce the source complexity. In addition, the sparsity in the DCT domain will avoid singular source pixels or a very low pupil fill percentage, which can result in the aberrations due to the lens-heating effect, or even lens-damage effect [33, 16]. Furthermore, the enforced structure in the pupil domain will moderately reduce the pupil fill percentage to improve the process window [33]. This occurs due to the fact that the sparsity in the pupil domain will make aggressive off-axis illumination type, and remove the illumination energy at the center of the pupil. Thus, the proposed algorithm takes advantages of sparsity on both DCT and pupil domains, and keep a balance between the imaging performance and manufacturability. In general, the advantages of the proposed SO method include four aspects: (1) significant improvement in speed by using random projections to reduce the dimensionality problem and on the reconstruction of optimal source patterns from underdetermined system of equations; (2) improvement of the source manufacturability by sparsely representing the source pattern on the 2D-DCT basis; (3) pronounced improvement of the process window (PW) and imaging contrast in virtue of the linear constraint on the source reconstruction; (4) improvement in the attained precision by applying a rigorous vector imaging model to formulate the SO problem.

In this paper, the SO is formulated as a *l*_{1}-norm inverse reconstruction problem:

*J⃗*, and the source pattern is represented by

*J⃗*=

**Ψ**

*θ⃗*. The

*l*

_{1}-norm is given by ${\Vert \overrightarrow{\theta}\Vert}_{1}={\sum}_{i=1}^{{N}^{2}}\left|{\overrightarrow{\theta}}_{i}\right|$, where

*θ⃗*is the

_{i}*i*th element of

*θ⃗*.

*Z⃗*and

_{s}*I⃗*are the target pattern and actual aerial image corresponding to the monitoring pixels selected on the image plane, respectively. ${\mathbf{I}}_{cc}^{s}\in {\mathbb{R}}^{M\times {N}_{s}^{2}}$ is composed by the rows of

_{s}**I**

*corresponding to these monitoring pixels, while*

_{cc}*M*and ${N}_{s}^{2}$ are the equation number and variable number, respectively. Equation (2) is a typical basis pursuit problem. The theory of CS guarantees that the source pattern can be successfully synthesized from the underdetermined problem with $M\ll {N}_{s}^{2}$. Hence, the number of monitoring pixels can be much less than the source variables, and the computational complexity can be thus effectively reduced. The linearized Bregman algorithm, a classic method in the basis pursuit realm, is exploited to solve for the optimization problem in Eq. (2), since it is computational efficient and can enhance the contrast of the acquired image [34, 35]. Other optimization algorithms could be used as well [36, 37]. The underlying sparsity assumption constrains the complexity of the source pattern, and effectively improves its manufacturability. In addition, the linear constraint of ${\overrightarrow{Z}}_{s}={\overrightarrow{I}}_{s}={\mathbf{I}}_{cc}^{s}\mathbf{\Psi}{\overrightarrow{J}}_{c}$ enforces that the actual aerial image is equal to the target pattern on the monitoring pixels. Therefore, the contrast of the aerial image is always enhanced during the optimization process, which is significant in improving the PW. In order to meet the precision requirement of immersion lithography systems, we apply a vector imaging model to account for the vector nature of the electromagnetic fields [38, 39]. It is noted that some prior works also built up the linear formulation of the SO problem [17] and solved for the large scale SO problem with “binding constraints” [3]. However, work in presented here is the first to use basis pursuit for the fast SO procedure. In our future research, we may improve our current approach based on the methods in [3, 17]. We may also investigate linear programming methods and compare them to the linearized Bregman algorithm.

The remainder of this paper is organized as follows. The relevant principles of CS are briefly described in Section 2. The SO framework based on vector imaging model is formulated in Section 3. The methodology to solve for the SO problem based on linearized Bregman algorithm is proposed in Section 4. Simulations and analysis are presented in Section 5. Conclusions are provided in Section 6.

## 2. Fundamentals of compressive sensing

CS provides a new framework to jointly measure and compress a sparse signal [30, 31]. A signal *x⃗* ∈ ℝ^{Ñ×1} is *K*-sparse if it can be represented as *x⃗* = **Ψ***θ⃗*, where **Ψ** = [*ψ⃗*_{1}, *ψ⃗*_{2},..., *ψ⃗ _{Ñ}*] ∈ ℝ

^{Ñ×Ñ}is the basis set and the set of coefficients

*θ⃗*∈ ℝ

^{Ñ×1}has just

*K*≪

*Ñ*nonzero elements. CS shows that a signal

*x⃗*can be reconstructed with high probability from a small set of nonadaptive random measurement

*y⃗*=

**Φ**

*x⃗*=

**ΦΨ**

*θ⃗*, where

*y⃗*∈ ℝ

^{M×1}, and

**Φ**= [

*ϕ⃗*

_{1},

*ϕ⃗*

_{2},...,

*ϕ⃗*]

_{M}*∈ ℝ*

^{T}^{M×Ñ}is a random projection matrix with

*M*≪

*Ñ*. Geometrically, the class of signals that can be accurately represented by the

*K*-term approximations in a basis

**Ψ**, is tightly clustered around a union of

*K*-dimensional subspaces in ℝ

^{Ñ×1}. If

**Ψ**and

**Φ**are incoherent with each other and the rows of

**Φ**are randomly chosen, then

*x⃗*can be recovered from

*y⃗*with high probability when

*C*≥ 1 is a oversampling factor. The incoherence between

**Ψ**and

**Φ**is measured by the mutual coherence defined as [40]

## 3. Formulation of the source optimization framework

Let *E _{x}*,

*E*and

_{y}*E*be the electric field components in the spatial coordinate (

_{z}*x*,

*y*,

*z*). The normalized spatial coordinate on the source plane is (

*x*,

_{s}*y*). We define the vector matrix as one with all entries being vectors or matrices, and the scalar matrix as one with all entries being scalars.

_{s}Assume **E*** _{i}* ∈ ℝ

^{N×N}represents the electric field emitting from the point light source (

*x*,

_{s}*y*).

_{s}**M**∈ ℝ

^{N×N}denotes the mask pattern. Thus, the near-field of the mask can be calculated as

**E**=

**E**

*⊙*

_{i}**B**⊙

**M**, where ⊙ is the entry-by-entry multiplication, and the scalar matrix

**B**∈ ℝ

^{N×N}is the mask diffraction matrix [38, 39]. According to Fourier optics, the electric field on the wafer contributed by one point source at (

*x*,

_{s}*y*) is ${\mathbf{E}}^{\text{wafer}}({x}_{s},{y}_{s})=\frac{2\pi}{{n}_{w}R}\times \left[{\mathcal{F}}^{-1}\left\{{\mathbf{V}}^{\prime}\right\}\otimes (\mathbf{B}\odot \mathbf{M})\right]$, where

_{s}*ℱ*

^{−1}is the inverse Fourier transform, ${\mathbf{V}}^{\prime}=\sqrt{\frac{\gamma}{{\gamma}^{\prime}}}\mathbf{V}\odot \mathbf{U}\odot {\mathbf{E}}_{i}$ is an

*N*×

*N*vector matrix [39]. The vector matrix

**V**∈ ℝ

^{N×N}is the rotation matrix, and the scalar matrix

**U**∈ ℝ

^{N×N}represents the pupil function. Therefore, the x, y, z components of

**E**

^{wafer}(

*x*,

_{s}*y*) can be reformulated as ${\mathbf{E}}_{p}^{\text{wafer}}({x}_{s},{y}_{s})={\mathbf{H}}_{p}\otimes (\mathbf{B}\odot \mathbf{M})$, where

_{s}*p*=

*x*,

*y*,

*z*, ${\mathbf{H}}_{p}=\frac{2\pi}{{n}_{w}R}{\mathcal{F}}^{-1}\left\{{{\mathbf{V}}^{\prime}}_{p}\right\}$ is referred to as the equivalent filter, and

**V′**

*is an*

_{p}*N*×

*N*scalar matrix equal to the

*p*-component of

**V′**. The aerial image contributed by the point source at (

*x*,

_{s}*y*) can thus be calculated as

_{s}**H**

*and*

_{p}**B**are functions of (

*x*,

_{s}*y*), and can be denoted as ${\mathbf{H}}_{p}^{{x}_{s}{y}_{s}}$ and

_{s}**B**

^{xsys}, respectively.

**J**is an

*N*×

_{s}*N*scalar matrix representing the source intensity distribution. According to Abbe’s method, the total aerial image contributed by the entire light source can be calculated as

_{s}**G**

^{xsys}∈ ℝ

^{N×N}, and

*I⃗*∈ ℝ

^{N2×1}and $\overrightarrow{J}\in {\mathbb{R}}^{{N}_{s}^{2}\times 1}$ are the raster-scanned aerial image and source pattern, respectively. The ICC matrix has dimension of ${N}^{2}\times {N}_{s}^{2}$. As shown in Fig. 1, the ICC matrix is formulated as ${\mathbf{I}}_{cc}=[{\overrightarrow{g}}_{1},{\overrightarrow{g}}_{2},\dots ,{\overrightarrow{g}}_{{N}_{s}^{2}}]$, where

*g⃗*is the raster-scanned vector of

_{i}**G**

^{xsys}corresponding to the source point

*J⃗*. Each column of

_{i}**I**

*represents the coherent imaging intensity contributed by one source point*

_{cc}*J⃗*, while each row represents the imaging intensity on one wafer pixel contributed by different source points.

_{i}The total sampling number on the mask or wafer is usually much larger than that of the source pattern, which means *N _{s}* ≪

*N*. Thus, Eq. (8) is an over-determined problem that cannot be solved efficiently and the desired source pattern is hardly reconstructed from it. Based on the theory of CS described in Section 2, only a small subset of the ICC matrix elements are needed to reconstruct the optimal source pattern. In this paper, we compress the ICC matrix by choosing only

*M*critical rows from it with $M\ll {N}_{s}^{2}\ll {N}^{2}$. First, we use a method similar to that in [27] to choose the rows corresponding to the inner and outer margins of mask features, as well as the monitoring pixels in non-pattern regions around drawn patterns. The new ICC matrix consisting of these selected rows is denoted as

**I′**

*. Figures 2(a) and 2(b) mark the selected monitoring pixels on the inner margins (green), outer margins (red), and the non-pattern regions (blue) for a line-space pattern and another horizontal block pattern. These two patterns are used as the targets and mask patterns in our following simulations in Section 5. After that, we randomly pick up*

_{cc}*M*rows from

**I′**

*to generate a much smaller matrix called ${\mathbf{I}}_{cc}^{s}\in {\mathbb{R}}^{M\times {N}_{s}^{2}}$. These*

_{cc}*M*rows correspond to

*M*selected monitoring pixels on the inner and outer margins, and the non-pattern regions. Using the above method, the over-determined linear problem in Eq. (8) is reduced into a

*M*-dimensional underdetermined problem as following

*Z⃗*∈ ℝ

_{s}^{M×1}represents the desired pattern on the selected monitoring pixels, while

*I⃗*∈ ℝ

_{s}^{M×1}represents the corresponding actual imaging intensities. The selected subset of the ICC matrix corresponds to the representative and critical monitoring pixels on wafer to control the imaging performance. In addition, the random selection of monitoring pixels guarantees the incoherence between the projection matrix and basis matrix required by the CS reconstruction as described in Section 2.

In general, the source pattern cannot be reconstructed from the underdetermined linear problem of Eq. (9). However, CS theory addresses that the optimal solution of Eq. (9) exists under the sparsity assumption of the source pattern. We assume that the optimal source is a *K*-sparse pattern on a 2D-DCT basis. Other representation basis such as wavelets can be used and is the topic of future research. The SO problem is thus transformed into a *l*_{1}-norm reconstruction problem with underdetermined linear constraint as shown in Eq. (2). Given the 2D-DCT transform matrix **Ψ̃** ∈ ℝ^{Ns×Ns}, the 2D-DCT coefficients of the source **J** can be computed as **Θ** = **Ψ̃JΨ̃*** ^{T}*, where

**Θ**∈ ℝ

^{Ns×Ns}. In Eq. (2), we unfold the square matrix

**J**into a raster-scanned vector

*J⃗*. The 2D DCT of the source

**J**can be transformed into

**Θ**, and ${\mathbf{\Psi}}^{T}\in {\mathbb{R}}^{{N}_{s}^{2}\times {N}_{s}^{2}}$ is a variant of

**Ψ̃**that can be calculated as

In the traditional CG method, the ICC matrix dimension *M* cannot be greatly reduced, so as to get the precise solution of the SO problem [27, 28]. This drawback precludes further reduction of the computational complexity by decreasing the monitoring pixel number. On the other hand, the proposed method relies on sparse representation of the source pattern to use much less monitoring pixels. It will be shown in Section 5 that reducing the dimensionality *M* leads to significantly faster reconstruction. According to Eq. (3), we randomly down-sample **I*** _{cc}* to achieve
$K\times \text{log}{N}_{s}^{2}<M\ll {N}_{s}^{2}$, thus leading to a much faster reconstruction process. The second advantage of the proposed method is that the

*l*

_{1}-norm reconstruction inherently biases the solution to be a sparse pattern on the 2D-DCT basis, such that the manufacturability of the source pattern is effectively improved. Further, the linear constraint in Eq. (9) enforces the actual aerial image equal to the target pattern on the monitoring pixels. Thus, the contrast of the aerial image is always enhanced during the optimization process, which is beneficial to improve the PW. Finally, the ICC matrix in Eq. (9) is generated using the vector imaging model, which is more accurate than the scalar model used in [27].

## 4. Methodology based on linearized Bregman algorithm

Based on Eqs. (2) and (10), the optimal source pattern can be acquired by solving a constrained *l*_{1}-norm optimization problem, which can be solved using one of many algorithms developed under the compressive sensing realm [41]. The
${\mathbf{I}}_{cc}^{s}$ matrix can be calculated before the optimization to reduces the runtime. In this paper, the linearized Bregman algorithm is chosen to solve this problem because it is computational efficient and can enhance the contrast of acquired image [34, 35]. The linearized Bregman algorithm is summarized as following:

**Step 1**: Initialize the weight parameter*μ*and step parameter*δ*, initialize the source as a normalized full pupil source with unit overall dose, and initialize the intermediate parameter*g*= 0 and*k*= 0.**Step 2**: Calculate the matrix ${\mathbf{I}}_{cc}^{sd}={\mathbf{I}}_{cc}^{s}\mathbf{\Psi}$, where**Ψ**is the defined in Eq. (11).**Step 3**: Use the linearized Bregman algorithm to iteratively update the 2D-DCT coefficients*θ⃗*:$${\overrightarrow{g}}^{k+1}={g}^{k}+({\overrightarrow{Z}}_{s}-{\mathbf{I}}_{cc}^{sd}\overrightarrow{\theta}),$$$${\overrightarrow{\theta}}^{k+1}=\delta {T}_{\mu}({\mathbf{I}}_{cc}^{sdT}{\overrightarrow{g}}^{k+1}),$$In Eq. (13), ${\mathbf{I}}_{cc}^{sdT}{\overrightarrow{g}}^{k+1}$ is a ${N}_{s}^{2}\times 1$ vector, and*T*(·) is an entry-wise operator. Given a vector_{μ}*x⃗*∈ ℝ^{N×1}, the operator*T*(·) is defined as following_{μ}$${T}_{\mu}(\overrightarrow{x})={\left[{t}_{\mu}({\overrightarrow{x}}_{1}),{t}_{\mu}({\overrightarrow{x}}_{2}),\dots ,{t}_{\mu}({\overrightarrow{x}}_{N})\right]}^{T},$$where$${t}_{\mu}({\overrightarrow{x}}_{i})=\{\begin{array}{cc}0& \left|{\overrightarrow{x}}_{i}\right|<\mu \\ \text{sgn}({\overrightarrow{x}}_{i})\hspace{0.17em}\left(\left|{\overrightarrow{x}}_{i}\right|-\mu \right)& \left|{\overrightarrow{x}}_{i}\right|\ge \mu \end{array},$$and$$\text{sgn}({\overrightarrow{x}}_{i})=\{\begin{array}{cc}1& {\overrightarrow{x}}_{i}\ge 0\\ -1& {\overrightarrow{x}}_{i}\le 0\end{array}.$$In fact, the right side of Eq. (13) is the absolute solution of the following minimization problem:$${\overrightarrow{\theta}}^{k+1}=\text{arg}\underset{\overrightarrow{\theta}\in {\mathbb{R}}^{{N}_{s}^{2}\times 1}}{\text{min}}\mu \delta {\Vert \overrightarrow{\theta}\Vert}_{1}+\frac{1}{2}{\Vert \overrightarrow{\theta}-\delta {v}^{k+1}\Vert}^{2},$$where ${v}^{k+1}=\mu {p}^{k+1}+\frac{1}{\delta}{\overrightarrow{\theta}}^{k+1}$, and*p*^{k+1}is the updated subgradient of the function ‖*θ⃗*‖_{1}. The relationship between*g⃗*^{k+1}and*v⃗*^{k+1}is ${\mathbf{I}}_{cc}^{sdT}{\overrightarrow{g}}^{k+1}={\overrightarrow{v}}^{k+1}$. To keep the symmetry of the source pattern, we first update source pixels in the left-up quarter, then update the symmetric pixels of the other three quarters accordingly. In order to guarantee the positivity of source pixels and remove faint source pixels, we use the updated DCT coefficients to calculate the current source pattern. Then, we turn off all source pixels with intensity lower than a threshold of 10^{−4}. The modified source pattern is then transformed into the DCT domain, and the resulting DCT coefficients are used in the next iteration.**Step 4**: If the residual error of ${\Vert {\overrightarrow{Z}}_{s}-{\mathbf{I}}_{cc}^{sd}\overrightarrow{\theta}\Vert}_{2}^{2}$ converges below an acceptable level or the maximum iteration number is reached, then stop the iteration and calculate the optimized source pattern as where $\widehat{\overrightarrow{\theta}}$ is the optimal 2D-DCT coefficients obtained by the linearized Bregman algorithm. Otherwise, return to**Step 3**.

It should be noted that in the general compressive sensing reconstruction process, the linear constraint in the optimization formulation holds for the optimal solution. On the other hand, the SO is a signal synthesis process, where the performance quality encoded in *Z⃗ _{s}* is not in fact known in advance. In other words, the linear constraints of
${\overrightarrow{Z}}_{s}={\overrightarrow{I}}_{s}={\mathbf{I}}_{cc}^{s}\mathbf{\Psi}\overrightarrow{\theta}$ in Eq. (2) is an ideal case that is difficult to attain even for the optimal source. In this paper, we approximately treat the difference between the target

*Z⃗*and the actual aerial image ${\overrightarrow{I}}_{s}={\mathbf{I}}_{cc}^{s}\mathbf{\Psi}\overrightarrow{\theta}$ as random noise. It has been proven that the Bregman algorithm is insensitive to random noise [34, 35], so the proposed method can be successfully applied to the synthesis of optimal source patterns as shown in the following simulations.

_{s}It is noted that the linearized Bregman algorithm could be effective in the convex optimization problem defined in this paper. If some other restrictions are involved during the optimization procedure, we need to first transform these restrictions into linear constraints, or apply some other numerical algorithms to solve for the SO problem. However, this topic is out of the scope of this paper and will be investigated in future.

## 5. Simulation and analysis

In this section, we use the proposed CS method to optimize the source pattern based on a typical line-space pattern and another horizontal block pattern as shown in Fig. 2. We also compare our method to the traditional CG method described in [27]. Figures 3 and 4 illustrate the simulations based on the dense line-space pattern using the CS and CG methods, respectively. The critical dimension of the line-space pattern is CD=45nm, and the duty ratio is 1:1. The wavelength of illumination is 193nm, the numerical aperture (NA) on the wafer side is 1.2, the demagnification factor is 4, and the refractive index of the immersion medium is 1.44. The mask dimension is 4020*nm* × 4020*nm*, and the pixel size on mask side is 20nm, which makes the mask consist of 201 × 201 pixels. The source patterns consist of 41 × 41 pixels. The photoresist effect is represented by a constant threshold resist (CTR) model. The print image is calculated as: Print Image = Γ{**I _{norm}** −

*t*}, where

_{r}**I**is the normalized aerial image that is given by

_{norm}**I**=

_{norm}**I**/

*J*,

_{sum}*J*= ∑

_{sum}_{xs}∑

_{ys}

**J**(

*x*,

_{s}*y*) is a normalization factor, and

_{s}*t*is a constant threshold. Γ{·} = 1 when the argument is larger than 0, otherwise, Γ{·} = 0. In the following simulations, the threshold is

_{r}*t*= 0.25,

_{r}*δ*= 10

^{−4}in Eq. (13), and

*μ*= 1 in Eq. (16). We use the normalized aerial image to calculate the print image because the photoresist threshold is chosen by assuming the unit exposure dose. From top to bottom, Figs. 3 and 4 show the simulations with

*M*= 200,

*M*= 100 and

*M*= 25, where

*M*is the number of randomly selected monitoring pixels on the wafer. From left to right, Figs. 3 and 4 show the optimized source patterns, the corresponding print images on the focal plane and 75nm defocal plane, as well as the print images on focal plane with 10% illumination dose variation. The intensities of the source patterns are normalized so that the overall dose is 1. The colors toned from black to white represent the range from 0 to the maximum intensity. For the print images, black and white represent 0 and 1, respectively. Figure 5 shows the 2D-DCT coefficients of the optimal source patterns. Figure 5(a), 5(b) and 5(c) are the 2D-DCT coefficients of the optimized source patterns for the line-space pattern with

*M*=200,

*M*=100, and

*M*=25, respectively. It is observed that the optimized source patterns are moderately sparse on the DCT domain. That is because we turned off the faint source pixel to keep balance between sparsity on both of the DCT domain and the pupil domain. It is observed that the optimized source patterns change as the number of the selected monitoring pixels. That is because the linearized Bregman algorithm always pursues the optimal solution of the

*l*

_{1}-norm minimization problem based on the given linear constraints. Changing the monitoring pixel number will change the linear constraints, thus leading to different optimal source patterns. In our future work, we will investigate the mathematical analysis on how the final results changes as the sample points. In addition, for a certain number of monitoring pixels, the optimization results could change slightly for different random selections. However, the optimal source patterns and print images are similar if the monitoring pixel number is determined.

Figure 6 shows an instance of the randomly selected monitoring pixels on wafer. The top row shows the distribution of randomly selected monitoring pixels for the line-space pattern with (a) *M*=200, (b) *M*=100 and (c) *M*=25. White regions represent all of the monitoring pixels before down-sampling, including the inner and outer margins of mask features, and the monitoring pixels in non-pattern regions. The black dots are the locations of randomly selected monitoring pixels. Since the target pattern is vertically and horizontally symmetric, we only select the monitoring pixels in the left-up quarter marked by the red dashed lines. It is shown that the monitoring pixels are selected from the narrow pool around target boundaries in the left-up quarter. Thus, we can use a small number of randomly selected monitoring pixels to approximately represent the geometric characteristics of the target pattern.

In our simulations, the pattern error (PE), aerial image contrasts and PWs are used to evaluate the image performance, which are summarized in Table 1 and Table 2. The pattern error is calculated as:

*M*decreases, the pattern error will increase and the contrast will decrease for both CS and CG methods. That is reasonable since the less monitoring pixels are used, the harder is to control the image fidelity and aerial image contrast. On the other hand, with the same monitoring pixel numbers, the proposed CS method always leads to better image fidelity and approximate twofold higher contrasts than the CG method. That is because the CS method applies linear constraint of ${\overrightarrow{Z}}_{s}={\overrightarrow{I}}_{s}={\mathbf{I}}_{cc}^{s}\mathbf{\Psi}\overrightarrow{\theta}$ to enforce the aerial image more equal to the target on the monitoring pixels. Figures 7(a), 7(c) and 7(e) show the PE convergences of CS and CG methods with

*M*=200, 100 and 25, respectively. Figures 7(b), 7(d) and 7(f) show the contrast convergences of CS and CG methods with

*M*=200, 100 and 25, respectively. It is observed that the CS method has more stable convergence property than the CG method. In addition, from Figs. 3 and 4, we can conclude that the CS method successfully preserves the sparsity of the source pattern, thus may achieve simpler source patterns than the CG method. Table 1 also presents the runtimes of the CS and CG methods, where we run 5 times independent simulations and calculated the average runtime. All programming are implemented by Matlab and the computations are carried out on an Intel(R) Xeon(R) ×5650 CPU (2 cores), 2.66 GHz, 32 GB of RAM. In contrast to the CG method, the proposed CS method achieves four-fold to five-fold speedup. Furthermore, the runtime of CS method can be effectively reduced by using less monitoring pixels, while the runtime of CG method seems independent of the monitoring pixel number. This trend can be explained as following. As shown in Eqs. (12) and (13), the computational intensity of the update operations in CS method is approximately proportional to the monitoring pixel number

*M*. On the other hand, the computational intensity of the CG method is approximately proportional to the number of source pixel ${N}_{s}^{2}$. The PWs of CS and CG methods with different monitoring pixel numbers are illustrated in Fig. 8(a). The PWs are measured at the middle bar along the black dashed line in Fig. 2(a), and we use Prolith software to calculate the PWs. The values of depth of focus (DOF) corresponding to different exposure latitudes (EL) are summarized in the upper half of Table 2. For both CS and CG methods, the PWs can be extended by increasing the monitoring pixel number. Furthermore, CS methods can achieve much larger PWs than the CG methods, thus effectively improving the robustness of lithography systems to the process variations.

Figures 9 and 10 illustrate the simulations of CS and CG methods based on another horizontal block pattern. The critical dimension of this pattern is CD=45nm. All of the optimization parameters are the same as those in Figs. 3 and 4. Table 3 summarizes the pattern errors, aerial image contrasts and runtimes of CS and CG methods with *M*=200, 100 and 25. Figure 5(d), 5(e) and 5(f) show the 2D-DCT coefficients of the optimized source patterns for the horizontal block pattern with *M*=200, *M*=100, and *M*=25, respectively. Figures 6(d), 6(e) and 6(f) show the distributions of the randomly selected monitoring pixels for the horizontal block pattern with *M*=200, *M*=100 and *M*=25, respectively. The convergence curves of pattern error and contrast are shown in Fig. 11. The PWs of above simulations are compared in Fig. 8(b), where the PWs are measured at the middle bar along the black dashed line in Fig. 2(b). It is noted that the PW of CG method with *M*=25 even vanishes. The values of DOF corresponding to different ELs are summarized in the bottom half of Table 2. From the simulations of horizontal block pattern, we can obtain similar conclusions to the line-space pattern. These above simulations demonstrate the superiority of the CS methods over the CG method in the aspects of the imaging performance, source manufacturability and computational efficiency.

In the prior simulations, the monitoring pixels are selected randomly, since the random selection process is beneficial to keep the incoherence property between the projection matrix and the basis matrix as described in Section 2. Figure 12 shows the simulation results based on deterministic pattern selection of the monitoring pixels. In Fig. 12, we choose *M* =200, and the monitoring pixels are selected by skipping a set number of pixels. Figure 12(a) and 12(b) show the optimized source patterns using the CS and CG methods for the line-space pattern, respectively. Figure 12(c) and 12(d) show the optimized source patterns using the CS and CG methods for the horizontal block pattern, respectively. The bottom row shows the nominal print images corresponding to the source patterns in the top row. It is observed that the CS method is more effective than the CG method to improve the image fidelity.

This paper develops a general framework of SO based on compressive sensing theory and basis pursuit approach. Although the above mathematical derivation and simulations are all based on the 2D-DCT basis, our approach is also suitable for other appropriate basis. Figure 13 shows the simulations using the spatial basis for both of the line-space pattern and horizontal block pattern. In Fig. 13, we choose the basis matrix **Ψ** in Eq. (2) as an identity matrix. That means that we would like to pursue the optimal source pattern that is sparse on the pupil domain. The top and bottom rows show the simulations for the line-space pattern and the horizontal block pattern, respectively. From left to right, Fig. 13 shows the optimized source patterns, the corresponding print images on the focal plane and 75nm defocal plane, as well as the print images on focal plane with 10% illumination dose variation. The comparison among Fig. 3, Fig. 9 and Fig. 13 shows that using the spatial basis results in little worse image fidelities than using the 2D-DCT basis.

In order to justify the merit of the DCT-sparsity assumption, we present the SO simulations of the line-space pattern using *l _{p}*-norm basis pursuit algorithm in Figure 14. Different from Eq. (2), the SO here is formulated as a

*l*-norm inverse reconstruction problem as:

_{p}*l*-norm of

_{p}*θ⃗*. The improved Bregman iterative algorithm in [42] is used to solve for the SO problem in Eq. (21). From left to right, Fig. 14 illustrates the SO simulations with

*p*=1.3, 1.5 and 1.8, where the number of randomly selected monitoring pixels is

*M*=200. The top and bottom rows show the optimized source patterns and the corresponding nominal print images, respectively. Compared to the simulations in Fig. 3 with

*p*=1, we found that the image fidelity is degraded as the order

*p*is increased. It is noted that, smaller

*p*enforces higher sparsity on the 2D-DCT domain. Thus, these examples demonstrate that the image fidelity improves as the sources become more sparse in the DCT domain.

## 6. Conclusion

This paper proposed an innovative efficient and robust source optimization algorithm based on the compressive sensing method. The source optimization was formulated as a *l*_{1}-norm image reconstruction problem with linear constraint. A vector imaging model was used to generate the ICC matrix that represents the image formation process of each source point. The linearized Bregman algorithm was applied to efficiently solve for the source optimization problem. The compressive sensing theory guarantees the optimal source pattern can be synthesized from a small set of monitoring pixels that even much less than the source pixels. In addition, the computational complexity of the CS method is proportional to the number of monitoring pixels. Compressive sensing is thus exploited to dramatically speedup the proposed algorithm. The *l*_{1}-norm cost function preserves the sparsity of the optimized source pattern on 2D-DCT domain, thus improving the source manufacturability. The linear constraint in the optimization framework makes the aerial image contrast gradually increase during the optimization process. Our simulations demonstrate the proposed CS method outperforms the traditional CG method in the aspects of imaging performance, source manufacturability and computational efficiency.

## Acknowledgments

We thank the financial support by Key Program of National Natural Science Foundation of China under No. 60938003, the National Science and Technology Major Project, Program of Education Ministry for Changjiang Scholars in University, the National Natural Science Foundation of China (Grant No. 61204113), the Program for New Century Excellent Talents in University (NCET, Grant No. NCET-10-0042), the Special Plan of Major Project Cultivation of Beijing Institute of Technology, the Basic Research Foundation of Beijing Institute of Technology (Grant No. 20120442001), and the Technology Foundation for Selected Overseas Chinese Scholar. We also thank the KLA-Tencor Corporation for providing academic use of Prolith.

## References and links

**1. **A. K. Wong, *Resolution Enhancement Techniques in Optical Lithography* (SPIE Press, 2001). [CrossRef]

**2. **X. Ma and G. R. Arce, *Computational Lithography*, Wiley Series in Pure and Applied Optics, 1st ed. (John Wiley and Sons, New York, 2010). [CrossRef]

**3. **K. Lai, *et al.*, “Design specific joint optimization of masks and sources on a very large scale,” Proc. SPIE , vol. **7973**, p. 797308 (2011). [CrossRef]

**4. **R. R. Vallishayee, S. A. Orszag, and E. Barouch, “Optimization of stepper parameters and their influence on OPC,” Proc. SPIE , vol. **2726**, pp. 660–669 (1996). [CrossRef]

**5. **T. E. Brist and G. E. Bailey, “Effective multicutline QUASAR illumination optimization for SRAM and logic,” Proc. SPIE , vol. **5042**, pp. 153–159 (2003). [CrossRef]

**6. **M. Burkhardt, A. Yen, C. Progler, and G. Wells, “Illumination design for printing of regular contact patterns,” Microelectron. Eng. **41**, 91–96 (1998). [CrossRef]

**7. **T. S. Gau, R. G. Liu, C. K. Chen, C. M. Lai, F. J. Liang, and C. C. Hsia, “The customized illumination aperture filter for low k1 photolithography process,” Proc. SPIE , vol. **4000**, pp. 271–282 (2000). [CrossRef]

**8. **Y. V. Miklyaev, W. Imgrunt, V. S. Pavelyev, D. G. Kachalov, T. Bizjak, L. Aschke, and V. N. Lissotschenko, “Novel continuously shaped diffractive optical elements enable high efficiency beam shaping,” Proc. SPIE , vol. **7640**, p. 764024 (2010). [CrossRef]

**9. **J. Carriere, J. Stack, J. Childers, K. Welch, and M. D. Himel, “Advances in DOE modeling and optical performance for SMO applications,” Proc. SPIE , vol. **7640**, p. 764025 (2010). [CrossRef]

**10. **Y. Granik, “Source optimization for image fidelity and throughput,” J. Microlith. Microfab. Microsyst. **3**, 509–522 (2004).

**11. **K. Tian, A. Krasnoperova, D. Melville, A. E. Rosenbluth, D. Gil, J. Tirapu-Azpiroz, K. Lai, S. Bagheri, C. C. Chen, and B. Morgenfeld, “Benefits and trade-offs of globalsource optimization in optical lithography,”Proc. SPIE , vol. **7274**, p.72740C (2009). [CrossRef]

**12. **K. Iwase, P. D. Bisschop, B. Laenens, Z. Li, K. Gronlund, P. V. Adrichem, and S. Hsu, “A new source optimization approach for 2x node logic,” Proc. SPIE, vol. 8166 (2011). [CrossRef]

**13. **A. E. Rosenbluth, S. Bukofsky, C. Fonseca, M. Hibbs, K. Lai, A. Molless, R. N. Singh, and A. K. Wong, “Optimum mask and source patterns to print a given shape,” J. Microlith. Microfab. Microsyst. **1**(1), 13–30 (2002).

**14. **X. Ma and G. R. Arce, “Pixel-based simultaneous source and mask optimization for resolution enhancement in optical lithography,” Optics Express **17**(7), 5783–5793 (2009). [CrossRef] [PubMed]

**15. **J. Yu and P. Yu, “Gradient-based fast source maskoptimization (SMO),” Proc. SPIE ,vol. **7973**, p. 797320 (2011). [CrossRef]

**16. **X. Ma, C. Han, Y. Li, L. Dong, and G. R. Arce, “Pixelated source and mask optimization for immersion lithography,” J. Opt. Soc. Am. A **30**(1), 112–123 (2013). [CrossRef]

**17. **A. E. Rosenbluth and N. Seong, “Global optimization of the illumination distribution to maximize integrated process window,” Proc. SPIE , vol. **6154**, p. 61540H (2006). [CrossRef]

**18. **A. Erdmann, T. Fühner, T. Schnattinger, and B. Tollkühn, “Towards automatic mask and sourceoptimization for optical lithography,” Proc.SPIE , vol. **5377**, pp.646–657 (2004). [CrossRef]

**19. **S. Robert, X. Shi, and L. David, “Simultaneous source mask optimization(SMO),” Proc. SPIE, vol.5853, pp. 180–193(Yokohama, Japan, 2005). [CrossRef]

**20. **S. Hsu, L. Chen, Z. Li, S. Park, K. Gronlund, H. Liu, N. Callan, R. Socha, and S. Hansen, “An innovative source-mask co-optimization (SMO) method for extending low k_{1} imaging,” Proc. SPIE , vol. **7140**, p. 714010 (2008). [CrossRef]

**21. **Y. Peng, J. Zhang, Y. Wang, and Z. Yu, “Gradient-based source and mask optimization in optical lithography,” IEEE Trans. Image Proc. **20**, 2856–2864 (2011). [CrossRef]

**22. **N. Jia and E. Y. Lam, “Pixelated source mask optimization for process robustness in optical lithography,” Opt. Express **19**(20), 19,384–19,398 (2011). [CrossRef]

**23. **J. Li, Y. Shen, and E. Lam, “Hotspot-aware fast source and mask optimization,” Opt. Express **20**(19), 21792–21804 (2012). [CrossRef] [PubMed]

**24. **J. Li, S. Liu, and E. Lam, “Efficient source and mask optimization with augumented Lagrangian methods in optical lithography,” Opt. Express **21**(7), 8076–8090 (2013). [CrossRef] [PubMed]

**25. **S. Li, X. Wang, and Y. Bu, “Robust pixel-based source and mask optimization for inverse lithography,” Opt. Laser Technol. **45**, 285–293 (2013). [CrossRef]

**26. **X. Ma, C. Han, Y. Li, B. Wu, Z. Song, L. Dong, and G. R. Arce, “Hybrid source mask optimization for robust immersion lithography,” Appl. Opt. **52**(18), 4200–4211 (2013). [CrossRef] [PubMed]

**27. **J. C. Yu, P. Yu, and H. Y. Chao, “Fast source optimization involving quadratic line-contour objectives for the resist image,” Opt. Express **20**(7), 8161–8174 (2012). [CrossRef] [PubMed]

**28. **J. Nocedal and S. J. Wright, *Numerical Optimization*, 2nd ed. (Springer, New York, 2006).

**29. **G. M. Gallatin, “High-numerical-aperture scalar imaging,” Appl. Opt. **40**(28), 4958–4964 (2001). [CrossRef]

**30. **E. Candés, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inform. Theory **52**(2), 489–509 (2006). [CrossRef]

**31. **D. Donoho, “Compressive sensing,” IEEE Trans. Inform. Theory **52**(4), 1289–1306 (2006). [CrossRef]

**32. **A. K. Jain, *Fundamentals of Digital Image Processing* (Prentice Hall, 1988).

**33. **K. Lai, *et al.*, “Experimental result and simulation analysis for the use of pixelated illumination from source mask optimization for 22nm logic lithography process,” Proc. SPIE , vol. **7274**, p. 72740A (2009). [CrossRef]

**34. **S. Osher, M. Burger, D. Goldfarb, J. Xu, and W. Yin, “An iterative regularization method for total variation-based image restoration,” Multiscale Model. Simul. **4**(2), 460–489 (2005). [CrossRef]

**35. **J. F. Cai, S. Osher, and Z. Shen, “Linearized bregman iterations for compressed sensing,” Mathematics of Computation **78**(267), 1515–1536 (2009). [CrossRef]

**36. **Z. Wang and G. R. Arce, “Variable density compressed image sampling,” IEEE Trans. Image Proc. **19**(1), 264–270 (2010). [CrossRef]

**37. **J. L. Paredes and G. R. Arce, “Compressive sampling signal reconstruction by weighted median regression estimates,” IEEE Trans. Signal Proc. **59**(6), 2585–2601 (2011). [CrossRef]

**38. **D. Peng, P. Hu, V. Tolani, and T. Dam, “Toward a consistent and accurate approach to modeling projection optics,” Proc. SPIE , vol. **7640**, p. 76402Y (2010). [CrossRef]

**39. **Y. Li, L. Dong, and X. Ma, “Aerial image calculation method based on the Abbe imaging vector model,” Chinese Patent, ZL 201110268282.X (Authorized in 2013).

**40. **D. Donoho and X. Huo, “Uncertainty principles and ideal atomic decomposition,” IEEE Trans. Inform. Theory **47**(7), 2845–2862 (2001). [CrossRef]

**42. **X. Li, “An improved Bregman iterative algorithm,” Master’s thesis, Beijing Jiaotong University (2010).