## Abstract

Extreme ultraviolet (EUV) lithography mask defects may cause severe reflectivity deformation and phase shift in advanced nodes, especially like multilayer defects. Geometric parameter characterization is essential for mask defect compensation or repair. In this paper, we propose a machine learning framework to predict the geometric parameters of multilayer defects on EUV mask blanks. With the proposed inception modules and cycle-consistent learning techniques, the framework enables a novel way of defect characterization with high accuracy.

© 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

As the critical dimension scales down in technology nodes, EUV lithography becomes the dominating candidate for semiconductor manufacturing at 7nm node and beyond. An EUV mask blank consists of Mo/Si multilayers and substrate. Mask defects like deposited particles underneath or inside the multilayers could cause deformation or disruption of the layer structure [1]. These defects could distort the multilayer reflectivity. If they cannot be fixed or compensated during the absorber deposition, they could generate phase shift during the EUV imaging and leads to pattern shift. As illustrated in Fig. 1, the EUV projection image and the resist contour image should be as shown in Figs. 1(b) and (c) for a contact mask with no defect as in Fig. 1(a). On the contrary, an uncompensated multilayer defect on the mask (Fig. 1(d)) could leads to imaging deformation and pattern shift as shown in Figs. 1(e) and (f). Therefore, it is necessary to detect and characterize such defects for mask repair [2,3].

Figure 2 illustrates the typical flow of multilayer defect inspection, characterization and repair. First, the multilayer defects are inspected and located on a given mask blank. Then the geometry of these defects is characterized by measuring the height and full width at half maximum (FWHM) at the top and bottom of the multilayer, i.e., $h_{top}, w_{top}, h_{bot}, w_{bot}$. These measured parameters could be used to guide mask pattern manufacturing for defects repair. As the printed patterns are defined by an absorber layer on the top of the multilayer mask, the locations of patterns are adjusted to cover defects and the pattern shapes are modified to compensate the imaging deformation caused by defects [2,3]. This repairment is a kind of optical proximity correction (OPC). The target patterns intends to be printed on photoresist through EUV lithography. The modified patterns are corrected target patterns for compensating the reflectivity change caused by multilayer defect.

Conventional approach for multilayer defect characterization relies on constructing an EUV inspection system with extra equipment [4–15]. The system can be used to build a library with paired information of defects and deformation, e.g., defect geometry and scattering images, respectively. The defect characteristics of a new mask blank can be back-traced by matching the scattering image output from the inspection system and those in the library. Extensive researches have been conducted to develop inspection systems that can detect and characterize defects as small as possible. For instance, the first actinic mask inspection system [16] can detect defects as small as 3nm of $h_{top}$ and 30nm of $w_{top}$. The microcoherent EUV scatterometry microscope (micro-CSM) system [17,18] improves the resolution to 1.4nm of $h_{top}$ and 25nm of $w_{top}$. The conventional approach lacks the capability of handling unseen defects. Recently, the mapping function from the deformation to defect geometric parameters is constructed for multilayer defect characterization. Dou *et al.* [19] retrieved the phase deformation from scattering images, and leveraged partial least-square regression (PLSR) for the mapping of the phase deformation properties to the defect geometric parameters. Xu *et al.* [20] collected intensity and phase distributions of designed sets based on a reference multilayer defect with different pupil filters, and then mapped this distribution information to geometric parameters.

Due to the successful application of machine learning techniques in other lithography related areas [21–33], we propose a machine learning framework to model the correlation between the deformation and the defect characteristics. In principle, any information containing the deformation could be used for the model calibration, including aerial images, scattering images. In our research, we leverage an EUV imaging system, collect the 1000x magnified aerial images and mapping them to multilayer defect geometric parameters. The main contributions are summarized as follows.

- • We explore a novel way of defect characterization based on machine learning techniques.
- • We propose a cycle-consistent learning technique to assist the learning of an inverse projection function.
- • We develop an inception-based neural network for high-performance defect characterization.
- • The experimental results demonstrate that the framework can achieve 3.02% error rate on average, which improves about 0.87% over a plain convolution neural network (CNN) and 1.78% better than the most recent work [20] using principal component analysis (PCA) and artificial neural networks (ANN).

The rest of the paper is organized as follows. Section 2 describes the background and provides the problem formulation. Section 3 presents the detailed algorithm of our approach. Section 4 validates the effectiveness of our approach with experimental results. Section 5 concludes the paper.

## 2. Preliminaries

In this section, we will review the background of multilayer defect characterization and provide the problem formulation in this work.

#### 2.1 Multilayer defect characterization

In the conventional characterization approach, we first build an EUV inspection system to collect deformation information (e.g., scattering image) caused by multilayer defects. Then we plant programmed defects on a mask blank, collect the corresponding scattering images through the inspection system, and build a library with paired multilayer defects geometric parameters and scattering images. Once the library is built, the geometric parameters of a defect can be obtained in two steps: 1) measure the scattering image through the inspection system; 2) retrieve the matched geometric entry in the library using the image.

Thus, the inspection system is the key in the characterization. Figure 3 illustrates a schematic example of such a system, i.e., the micro-CSM system [17]. This system consists of the coherent EUV illumination, the fresnel zone plate (FZP) and a charge-coupled-device (CCD) camera. A circular region and a square rectangle region are patterned on the FZP. During the inspection, the coherent EUV light is focused on the EUV mask blank with a multilayer defect through the circular region of the off-axis FZP. Then the EUV rays are reflected by the mask blank and pass the rectangle region. The CCD camera at the end records the scattering image formed by the reflected EUV rays. Constructing such a system demands careful design and assemble of optical equipment. For example, optical parameters (e.g., focal length, the diameter of the circular region, etc.) of the equipment need to be designed and tuned carefully.

#### 2.2 Projection system based characterization

The conventional method lacks of ability of handling unseen defects. If the information of a defect is not collected by the paired library, its geometric parameters cannot be retrieved. Machine learning based method could fix this problem by learning a mapping function. In the proposed machine learning approach, we first plant programmed defects on a mask blank and collect corresponding deformation information, then learn a function mapping the deformation to geometric parameters. The contour on photoresist relates to the aerial image. Therefore, the aerial image is adopted as the input of the mapping function. We leverage the projection system as shown in Fig. 4 to collect the aerial images. The inspection principle of this projection system is same as the system in Fig. 3 except that it collects aerial images. The illuminator-directed EUV light is reflected by mask, then focuses through projection optics, and finally collected by an aerial image sensor. With consideration of the actual pixel resolution of the aerial image sensor, this projection system is designed to project 1000x magnified images. Deformation caused by multilayer defects affects the projection intensity. Therefore, from the light intensity, we can inversely infer the information of the defects. With the aerial image sensor, the projection intensity could be collected. To this end, if we let the projection system be modeled by a projection function $f$, characterizing the defects from projection images becomes a modeling task for the inverse function $f^{-1}$. Then, machine learning techniques can be adopted for the characterization task.

#### 2.3 Problem formulation

The main task of this work is to learn the inverse projection function $f^{-1}$, i.e., characterizing multilayer defects from EUV projection images. The prediction target is the geometric parameters (a.k.a $h_{top}, w_{top}, h_{bot}, w_{bot}$) of multilayer defects. As the geometric parameters are continuous, this is essentially a regression task. We adopt error rate for evaluation.

**Problem 1 (Machine learning based defect characterization)** *Given a dataset containing geometric parameters of multilayer defects and corresponding EUV projection images, train a model to accurately predict the geometric parameters of a multilayer defect given its EUV projection image*.

## 3. Framework

In this section, we will introduce our framework, starting from a CNN architecture and gradually incorporating new features like inception modules and cycle consistency loss.

#### 3.1 Convolution neural network

For simplicity, we define $g$ as the inverse projection function $f^{-1}$. Then the goal of multilayer defect characterization is to learn $g: X \rightarrow Y$, where $X$ denotes the EUV projection images and Y denotes the defect geometric parameters. A straightforward way is to construct a CNN to learn the regression task for its high performance in image related tasks [34]. A CNN generally consists of convolution layers for feature extraction and fully connected layers for classification or regression. The rectified linear unit (ReLU) is adopted as the activation function for fast convergence and nonlinearity introduction. The max pooling layer conducts downsampling to reduce translation invariance. Let $G$ be the learned function of $g$. Then the loss function is defined as follows,

where $||\cdot ||_p$ denotes the $p$-norm. We adopt $\ell _2$-norm ($p$=2) for CNN, which is equivalent to the mean-square-error loss. We tune the hyperparameters for the best accuracy in this task. Table 1 summarizes the CNN architecture, with the convolution module (CM) defined in Fig. 5(a) as building blocks. The size, stride and channel number of the filters for the convolution (Conv) layers are listed in the column “Filter” or marked in the module figures. The sequence “$M \times N, S, C$” denotes that the size of the filter is $M \times N$, the stride size is $S$, and the channel number is $C$. We conduct zero-padding for all filters. Five CMs are cascaded in the CNN architecture. Batch Normalization (BN) and 50% dropout are selectively applied on certain layers. However, this plain CNN architecture lacks the consideration of the characteristics of EUV imaging and hence there is still quite some room to improve.#### 3.2 Inception module structure

The inception architecture is constructed behind the idea of approximating and covering an optimal local sparse structure with readily available dense components in a convolutional network [35]. We intend to find the optimal local construction and repeat it spatially. Figure 6 shows a basic inception module [35]. The inception module consists of the 1x1, 3x3, 5x5 filter layers and a pooling layer. This means to apply 1x1, 3x3, 5x5 convolution and pooling operation on the input and split it into four branches. Those parallel layers could help the network to be more adaptable to the input feature scale. The output is formed by concatenating outputs of all those parallel layers into a single vector. The formula to describe the inception module is as follows:

where I(x) denotes the whole inception process, $i_k(x)$, $k = 1,2,\ldots ,n$ denotes the function of each branch. The output of all branches are concatenated by adding the channels together. In this way, feature with different views are extracted.As different geometry of the defect can cause local and global changes in the scope and the fidelity of the EUV projection image, applying feature extraction with different sizes of filters can extract holistic features from both local and global views. Therefore, parallel branches with inception modules are introduced to extract features with different views. Figure 7 shows the structure of the two cascaded inception modules (IMs) used in our model. The inception module consists of four parallel feature extraction branches. We introduce $1 \times 1$ convolution layers and the factorization of convolutions [36,37] to reduce the complexity of the inception modules. Other than deepening the structure, $1 \times 1$ convolution layers also can improve the nonlinearity by introducing extra ReLU layers. The factorization of convolutions means the replacement of lager filters with smaller and deeper ones. For example, a $5 \times 5$ filter can be replaced by two $3 \times 3$ filters, a $3 \times 3$ filter can be replaced by one $1 \times 3$ filter and one $3 \times 1$ filter. With the increase of nonlinearity, the training parameters are reduced from $5 \times 5=25$ to $3+3+3+3=12$ while keeping the same receptive field. The inception structure of one module is highlighted with an orange rectangle. The same module is used again in the next stage. The two cascaded inception modules are used to replace two cascaded CMs in Table 1.

#### 3.3 Cycle-consistent learning

The modeling accuracy can be further improved if we consider the practical imaging process as shown in Fig. 4. Given an EUV mask blank with a multilayer defect, we can get the projection intensity image. We intend to learn $g = f^{-1}$, the inverse projection function, for defect characterization with the EUV projection images as input. Besides learning $g$ directly, we can construct a network to learn $f$ to assist in learning the inverse function $g$. More specifically, given an EUV projection image $x$, we obtain the geometric parameters $y$ of the multilayer defect through inverse projection function $g$. If we plant defect with same parameters $y$ on a mask blank and put it under the EUV projection system, we should get the original projection image $x$. If we define $f: Y \rightarrow X$ as the projection function and $F$ as the learned $f$, then it is a cycle of transformation, i.e., $X \xrightarrow {G} Y \xrightarrow {F} X'$. If both $G$ and $F$ are accurate, then $X$ and $X'$ should be equal. Based on such an intuition, we introduce the cycle consistency loss [38], as shown in Fig. 8. For each image $x$, the transformation cycle should bring $x$ back to the original image (a.k.a satisfy forward cycle consistency [39]): $x \rightarrow G(x) \rightarrow F(G(x)) \approx x$. Similarly, backward cycle consistency: $y \rightarrow F(y) \rightarrow G(F(y)) \approx y$ should also be satisfied. The cost function based on cycle consistency loss is as follows,

Similar to Eq. (2), the loss function for $F$ is defined as follows,Unlike CycleGAN [39], our task aims at learning $G$ accurately, while $F$ is not a direct optimization target. We define $\lambda$ as the parameter to control the relative importance between two cycle losses. Moreover, $\ell _1$-norm is used for the final loss functions. We then define the full loss function as follows,

The entire network is designed as an encoder-decoder structure, with the encoder network for the function $G$ and the decoder network for the function $F$. The encoder takes the input $x$, downsamples it through multiply layers, and obtains the output $G(x)$ from the bottleneck layer. We use the CNN with inception model for the encoder. The decoder reverses the process and upsamples $G(x)$ to $F(G(x)$. We value the performance of the encoder $G$ more since the goal of our model is to predict the geometric parameters of defects. The details of the model architecture are summarized in Table 2. Two cascaded inception modules are adopted in the encoder. We define the structure marked with a purple rectangle in Fig. 5(b) as one deconvolutional module (DM). Four cascaded DMs are adopted in the decoder.The whole framework is illustrated as in Fig. 9. The encoder G(Inception) in Table 2 takes the projection image $x$ as input and get the predicted geometric parameters as output. This encoder G(Inception) is our target model. Decoder F in Table 2 is used to introduce cycle-consistent loss and help the training of G(Inception). Relatively, F takes the geometric parameters as input and get the projection image as output. The $x, \hat {y}, \hat {x}, {y}, \hat {x}', \hat {y}'$ are the same as in Fig. 8. The terms of full loss function as in Eq. (6), a.k.a $L_{G_1}, L_{forward}, L_{F_1}, L_{backward}$, are also marked.

## 4. Experimental results

In this section, we explain the experiment settings and results to validate the proposed techniques.

#### 4.1 Data preparation

We employ the EUV projection images as the input and the geometric parameters of defects as the output for model training. To verify the performance of the proposed approach, rigorous simulation of a 1000x magnification projection system is performed by Synopsys Sentaurus Lithography (S-litho) on a server with 12-core Intel Xeon CPU @2.1GHz for data collection. We place the multilayer defect of different geometric parameters at the center of the mask blank and collect the EUV projection images at the focal plane. Note that there are two typical multilayer defects: bump defects and pit defects. We adopt bump Gaussianshaped defects to demonstrate the proposed framework. The simulation is based on rigorous coupled wave analysis (RCWA). The imaging settings are as follows: wavelength $\lambda$ = 13.5nm, $NA$=0.33, an annular source with center radius = 0.6 and width = 0.2. The source is unpolarized. Both TE and TM polarization are taken into consideration when computing mask 3D effect. For better understanding, defect geometry parameters are specified on wafer scale. Considering the actual defects characteristics, the parameter settings are: $h_{top} = 0.4\sim 2.2nm, w_{top} = 5\sim 20nm, h_{bot}= 6\sim 18nm, w_{bot}= 9\sim 21nm$. The wafer scale for analysis is 200x200 $nm^{2}$. The 1000x magnified projection image is gotten through the simulation. It is expensive for neural networks to process. Therefore, we downscale the projection images and represent each intensity image with $200 \times 200$ pixels The value of each pixel denotes the light intensity at that point. The collected dataset contains 2000 samples, we randomly divide the dataset into 1600 training samples and 400 testing samples. If not specially mentioned, the error rates reported in the following are for the testing set.

#### 4.2 Performance evaluation

The proposed framework for multilayer defect characterization is implemented in Python with Tensorflow 1.14.0 on a Linux server with 20-core Intel Xeon Gold 6230 CPU @2.1GHz and one NVIDIA GTX 2080TI GPU. Adam is adopted as the gradient descent optimizer, the initial learning rate is 0.001 and it decays 35% every 3000 training steps. The batch size is 32. The total training step is 15000. Moreover, to avoid statistical instability in randomness, we train the model ten times with different random seeds and report the average results. The general runtime for training the model is around 36 minutes. Note that the training is a one time thing, once the model is gotten, it could be used for geometric parameters prediction. The 36-minute training time is acceptable considering the benefit of low prediction error rate. The prediction time on all testing samples is less than one second. Thus we will not separately report the runtime in the following discussion.

Here we summarize the notations for different models. “CNN” adopts the architecture in Table 1; “CNN(wider)” increases the kernel channel of “CNN” from 32 to 64; “CNN(wider+deeper)” is a deeper version of “CNN(wider)” with two more CMs, the kernel channel size of which is also increased to 64; “CNN+inception” denotes the CNN with inception modules; “cycleCNN” denotes “$G \textrm {(CNN)} + F$” with cycle consistency loss; “cycleCNN+inception” denotes “$G \textrm {(Inception)} + F$” with cycle consistency loss. Model “PCA+ANN” denotes the algorithm in the most recent work [20] with PCA and ANN. Since the detailed ANN structure is not given, we build one for comparison. The output feature size of PCA is 200. The ANN consists of one input layer, two fully-connected hidden layers with 100 and 50 hidden units respectively, and one output layer with 4 units.

Table 3 compares the performance of cycleCNN with different $\lambda$ defined in Eq. (6). With $\lambda$ increasing from 0 to 1, a general trend of degradation is observed. As the performance of models with $\lambda$ equal to 0 and 0.1 are very close, we evaluate the performance of cycleCNN+inception with both values. Table 4 shows such comparison. Previous work “PCA+ANN” [20] achieves 4.79% of average error rate. The proposed “CNN” architecture drops the average error rate to 3.89% (0.9% improvement). “CNN” still performs as the best when comparing with a wider or deeper CNN architecture. With two cascaded inception modules, “CNN+inception” improves the average error rate from 3.89% to 3.42% comparing with “CNN”. By introducing cycle-consistent learning, “cycleCNN($\lambda =0$)+inception” further drops the average error rate to 3.02%. It improves 1.78% comparing to “PCA+ANN”, which matters much considering that 4.79% is already a low error rate. The robustness of the proposed network can be further tested. For instance, if cancels the training part, the average error rate reaches up to 100.02%, which is unacceptable in real measurement. What’s more, for machine learning based method, the training part is essential. The training samples need to be collected as many as possible for better model performance.

Figure 10 presents the average testing error rate and the standard deviations of ten random seeds for different models. Among all CNN models, the proposed “cycleCNN($\lambda =0$)+inception” achieves both the lowest average error rate and standard deviation, demonstrating the model stability. Particularly, the standard deviation of “PCA+ANN” is much lower than the proposed method, this happens may because that the neural network is simpler than the proposed CNN series. Table 5 compares the number of trainable parameters and average error rate for different models. The row of “Ratio” indicates the ratio between the number of trainable parameters for a model and that for “CNN”. For cycleCNN series, as only encoder is eventually used for prediction and the decoder is only used to assist model training, we do not include the parameters of the decoder for ratio computation.

We can see that “CNN” improves the error rate from 4.79% to 3.89% with twice the parameters comparing to “PCA+ANN”. Although wider or deeper CNNs have more parameters ($2.35\times$ and $1.39\times$, respectively), they do not improve the accuracy of the baseline CNN. The integration of inception modules can drop the error rate from 3.89% to 3.42% with only 8% more parameters. The cycle-consistent learning technique does not increase the model complexity for prediction, but more parameters are introduced from the decoder during training. Therefore, with the same model complexity, cycle-consistent learning can further improve the error rate from 3.89% to 3.48% without inception, and from 3.42% to 3.02% with inception. These results demonstrate that introducing cycle consistency loss is effective and efficient in improving the accuracy for the defect characterization task. Eventually, with both inception and the cycle-consistent learning techniques, the error rate drops from 3.89% to 3.02%, which is 0.87% improvement. This is rather significant considering that the CNN baseline error rate (3.89%) is already very low.

## 5. Conclusion

In this work, we present a machine learning framework for multilayer defect characterization on EUV mask blanks. The framework takes the EUV projection images as input and predicts the defect geometric parameters. A plain CNN is built to achieve 3.89% average error rate, which is already low enough. Furthermore, with the cascaded inception modules and the cycle-consistent learning technique, we can drop the error rate to 3.02%, which is about 0.87% better than the plain CNN and 1.78% better than the most recent work using PCA and ANN. This work enables a new window for fast and precise EUV mask defect inspection and compensation in semiconductor manufacturing.

## Funding

National Major Science and Technology Projects of China (2017ZX02101004-003, 2017ZX02315001-003).

## Acknowledgments

We would like to acknowledge the support from EUV lithography simulation joint lab of Synopsys and IMECAS. We also would like to acknowledge the support from the Opening Project of Key Laboratory of Microelectronic Devices & Integrated Technology, Institute of Microelectronics, Chinese Academy of Sciences.

## Disclosures

The authors declare no conflicts of interest.

## References

**1. **K. Goldberg and I. Mochi, “Wavelength-specific reflections: A decade of extreme ultraviolet actinic mask inspection research,” J. Vac. Sci. Technol., B: Nanotechnol. Microelectron.: Mater., Process., Meas., Phenom. **28**(6), C6E1–C6E10 (2010). [CrossRef]

**2. **A. Barty, P. B. Mirkarimi, D. G. Stearns, D. W. Sweeney, H. N. Chapman, W. M. Clift, S. D. Hector, and M. Yi, “Euvl mask blank repair,” in * Emerging Lithographic Technologies VI*, vol. 4688 (International Society for Optics and Photonics, 2002), pp. 385–394.

**3. **S. Zhao and Z. J. Qi, “Phase-independent multilayer defect repair for euv photomasks,” in * Photomask Technology 2016*, vol. 9985 (International Society for Optics and Photonics, 2016), p. 998517.

**4. **S. Jeong, C.-W. Lai, S. Rekawa, C. C. Walton, and J. Bokor, “Actinic defect counting statistics over 1-cm2 area of euvl mask blank,” in * Emerging Lithographic Technologies IV*, vol. 3997 (International Society for Optics and Photonics, 2000), pp. 431–440.

**5. **K. A. Goldberg, S. B. Rekawa, C. D. Kemp, A. Barty, E. Anderson, P. Kearney, and H. Han, “Euv mask reflectivity measurements with micron-scale spatial resolution,” in * Emerging Lithographic Technologies XII*, vol. 6921 (International Society for Optics and Photonics, 2008), p. 69213U.

**6. **S. Huh, L. Ren, D. Chan, S. Wurm, K. Goldberg, I. Mochi, T. Nakajima, M. Kishimoto, B. Ahn, and I. Kang, “A study of defects on euv masks using blank inspection, patterned mask inspection, and wafer inspection,” in * Extreme ultraviolet (EUV) lithography*, vol. 7636 (International Society for Optics and Photonics, 2010), p. 76360K.

**7. **T. Hirano, S. Yamaguchi, M. Naka, M. Itoh, M. Kadowaki, T. Koike, Y. Yamazaki, K. Terao, M. Hatakeyama, and H. Sobukawa, “Development of eb inspection system ebeyem for euv mask,” in * Photomask Technology 2010*, vol. 7823 (International Society for Optics and Photonics, 2010), p. 78232C.

**8. **T. Harada, M. Nakasuji, T. Kimura, Y. Nagata, T. Watanabe, and H. Kinoshita, “The coherent euv scatterometry microscope for actinic mask inspection and metrology,” in * Photomask and Next-Generation Lithography Mask Technology XVIII*, vol. 8081 (International Society for Optics and Photonics, 2011), p. 80810K.

**9. **J. Lin, N. Weber, M. Escher, J. Maul, H.-S. Han, M. Merkel, S. Wurm, G. Schönhense, and U. Kleineberg, “Three-dimensional characterization of extreme ultraviolet mask blank defects by interference contrast photoemission electron microscopy,” Opt. Express **16**(20), 15343–15352 (2008). [CrossRef]

**10. **S. Yamaguchi, M. Naka, T. Hirano, M. Itoh, M. Kadowaki, T. Koike, Y. Yamazaki, K. Terao, M. Hatakeyama, and K. Watanabe, “Performance of ebeyem for euv mask inspection,” in * Photomask Technology 2011*, vol. 8166 (International Society for Optics and Photonics, 2011), p. 81662F.

**11. **R. Hirano, S. Iida, T. Amano, H. Watanabe, M. Hatakeyama, T. Murakami, S. Yoshikawa, K. Suematsu, and K. Terao, “Study of extreme ultraviolet lithography patterned mask inspection tool for half-pitch 11-nm node defect detection performance,” J. Micro/Nanolithogr., MEMS, MOEMS **15**(2), 021008 (2016). [CrossRef]

**12. **A. Barty, Y. Liu, E. Gullikson, J. S. Taylor, and O. Wood, “Actinic inspection of multilayer defects on euv masks,” in * Emerging Lithographic Technologies IX*, vol. 5751 (International Society for Optics and Photonics, 2005), pp. 651–659.

**13. **T. Yamane, T. Iwasaki, T. Tanaka, T. Terasawa, O. Suga, and T. Tomie, “The performance of an actinic full-field euvl mask blank inspection system,” in * Alternative Lithographic Technologies*, vol. 7271 (International Society for Optics and Photonics, 2009), p. 72713H.

**14. **L. Juschkin, R. Freiberger, and K. Bergmann, “Euv microscopy for defect inspection by dark-field mapping and zone plate zooming,” in Journal of Physics: Conference Series, vol. 186 (IOP Publishing, 2009), p. 012030.

**15. **S. Herbert, A. Maryasov, L. Juschkin, R. Lebert, and K. Bergmann, “Defect inspection with an euv microscope,” in 26th European Mask and Lithography Conference, vol. 7545 (International Society for Optics and Photonics, 2010), p. 75450O.

**16. **S. Jeong, M. Idir, Y. Lin, L. Johnson, S. Rekawa, M. Jones, P. Denham, P. Batson, R. Levesque, and P. Kearney, “At-wavelength detection of extreme ultraviolet lithography mask blank defects,” J. Vac. Sci. Technol., B: Microelectron. Process. Phenom. **16**(6), 3430–3434 (1998). [CrossRef]

**17. **T. Harada, Y. Tanaka, T. Watanabe, H. Kinoshita, Y. Usui, and T. Amano, “Phase defect characterization on an extreme-ultraviolet blank mask using microcoherent extreme-ultraviolet scatterometry microscope,” J. Vac. Sci. Technol., B: Nanotechnol. Microelectron.: Mater., Process., Meas., Phenom. **31**(6), 06F605 (2013). [CrossRef]

**18. **T. Harada, H. Hashimoto, T. Amano, H. Kinoshita, and T. Watanabe, “Phase imaging results of phase defect using micro-coherent extreme ultraviolet scatterometry microscope,” J. Micro/Nanolithogr., MEMS, MOEMS **15**(2), 021007 (2016). [CrossRef]

**19. **J. Dou, Z. Gao, Z. Yang, Q. Yuan, and J. Ma, “Euv multilayer defects reconstruction based on the transport of intensity equation and partial least-square regression,” in International Conference on Optical and Photonics Engineering (icOPEN 2016), vol. 10250 (International Society for Optics and Photonics, 2017), p. 1025004.

**20. **D. Xu, P. Evanschitzky, and A. Erdmann, “Extreme ultraviolet multilayer defect analysis and geometry reconstruction,” J. Micro/Nanolithogr., MEMS, MOEMS **15**(1), 014002 (2016). [CrossRef]

**21. **W. Ye, M. B. Alawieh, Y. Lin, and D. Z. Pan, “Lithogan: End-to-end lithography modeling with generative adversarial networks,” in Proceedings of the 56th Annual Design Automation Conference 2019, (ACM, 2019), p. 107.

**22. **Y.-T. Yu, G.-H. Lin, I. H.-R. Jiang, and C. Chiang, “Machine-learning-based hotspot detection using topological classification and critical feature extraction,” IEEE Transactions on Comput. Des. Integr. Circuits Syst. **34**(3), 460–470 (2015). [CrossRef]

**23. **X. Ma, B. Wu, Z. Song, S. Jiang, and Y. Li, “Fast pixel-based optical proximity correction based on nonparametric kernel regression,” J. Micro/Nanolithogr., MEMS, MOEMS **13**(4), 043007 (2014). [CrossRef]

**24. **N. Figueiro, F. Sanchez, R. Koret, M. Shifrin, Y. Etzioni, S. Wolfling, M. Sendelbach, Y. Blancquaert, T. Labbaye, and G. Rademaker, “Application of scatterometry-based machine learning to control multiple electron beam lithography: Am: Advanced metrology,” in 2018 29th Annual SEMI Advanced Semiconductor Manufacturing Conference (ASMC), (IEEE, 2018), pp. 328–333.

**25. **J. W. Park, A. Torres, and X. Song, “Litho-aware machine learning for hotspot detection,” IEEE Transactions on Comput. Des. Integr. Circuits Syst. **37**(7), 1510–1514 (2018). [CrossRef]

**26. **S.-W. Chien, J.-S. Cai, C.-L. Lee, K.-Y. Tsai, J. Shiely, and M. S. John, “Investigation on mbopc convergence improvement with location-dependent correction factors aided by machine learning,” in * Optical Microlithography XXXII*, vol. 10961 (International Society for Optics and Photonics, 2019), p. 1096107.

**27. **S. Shim, S. Choi, and Y. Shin, “Machine learning (ml)-based lithography optimizations,” in 2016 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), (IEEE, 2016), pp. 530–533.

**28. **S. Wang, J. Su, Q. Zhang, W. Fong, D. Sun, S. Baron, C. Zhang, C. Lin, B.-D. Chen, and R. C. Howell, “Machine learning assisted sraf placement for full chip,” in Photomask Technology 2017, vol. 10451 (International Society for Optics and Photonics, 2017), pp. 104510D.

**29. **M. Shin and J.-H. Lee, “Accurate lithography hotspot detection using deep convolutional neural networks,” J. Micro/Nanolithogr., MEMS, MOEMS **15**(4), 043507 (2016). [CrossRef]

**30. **Y. Tomioka, T. Matsunawa, C. Kodama, and S. Nojima, “Lithography hotspot detection by two-stage cascade classifier using histogram of oriented light propagation,” in 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC), (IEEE, 2017), pp. 81–86.

**31. **B. Jiang, H. Zhang, J. Yang, and E. F. Young, “A fast machine learning-based mask printability predictor for opc acceleration,” in Proceedings of the 24th Asia and South Pacific Design Automation Conference, (ACM, 2019), pp. 412–419.

**32. **W. Sim, K. Lee, D. Yang, J. Jeong, J.-S. Hong, S. Lee, and H. Lee, “Automatic correction of lithography hotspots with a deep generative model,” in * Optical Microlithography XXXII*, vol. 10961 (International Society for Optics and Photonics, 2019), p. 1096105.

**33. **H. Yang, S. Li, Y. Ma, B. Yu, and E. F. Young, “Gan-opc: Mask optimization with lithography-guided generative adversarial nets,” in 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC), (IEEE, 2018), pp. 1–6.

**34. **A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, (2012), pp. 1097–1105.

**35. **C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (2015), pp. 1–9.

**36. **S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International Conference on Machine Learning, (2015), pp. 448–456.

**37. **C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (2016), pp. 2818–2826.

**38. **T. Zhou, P. Krahenbuhl, M. Aubry, Q. Huang, and A. A. Efros, “Learning dense correspondence via 3d-guided cycle consistency,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2016), pp. 117–126.

**39. **J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE international conference on computer vision, (2017), pp. 2223–2232.