## Abstract

The performance of metasurfaces measured experimentally often discords with expected values from numerical optimization. These discrepancies are attributed to the poor tolerance of metasurface building blocks with respect to fabrication uncertainties and nanoscale imperfections. Quantifying their efficiency drop according to geometry variation are crucial to improve the range of application of this technology. Here, we present a novel optimization methodology to account for the manufacturing errors related to metasurface designs. In this approach, accurate results using probabilistic surrogate models are used to reduce the number of costly numerical simulations. We employ our procedure to optimize the classical beam steering metasurface made of cylindrical nanopillars. Our numerical results yield a design that is twice more robust compared to the deterministic case.

© 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

In the past decades, we have witnessed an increased interest for metasurfaces to realize compact optical components with exceptional wavefront engineering capabilities [1–5]. Metasurfaces introduce highly resolved phase, amplitude, and polarization changes on the incoming wavefront over very short propagation distances with high resolution [3–8]. Consequently, the sophisticated nanofabrication facilities developed in the past few years, together with the exotic features of metasurfaces, have led to peculiar optical phenomena ranging from negative refraction [9], sub-diffraction optical microscopy [10] and broadband achromatic lenses [11,12].

Notwithstanding, the nanofabrication process incorporates stochastic disorder and systematic fabrication errors that are usually non-negligible and that often degrade the performance compared to the numerical simulation predictions. It appears indispensable to consider fabrication imperfections directly during the metasurface’s optimization procedure. The most straightforward design strategy relies on conducting brute force optimization where each individual design is accompanied by a certain number of simulations to characterise the impact of noise in each design parameter. The brute force approach certainly fails to get the most robust design in the parameter space, notably when a considerable number of parameters is considered. Ideally, optimization techniques would incorporate the fabrication imperfections to guarantee that the designs exhibit robustness with respect to the fabrication inaccuracies.

Recent studies, which have demonstrated efficiencies beyond traditional engineering approach, consist on coupling the simulation process to automated optimization algorithms [13–19]. However, these optimizations are usually carried out in a complete deterministic sense, i.e., making the assumption that the real system is perfectly described by its numerical model Deterministic optimization neglects however all factors of inaccuracies. Among them, one usually distinguishes *errors* that are clearly identified and can be controlled, to *uncertainties* that are related to a lack of knowledge of the system and cannot be reduced. The combination of uncertainties and errors affect the design process, inducing large discrepancies between the numerically predicted performance and the reality. In the context nanophotonic designs, numerical errors are typically related to the simulation technique, i.e., the spatial and temporal discretization, the convergence criterion employed, etc. If these factors can be reduced, notably following validation procedure by performing a numerical convergence studies, they do not account for experimental uncertainties and manufacturing tolerances. For example, the characteristics of the incoming waves or the geometry of the nanoscale device that may deviate from nominal conditions. As a consequence, the metasurfaces designed and optimized without accounting for uncertainties may exhibit unexpected behaviors in real conditions, yielding a significant loss of performance.

In the field of nanophotonics, geometric robustness has been lately studied in the gradient-based topology optimization [20,21]. These strategies are essentially limiting the parameter space, focusing exclusively on designs that meet fabrication restrictions. Yet, the topology-based optimization methods generally converge to a local solution and are highly sensitive to the initial design [22,23]. An alternative methodology for the optimization of robust metasurfaces leverages on a concept termed Uncertainty Quantification (UQ) that has only been introduced in this field recently [24] using a large number of Monte Carlo simulations. In [25], a global evolutionary strategy that can take into account the design robustness has been deployed to optimize a color filtering metasurface. The robust design process involves significant computation efforts in comparison to the classical evolutionary algorithms, which are already inherently costly.

We have recently illustrated in [18,19,26] that statistical learning-based optimization techniques, specifically applied to metasurface design, can outperform most of the traditional global optimization approaches. With respect to other deterministic optimization methods, statistical learning is applied on meta-model, which is obtained by fitting the numerical results obtained during the design of experiment, in order to reduce significantly the number of fullwave simulations. In this work, we extend this statistical learning optimization methodology to reduce the computation cost and to account for nanofabrication imperfections.

## 2. Problem formulation

Several formulations can be adopted in the perspective of considering uncertainties during the early design phase. We refer the reader to the review article from Beyer & Sendhoff [27] to build a comprehensive understanding of this field. Most approaches consider the uncertain parameters of the system as random variables, affecting the output of the system. Consequently, the cost function to be optimized becomes also a random quantity. One can distinguish between two main classes of robust optimization methods to tackle this randomness: (i) Approaches that evaluate a robustness measure (e.g., expectancy or worst value) of the cost function, using some numerical techniques. The latter are used to formulate a fully deterministic optimization problem. In this case, the difficulty is related to the estimation of a relevant robustness measure; (ii) Approaches that aim at directly optimizing the noisy cost function. Here, the difficulty is related to the minimization of a non-deterministic function. It requires using specific and usually costly optimization methods. The choice of one of these two approaches typically depends on the application domain.

In the context of metasurfaces design, we consider that the main source of uncertainty concerns the manufacturing tolerances. As a consequence, our objective is to determine a metasurface design that exhibits a high efficiency, corresponding to a low value of a cost function, accounting for metasurface building blocks uncertainties, i.e. potential variation of the nanostructure geometries. Adopting the first approach mentioned above, we formulate an optimization problem based that relies on robustness measures. We consider the geometry of the metasurface denoted by $\mathcal {G}$ and parameterized by a set of variables $x \in \mathcal {D} \subset R^{n}$. These variables are optimized according to a cost function $f$ computed from the solution fields $E$. Thus the optimization problem without uncertainty can be written as:

To account for random perturbations of the geometry, we consider that the geometrical parameters are a set of independent random variables $X$, which are characterized by their probability density functions $\rho _{X}$. Note that if in the following, we make the assumption that the perturbations are uniformly distributed, our methodology could be extended to other distributions, e.g., normal or an empirical distribution of observed manufacturing errors. In the uniform case, the random variables $X$ are simply determined by their expectancy values $\mu _X$ and maximum perturbations $\delta _X$. In this non-deterministic context, $x$ are only instances of the random variables $X$ and thus cannot be considered as optimization variables anymore. As a consequence, the optimization is applied to $\mu _X$. It corresponds to the nominal parameters prescribed during the manufacturing phase, whereas $\delta _X$ is imposed by the accuracy of the fabrication process.

To complete the formulation, one should also choose a robustness measure for the optimization problem. Note that, similarly to the design parameters $X$, the cost function itself becomes by propagation a random quantity $F$. To fix the ideas, one could consider *worst-case* measure so as to ensure reaching the determined efficiency level no matter which are the manufacturing perturbations. We define the *worst-case* cost function as a function of the design parameters $\mu _X$ in presence of uncertainty as:

The corresponding optimization problem can be summarized as:

If this formulation allows to improve the worst efficiency value due to manufacturing tolerances, it does not provide information about the distribution of the efficiency. It is therefore considered as conservative. A second drawback is related to the fact that this problem of Min-Max type is essentially non-differentiable.

A less conservative approach can be adopted by considering as robustness measure the $\alpha -$quantile $Q^{\alpha }_F$, i.e. the value for which the probability of $F$ is lower than $Q^{\alpha }_F$ is $\alpha$:

The corresponding optimization problem can be written as:

For instance, by setting $\alpha =0.9$, one minimizes the cost function values obtained for 90% of manufactured designs.

An alternative robustness measure is to consider the characteristics of the probability density function of the efficiency. Typically, one can minimize the expectancy value of the cost function $\mu _F$ to improve the average efficiency of the metasurface. However, this does not prevent from a large degradation of the performance for some geometry perturbations. Therefore, one often takes into account a second measure based on the variance of the cost function $\sigma ^{2}_F$, to reduce the dispersion of efficiency values as most as possible. It results in a bi-objective optimization problem:

This problem does not exhibit any loss of regularity, contrarily to the problem based on the worst-case measure. However, a multi-objective optimization algorithm is required to solve the problem, relying on the *Pareto front* representing all possible compromises between the two criteria (we refer to our Ref. [28] for more details about the multiobjective optimization). With this approach, it is possible to obtain relatively high efficiency while avoiding a large dispersion. Consequently, we will adopt this formulation in the following of this work. Note that the robustness measures can also be included in the problem as constraints, meaning that one may minimize the expectancy of $F$ subject to a constraint on its $\alpha -$quantiles.

#### 2.1 Estimation of efficiency measures

To solve the optimization problem (6), we first estimate the different robustness measures, in particular the expectancy of the cost function $\mu _F$ and its variance $\sigma ^{2}_F$. These integrals could be evaluated by using high-dimensional numerical quadratures, like Gauss-Legendre rules or sparse-grids methods, but the computational cost would increase quickly with the dimension of the problem $n$. An alternate and easy approach is to employ Monte-Carlo estimators:

#### 2.2 Gaussian processes

Gaussian process regression models rely on the idea that designs that are *close* in the design space share similar performance values. This intuition is encoded in the covariance function $k : \mathbb {X} \times \mathbb {X} \mapsto \mathbb {R}$ defining a Gaussian process (GP). The covariance function needs to be positive semi-definite, such as the Gaussian or Matérn families. Here we use the Matérn-$5/2$ kernel, that writes: $k(\mathbf {x}, \mathbf {x}') = \sigma ^{2} \prod _{i=1}^{d} \left (1 + \frac {\sqrt {5}|x_i - x'_i|}{\theta _i} + \frac {5(x_i - x'_i)^{2}}{3\theta _i^{2}} \right ) \exp \left (- \frac {\sqrt {5}|x_i - x'_i|}{\theta _i} \right )$ with $\sigma ^{2}$ the process variance and $\theta _i > 0$ the length-scale for variable $i$. It corresponds to the assumption that the underlying unknown black-box is twice differentiable. From this zero-mean GP prior $Y(\mathbf {x})$, given observations $\mathbf {y}_n := (y_1, \dots , y_n)$ at a design of experiments $\mathbf {x}_{1:n} := (\mathbf {x}_1, \dots , \mathbf {x}_n)^{\top }$, often constructed via a Maximin Latin hypercube sample, the posterior distribution is still a GP, $Y(\mathbf {x})|\mathbf {x}_{1:n}, \mathbf {y}_n \sim \mathcal {N}(m_n(\mathbf {x}), s_n^{2}(\mathbf {x}))$ where the corresponding predictive mean and variance are:

#### 2.3 Bayesian optimization under input uncertainty

The principle of Bayesian optimization (BO) is to replace the expensive black-box by a probabilistic surrogate model,which is much cheaper to run [32]. This surrogate is used to select forthcoming evaluation parameter, relying on an acquisition function. Gaussian processes are the most common surrogate model, and there are a variety of acquisition functions available, for more details, see e.g., [33]. Here we solve (6), exploiting the fact that the numerical simulation can be evaluated exactly at $x$ to construct directly a surrogate $\tilde {f}$ of $f$. This $\tilde {f}$ is used instead of $f$ in (2.1) at the design points, enabling the fit of GPs for the mean and variance, $\tilde {\mu _F}$ and $\tilde {\sigma _F}$. Notice that by doing so, the error related to the Monte Carlo estimation and surrogate modeling is ignored. For some input noise distributions, such as the uniform or Gaussian ones, analytical approximations of such errors are given in [34,35]. For the resulting bi-objective optimization problem, we rely on the Expected Hypervolume Improvement (EHI) acquisition function [36]. Based on a current estimation of the Pareto front, it corresponds to the expectation over the GP prediction of the added volume to the current set of non-dominated solutions (i.e. solutions that belong to the Pareto front). The resulting procedure, starting with a Maximin Latin hypercube sample as design of experiments is implemented with the R packages `DiceKriging` [30] and `GPareto` [37]. The procedure is summarized in Algorithm 1.

## 3. Numerical results

In this section, we present a numerical illustration of our methodology outlined above. We consider one of the most popular metasurface configurations (beam deflector) and compare the performance in terms of robustness without and with the UQ analysis as stated in Eq. (1) and Eq. (6), respectively. The main goal of this section is to illustrate the essential and significant role of the UQ analysis in achieving robust design with respect to the fabrication uncertainties. In other words, we present the optimization results for both the deterministic case and the UQ optimization case and compare the robustness performance of the optimized designs with respect to the fabrication imperfections.

In Fig. 1(a), we study a phase gradient metasurface made of periodic repetition of deflecting unit cells composed of four cylindrical GaN nanopillars (green regions) placed over a semi-infinite substrate of Al$_{2}$O$_{3}$ represented by the yellow region. We consider a normal incident plane wave with an electric field polarized in the y-direction (from the substrate), and we aim at maximizing the diffraction efficiency of the first-order mode $\eta (0,-1)$ (i.e., by deflecting light in the $y-z$ plane, which is the plane of incidence) at the wavelength $\lambda =600$ nm. To circumvent the undesired diffraction inside the substrate, we consider a subwavelength period in the $x$-direction (300 nm) and we fix the unit cell deflecting period along $y$-direction as 1500 nm. Furthermore, the height of nanopillars is fixed at $h=1000$ nm during the optimization process. The considered configuration involves seven optimization parameters as depicted in Fig. 1(a). The distances between the four cylinders referred to as $\rho _{i}~ \forall i \in \{1,2,3\}$, as well as the diameters of the four cylinders $D_{i}~ \forall i \in \{1,2,3,4\}$. It is worth mentioning that the smallest feature size is fixed as $90$ nm and the maximum distance between ridges can reach $222$ nm. In this work, we consider only the uncertainty associated with the diameter values, i.e., the four parameters $D_{i}~ \forall i \in \{1,2,3,4\}$. Despite the fact that our methodology is general, to account for any other sort of noise, we consider here a uniform noise distribution. In addition, the incertitude on the diameter values is set to $\pm 12$ nm.

Fullwave simulations are based on a high order Discontinuous Galerkin Time-Domain (DGTD) solver from the DIOGENeS [38] software suite using high order polynomial adaptation [18,19,39].

In Fig. 1(b), we present a sketch of the optimization setup. The distances between the cylinders are not included in the figure as we are considering mainly the UQ on the diameter values. Figure 1(b) represents only a schematic view of the diameter space that will be considered according to the dimension of the parameter space (4D for four diameters). In our design, the diameter values are varying between the minimum feature size $D_{min}=90$ nm and the maximum diameter value $D_{max}=210$ nm (which is a sufficient value to achieve $2\pi$ phase shift as we demonstrated in Ref. [18]). The limitation are depicted by the blue borders in Fig. 1(b). To avoid optimization of nanostructure diameter values close to the borders and thus risking exceeding the limit (between $D_{min}$ and $D_{max}$) after adding the noise $\pm 12$ nm, we perform the optimization for the deterministic case (see Eq. (1)) in the white box, bounded between $D_{min}+\delta d=90+12=102$ nm and $D_{max}-\delta d=210-12=198$ nm (Fig. 1(b)). In other words, during the optimization iterations, the diameters for the deterministic case are allowed to vary only in the white box (see $X1_{Det}$ and $X2_{Det}$). After determining the optimized design (maximum deflection efficiency), a uniform distributed noise $\pm 12$ nm is added to the optimized diameters to study the robustness.

For the UQ optimization, our objective is different from the one defined in the deterministic case. In the UQ optimization, we seek an optimized design with high efficiency in the presence of fabrication errors, see Eq. (6). However, the deterministic case finds best optimized design (highest efficiency), while not considering the robustness, which is usually performed as a sequence step by perturbing the optimized diameters. For the UQ case, we perturb the diameters during the optimization iterations with a uniform noise between $\pm 12$ nm (see the grey boxes around $X1_{UQ}$ and $X2_{UQ}$), seeking for designs highest mean deflection efficiency and the smallest standard deviation with respect to the associated noise Eq. (6). In the UQ optimization, it is then allowed to have designs at the borders between the white and the blue regions in Fig. 1(b), with perturbation noise lying partially in the blue region (see $X_{UQ2}^{*}$ for instance).

#### 3.1 Deterministic case

For the deterministic case, the optimization process relies on EGO and no noise is applied to the diameters (see Eq. (1)). The grey region in Fig. 2(a) refers to the DoEs phase where we considered only 75 designs during the learning process. In the second phase, we performed 150 iterations where an optimized design achieves an efficiency of $85\%$ (in our optimization framework we minimize $1-\eta$, i.e., we maximize $\eta$). The next step is to check the robustness of the optimized design to the fabrication incertitude (for the diameter values only). For this purpose, we perturb the optimized diameters with 100 uniform noise between $\pm 12$ nm, the results are presented in Fig. 2(b). The black point at iteration 18 refers to the best design in Fig. 2(a) without noise. One can directly notice that the average efficiency deteriorates to $75\%$ ($1-\eta =0.25$) as denoted by the horizontal brown line. Moreover, the points are not located around the mean value, indicating that the variance of the noise is relatively high, the standard deviation (STD) is estimated as 0.075. These results indicate that the optimized design in the deterministic case is not robust to the given noise and the efficiency is highly sensitive to the diameter values.

#### 3.2 UQ optimization case

In the UQ optimization, we used the same DoEs as in the deterministic case, yet, since we are solving a bi-objective optimization problem (see Eq. (6)), two metamodels are created [28]: one for the mean and one for the standard deviation. These two metamodels are used to predict the next design to be simulated. In the optimization phase, we have performed 259 iterations, the results are presented in Fig. 3(a). The black points denote the objective values in the mean-STD space predicted by the metamodels. The yellow points represent the set of non-dominated designs (Pareto front). Along the Pareto front, we refer to the three designs $X^{*}_{UQ1}$, $X^{*}_{UQ2}$, and $X^{*}_{UQ3}$. The performance of the two metamodels are demonstrated in Figs. 3(b) and (c), respectively. In these figures, we present leave-one-out estimation of the exact values of the objective functions from the metamodels versus these observed values. Specifically, the models use all observations except one to predict this remaining value (without re-estimating the hyperparameters), see, e.g., [40] for the expressions and [41] for generalization to multiple-folds. In other words, when all the points lie along the straight line, it means that the prediction is accurate. As it can be seen, the metamodel for the mean shown in Fig. 3(c) is quite accurate compared to the metamodel for the standard deviation exhibited in Fig. 3(d). The blue, red, and green color vertical lines correspond to the values of the mean and the STD of the three points in Fig. 3(a), respectively. The prediction of the red point $X^{*}_{UQ2}$ is accurate for both the mean and the STD (the black points are matching the black line in Figs. 3(b) and (c) at the the corresponding values). Furthermore, for the blue, and green points, the prediction of the mean are accurate as depicted in Fig. 3(b). Nevertheless, for these points, the prediction of the STD, are not fully accurate as it is demonstrated in Fig. 3(c). In order to validate these results, we extracted the designs corresponding to the three color points $X^{*}_{UQ1}$, $X^{*}_{UQ2}$, and $X^{*}_{UQ3}$ and test the efficiency for 100 design with uniform noise on the dimater values (as Fig. 2(b)). The results are depicted in Figs. 3(d)-(f), respectively. As expected, the exact calculation of the mean matches exactly the predicted values from the metamodels, horizontal line in Figs. 3(d)-(f) and black points. In addition, the exact value of the STD for the red design is exactly the same as the predicted value (the two metamodels are accurate for this design). However, the STD predictions for the blue and green designs do not fit the exact values in Fig. 3(d) and Fig. 3(f) respectively. This was to be anticipated given that their STD metamodels are not fully accurate at their corresponding values.

The two metamodels are able to predict the standard deviation and the mean without performing additional simulations at each iteration. Moreover, we performed only 334 iterations, including the DoEs. In fact, without the metamodeling procedure here, one has to perform classical brute force search, simply by applying 100 uniform noise on 100 designs (for instance), which yields at least 10 000 simulations. Furthermore, it is not evident that we would converge to the global set of solutions. To summarize, we believe that the metamodeling methodology presented here is the most suitable approach for studying the UQ analysis, despite some error associated to the metamodeling. Yet, one has to find the best compromise between the number of simulations and the prediction.

Interestingly, the best compromise between the two objectives is given by the design $X^{*}_{UQ2}$. The seven optimized parameters for the deterministic case and for the design $X^{*}_{UQ2}$ are exhibited in Table 1. Actually, the results depicted in Fig. 3(e) yield that the average mean efficiency is approximately $73\%$ (1-$\eta$(mean)=0.27) for the design $X^{*}_{UQ2}$ when a 100 uniform noise distribution is applied to the diameter values. This mean value is very close to the one obtained for the deterministic design (see Fig. 3(b)). Nevertheless, in the UQ optimization, and unlike the deterministic case, all the 100 designs are allocated around the mean value, meaning that the standard deviation is remarkably small; in this case it is estimated as 0.03307. This value of standard deviation is two times smaller than the one obtained for the deterministic design (under the same noise). We have demonstrated that UQ optimization provides robust design with respect to the fabrication errors as it is illustrated in Fig. 4.

## 4. Conclusion

In this article, we have presented a novel optimization framework to account for the manufacturing errors of simple metasurface with classical cylindrical nanopillar geometry. Our procedure relies on a global statistical learning based optimization method that substitutes the resource intensive simulations with a surrogate model. Our numerical results reveal that incorporating the UQ analysis in the optimization scheme is crucial in achieving robust design. Our approach necessitates resolving a bi-objective optimization problem that accounts for the mean and the variance change of the efficiency under the given noise. With the UQ analysis, we obtained designs that are twice more robust than the analogous one in the deterministic case. Our discovery enriches the field of metasurface inverse design with solution to rigorously consider the manufacturing issues. Aside from the device reproducibility, the optimization of metasurface designs in the presence of small manufacturing errors is a necessary step to address one of the main cause of the performance deterioration between the simulation and the manufacturing phase.

## Funding

Agence Nationale de la Recherche (ANR-18-ASMA-0006-01).

## Acknowledgments

The authors acknowledge support from French defence procurement agency under the ANR ASTRID Maturation program, grant agreement number ANR-18-ASMA-0006-01.

## Disclosures

The authors declare no conflicts of interest.

## Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

## References

**1. **S. Astilean, P. Lalanne, P. Chavel, E. Cambril, and H. Launois, “High-efficiency subwavelength diffractive element patterned in a high-refractive-index material for 633 nm,” Opt. Lett. **23**(7), 552–554 (1998). [CrossRef]

**2. **P. Lalanne, S. Astilean, P. Chavel, E. Cambril, and H. Launois, “Design and fabrication of blazed binary diffractive elements with sampling periods smaller than the structural cutoff,” J. Opt. Soc. Am. A **16**(5), 1143–1156 (1999). [CrossRef]

**3. **N. Yu, P. Genevet, M. A. Kats, F. Aieta, J.-P. Tetienne, F. Capasso, and Z. Gaburro, “Light propagation with phase discontinuities: Generalized laws of reflection and refraction,” Science **334**(6054), 333–337 (2011). [CrossRef]

**4. **S. Sun, Q. He, S. Xiao, Q. Xu, X. Li, and L. Zhou, “Gradient-index meta-surfaces as a bridge linking propagating waves and surface waves,” Nat. Mater. **11**(5), 426–431 (2012). [CrossRef]

**5. **D. Lin, P. Fan, E. Hasman, and M. L. Brongersma, “Dielectric gradient metasurface optical elements,” Science **345**(6194), 298–302 (2014). [CrossRef]

**6. **M.-S. L. Lee, P. Lalanne, J. Rodier, P. Chavel, E. Cambril, and Y. Chen, “Imaging with blazed-binary diffractive elements,” J. Opt. A: Pure Appl. Opt. **4**(5), S119–S124 (2002). [CrossRef]

**7. **A. Y. Zhu, A. I. Kuznetsov, B. Luk’yanchuk, N. Engheta, and P. Genevet, “Traditional and emerging materials for optical metasurfaces,” Nanophotonics **6**(2), 452–471 (2017). [CrossRef]

**8. **P. Genevet, F. Capasso, F. Aieta, M. Khorasaninejad, and R. Devlin, “Recent advances in planar optics: from plasmonic to dielectric metasurfaces,” Optica **4**(1), 139–152 (2017). [CrossRef]

**9. **D. R. Smith, W. J. Padilla, D. C. Vier, S. C. Nemat-Nasser, and S. Schultz, “Composite medium with simultaneously negative permeability and permittivity,” Phys. Rev. Lett. **84**(18), 4184–4187 (2000). [CrossRef]

**10. **A. Salandrino and N. Engheta, “Far-field subdiffraction optical microscopy using metamaterial crystals: Theory and simulations,” Phys. Rev. B **74**(7), 075103 (2006). [CrossRef]

**11. **S. Wang, P. C. Wu, V.-C. Su, Y.-C. Lai, M.-K. Chen, H. Y. Kuo, B. H. Chen, Y. H. Chen, T.-T. Huang, J.-H. Wang, R.-M. Lin, C.-H. Kuan, T. Li, Z. Wang, S. Zhu, and D. P. Tsai, “A broadband achromatic metalens in the visible,” Nat. Nanotechnol. **13**(3), 227–232 (2018). [CrossRef]

**12. **R. J. Lin, V.-C. Su, S. Wang, M. K. Chen, T. L. Chung, Y. H. Chen, H. Y. Kuo, J.-W. Chen, J. Chen, Y.-T. Huang, J.-H. Wang, C. H. Chu, P. C. Wu, T. Li, Z. Wang, S. Zhu, and D. P. Tsai, “Achromatic metalens array for full-colour light-field imaging,” Nat. Nanotechnol. **14**(3), 227–231 (2019). [CrossRef]

**13. **T. Phan, D. Sell, E. W. Wang, S. Doshay, K. Edee, J. Yang, and J. A. Fan, “High-efficiency, large-area, topology-optimized metasurfaces,” Light: Sci. Appl. **8**(1), 48 (2019). [CrossRef]

**14. **J. Yang and J. A. Fan, “Topology-optimized metasurfaces: impact of initial geometric layout,” Opt. Lett. **42**(16), 3161 (2017). [CrossRef]

**15. **C. Forestiere, Y. He, R. Wang, R. M. Kirby, and L. Dal Negro, “Inverse design of metal nanoparticles’ morphology,” ACS Photonics **3**(1), 68–78 (2016). [CrossRef]

**16. **V. Egorov, M. Eitan, and J. Scheuer, “Genetically optimized all-dielectric metasurfaces,” Opt. Express **25**(3), 2583–2593 (2017). [CrossRef]

**17. **P. Forster, V. Ramaswamy, P. Artaxo, T. Bernsten, R. Betts, D. Fahey, J. Haywood, J. Lean, D. Lowe, G. Myhre, J. Nganga, R. Prinn, G. Raga, M. Schulz, and R. V. Dorland, “Changes in atmospheric consituents and in radiative forcing,” in “Climate Change 2007: The Physical Science Basis. Contribution of Working Group 1 to the Fourth assesment report of Intergovernmental Panel on Climate Change,”, S. Solomon, D. Qin, M. Manning, Z. Chen, M. Marquis, K. B. Averyt, M. Tignor, and H. L. Miler, eds. (Cambridge University, 2007).

**18. **M. M. Elsawy, S. Lanteri, R. Duvigneau, G. Brière, M. S. Mohamed, and P. Genevet, “Global optimization of metasurface designs using statistical learning methods,” Sci. Rep. **9**(1), 17918 (2019). [CrossRef]

**19. **M. M. R. Elsawy, K. Hassan, S. Boutami, and S. Lanteri, “Bayesian optimization and rigorous modelling of a highly efficient 3d metamaterial mode converter,” OSA Continuum **3**(6), 1721–1729 (2020). [CrossRef]

**20. **E. W. Wang, D. Sell, T. Phan, and J. A. Fan, “Robust design of topology-optimized metasurfaces,” Opt. Mater. Express **9**(2), 469–482 (2019). [CrossRef]

**21. **J. Jung, “Robust design of plasmonic waveguide using gradient index and multiobjective optimization,” IEEE Photonics Technol. Lett. **28**(7), 756–758 (2016). [CrossRef]

**22. **A. Y. Piggott, J. Lu, K. G. Lagoudakis, J. Petykiewicz, T. M. Babinec, and J. Vučković, “Inverse design and demonstration of a compact and broadband on-chip wavelength demultiplexer,” Nat. Photonics **9**(6), 374–377 (2015). [CrossRef]

**23. **Z. Ren, M.-T. Pham, and C. S. Koh, “Robust global optimization of electromagnetic devices with uncertain design parameters: comparison of the worst case optimization methods and multiobjective optimization approach using gradient index,” IEEE Trans. Magn. **49**(2), 851–859 (2013). [CrossRef]

**24. **N. Schmitt, N. Georg, G. Brière, D. Loukrezis, S. Héron, S. Lanteri, C. Klitis, M. Sorel, U. Römer, H. D. Gersem, S. Vézian, and P. Genevet, “Optimization and uncertainty quantification of gradient index metasurfaces,” Opt. Mater. Express **9**(2), 892–910 (2019). [CrossRef]

**25. **S. S. Panda, H. S. Vyas, and R. S. Hegde, “Robust inverse design of all-dielectric metasurface transmission-mode color filters,” Opt. Mater. Express **10**(12), 3145–3159 (2020). [CrossRef]

**26. **M. M. Elsawy, S. Lanteri, R. Duvigneau, J. A. Fan, and P. Genevet, “Numerical optimization methods for metasurfaces,” Laser Photonics Rev. **14**(10), 1900445 (2020). [CrossRef]

**27. **H. Beyer and B. BSendhoff, “Robust optimization: a comprehensive survey,” Comput. Methods Appl. Mech. Eng. **196**(33-34), 3190–3218 (2007). [CrossRef]

**28. **M. M. R. Elsawy, A. Gourdin, M. Binois, R. Duvigneau, D. Felbacq, S. Khadir, P. Genevet, and S. Lanteri, “Multiobjective statistical learning optimization of RGB metalens,” ACS Photonics **8**(8), 2498–2508 (2021). [CrossRef]

**29. **C. E. Rasmussen and C. Williams, * Gaussian Processes for Machine Learning* (MIT, 2006).

**30. **O. Roustant, D. Ginsbourger, and Y. Deville, “DiceKriging, DiceOptim: Two R packages for the analysis of computer experiments by kriging-based metamodeling and optimization,” J. Stat. Soft. **51**(1), 1–55 (2012). [CrossRef]

**31. **R. B. Gramacy, * Surrogates: Gaussian Process Modeling, Design, and Optimization for the Applied Sciences* (CRC, 2020).

**32. **D. Jones, “Efficient global optimization of expensive black-box functions,” J. Glob. Optim. **13**(4), 455–492 (1998). [CrossRef]

**33. **B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. De Freitas, “Taking the human out of the loop: A review of bayesian optimization,” Proc. IEEE **104**(1), 148–175 (2016). [CrossRef]

**34. **A. Girard, C. Rasmussen, J. Q. Candela, and R. Murray-Smith, “Gaussian process priors with uncertain inputs application to multiple-step ahead time series forecasting,” Advances in Neural Information Processing Systems **15**, 545–552 (2002).

**35. **A. McHutchon and C. Rasmussen, “Gaussian process training with input noise,” Advances in Neural Information Processing Systems **24**, 1341–1349 (2011).

**36. **M. T. Emmerich, A. H. Deutz, and J. W. Klinkenberg, “Hypervolume-based expected improvement: Monotonicity properties and exact computation,” in 2011 IEEE Congress on Evolutionary Computation (CEC) (IEEE, 2011), pp. 2147–2154.

**37. **M. Binois and V. Picheny, “GPareto: An R package for gaussian-process-based multi-objective optimization and analysis,” J. Stat. Soft. **89**(8), 1–30 (2019). [CrossRef]

**38. ***Diogenes: A Discontinuous-Galerkin based software suite for nano-optics*. https://diogenes.inria.fr/.

**39. **J. Viquerat, “Simulation of electromagnetic waves propagation in nano-optics with a high-order discontinuous Galerkin time-domain method,” Ph.D. thesis, University of Nice-Sophia Antipolis (2015).

**40. **O. Dubrule, “Cross validation of kriging in a unique neighborhood,” J. Int. Assoc. Math. Geol. **15**(6), 687–699 (1983). [CrossRef]

**41. **D. Ginsbourger and C. Schärer, “Fast calculation of gaussian process multiple-fold cross-validation residuals and their covariances,” arXiv preprint arXiv:2101.03108 (2021).