## Abstract

Metasurfaces are an emerging technology that may supplant many of the conventional optics found in imaging devices, displays, and precision scientific instruments. Here, we develop a method for designing optical systems composed of multiple unique metasurfaces aligned in sequence and separated by distances much larger than the design wavelengths. Our approach is based on computational inverse design, also known as the adjoint-gradient method. This technique enables thousands or millions of independent design variables (e.g., the shapes of individual meta-atoms) to be optimized in parallel, with little or no intervention required by the user. The assumptions underlying our method are as follows: we use the local periodic approximation to determine the phase-response of a given meta-atom, we use the scalar wave approximation to propagate light fields between metasurface layers, and we do not consider multiple reflections between metasurface layers (analogous to a sequential-optics ray-tracer). To demonstrate the broad applicability of our method, we use it to design an achromatic doublet metasurface lens, a spectrally-multiplexed holographic element, and an ultra-compact optical neural network for classifying handwritten digits.

© 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

Optical metasurfaces are composite structures that utilize nanoscale patterning to achieve properties not found in nature [1,2]. By tailoring the dimensions of nanoscale scattering elements (meta-atoms) arranged in a periodic lattice, it is possible to precisely shape the wavefronts (phase) of incident light (see Fig. 1(a) for a representative image of a metasurface). Using metasurfaces based on dielectric meta-atoms, high numerical aperture lenses have been demonstrated with high transmission and focusing efficiency, as well as holograms, beam steerers, and gratings [3–8]. While glass optics bend light using macroscale curvature, metasurfaces feature a nearly-flat form factor, and are not subject to the same geometric design constraints. Metasurfaces have the potential to streamline complex optical assemblies, reducing their bulkiness, weight, and cost, and may be used to achieve fundamentally novel functionalities that cannot be realized by conventional means [9–11].

The design of high-performance metasurface optics is an open problem that spans the fields of computational electromagnetics, optical engineering, and numerical optimization. While single-wavelength metasurfaces may be specified using ad-hoc approximations and parameter sweeps over a limited set of meta-atom geometries, the design of polychromatic and multi-functional devices is not as clear-cut. To address these issues, we propose a method for designing optical systems composed of multiple unique metasurfaces cascaded along the optical propagation axis [12,13]. Our method utilizes simple, polarization-independent meta-atoms as the basic building-blocks for our design. Yet, we discover capabilities that would not be possible using “singlet” metasurfaces composed of just one diffracting element (and the same library of meta-atom geometries). Our approach relies on a powerful optimization technique called computational inverse design [14,15], also known as the adjoint gradient method (furthermore, in the field of machine learning, the method is similar to that of *backpropagation*, which is often used for training neural networks). Inverse design provides a means of calculating the gradients of a cost function with respect to *all* design variables by solving a so-called “adjoint problem.” The adjoint problem has similar computational complexity to a single solution of the “forward problem,” which is used to evaluate the cost function at a given design iteration. The advantage of inverse design is the fact that the computational complexity of evaluating the gradients depends only on that of the forward problem, even as the number of design variables increases. Hence, many parameters may be optimized in parallel. Computational inverse design has found applications within a multitude of engineering disciplines [16,17]. The technique has been used to design electromagnetic and optical devices, and revolutionary silicon photonics components [18–28]. Recently, researchers have recognized the utility of inverse-design-based approaches for metasurface optics [29–31], among other methods [32]. Such work has led to designs for polychromatic and high-incidence-angle metasurface lenses and gratings [29,33–35], and has facilitated the exploration of novel fabrication platforms, such as arrangements of Mie-scattering dielectric spheres [36]. Notably, recent works have considered the optimization of metasurfaces composed of multilayer meta-atoms [37–39]. In such designs, differently-shaped dielectric scatterers are stacked on top of each other within a subwavelength volume (or volumes of a few wavelengths to avoid near-field coupling between layers [40]), in order to achieve superior performance to single-layer meta-atoms. In contrast, this paper considers only simple meta-atom geometries (rectangular titanium dioxide (TiO_{2}) nanopillars of different widths), and instead leverages the interplay between multiple metasurfaces spaced distances $\gg \lambda $ apart to realize capabilities not possible using singlet metasurfaces. While the adjoint-gradient approach is well known and has been in use for some time, this work is primarily concerned with applying the method to the optimization of devices containing multiple flat optical elements. Accordingly, this work should be viewed as complementary to previous approaches, in that the method described here may be applied to the large-scale optimization of an entire optical system, while previously reported nanoscale topology-optimization methods may concurrently be used to significantly improve the geometries of the individual meta-atoms that compose a particular metasurface.

Our inverse design framework relies heavily upon Fourier optics, and the underlying mathematics are very similar to the analytical gradient calculations used in some phase-retrieval algorithms [41–43]. Here we are primarily interested in designing novel optical systems, as opposed to characterizing optical aberrations from a diverse set of images. Secondly, we demonstrate how the optimization framework may be extended to enable computationally-efficient Fourier optics simulations to be combined with finite-difference-time-domain (FDTD) parameter sweeps to directly optimize a complex optical system with respect to individual meta-atom geometries. Finally, although this paper is primarily concerned with metasurface design, the methods may be applied to optical systems containing multiple air-spaced diffractive optics, photolithographically fabricated phase masks, or spatial light modulators.

This paper is organized as follows: In Section 2 we briefly summarize other approaches to metasurface design and introduce a distinction between what we term the *macroscale* and *nanoscale* design problems. In Section 3 we develop the computational inverse design framework. In Section 4 we apply our method to a range of applications: We design an achromatic doublet lens for use in the visible spectrum, a spectrally multiplexed holographic element, and a compact optical neural network for classifying handwritten digits. In Section 5, we discuss future extensions and generalizations of the method. Finally, in Section 6, we analyze the validity of the approximations used by our optical propagation model.

## 2. Relation to previous work

A key difficulty encountered in metasurface design problems is their inherently multiscale nature: It is desirable to build devices with macroscale dimensions (from 100s of µm up to cm in size), but it is also necessary to precisely tailor the geometries of nanoscale structures (individual meta-atoms) in order to achieve the intended function of the macroscale device. Rigorously modeling an entire metasurface, using e. g. FDTD methods, simultaneously over the nanoscale and macroscale length-scales is computationally prohibitive, due to the narrow simulation mesh sizes required, and formidable memory requirements (however there have been some notable examples of full metalens FDTD simulations carried out using parallel computing [39]). The approach described in this paper provides a relatively simple means of designing metasurface optics, without requiring rigorous FDTD simulations over the entire design volume. It is often customary to divide the task of designing a metasurface into what we term a *macroscale* problem and a *nanoscale* problem. In the macroscale problem, some desired output to the overall metasurface device specified, usually in the form of a desired wavefront or phase function. For example, in the case of a singlet metasurface lens, a spherically converging phase would be specified, or in the case of a beam deflector, a linear phase ramp would be required. In the nanoscale problem, the individual meta-atom geometries must be chosen to realize the desired output at each spatial location throughout the metasurface. This step is normally performed by simulating a library of meta-atom geometries (using e. g. an FDTD parameter sweep with periodic boundary conditions), and recording output parameters of interest such as phase and transmission into a lookup-table [5,44–47]. This approach is often known as the local periodic approximation: The underlying assumption is that adjacent meta-atoms will have similar geometries (See Section 6.2 for some caveats to this strategy). Since each simulation used to construct the lookup-table is performed over the dimensions of just a single meta-atom, a complete parameter-sweep over every design variable is computationally tractable when the number of design variables per meta-atom is small (otherwise the method would suffer from the curse of dimensionality). A prescription for a complete metasurface optic can subsequently be specified by choosing meta-atoms from the lookup table that most closely match the desired output at all lattice positions across the metasurface aperture. Other approaches for metasurface design have involved the use of the Pancharatnam-Berry (geometric) phase [6,7,48–50], asymmetric polarization-dependent meta-atom geometries [6,51–53], Huygens’ surfaces [54–56], and libraries of complex meta-atom shapes used to achieve unique dispersion properties [57]. For simplicity and to avoid the curse of dimensionality, in this paper, we focus on simple, polarization-independent meta-atoms (square-shaped nanostructures of varying widths). However, in Section 5, we discuss how our method could be potentially extended to consider more complex meta-atom shapes, possibly improving overall design performance.

The above described approaches to metasurface design have been used in many contexts and are highly effective when the solution to the macroscale problem is intuitive and can be easily specified. This is the case when designing singlet metasurface lenses, since the phase function associated with an ideal thin lens is already known. Challenges arise, however, when the solution to the macroscale problem is not obvious. If an optical system is composed of multiple cascaded metasurfaces, it is not clear what metasurface phase should be specified at each layer of the optical system to achieve ideal performance. This problem becomes especially difficult when polychromatic functionality is desired – in this case, different desired phases must be selected for different wavelengths.

Recently, methods for designing two-layer, single-wavelength metasurfaces (devices composed of a metasurface patterned on both sides of a single substrate), and combined refractive-diffractive optics have been developed which utilize ray-tracing software such as Zemax Optics Studio [58], to determine the phase pattern required for the metasurface components. This approach has led to the development of wide field-of-view doublet metasurfaces [59,60], and achromatically corrected microscope objective lenses [61]. While this general workflow is appropriate for designing many useful optical components, it has the following shortcomings: Firstly, reliance on traditional ray-tracing software requires specifying the metasurface phase with smoothly varying basis functions, such as the Zernike polynomials. Potentially advantageous designs involving discontinuous phase functions are not considered. Secondly, polychromatic designs are challenging, since the achieved phase at each desired wavelength will be constrained by the nanoscale problem. One cannot independently solve the macroscale problem at each design wavelength, and then hope that a meta-atom geometry exists that achieves the correct phase at *all* desired wavelengths. Thirdly, while conventional lens design software is well suited for developing focusing optics and high-performance imaging systems, one may desire a metasurface to perform highly unusual functions where light-focusing may not be the intended effect—such as projecting unique intensity distributions at different wavelengths [62,63], or acting as a layer of an optical neural network [64,65]. Finally, for optical systems composed of many unique metasurface layers, we do not wish for the computational complexity of the underlying design problem to scale with the increasing number of design variables. We address each of these issues with our proposed inverse design method. Firstly, our method permits all design variables (the shapes of all meta-atoms within all metasurface layers of an optical system) to be computationally optimized independent of each other. Specification of a limited set of basis functions such as the Zernike polynomials is not required. Secondly, each iteration of our optimization method considers both the nanoscale and macroscale problem in tandem, so there is no risk of arriving at a solution to the macroscale problem, that cannot be realized due to the physical constraints of the nanoscale problem. This feature facilitates the design of polychromatic (and achromatic) devices. Thirdly, our design framework permits a user to specify an arbitrary desired output intensity distribution. Finally, as is the case with all inverse design methods, additional forward problem simulations are not required as the number of design variables increases. This feature permits thousands or millions of parameters to be optimized in parallel.

## 3. Mathematical framework

In this section, we develop the mathematical framework that underlies our approach to computational inverse design. We will first describe how a user of this framework can specify a cost function in terms of the squared errors between a desired set of output intensity distributions, and realized set of intensity distributions at a given design iteration. This cost function must then be iteratively minimized using the method of steepest-descent. In order to perform this optimization, it will be necessary to use a computationally efficient means of computing the gradient with respect to all of the design parameters (the individual meta-atom geometries). The gradient calculation is performed by solving a so-called adjoint problem within the context of a Fourier optics simulation and combining this result with an FDTD parameter sweep.

#### 3.1 Optimization problem description

Figure 1(b) depicts the types of optical systems for which our design method is readily suited. We use a vector ${{\boldsymbol E}^{in}}$ to denote the electric field associated with light injected at an input plane of the optical system. Here, ${{\boldsymbol E}^{in}}$ is an $S \times 1$-dimensional (complex) vector containing discrete samples of the continuous input electric field distribution. In general, ${{\boldsymbol E}^{in}}$ may be a vectorization of the sampled 2D input plane for full 3D design problem, or 1D input plane for a 2D problem. The input field subsequently propagates through multiple layers of thin substrates coated with metasurfaces. At the plane of the *m*’th metasurface, the incident electric field accumulates a spatially-varying phase-change determined by the nanoscale geometric properties of the metasurface, which are denoted by the vector ${{\boldsymbol w}^m}$. Each metasurface consists of $S$ unique meta-atoms that can be independently parameterized by an arbitrary number of variables. For simplicity, here we only consider simple geometries (Fig. 1(c)), in which each meta-atom is a rectangular post of fixed height, and variable width (*w*). Hence, ${{\boldsymbol w}^m}$ will also be an $S \times 1$-dimensional (real) vector containing the widths of the individual meta-atoms. (Our method is applicable to arbitrarily complex meta-atom geometries, by increasing the dimensionality of ${{\boldsymbol w}^m}$ to store additional design parameters per individual meta-atom.) At the image plane of the optical system, the resulting electric field ${{\boldsymbol E}^{out}}$ will be a function of the parameters $\{{{{\boldsymbol w}^1},{{\boldsymbol w}^2}, \cdots ,{{\boldsymbol w}^M}} \}$ at each of the *M* metasurfaces. The output intensity at the *s*’th sample position within the image plane is computed as ${{\boldsymbol I}_s} = {\boldsymbol E}_s^{out\ast }{\boldsymbol E}_s^{out}$. Subscripts denote individual elements of a given vector quantity. Our overarching goal will be to minimize the squared error between the realized intensity ${\boldsymbol I}$, and a user-defined *desired* intensity distribution ${{\boldsymbol I}^{des}}$ at each of the *S* sample positions. Practically, the optical system should be designed to operate for different input light wavelengths $(\lambda )$ as well as different incident field distributions $(f )$ for each wavelength. Hence, the user may define multiple desired output intensity distributions as a function of wavelength and/or input fields. Given these specifications, we may formulate a cost-function *C* that we hope to minimize, and an accompanying optimization problem:

*J*input wavelengths,

*K*input field distributions at a given wavelength, and

*S*sample points within the image plane. In our formulation of the optimization problem, we account for meta-atom geometric constraints $\{{{w^{min}},{w^{max}}} \}$ which respectively specify the minimum and maximum width of a given meta-atom. In order to minimize

*C*, one must iteratively adjust the design parameters $\{{{{\boldsymbol w}^1},{{\boldsymbol w}^2}, \cdots ,{{\boldsymbol w}^M}} \}$. It is desirable that this iterative optimization be performed using the steepest-descent algorithm [66]. At each step of the optimization, the design parameters are updated using the following rule:

*C*. To estimate the gradient using this approach would entail a minimum of $M \times S + 1$ evaluations of $C$.

In the following sections, we will show how ${{dC} \mathord{\left/ {\vphantom {{dC} {d{{\boldsymbol w}^m}}}} \right.} {d{{\boldsymbol w}^m}}}$ may be computed in an efficient manner by solving an adjoint problem. This will enable steepest-descent optimization to be utilized for challenging design tasks. First, we will describe our forward problem, which is used to determine how light propagates through the optical system, and evaluate *C* at a given design iteration. We will next derive an adjoint problem that has similar computational complexity to a single evaluation of *C*, but permits us to analytically evaluate the gradients with respect to all ${{\boldsymbol w}^m}$. Use of the adjoint-gradient method reduces the computational requirements of a single steepest-descent iteration from $M \times S + 1$ forward simulations (using finite differences) to just one forward simulation, and one adjoint simulation.

#### 3.2 Forward problem formulation

Our forward propagation model is a discretized version of the angular spectrum wave
propagator described in [67].
Briefly, the sampled electric field at the output of the
*m*’th metasurface plane of the optical system
is decomposed into a superposition of plane wave components using the
discretized Fourier transform (this is practically accomplished using
the Fast Fourier Transform Algorithm). Each plane wave (Fourier)
component is multiplied by an appropriate phase factor that accounts
for the phase accumulated as the wave propagates to the next surface
within the optical system. An inverse Fourier transform operation is
then performed to determine the resulting electric field. This
propagation model permits large (∼mm-scale) optical systems to
be simulated efficiently, however a shortcoming of this approach is
that multiple Fresnel-like reflections off of intermediate layers of
the optical system are not modeled, leading to potential
under-estimates of spurious background intensity at the image plane
(see the discussion in Section 6.1). Additionally, the Fourier optics framework does not
capture near-field effects occurring at the metasurfaces, or between
adjacent meta-atoms. At each metasurface plane, the incident electric
field is multiplied by a diagonal matrix ${{\boldsymbol \varPhi
}^{{{\boldsymbol w}^m},{\lambda _j}}}$ that contains complex exponentials
associated with the phase delays induced by the choice of design
variables ${{\boldsymbol
w}^m}$ at input wavelength ${\lambda _j}$. We additionally find it helpful to
define another diagonal matrix ${{\boldsymbol \phi
}^{{{\boldsymbol w}^m},{\lambda _j}}}$ that contains the (real-valued) phase
factors associated with ${{\boldsymbol
w}^m}$. That is, ${{\boldsymbol \varPhi
}^{{{\boldsymbol w}^m},{\lambda _j}}} = \exp ({i{{\boldsymbol \phi
}^{{{\boldsymbol w}^m},{\lambda_j}}}} )$ for all diagonal entries of ${{\boldsymbol \varPhi
}^{{{\boldsymbol w}^m},{\lambda _j}}}$, and zero otherwise. The components
of ${{\boldsymbol \varPhi
}^{{{\boldsymbol w}^m},{\lambda _j}}}$ and ${{\boldsymbol \phi
}^{{{\boldsymbol w}^m},{\lambda _j}}}$ may be readily determined from a
lookup table such as the one shown in Fig. 1(d). For all of the design problems discussed
in this paper, this lookup table was generated by performing a
parameter sweep over titanium dioxide (TiO_{2}) nanopillars of
different widths, a constant height of 600 nm, and a meta-atom pitch
of 400 nm using the commercial software package Lumerical-FDTD [68], and measuring the output phase
for all design wavelengths of interest. In each FDTD simulation,
perfectly matched layer boundary conditions were used along the
z-axis, and periodic boundary conditions were used along the x- and
y-axis. We may express the forward propagation model for wavelength ${\lambda _j}$ and input field distribution ${f_k}$ as the following series of
matrix-vector multiplications:

*m*’th plane to the (

*m*+ 1)’th plane in the optical system, and ${n_m}$ is the refractive index of the propagation medium (either air ${n_m} = 1$ or glass ${n_m} = 1.5$). ${\nu _\xi }$ is the spatial frequency corresponding to the $\xi $’th entry of the Fourier transformed electric field.

From Eq. (3), it is straightforward to evaluate the output field ${{\boldsymbol E}^{out,{\lambda _j},{f_k}}}$ from a given input field ${{\boldsymbol E}^{in,{\lambda _j},{f_k}}}$. Once the output field is known, the output intensity and cost function *C* may be evaluated. We note a couple differences between our formulation of the forward propagation model and other approaches [67]: First, we have chosen to use the angular spectrum wave propagator over the more familiar and computationally efficient Fresnel diffraction integral. The Fresnel diffraction integral requires only a single Fourier transform operation to propagate the electric field from one surface to the next, while our approach requires both a Fourier transform and inverse Fourier transform. However, Fresnel propagation assumes only paraxial wavefronts, while the angular spectrum wave propagator requires no such approximation (our approach is still a scalar-wave approximation and does not capture polarization effects). Secondly, previous work has utilized the Rayleigh-Sommerfeld diffraction integral to design multilayer optical systems in the context of training optical neural networks [64]. However, our approach enables the fast Fourier transform algorithm to be leveraged, amounting to significant computational savings–${\rm O}({MS\log (S )} )$ operations required to forward propagate the electric field, as opposed to ${\rm O}({M{S^2}} )$ operations for an arbitrary series of *M* matrix multiplications.

We point out some of the approximations that underlie our forward problem formulation: First, we assume that the output phase associated with a given meta-atom depends only on its own geometry, and is unaffected by its neighbors. We justify this approximation by reasoning that the optimal metasurface will be locally periodic over most regions. That is, the optimal metasurface geometry is usually not expected to change drastically as one moves a small number of lattice positions. In Section 6.2, we examine worst-case scenarios in which appreciable errors are incurred under the local periodic approximation. Secondly, we assume uniform transmission for all metasurface geometries. That is, our optimization framework assumes that a change in meta-atom geometry will affect output phase, but not output intensity. Adjoint problem formations may be derived that consider both the effects of phase and intensity, but for simplicity and ease of mathematical calculations, we will restrict our optimization to consider only changes in phase. Once an optimized metasurface has been obtained, non-uniform transmission effects may be straightforwardly incorporated from FDTD simulations to calculate parameters such as focusing efficiency (see Section 3.1).

#### 3.3 Adjoint problem derivation and fast calculation of gradients

Given a forward model for efficiently calculating *C*, we now specify a method for computing the gradients ${{dC} \mathord{\left/ {\vphantom {{dC} {d{{\boldsymbol w}^m}}}} \right.} {d{{\boldsymbol w}^m}}}$ using the adjoint-gradient approach. While the general use of adjoint problems is well known, this specific application to our forward propagation model is novel. We begin by dividing the gradient computation into a nanoscale problem that is computationally easy (when considering only simple metasurface structures), and a macroscale problem that is more computationally challenging. We may use the chain rule to express the gradient of $C$ as:

We begin by considering the (scalar) derivative ${{dC} \mathord{\left/ {\vphantom {{dC} {d{ \mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over {\boldsymbol{\phi}} } }}}} \right. } {d{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over {\boldsymbol{\phi}} } }}}_{s^{\prime}}^{{{\boldsymbol w}^m},{\lambda _j}}$ at a single spatial location *s’* on the *m*’th metasurface. We again make use of the chain rule:

*m*’th metasurface of the optical system. That is:

*adjoint field*as the the vector:

*entire*gradient ${{dC} \mathord{\left/ {\vphantom {{dC} {d{{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over {\boldsymbol{\phi}} } }}^{{{\boldsymbol w}^m},{\lambda_j}}}}}} \right. } {d{{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over {\boldsymbol{\phi}} } }}^{{{\boldsymbol w}^m},{\lambda _j}}}}}$: We recognize that the matrix ${{\boldsymbol \Delta }^{s^{\prime}}}$ effectively ‘picks out’ one element of a vector, and then multiplies that element by an additional factor of

*i*. We therefore conclude:

*backwards*through the optical system. By performing element-wise multiplication of the intermediate (forward-propagated) fields with the (backward-propagated) adjoint fields at each metasurface plane, ${{dC} \mathord{\left/ {\vphantom {{dC} {d{{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over {\boldsymbol{\phi}} } }}^{{{\boldsymbol w}^m},{\lambda_j}}}}}} \right. } {d{{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over {\boldsymbol{\phi}} } }}^{{{\boldsymbol w}^m},{\lambda _j}}}}}$ is evaluated. This quantity is then element-wise multiplied by ${{d{{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over {\boldsymbol{\phi}} } }}^{{{\boldsymbol w}^m},{\lambda _j}}}} \mathord{\left/ {\vphantom {{d{{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over {\boldsymbol{\phi}} } }}^{{{\boldsymbol w}^m},{\lambda_j}}}} {d{{\boldsymbol w}^m}}}} \right. } {d{{\boldsymbol w}^m}}}$ to determine ${{dC} \mathord{\left/ {\vphantom {{dC} {d{{\boldsymbol w}^m}}}} \right.} {d{{\boldsymbol w}^m}}}$. It is straightforward to appreciate the computational advantage of using this approach. Instead of performing a series of completely separate forward propagations to evaluate all of the ${{dC} \mathord{\left/ {\vphantom {{dC} {d{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over {\boldsymbol{\phi}} } }_{s^{\prime}}^{{{\boldsymbol w}^m},{\lambda_j}}}}} \right. } {d{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over {\boldsymbol{\phi}} } }_{s^{\prime}}^{{{\boldsymbol w}^m},{\lambda _j}}}}$, as would be required by Eq. (10), we backward-propagate a single adjoint field, then perform a series of element-wise multiplications to determine the entries in the vector ${{dC} \mathord{\left/ {\vphantom {{dC} {d{{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over {\boldsymbol{\phi}} } }}^{{{\boldsymbol w}^m},{\lambda_j}}}}}} \right. } {d{{{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over {\boldsymbol{\phi}} } }}^{{{\boldsymbol w}^m},{\lambda _j}}}}}$ (Eq. (13)). If we wished to design an optical system to operate at $J$ wavelengths and $K$ input fields at each unique wavelength, then a gradient calculation would require $J \times K$ forward propagations of the input fields, and $J \times K$ backward propagations (of similar computational complexity) of the adjoint fields. However, the required number of forward/backward propagations required would remain the same, regardless of the number of design variables used. In the following section, we will demonstrate the utility of this inverse design method, by using it to design optical devices for broadband imaging, spectral multiplexing, and optical neural networks.

## 4. Computational inverse design examples

In this section, we present three examples of novel optical systems designed using computational inverse design. All of the designs presented here utilize the same square TiO_{2} on glass meta-atom library, and the FDTD results shown in Fig. 1(d). We first design an achromatic doublet metasurface lens or “metalens” that consists of two metasurfaces alternately patterned on the front and back side of a single glass substrate. Our design has a 160 nm bandwidth across the visible spectrum, an aperture diameter of 800 µm and a back focal length of 2 mm. The numerical aperture and diameter are superior to previous (experimentally demonstrated) achromatic devices. Furthermore, as our design does not utilize the geometric phase, or depend upon input light of a particular polarization state, focusing and transmission efficiency will be uniform for light of any input polarization. In our second example, we design a spectrally-multiplexed holographic device that produces a different output intensity distribution depending upon the wavelength of input light. We specify a five-color design for wavelengths each separated by 40 nm. Finally, due to the similarity between our computational inverse design method and the backpropagation algorithm used for neural networks, we use our approach to “train” an optical neural network composed of five glass substrates each patterned on either side with a unique metasurface. Our design may be used to classify handwritten digits projected onto the front of the device. Our proposed design can be trained in a computationally efficient manner, is suitable for visible wavelengths, and features a compact form-factor.

#### 4.1 Achromatic metalens doublet, with a large numerical aperture

Chromatic aberration is a longstanding issue that impedes the adoption of metasurface optics for a wide range of imaging applications. Metasurface lenses designed for operation at a single wavelength usually focus red wavelengths a shorter distance than blue wavelengths – a highly undesirable feature for broadband imaging. (This phenomena is termed negative dispersion, and afflicts conventional diffractive elements in addition to metasurfaces [71–73].) In the context of metasurfaces, current approaches to correcting chromatic aberration involve spatial segmentation [74], layered polychromatic devices [75], use of the Pancharatnam-Berry (geometric) phase [76–79], polarization rotation [53,80], dispersion engineering with reflective substrates [81,82], and computational deconvolution combined with extended-depth-of-field metasurfaces [83]. Related computational design methods for broadband diffractive optical elements have also been proposed [84]. Previously reported metasurface achromats have small diameters, low numerical apertures, may require a specific input polarization and often suffer from low overall transmission efficiency. Here, we present a design for an achromatic metasurface doublet lens that is polarization independent. Furthermore, our design uses simple rectangular nanostructures, which would potentially make fabrication easier.

A schematic of our design is shown in Fig. 2(a). Both sides of a 1 mm thick glass substrate are patterned with a unique metasurface. It is desired that normally-incident broadband light come to a focus 2 mm beyond the second metasurface (the backside of the glass substrate). We specify that the metasurface aperture be 800 µm in diameter. If we perform optimization across a single line through the center of the optic (i. e. we explicitly perform optimization within a two-dimensional (XZ) simulation region) and assume that the metalens is symmetric, it follows that we must optimize two thousand design parameters (given a 400 nm pitch, one thousand unique meta-atom widths must be determined for both the front and back metasurface of the optic). To specify a target intensity distribution for our design, ${{\boldsymbol I}^{des,{\lambda _j}}}$, we calculated the ideal, diffraction-limited focal spot associated with a singlet lens with focal length 2 mm, and aperture diameter 800 µm, (the desired intensity distribution was changed for each input wavelength). For our input fields, we chose normally-incident planewaves with wavelengths λ = 480 nm through 640 nm in 20 nm increments (nine input fields in total). Forward propagation and adjoint calculations were performed in a two-dimensional simulation (e. g. along a central cross-section of the metalens). To perform steepest-descent iterations, a separate adjoint simulation was solved for each of the nine design wavelengths. The gradients calculated at each wavelength were summed to determine the overall gradient ${{dC} \mathord{\left/ {\vphantom {{dC} {d{{\boldsymbol w}^m}}}} \right.} {d{{\boldsymbol w}^m}}}$. All meta-atom widths were initially set to 200 nm, (ensuring an initially flat phase response for all design wavelengths). Optimization proceeded by rescaling the gradients such that the maximum change in the width of any meta-atom was set to 5 nm per iteration, and 100 iterations were performed in this manner. The maximum width change was then set to 1 nm, and 500 more iterations were performed. To enforce manufacturing constraints, we specified that the minimum meta-atom width was ${w^{min}}$ = 85 nm, and the maximum was ${w^{max}}$ = 370 nm. If a given steepest-descent iteration caused a meta-atom width to move outside of these constraints, the step was truncated such that that meta-atom was instead set to either ${w^{min}}$ or ${w^{max}}$.

The resulting optimized meta-atom widths for the front and back metasurfaces are plotted in Fig. 2(b). Inspection of these plots indicates that the design does not strictly satisfy the local periodic approximation over all regions of the optimized optic. Analysis of potential error sources and workarounds is included in Section 6.2. Further intuition about the resulting design can be gained by plotting the unwrapped phase across the metasurface aperture at a single wavelength (λ=640 nm, see Fig. 2(c)). Unlike conventional lenses that utilize a parabolic or spherical phase to focus light, Our design features sharp changes in the gradient of the phase at a radial distance ∼180 µm from the center of the metasurface aperture. Such sudden phase discontinuities would be difficult to realize using e. g. Zernike polynomials, and conventional optical design software—highlighting a key advantage of an inverse design approach that independently optimizes all design variables. To estimate the overall optical efficiency of our design after optimization, transmission data (determined from the FDTD parameter sweep) for each optimized meta-atom width was incorporated into our Fourier optics simulations by multiplying the incident electric field at each metasurface layer with a diagonal matrix containing the spatially-varying transmission amplitudes (we also consider the effects of a single internal reflection within the substrate in Section 6.1). In Fig. 2(d), the simulated intensity across the center of the image plane is plotted for each of the design wavelengths. We achieve nearly diffraction-limited focusing performance across a relatively wide bandwidth of 160 nm. In Fig. 2(e), overall optical transmission through the two metasurface layers is plotted as a function of wavelength. In addition, we plot focusing efficiency, defined as the fraction of incident intensity contained in a diffraction-limited region (∼1.6 µm radius) at the image plane. Overall, focusing efficiency is reduced in comparison to singlet metasurfaces designed for single-wavelength operation [5]. However, performance is superior to many existing achromatic designs over comparable bandwidths. Furthermore, recent work [30] has demonstrated that high focusing efficiency may be achieved by optimizing the geometries of individual meta-atoms (of sizes ∼2.5λ) using adjoint-gradient methods. Hence, we speculate that focusing efficiency could be further enhanced by combining such nanoscale optimization methods with our approach. XZ-profiles of the optimized metalens point spread function are shown in Fig. 3(a). These plots confirm that for each of the design wavelengths, peak intensity is directed at a plane 2 mm from the rear surface of the metalens. To highlight how the interplay between the front and back metasurface is necessary for achieving broadband performance, we re-ran the optimization procedure, but optimized only with respect to the rear metasurface (uniform phase and perfect transmission were assumed at the surface that would normally contain the front metasurface). XZ-profiles of the point spread function associated with the singlet design are shown in Fig. 3(b). Intensity distributions are plotted on the same intensity scale as in Fig. 3(a). For the case of optimizing a metasurface singlet lens, focal plane intensity is severely reduced, and spurious intensity peaks occur at multiple axial positions removed from the focal plane. These results demonstrate how systems composed of multiple metasurfaces introduce an expanded range of capabilities.

#### 4.2 Generation of arbitrary, spectrally-multiplexed intensity distributions

A key advantage of the inverse design method is that it permits a user to define an arbitrary set of output intensity distributions at the image plane $({{{\boldsymbol I}^{des,{\lambda_j},{f_k}}}} )$. There is no need to restrict the desired output to the functions performed by conventional optical elements. To illustrate this feature, we show how a single metasurface device may be used to generate a set of unique output intensities, depending solely upon the input wavelength. In the context of silicon photonics, devices with wavelength-splitting functionality have been designed using electromagnetic inverse solvers [22,25]. Furthermore, wavelength-multiplexed metasurfaces could find application in fields such as fluorescence bioimaging, where under some circumstances it is advantageous to encode wavelength-specific optical aberrations into acquired images for classification and 3D localization tasks [85–88].

A schematic of our design is illustrated in Fig. 4(a). Input light (assumed to be spatially coherent) is normally incident on a 1 mm thick glass substrate coated with a metasurface on the front and back face. Here we perform optimization over a full three-dimensional simulation region. Each of the two metasurfaces is 150 µm in diameter and has a circular aperture within an XY plane. A wavelength-dependent desired image is expected to form 500 µm beyond the back metasurface. We again assume a 400 nm pitch between adjacent meta-atoms. Hence, each meta-surface contains 110,461 meta-atoms, and the optimization problem will consist of a total of 220,922 independent design variables. We specify five unique input wavelengths (λ = 480 through 640 nm in 40 nm increments), and five unique desired output intensity distributions. Optimization consisted of 500 steepest descent iterations with a maximum change in meta-atom width of 5 nm per iteration. Unlike in the previous example, in this case it was necessary to solve forward and adjoint problems over a full three-dimensional simulation region.

The optimized meta-atom widths (Fig. 4(b)) show that the optimized design makes use of all degrees of freedom offered by the numerous design variables. Output intensity distributions for the five input wavelengths are plotted in Fig. 4(c). The output intensities spell the words BLUE, GREEN, YELLOW, ORANGE and RED, for different input wavelengths. To estimate the overall optical efficiency ${O^{{\lambda _i}}}$ of this holographic element, for each of the five output images the following sum based on the root-mean square error was calculated:

#### 4.3 Design of an optical neural network for image classification

As a final demonstration of our inverse design method, we show how it may be used to train an optical neural network to classify handwritten digits 0 through 9. Optical neural networks are an emerging field that has recently generated much interest within the machine learning and artificial intelligence communities [64,65,89,90]. Conventional neural networks are implemented entirely in software and perform a series of complex mathematical operations on input data in order to assign the data to one of many possible output classes. Alternatively, optical neural networks utilize light propagation to perform a multitude of mathematical operations and use the intensity distribution at the output of the network to perform classification tasks, or as an intermediate input for further computation performed in software. Optical neural networks effectively perform computations such as Fourier transforms and convolutions “at the speed of light”. Hence, an advantage to implementing a neural-network partially or completely by optical means is the potential to drastically speed up and further parallelize computations on large datasets. Previous work has experimentally demonstrated how image classification tasks may be performed optically [64,65]. Neural network training may also be done by optical means [90]. Here, we recognize that our inverse-design method operates in a manner similar to the conventional backpropagation algorithm used for training (software) neural networks. However, our method further increases training speed. In a standard linear network, forward propagation and gradient calculation requires ${\rm O}({M{S^2}} )$ operations, where *S* is the number of meta-atoms (neurons) contained in a single metasurface (layer) of the network, and *M* is the number of layers. However, since our method uses the fast Fourier transform algorithm to simulate propagation of input and adjoint fields, computation time is reduced to ${\rm O}({MS\log (S )} )$ operations.

Our proposed design is shown schematically in Fig. 5(a). Pixilated images of handwritten digits are projected upon the input aperture of the device using monochromatic, coherent light (λ = 560 nm). The neural network consists of five glass substrates 100 µm thick, axially arranged with 100 µm air separations. A 150 µm diameteter circular metasurface is patterned on the front and back of each element, such that the entire neural network contains ten unique layers. After light exits the final glass substrate, it propagates 0.5 mm, until it impinges upon an array of ten detectors, arranged in a circular configuration at the image plane. Each detector is assigned a specific digit. Image classification proceeds by measuring the total intensity incident upon each of the ten detectors, and assigning input images to the digit class that has the greatest associated output intensity (note that this classification strategy is effectively the same as that used by [64]).

Given this optical system layout, the next task is to train the neural network by optimizing individual meta-atom dimensions such that different input intensity distributions illuminate the correct output detectors. This optimization problem involves 1,104,610 design variables (110,461 meta-atoms per metasurface, and ten metasurfaces). Training was performed using the 60,000 handwritten digit images contained in the MNIST training dataset [91]. Digits were rescaled such that they encompassed a bounding box of 75 µm diameter. It is assumed that each of the digits contained in the training dataset was appropriately centered within the bounding box. For each training image *k*, an output intensity distribution $({{{\boldsymbol I}^{des,{f_k}}}} )$ was defined as a two-dimensional Gaussian centered over the appropriate output detector with variance parameter σ^{2} = 2.25 µm^{2}. Optimization of the design proceeded using the stochastic steepest-descent method. The 60,000 training images and associated desired outputs were divided into batches of five images, and training consisted of five epochs. For the first epoch, a maximum step-size (maximum change of meta-atom width per steepest descent iteration) of 5 nm was used. For the second epoch a 1 nm step size was used, and for the final three epochs a 0.1 nm step size was used. Figure 4(d) plots the widths of the meta-atoms of the final design. To gain an intuitive understanding of the overall function of this optical system, Fig. 5(b) plots three representative input digits, and the resulting simulated output intensity distributions. For all digits, the majority of the output intensity is concentrated in ten regions centered over the ring of detectors. However, in each of the cases shown, a different detector receives the greatest overall intensity, enabling each of the unique digits to be classified correctly.

Once an optimized design was arrived at, the performance of the system was evaluated by attempting to classify the 10,000 handwritten digits contained in MNIST testing dataset. On the test data, we achieve a classification accuracy of 84.00%. A confusion matrix for our result is plotted in Fig. 5(c). The optical network has greatest difficulty distinguishing between “4” and “9”, and “5” and “3”, based on the similarities in the shapes of these digits. In comparison, [64] obtained a simulated test accuracy of 91.75%. This difference in performance could be due to factors such as the overall geometry and dimensions of our design, or differences in the initialization and training routine. Furthermore, we note that real-world performance of the optical neural network could be degraded by misalignment of the layers. In Section 6.3 we consider axial misalignment of one of the substrates containing two metasurfaces, and observe an ∼11% reduction in classification accuracy due to a 20 µm misalignment. In closing this section, we note that even though our optical design consists of multiple layers of metasurfaces, the neural network is effectively still a linear system [92]. That is, each metasurface layer and subsequent propagation can be thought of a linear transformation upon the incident intensity distribution. Use of optical nonlinearities [93–95] could potentially improve overall performance.

## 5. Discussion

We have developed an inverse design method for optimizing cascaded systems of metasurface optics. The method is computationally efficient and realizes a variety of innovative design solutions. Each of the designs presented here utilizes a TiO_{2}-on-glass fabrication platform based on square meta-atoms arranged in a periodic lattice. Hence the only adjustable variable for each individual meta-atom is its associated width. This simple fabrication platform has a number of advantages: The manufacturing feasibility of the meta-atom geometries is well established, the four-fold symmetry of the meta-atoms ensures polarization-independent performance, and the total number of design variables remains relatively modest. Nevertheless, it is reasonable to expect that performance of the example designs considered could be further enhanced if more complex meta-atom geometries were considered throughout the design optimization routine, especially in the context of polychromatic and achromatic designs. For example, multi-layer (singlet) metasurfaces, and meta-atom libraries of crosses, hollow rectangles and concentric cylinders have been investigated [57]. We briefly discuss a couple ways in which our inverse design framework could be extended to include this additional functionality. The most straightforward approach to consider more complex meta-atom geometries would be to perform a multi-dimensional FDTD parameter sweep over the multiple design variables. In this case, estimating the gradients of the nanoscale problem (calculation of ${{d{{\boldsymbol \phi }^{{{\boldsymbol w}^m},{\lambda _j}}}} \mathord{\left/ {\vphantom {{d{{\boldsymbol \phi }^{{{\boldsymbol w}^m},{\lambda_j}}}} {d{{\boldsymbol w}^m}}}} \right.} {d{{\boldsymbol w}^m}}}$, see Eq. (5)) would require computing finite differences over a multi-dimensional parameter space as opposed to a single width parameter, but the overall method would remain nearly unchanged. An obvious drawback to this strategy is that a complete parameter sweep becomes computationally intractable as the number of design variables per meta-atom increases. In the FDTD parameter sweeps that we have performed for this paper, a single meta-atom geometry can take ∼minutes to simulate using Lumerical FDTD on most standard desktop workstations. Hence, a well-sampled sweep over three or more design variables could easily take ∼weeks to complete, without utilizing high-performance computing resources. An approach that would not require an exhaustive parameter sweep would be to solve a set of different adjoint problems at each design iteration: First, a macroscale adjoint problem using the framework described here would be solved to determine appropriate gradients with respect to phase at each meta-atom location within the optical system. Initial meta-atom geometries could then be selected from a (relatively small) library of simple shapes such as rectangles or cylinders. Next, using some suitable error threshold, lattice positions within the metasurface are identified where varying the parameters of the simple meta-atom structures does not produce desired gradients with respect to phase. At these locations, a separate set of nanoscale adjoint problems could be solved to determine gradients in phase with respect to a larger set of design parameters, and meta-atom shapes updated accordingly. As each nanoscale adjoint problem would be performed over the dimensions of a single meta-atom, this process would not require unreasonable computational resources, and could be trivially parallelized. Means of formulating appropriate nanoscale adjoint problems have been described in previous literature [19–21], and commercial software packages for implementing adjoint solvers in FDTD are currently available [68]. Our inverse design framework, combined with the possible extensions discussed here may lead to numerous applications for metasurfaces to the fields of imaging, display, and optical computing.

## 6. Validity of approximations

In this section, we briefly examine the accuracy of the assumptions that underlie our optical propagation model. Specifically, we assess whether we can ignore scattering and back-reflections between multiple layers of the metasurface optical system, we analyze errors associated with the local periodic approximation, and we quantify the effects of potential misalignments within the optical system.

#### 6.1 Internal reflections between multiple metasurface layers

The optical propagation model used throughout the main text does not consider the effects of stray light reflected between multiple metasurface layers (analogous to the assumptions of a sequential ray-tracer used by conventional optical-design software). This reflected light could propagate into the image plane, and influence the overall output intensity distribution. Here, we show that for the most part such effects are relatively minimal (at least for the design examples considered in this manuscript). Figure 6(a) illustrates the simulation that was performed in order to test the above claim: Using the optimized meta-atom geometries and optical system described in Section 4.1 (achromatic metalens doublet), a single reflection of the input beam off of the back and front metasurface was simulated. Values for the complex reflection coefficients accrued after each subsequent reflection were determined directly from the same FDTD simulations used to estimate the phase imparted by transmission through a metasurface layer. The image-plane electric fields of purely transmitted and multiply-reflected light were summed to determine the overall intensity distribution ${{\boldsymbol I}^{{\lambda _j},ref}}$, where the superscript *ref* indicates that reflected light was considered. Figure 6(b) plots the difference ${{\boldsymbol I}^{{\lambda _j},ref}} - {{\boldsymbol I}^{{\lambda _j}}}$ at the image plane, for each of the nine design wavelengths considered in Section 4.1. The intensity scale is the same as that used in Fig. 2(d). We conclude that the peak intensity will be altered by less than 1% due to multiple reflections. This result is due, in part, to the fact that the optical path length of reflected light from input to the image plane is significantly longer than that of purely transmitted light. Hence, the reflected beam will not be well-focused, and will contribute only a highly-diffuse background intensity. Nevertheless, future work could consider explicitly optimizing metasurface designs with respect to both transmitted an internally reflected light.

#### 6.2 Shortcomings of the local periodic approximation

In order to select meta-atom geometries that will impart a desired phase response on incident light, we assume that a given meta-atom will be surrounded by other meta-atoms of similar geometries. This permits us to perform a modest set of FDTD simulations using periodic boundary conditions in order to determine the relationship between a given meta-atom width and output phase. This approach is known as the local periodic approximation. However, inspection of the results in Fig. 2(b) shows that our inverse-design approach can sometimes lead to metasurfaces with wildly varying widths between adjacent lattice positions. To quantify potential errors associated with the local periodic approximation, here we simulate a “worst-case scenario” in which a single meta-atom of fixed geometry is neighbored on either side by meta-atoms of significantly different geometries. Figure 7(a) illustrates the simulation geometry: A meta-atom 200 nm in width and 600 nm in height is neighbored by two meta-atoms on either side by meta-atoms of a different width. The lattice period was fixed at 400 nm, and simulations were performed over a 400 × 2000 nm grid using periodic boundary conditions in x and y. Four unique simulations were performed, in which the widths of the flanking meta-atoms were set to {85 nm, 200 nm, 300 nm, 375 nm}. The simulation wavelength was λ = 570 nm. Figure 7(b) plots the output phase, relative to the case in which all meta-atoms are 200 nm in width. At the location immediately above the center meta-atom, we note that up to a ∼1.63 radian phase change can occur, depending on whether the flanking meta-atoms are 85 nm or 375 nm in width. Figure 7(c) plots the real portions of the output electric fields, along with the plane at which output phase was measured. While previous work [6,52] has experimentally demonstrated holographic optical elements using similar assumptions (a scenario in which the local periodic approximation is not strictly valid), it is highly likely that phase changes associated with disparately-sized neighboring meta-atoms would degrade experimental results. To mitigate such issues, the following strategies could be employed: Firstly, instead of optimizing over individual meta-atom lattice positions, one could specify that meta-atoms geometries must be constant, or nearly constant over a “super-cell” of multiple meta-atoms. Secondly, a total-variation penalty could be added to the cost function *C* [96], to ensure that meta-atom widths do not change drastically over short distances. Finally, the inverse-design method described here could be potentially combined with a topology-optimization code that utilizes overlapping domains [97], in addition to an online Maxwell’s equations solver.

#### 6.3 Effect of misalignments within the optical system

In the design examples presented in the main text, we did not consider the effects of misalignment between multiple metasurfaces of a given optical system. Under real-world conditions, misalignments could arise due to incorrect substrate thicknesses, or improper axial placement of separate metasurface layers. To assess the impact of optical misalignment, we revisit the design example described in Section 4.3 (optical neural network for image classification). Here, we tweak the optimized design described in the main text, such that the third substrate (containing two metasurfaces spaced 100 µm apart) is axially shifted a distance 20 µm closer to the fourth substrate (see Fig. 8(a)). Without re-training the neural network, we attempted to re-classify the same set of 10,000 test digits. Example output for a given input image is shown in Fig. 8(b), and compared with output for a perfectly aligned optical system. Inspection of images shows that in the case of a misalignment, output intensity is no longer well-focused into tight Gaussian spots, and in this particular example the input image would be mis-classified. A confusion matrix for all 10,000 test images is plotted in Fig. 8(c). Our results indicate that a misalignment of this magnitude will cause classification accuracy to decrease by ∼11%, however the network continues to function adequately, and is still capable of correctly classifying a majority of test images. It also is interesting to note that although overall classification accuracy declines, for the accuracy marginally increases for digits 1, 6, 9, and 0. Furthermore, a 20 µm misalignment is quite severe, and it is likely that better precision could be achieved under most circumstances (we also simulated an axial misalignment of λ/2, however there was no impact on the classification results).

## Funding

Laboratory Directed Research and Development (LDRD), Sandia National Laboratories; U.S. Department of Energy (DE-NA0003525).

## Acknowledgments

The author wishes to acknowledge Michael B. Sinclair for discussions regarding the adjoint problem derivation and careful reading of the manuscript. The author thanks Maxwell D. Aiello for providing the representative SEM images of a metasurface shown in Fig. 1(a) of the text. The author also acknowledges Victor M. Acosta, John D. Perreault, and Patrick Llull for helpful conversations regarding computational inverse design and its applications. This work was supported by the Laboratory Directed Research and Development program at Sandia National Laboratories, a multimission laboratory managed and operated by the National Technology and Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract no. DE-NA0003525. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the U.S. government.

## References

**1. **P. Genevet, F. Capasso, F. Aieta, M. Khorasaninejad, and R. Devlin, “Recent advances in planar optics: from plasmonic to dielectric metasurfaces,” Optica **4**(1), 139–152 (2017). [CrossRef]

**2. **S. M. Kamali, E. Arbabi, A. Arbabi, and A. Faraon, “A review of dielectric optical metasurfaces for wavefront control,” Nanophotonics **7**(6), 1041–1068 (2018). [CrossRef]

**3. **D. Fattal, J. Li, Z. Peng, M. Fiorentino, and R. G. Beausoleil, “Flat dielectric grating reflectors with focusing abilities,” Nat. Photonics **4**(7), 466–470 (2010). [CrossRef]

**4. **D. Fattal, Z. Peng, T. Tran, S. Vo, M. Fiorentino, J. Brug, and R. G. Beausoleil, “A multi-directional backlight for a wide-angle glasses-free three-dimensional display,” Nature **495**(7441), 348–351 (2013). [CrossRef]

**5. **A. Arbabi, Y. Horie, A. J. Ball, M. Bagheri, and A. Faraon, “Subwavelength-thick lenses with high numerical apertures and large efficiency based on high-contrast transmitarrays,” Nat. Commun. **6**(1), 7069 (2015). [CrossRef]

**6. **A. Arbabi, Y. Horie, M. Bagheri, and A. Faraon, “Dielectric metasurfaces for complete control of phase and polarization with subwavelength spatial resolution and high transmission,” Nat. Nanotechnol. **10**(11), 937–943 (2015). [CrossRef]

**7. **M. Khorasaninejad, W. T. Chen, R. C. Devlin, J. Oh, A. Y. Zhu, and F. Capasso, “Metalenses at visible wavelengths: diffraction-limited focusing and subwavelength resolution imaging,” Science **352**(6290), 1190–1194 (2016). [CrossRef]

**8. **D. Lin, M. Melli, E. Poliakov, P. St. Hilaire, S. Dhuey, C. Peroz, S. Cabrini, M. Brongersma, and M. Klug, “Optical metasurfaces for high angle steering at visible wavelengths,” Sci. Rep. **7**(1), 2286 (2017). [CrossRef]

**9. **S. M. Kamali, A. Arbabi, E. Arbabi, Y. Horie, and A. Faraon, “Decoupling optical function and geometrical form using conformal flexible dielectric metasurfaces,” Nat. Commun. **7**(1), 11618 (2016). [CrossRef]

**10. **H. Pahlevaniznehad, M. Khorasaninejad, Y. Huang, Z. Shi, L. P. Hariri, D. C. Adams, V. Ding, A. Zhu, C. Qiu, F. Capasso, and M. J. Suter, “Nano-optic endoscope for high-resolution optical coherence tomography in vivo,” Nat. Photonics **12**(9), 540–547 (2018). [CrossRef]

**11. **R. J. Lin, V. Su, S. Wang, M. K. Chen, T. L. Chung, Y. H. Chen, H. Y. Kuo, J. Chen, J. Chen, Y. Huang, J. Wang, C. H. Chum P, C. Wu, T. Li, Z. Wang, S. Zhu, and D. P. Tsai, “Achromatic metalens array for full-colour light-field imaging,” Nat. Nanotechnol. **14**(3), 227–231 (2019). [CrossRef]

**12. **C. Pfeiffer and A. Grbic, “Cascaded metasurfaces for complete phase and polarization control,” Appl. Phys. Lett. **102**(23), 231116 (2013). [CrossRef]

**13. **C. Pfeiffer, N. K. Emani, A. M. Shalout, A. Boltasseva, V. M. Shalaev, and A. Grbic, “Efficient light bending with isotropic metamaterial Hugens’ surfaces,” Nano Lett. **14**(5), 2491–2497 (2014). [CrossRef]

**14. **S. G. Johnson, “Notes on Adjoint Methods for 18.335,” https://math.mit,edu/∼stevenj/18.336/adjoint.pdf (Last accessed May 29, 2019).

**15. **S. Molesky, Z. Lin, A. Y. Piggott, W. Jin, J. Vučković, and A. W. Rodriguez, “Inverse design in nanophotonics,” Nat. Photonics **12**(11), 659–670 (2018). [CrossRef]

**16. **J. Reuther, A. Jameson, J. J. Alonso, M. J. Remlinger, and D. Saunders, “Constrained multipoint aerodynamic shape optimization using an adjoint formulation and parallel computers, part 2,” J. Aircr. **36**(1), 61–74 (1999). [CrossRef]

**17. **M. B. Giles and N. A. Pierce, “An introduction to the adjoint approach to design,” Flow, Turbul. Combust. **65**(3/4), 393–415 (2000). [CrossRef]

**18. **N. K. Georgieva, S. Glavic, M. H. Bakr, and J. W. Bandler, “Feasible adjoint sensitivity technique for EM design optimization,” IEEE Trans. Microwave Theory Tech. **50**(12), 2751–2758 (2002). [CrossRef]

**19. **J. S. Jensen and O. Sigmund, “Topology optimization for nano-photonics,” Laser Photonics Rev. **5**(2), 308–321 (2011). [CrossRef]

**20. **C. M. Lalau-Keraly, S. Bhargava, O. D. Miller, and E. Yablonovitch, “Adjoint shape optimization applied to electromagnetic design,” Opt. Express **21**(18), 21693–21701 (2013). [CrossRef]

**21. **O. D. Miller, “Photonic design: fundamental solar cell physics to computational inverse design,” https://arxiv.org/abs/1308.0212 (2013).

**22. **J. Lu and J. Vučković, “Nanophotonic computational design,” Opt. Express **21**(11), 13351–13367 (2013). [CrossRef]

**23. **A. C. R. Niederberger, D. A. Fattal, N. R. Gauger, S. Fan, and R. G. Beausoleil, “Sensitivity analysis and optimization of sub-wavelength optical gratings using adjoints,” Opt. Express **22**(11), 12971–12981 (2014). [CrossRef]

**24. **V. Ganapati, O. D. Miller, and E. Yablonovitch, “Light trapping textures designed by electromagnetic optimization for subwavelength thick solar cells,” IEEE J. Photovolt. **4**(1), 175–182 (2014). [CrossRef]

**25. **A. Y. Piggott, J. Lu, K. G. Lagoudakis, J. Petykiewicz, T. M. Babinec, and J. Vučković, “Inverse design and demonstration of a compact and broadband on-chip wavelength demultiplexer,” Nat. Photonics **9**(6), 374–377 (2015). [CrossRef]

**26. **A. Y. Piggot, J. Petykiewicz, L. Su, and J. Vučković, “Fabrication-constrained nanophotonic inverse design,” Sci. Rep. **7**(1), 1786 (2017). [CrossRef]

**27. **L. Su, R. Trivedi, N. V. Sapra, A. Y. Piggott, D. Vercruysse, and J. Vučković, “Fully-automated optimization of grating couplers,” Opt. Express **26**(4), 4023–4034 (2018). [CrossRef]

**28. **T. W. Hughes, M. Minkov, I. A. D. Williamson, and S. Fan, “Adjoint method and inverse design for nonlinear nanophotonic devices,” ACS Photonics **5**(12), 4781–4787 (2018). [CrossRef]

**29. **R. Pestourie, C. Pérez-Arancibia, Z. Lin, W. Shin, F. Capasso, and S. G. Johnson, “Inverse design of large-area metasurfaces,” Opt. Express **26**(26), 33732–33747 (2018). [CrossRef]

**30. **T. Phan, D. Sell, E. W. Wang, S. Doshay, K. Edee, J. Yang, and J. A. Fan, “High-efficiency, large-area topology-optimized metasurfaces,” Light: Sci. Appl. **8**(1), 48 (2019). [CrossRef]

**31. **31. J. Jiang and J. A. Fan, “Dataless training of generative models for the inverse design of metasurfaces,” https://arxiv.org/abs/1906.07843 (2019).

**32. **V. Egorov, M. Eitan, and J. Scheuer, “Genetically optimized all-dielectric metasurfaces,” Opt. Express **25**(3), 2583–2593 (2017). [CrossRef]

**33. **D. Sell, J. Yang, S. Doshay, and J. A. Fan, “Periodic dielectric metasurfaces with high-efficiency, multiwavelength functionalities,” Adv. Opt. Mater. **5**(23), 1700645 (2017). [CrossRef]

**34. **D. Sell, J. Yang, S. Doshay, R. Yang, and J. A. Fan, “Large-angle multifunctional metagratings based on freeform multimode geometries,” Nano Lett. **17**(6), 3752–3757 (2017). [CrossRef]

**35. **J. Yang and J. A. Fan, “Topology-optimized metasurfaces: impact of initial geometric layout,” Opt. Lett. **42**(16), 3161–3164 (2017). [CrossRef]

**36. **A. Zhan, T. K. Fryett, S. Colburn, and A. Majumdar, “Inverse design of optical elements based on arrays of dielectric spheres,” Appl. Opt. **57**(6), 1437–1446 (2018). [CrossRef]

**37. **F. Callewaert, V. Velev, P. Kumar, A. V. Sahakian, and K. Aydin, “Inverse-designed broadband all-dielectric electromagnetic metadevices,” Sci. Rep. **8**(1), 1358 (2018). [CrossRef]

**38. **Z. Lin, B. Groever, F. Capasso, A. W. Rodriguez, and M. Lončar, “Topology-optimized multilayered metaoptics,” Phys. Rev. Appl. **9**(4), 044030 (2018). [CrossRef]

**39. **Z. Lin, V. Liu, R. Pestourie, and S. G. Johnson, “Topology optimization of freeform large-area metasurfaces,” Opt. Express **27**(11), 15765–15775 (2019). [CrossRef]

**40. **Y. Zhou, I. I. Kravchenko, H. Wang, J. R. Nolen, G. Gu, and J. Valentine, “Multilayer noninteracting dielectric metasurfaces for multiwavelength metaoptics,” Nano Lett. **18**(12), 7529–7537 (2018). [CrossRef]

**41. **J. R. Fienup, “Phase retrieval algorithms: a comparison,” Appl. Opt. **21**(15), 2758–2769 (1982). [CrossRef]

**42. **J. R. Fienup, “Phase-retireval algorithms for a complicated optical system,” Appl. Opt. **32**(10), 1737–1746 (1993). [CrossRef]

**43. **V. Ganapati, L. Waller, and E. Yablonovitch, “Adjoint method for phase retrieval,” https://doi.org/10.1364.COSI.2014.CW4C.2 (2014).

**44. **P. Lalanne, S. Astilean, P. Chavel, E. Cambril, and H. Launois, “Design and fabrication of blazed binary diffractive elements with sampling periods smaller than the structural cutoff,” J. Opt. Soc. Am. A **16**(5), 1143–1156 (1999). [CrossRef]

**45. **S. Vo, D. Fattal, W. V. Sorin, Z. Peng, T. Tran, M. Fiorentino, and R. G. Beausoleil, “Sub-wavelength grating lenses with a twist,” IEEE Photonics Technol. Lett. **26**(13), 1375–1378 (2014). [CrossRef]

**46. **M. Khorasaninejad, A. Y. Zhu, C. Roques-Carmes, W. T. Chen, J. Oh, I. Mishra, R. C. Devlin, and F. Capasso, “Polarization-insensitive metalenses at visible wavelengths,” Nano Lett. **16**(11), 7229–7234 (2016). [CrossRef]

**47. **A. Zhan, S. Colburn, R. Trivedi, T. K. Fryett, C. M. Dodson, and A. Majumdar, “Low-contrast dielectric metasurface optics,” ACS Photonics **3**(2), 209–214 (2016). [CrossRef]

**48. **E. Hasman, V. Kleiner, G. Biener, and A. Niv, “Polarization dependent focusing lens by use of quantized Pancharatnam-Berry phase diffractive optics,” Appl. Phys. Lett. **82**(3), 328–330 (2003). [CrossRef]

**49. **D. Lin, P. Fan, E. Hasman, and M. L. Brongersma, “Dielectric gradient metasurface optical elements,” Science **345**(6194), 298–302 (2014). [CrossRef]

**50. **50. R. C. Devlin, M. Khorasaninejad, W. T. Chen, J. Oh, and F. Capasso, “Broadband high-efficiency dielectric metasurfaces for the visible spectrum,” Proc. Natl. Acad. Sci. U. S. A. **113**(38), 10473–10478 (2016). [CrossRef]

**51. **E. Schonbrun, K. Seo, and K. Crozier, “Reconfigurable imaging systems using elliptical nanowires,” Nano Lett. **11**(10), 4299–4303 (2011). [CrossRef]

**52. **J. P. Balthasar Mueller, N. A. Rubin, R. C. Devlin, B. Groever, and F. Capasso, “Metasurface polarization optics: independent phase control of arbitrary orthogonal states of polarization,” Phys. Rev. Lett. **118**(11), 113901 (2017). [CrossRef]

**53. **M. D. Aiello, A. S. Backer, A. J. Sapon, J. Smits, J. D. Perreault, P. Llull, and V. M. Acosta, “Achromatic varifocal metalens for the visible spectrum,” https://arxiv.org/abs/1903.03930 (2019).

**54. **M. Decker, I. Staude, M. Falkner, J. Dominguez, D. N. Neshev, I. Brener, T. Pertsch, and Y. S. Kivshar, “High-efficiency dielectric Huygens’ surfaces,” Adv. Opt. Mater. **3**(6), 813–820 (2015). [CrossRef]

**55. **K. E. Chong, I. Staude, A. James, J. Dominguez, S. Liu, S. Campione, G. S. Subramania, T. S. Luk, M. Decker, D. N. Neshev, I. Brener, and Y. S. Kivshar, “Polarization-independent silicon metadevices for efficient optical wavefront control,” Nano Lett. **15**(8), 5369–5374 (2015). [CrossRef]

**56. **S. Liu, A. Vaskin, S. Campione, O. Wolf, M. B. Sinclair, J. Reno, G. A. Keeler, I. Staude, and I. Brener, “Huygens’ metasurfaces enabled by magnetic dipole resonance tuning in split dielectric nanoresonators,” Nano Lett. **17**(7), 4297–4303 (2017). [CrossRef]

**57. **S. Shrestha, A. C. Overvig, M. Lu, A. Stein, and N. Yu, “Broadband achromatic dielectric metalenses,” Light: Sci. Appl. **7**(1), 85 (2018). [CrossRef]

**58. **Zemax LLChttps://www.zemax.com/products/opticstudio (Last accessed June 2, 2019).

**59. **A. Arbabi, E. Arbabi, S. M. Kamali, Y. Horie, S. Han, and A. Faraon, “Miniature optical planar camera based on a wide-angle metasurface doublet corrected for monochromatic aberrations,” Nat. Commun. **7**(1), 13682 (2016). [CrossRef]

**60. **B. Groever, W. T. Chen, and F. Capasso, “Meta-lens doublet in the visible region,” Nano Lett. **17**(8), 4902–4907 (2017). [CrossRef]

**61. **W. T. Chen, A. Y. Zhu, J. Sisler, Y. Huang, K. M. A. Yousef, E. Lee, C. Qiu, and F. Capasso, “Broadband achromatic metasurface-refractive optics,” Nano Lett. **18**(12), 7801–7808 (2018). [CrossRef]

**62. **B. Wang, F. Dong, Q. Li, D. Yang, C. Sun, J. Chen, Z. Song, L. Xu, W. Chu, Y. Xiao, Q. Gong, and Y. Li, “Visible-frequency dielectric metasurfaces for multiwavelength achromatic and highly dispersive holograms,” Nano Lett. **16**(8), 5235–5240 (2016). [CrossRef]

**63. **Z. Shi, M. Khorasaninejad, Y. Huang, C. Roques-Carmes, A. Y. Zhu, W. T. Chen, V. Sanjeev, Z. Ding, M. Tamagnone, K. Chaudhary, R. C. Devlin, C. Qiu, and F. Capasso, “Single-layer metasurface with controllable multiwavelength functions,” Nano Lett. **18**(4), 2420–2427 (2018). [CrossRef]

**64. **X. Lin, Y. Rivenson, N. T. Yardimci, M. Veli, Y. Luo, M. Jarrahi, and A. Ozcan, “All-optical machine learning using diffractive deep neural networks,” Science **361**(6406), 1004–1008 (2018). [CrossRef]

**65. **J. Chang, V. Sitzmann, X. Dun, W. Heidrich, and G. Wetzstein, “Hybrid optical-electronic convolutional neural networks with optimized diffractive optics for image classification,” Sci. Rep. **8**(1), 12324 (2018). [CrossRef]

**66. **P. E. Gill, W. Murray, and M. H. Wright, * Practical Optimization* (Emerald Group Publishing Limited, 1982).

**67. **J. W. Goodman, * Introduction to Fourier Optics* (McGraw-Hill, 1996).

**68. **Lumerical Inc. https://www.lumerical.com/products (Last accessed June 2, 2019).

**69. **R. Remmert, * Theory of Complex Functions* (Springer Science & Business Media, 2012).

**70. **E. Nehme, D. Freedman, R. Gordon, B. Ferdman, T. Michaeli, and Y. Shechtman, “Dense three dimensional localization microscopy by deep learning,” https://arxiv.org/abs/1906.09957 (2019).

**71. **S. J. Bennett, “Achromatic combinations of hologram optical elements,” Appl. Opt. **15**(2), 542–545 (1976). [CrossRef]

**72. **D. A. Buralli and J. R. Rogers, “Some fundamental limitations of achromatic holographic systems,” J. Opt. Soc. Am. A **6**(12), 1863–1868 (1989). [CrossRef]

**73. **M. W. Farn and J. W. Goodman, “Diffractive doublets corrected at two wavelengths,” J. Opt. Soc. Am. A **8**(6), 860–867 (1991). [CrossRef]

**74. **E. Arbabi, A. Arbabi, S. M. Kamali, Y. Horie, and A. Faraon, “Multiwavelength metasurfaces through spatial multiplexing,” Sci. Rep. **6**(1), 32803 (2016). [CrossRef]

**75. **O. Avayu, E. Almeida, Y. Prior, and T. Ellenbogen, “Composite functional metasurfaces for multispectral achromatic optics,” Nat. Commun. **8**(1), 14992 (2017). [CrossRef]

**76. **S. Wang, P. C. Wu, V. Su, Y. Lai, C. H. Chu, J. Chen, S. Lu, J. Chen, B. Xu, C. Kuan, T. Li, S. Zhu, and D. P. Tsai, “Broadband achromatic optical metasurface devices,” Nat. Commun. **8**(1), 187 (2017). [CrossRef]

**77. **S. Wang, P. C. Wu, V. Su, Y. Lai, M. Chen, H. Y. Kuo, B. H. Chen, Y. H. Chen, T. Huang, J. Wang, R. Lin, C. Kuan, T. Li, Z. Wang, S. Zhu, and D. P. Tsai, “A broadband achromatic metalens in the visible,” Nat. Nanotechnol. **13**(3), 227–232 (2018). [CrossRef]

**78. **W. T. Chen, A. Y. Zhu, V. Sanjeev, M. Khorasaninejad, Z. Shi, E. Lee, and F. Capasso, “A broadband achromatic metalens for focusing and imaging in the visible,” Nat. Nanotechnol. **13**(3), 220–226 (2018). [CrossRef]

**79. **W. T. Chen, A. Y. Zhu, J. Sisler, Z. Bharwani, and F. Capasso, “A broadband achromatic polarization-insensitive metalens consisting of anisotropic nanostructures,” Nat. Commun. **10**(1), 355 (2019). [CrossRef]

**80. **E. Arbabi, A. Arbabi, S. M. Kamali, Y. Horie, and A. Faraon, “High efficiency double-wavelength dielectric metasurface lenses with dichroic birefringent meta-atoms,” Opt. Express **24**(16), 18468–18477 (2016). [CrossRef]

**81. **E. Arbabi, A. Arbabi, S. M. Kamali, Y. Horie, and A. Faraon, “Controlling the sign of chromatic dispersion in diffractive optics with dielectric metasurfaces,” Optica **4**(6), 625–632 (2017). [CrossRef]

**82. **M. Khorasaninejad, Z. Shi, A. Y. Zhu, W. T. Chen, V. Sanjeev, A. Zaidi, and F. Capasso, “Achromatic metalens over 60 nm bandwidth in the visible and metalens with reverse chromatic dispersion,” Nano Lett. **17**(3), 1819–1824 (2017). [CrossRef]

**83. **S. Colburn, A. Zhan, and A. Majumdar, “Metasurface optics for full-color computational imaging,” Sci. Adv. **4**(2), eaar2114 (2018). [CrossRef]

**84. **P. Wang, N. Mohammad, and R. Menon, “Chromatic-aberration-corrected diffractive lenses for ultra-broadband focusing,” Sci. Rep. **6**(1), 21545 (2016). [CrossRef]

**85. **C. Smith, M. Huisman, M. Siemons, D. Grünwald, and S. Stallinga, “Simultaneous measurement of emission color and 3D position of single molecules,” Opt. Express **24**(5), 4996–5013 (2016). [CrossRef]

**86. **A. Jesacher, S. Bernet, and M. Ritsche-Marte, “Colored point spread function engineering for parallel confocal microscopy,” Opt. Express **24**(24), 27395–27402 (2016). [CrossRef]

**87. **Y. Shechtman, L. E. Weiss, A. S. Backer, M. Y. Lee, and W. E. Moerner, “Multicolor localization microscopy by point-spread-function engineering,” Nat. Photonics **10**(9), 590–594 (2016). [CrossRef]

**88. **E. Hershko, L. E. Weiss, T. Michaeli, and Y. Shechtman, “Multicolor localization microscopy and point-spread-function engineering by deep learning,” Opt. Express **27**(5), 6158–6183 (2019). [CrossRef]

**89. **Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, D. Englund, and M. Soljačić, “Deep learning with coherent nanophotonic circuits,” Nat. Photonics **11**(7), 441–446 (2017). [CrossRef]

**90. **T. W. Hughes, M. Minkov, Y. Shi, and S. Fan, “Training of photonic neural networks through *in situ* backpropagation and gradient measurement,” Optica **5**(7), 864–871 (2018). [CrossRef]

**91. **Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE **86**(11), 2278–2324 (1998). [CrossRef]

**92. **H. Wang, G. Huang, X. Wei, Y. Sun, and H. Wang, “Comment on ‘all-optical machine learning using diffractive deep neural networks,’” https://arxiv.org/abs/1809.08360 (2018).

**93. **S. Liu, P. P. Vabishchevich, A. Vaskin, J. L. Reno, G. M. Peake, G. A. Keeler, M. B. Sinclair, I. Staude, and I. Brener, “Optical nonlinearities in all-dielectric metasurfaces,” 2018 International Conference on Optical MEMS and Nanophotonics doi:10.1109/OMN.2018.8454615.

**94. **M. Miscuglio, A. Mehrabian, Z. Hu, S. I. Azzam, J. George, A. V. Kildishev, M. Pelton, and V. J. Sorger, “All-optical nonlinear activation function for photonic neural networks,” Opt. Mater. Express **8**(12), 3851–3863 (2018). [CrossRef]

**95. **Y. Zuo, B. Li, Y. Zhao, Y. Jiang, Y. Chen, P. Chen, G. Jo, J. Liu, and S. Du, “All optical neural network with nonlinear activation functions,” https://arxiv.org/abs/1904.10819 (2019).

**96. **L. J. Rudin, S. Osher, and E. Fatemi, “Nonlinear total-variation-based noise removal algorithms,” Phys. D **60**(1-4), 259–268 (1992). [CrossRef]

**97. **Z. Lin and S. G. Johnson, “Overlapping domains for topology optimization of large area metasurfaces,” https://arxiv.org/abs/1906.05376 (2019).