## Abstract

We derive an adjoint shape optimization algorithm with a compound figure of merit and demonstrate its use with both gradient descent and Levenberg-Marquart updates for the case of SiO_{2}-buried SOI coplanar waveguide crossings. We show that a smoothing parameter, basis function width, can be used to eliminate small feature sizes with a small cost to device performance. The Levenberg-Marquardt update produces devices with larger bandwidth. A waveguide crossing with simulated performance values of > 60 dB cross power extinction ratio and > −0.08 dB through power over the 1500-1600 nm band is presented. A fabricated device is measured to have a maximum of −0.06 dB through power and a 50 dB cross power extinction ratio.

© 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

As the size and complexity of photonic circuits increase the cumulative effect of single-device losses becomes significant. This necessitates device optimization methods that take working designs and alter their shape such that insertion loss is reduced, or other metrics further improved. Traditionally this process was accomplished through simulation sweeps of a small set of parameters yielding a design optimized within that space.

More recently design optimization methods have been developed that greatly increase the number of parameters that can be used to represent a shape, and thus the ultimate performance ceiling [1]. These include genetic [2–6] and particle swarm algorithms [7–9], objective first [10–14], Bayesian optimization [15], and the adjoint methods [16–26]. Whereas the objective first method tends to produce devices that are hard to manufacture with photolithography, both genetic and particle swarm algorithms require running many simulations per design iteration and are computationally inefficient. While Bayesian optimization appears to be promising in terms of its convergence rate, here we choose to focus on the simpler adjoint method.

With the adjoint method only at most two simulations are needed per design update iteration, depending on structure symmetry. The adjoint method was originally developed to design mechanical structures [27] but has been adapted to design photonic devices. This method involves calculating a shape or topological derivative that is used in a gradient descent algorithm. While robust, the gradient descent algorithm is known to get stuck at local optima in the parameter space. A higher order alternative, better capable of navigating through local optima is Newton’s method which involves calculating the Hessian matrix. This matrix can be expensive to calculate, however, and methods using approximations to the Hessian, the so-called quasi-Newton methods, have been developed to mitigate this. In this work we will show how the use of the Fisher Hessian matrix in a Levenberg-Maquardt update algorithm can be used to yield improved broadband performance over gradient descent using a coplanar waveguide crossing as an example. As it is important to ensure that the shapes remain manufacturable with standard photolithographic techniques, we also introduce a shape smoothing parameter that comes with a small cost to final device performance.

## 2. Theory

#### 2.1 Shape gradient

Central to the optimization algorithm presented here is the calculation of a shape gradient along which direction the shape parameters must change to increase a given figure of merit (FOM). Consider the case where an object is illuminated with a continuous wave source in the three-dimensional domain Ω, shown in Fig. 1. The background permittivity of Ω is *ɛ _{1}* and the subdomain,

*Ψ*, has linear isotropic permittivity

*ɛ*, boundary ∂

_{2}*Ψ*, and local outward surface normal coordinate, $\mathop{r_n}\limits^{\rightharpoonup}$, representing a small change in the boundary position along the direction normal to it. The magnetic permeability throughout the domain

*Ω*is µ

_{0}. The FOM is defined across the subdomain

*Φ*. Typically, the FOM is defined as the overlap integral of the field from the forward source with a desired field profile: for instance, a waveguide mode. Two simulations need to be performed to calculate the adjoint shape gradient: a forward simulation with the forward source producing a field

*E*in

_{F}*Ω*and an adjoint simulation yielding a field distribution

*E*.

_{A}If ∂*Ψ* has been parameterized by a finite set of values, *p _{i}*, one can express the adjoint shape gradient as the following expression derived in the Appendix:

*k*and

*l*to account for discrete design frequencies and different FOM components (i.e. output power at different waveguide ports or different modes at the same port), respectively. The weights

*α*allow one to give more emphasis to a particular frequency or FOM component. The field vector

_{kl}*E*and

_{F||kl}*E*are the components of the forward and adjoint fields tangential to ∂

_{A||kl}*Ψ*, respectively. Analogously

*D*and

_{F⊥kl}*D*are the electric flux densities normal to the interface. The unit imaginary number is

_{A⊥kl}*j*. The factor

*φ*accounts for the fact that the adjoint source field profile is defined in terms of the derivative of the figure of merit with respect to the electric field. For waveguide devices with FOM’s equal to power at various ports

_{Fkl}*φ*, where

_{F}= a_{k}**a*is the complex conjugate of the field transmission coefficient into the forward propagating waveguide mode in question using the conjugated overlap integral derived from the Lorentz Reciprocity Theorem [17]. The mode profile used to perform the overlap must be the same, including phase, as the mode profile of the adjoint source, hence the

_{k}**m*superscript on the adjoint fields.

#### 2.2 Shape representation

In the context of planar waveguide circuits, it is convenient to represent ∂*Ψ* as a polygon. Since polygons are piecewise linear continuous shapes one can represent them in terms of locally-supported one-dimensional triangular basis functions [28] where a basis function is defined for each *x* and *y* coordinate over the polygon vertices. In this case the *x* and *y* coordinates themselves are simply the weights for the respective basis functions. The polygon representation of the shape is depicted in Fig. 2.

Figure 2(a) shows the triangular basis functions with the basis function corresponding to vertex 5 in green bold. Figures 2(b) and 2(c) give the x and y coordinate weights for the triangular basis functions, respectively, and yielding the ellipse shown in black with solid black vertices in Fig. 2(d).

The quantity $\partial \mathop{r_n}\limits^{\rightharpoonup} /\partial {p_i}$ in Eq. (1) remains to be evaluated. This can be accomplished by considering that the vector *∂p _{i}* represents a change in the polygon boundary to a new location. We can arbitrarily force

*∂p*to be parallel to $\partial \mathop{r_n}\limits^{\rightharpoonup} $, and of the same magnitude, since movements parallel to ∂

_{i}*Ψ*do not result in a change of shape. According to [16], using the level set representation of a shape, expanding a shape outward leads to a decrease in $\mathop{r_n}\limits^{\rightharpoonup} $ because $\mathop{r_n}\limits^{\rightharpoonup} $ is the gradient of the level set function evaluated on ∂

*Ψ*and this means that sign of $\partial \mathop{r_n}\limits^{\rightharpoonup} /\partial {p_i}$ should be negative.

In addition to $\partial \mathop{r_n}\limits^{\rightharpoonup} /\partial {p_i}$ we also include a weight function localized at each vertex, *W _{i}* to facilitate shape smoothing. We set each

*W*equal to the corresponding triangular basis functions shown in Fig. 2(a), like the representation of the shape itself. In the case of the weight functions, however, we allow their width to span a distance greater than one polygon vertex away from the

_{i}*i*vertex, which we call

^{th}*basis function width*(BFW). These weight functions are shown in Fig. 2(d) with the red-shaded weight function having a BFW of 1 polygon vertex on either side of the peak and the green-shaded weight function having a BFW of 3 vertices. BFW can also be expressed in terms of distance along the shape boundary away from the vertex. The peak value of the weight function is normalized such that the total area under the weight function is constant for all BFW’s.

Equation (1) can be rewritten with the weight function and definition for $\partial \mathop{r_n}\limits^{\rightharpoonup} /\partial {p_i}$:

*x*and

*y*coordinates separately, splitting on the

*p*into

_{i}*x*and

_{i}*y*. The factor

_{i}*n*is the

_{pi}*x*or

*y*unit normal component. By increasing the basis function width of

*W*, the integral in Eq. (2) is evaluated over a larger section of the polygon and thus adjacent polygon vertices tend to move with a greater degree of correlation as compared to the case where BFW = 1. Hence BFW acts like a low-pass smoothing filter. In the case of a two-dimensional planar polygon shape the double integral of Eq. (2) becomes a simple path integral.

_{i}#### 2.3 Optimization algorithms

The gradient of the FOM with respect to the shape parameters, specifically the coordinates of the polygon, is given by Eq. (2). Increasing the FOM using the shape parameters can be done simply by scaling the shape gradient by a *learning rate* (LR) and adding this to the existing shape parameters. Thus, where the gradient is larger the polygon points will shift by a larger amount. This simple procedure is called *gradient descent* (GD) and is a well-known and simple method for reaching an optimum. The challenge with this algorithm is the optimum that is obtained depends strongly on the starting point. The update equation for GD can be expressed as the following for the case of maximizing *F* with respect to $\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over p} $:

*N*x2 matrix of ordered polygon vertex coordinates, where

*N*is the number of polygon vertices, the first column of the matrix is the

*x*-coordinates and the second column is for the

*y*-coordinates.

*l*is the learning rate scalar factor, and ${\nabla _p}F$ is the gradient of the FOM to be maximized with respect to each vertex given by Eq. (1). For our as-implemented gradient descent optimization we have found learning rates in the range of 0.01 to 0.1 to give acceptable convergence rates.

Equation (3) represents the lowest order Taylor’s series approximation for *F* as a function of $\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over p} $ which allows linear convergence of *F* with successive recursive updates to $\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over p} $ followed by recalculation of ${\nabla _p}F$ at the new $\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over p} $. It is possible to take more terms from the Taylor’s expansion to get an update that follows a curved path which can give a more direct and quicker (in terms of number of iterations) optimization. In general, one can expand the FOM as a function of small deviations of ${\Delta _{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over p} }}$ with a Taylor series [29]:

*H*is the Hessian matrix which contains quadratic curvature information: If we want to include the next higher order terms we then can use the following procedure to derive the update equation. First, we ignore the cubic and higher order terms designated by

*O(n*. We then differentiate Eq. (4) with respect to the ${\Delta _{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over p} }}$ and set it to zero, enforcing the constraint that ${\Delta _{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over p} }}$ is a stationary point:

^{3})*H = H*) as an outer product of ${\nabla _p}F$ with itself, known as the Fisher matrix, with a summation of a scaled identity matrix to ensure invertibility [30]:

_{a}*I*denotes the identity matrix and the scaling parameter

*λ*≥ 0 is also known as the damping parameter and is used to tune the condition number of

*H*. We have found that values over a wide range between 10

_{a}^{−12}and 10

^{−4}to work well for the cases explored here. In addition, we still use a learning rate

*l*with values ranging between roughly 10

^{−12}to 10

^{−11} for LM.

## 3. Simulation

We use Lumerical FDTD Solutions with the Matlab interface to calculate the broadband fields and perform the calculation of ${\Delta _{\mathord{\buildrel{\lower3pt\hbox{$\scriptscriptstyle\rightharpoonup$}} \over p} }}$ and update the shape. The shape optimization script a performs the following steps. (1) In Lumerical, input parameters are parsed, and the simulation is initialized. (2) The forward simulation for the current iteration is performed in Lumerical. (3) If the device does not possess forward-backward symmetry the forward source and monitors are turned off and an adjoint simulation is performed. (4) The frequency domain field for the forward and adjoint simulations are sent to Matlab along with necessary input parameters. (5) Matlab interpolates the fields at the polygon vertices (using magnetic fields to ensure continuity near the boundary), calculates the shape gradient, updates the shape, removes self-intersections and re-interpolates the polygon to ensure that the polygon vertices are evenly spaced and to maintain the same spatial (linear) polygon vertex density. (6) Matlab returns the final shape to Lumerical which updates the vertices of the polygon and saves the latest performance parameters and polygon vertices to a file. (7) Continue at step 2 until target number of iterations or performance is reached.

As an example of a multiport device we demonstrate the optimization of a 90-degree co-planar waveguide crossing on a silicon-on-insulator (SOI) platform. It has two primary figures of merit to be extremized: the power straight through to be maximized and the cross power (here defined as total power into *both* cross waveguides) to be minimized. Other performance parameters of interest are the power reflected at the input port and the radiated power defined as the source power minus the sum of through, cross and reflected powers. The device is single-mode for each polarization at the input. Here we select the fundamental quasi-TE mode for optimization. The waveguide crossing structure is shown in Fig. 3. This structure has 8-fold symmetry and so an explicit adjoint simulation does not need to be performed as the adjoint fields are derived from the forward fields via in-plane rotation and applying the appropriate field symmetries.

Gradient-based optimization algorithms are sensitive to the initial starting conditions. To ensure good performance we initialize the crossing to consist of a single mode waveguide feeding into a short mode expansion taper (transitioning linearly from 400 nm to 1 µm in width) followed by a wider multimode waveguide to allow multimode interference to focus the guided power across the center unbounded section.

Here we use a 20 nm FDTD grid and optimize over 3 wavelengths at 1520, 1550 and 1580 nm with an in-plane polarized electric field (quasi-TE polarization). The input waveguides are 400 nm wide and the planar structure is either 230 or 250 nm thick. At each wavelength we set the FOM weights to be 1 for the through power and something in the range of −1.5 to 0 for the cross power which can be expressed as a vector, i.e. (1, −1.5) for the through and cross power. The simulation is run in 2D at with the effective index method used to model the core permittivity until the through power converges after which 3D iterations are performed. The polygon points are spaced nominally 5 nm apart.

## 4. Simulation results and discussion

#### 4.1 Shape smoothing by increasing basis function width

We begin by showing how increasing the basis function width can be used to improve the manufacturability of the optimized shapes albeit at the cost of some performance. Here BFW’s of 100, 200 and 300 nm are used with LM optimization and with *λ* = 1e-6 and *l* = 1e-12. The optimization is done in 2D and the waveguide thickness used is 250 nm. The FOM weights are 1 and −1.5 for the through and cross powers for the fundamental quasi-TE mode, respectively. Figure 4 shows the first-quadrant polygon shape profiles for the three cases, offset for clarity. Qualitatively speaking, there is a large increase in shape smoothness going from 100 to 200 nm BFW. This smoothing comes at the cost of performance, however.

Table 1 shows normalized performance parameters for the different BFW values, aggregated over the three design wavelengths 1520, 1550 and 1580 nm. When increasing the BFW from 100 to 300 nm the through power decreases from −0.02 to −0.07 dB and the radiated power increases from −24 dB to −18 dB. Both the cross power extinction ratio (P_{through}/P_{cross}) and radiated power have no clear trend with BFW’s up to 300 nm. The performance parameters in Table 1 compare well with previous works considering the design of waveguide crossings [8–9,32–42].

#### 4.2 Gradient descent and Levenberg-Marquardt convergence

After the initial large increase in performance the iterative optimization algorithms will slow down and eventually reach an asymptote where the optimum is achieved. Here we compare the convergence rates and optima for the GD and LM shape updates. The learning rate for GD was 0.02 and for LM the learning rate was 1e-12 and the *λ* parameter was 1e-10. The BFW was 250** **nm. As before, the design wavelengths were set to 1520, 1550 and 1580** **nm. A waveguide thickness of 230** **nm, corresponding to a structure undergoing oxidation smoothing after the patterning of the Si is used. The BFW is set to 250** **nm. The FOM weight vector is (1, -1.5) with the first component being the weight for the through power and the second component is the weight for the cross power (negative so that we can minimize it). The iterations are performed in 2D for >600 iterations until the asymptote is reached after which the simulations are performed in 3D. The normalized power results are shown in Fig. 5. As Fig. 5(a) shows LM has a higher through power asymptote of -0.08** **dB versus -0.13** **dB for the GD-optimized waveguide crossing. Similarly, Fig. 5(c) shows that the radiated power for LM is 2** **dB lower for than the GD-optimized crossing. The extinction ratio in Fig. 5(b) and the reflected power shown in Fig. 5(d) show that GD is improved over LM by 3 and 5** **dB, respectively. In general, for both the GD and LM methods, waveguide crossings with close to their maximum performance can be achieved in just over 200 2D iterations followed by roughly 100 3D iterations.

#### 4.3 Improved Device Bandwidth with Levenberg-Marquardt

We now examine the bandwidth performance of the GD and LM-optimized devices where we vary the FOM weight vector from (1, 0), (1, -1), and (1, -1.5). The simulation setup is the same as for the devices in Subsection 4.2. As before, > 600 2D iterations were run after which the simulations were performed in 3D for > 100 iterations. Here we systematically select the final 3D-optimized design by finding the maximum of an augmented FOM defined by a linear combination of the through power, and the extinction ratio normalized to its maximum value. This augmented FOM thus represents a scaled comparison of through power and extinction ratio. The weights of the linear combination are the respective absolute values of the FOM weight vector. The results are shown in Fig. 6. In Fig. 6(a) one can see that for the FOM weight combinations that LM-optimized crossings have a wider bandwidth (by roughly 10 nm on either side) and in addition for the combination of (1, -1.5) the peak through power for LM does not decrease as much as for GD. In Fig. 6(b) one can see that the extinction ratio increases dramatically when moving from FOM weights of (1, 0) to (1, -1) which is a validation that the compound FOM optimization is worthwhile. Here the bandwidth for LM and GD methods is maximized at FOM weights of (1, -1). The reflected power shown in Fig. 6(c) does not appear to be affected much by the optimization method, with the reflected power being mostly below -30 dB. The radiated power, shown in Fig. 6(d), is more broadband for LM as well. Figs. 6(a)–6(b) for the FOM weight of (1, -1) show that it is possible to have a crossing with > 60 dB crosstalk extinction ratio while maintaining > -0.08 dB through power from 1500 to 1600 nm free space wavelength.

## 5. Fabrication and testing

A waveguide crossing was LM optimized with 250 nm thick Si, FOM weights of (1, -1.5), learning rate of 1e-12 and a *λ* parameter of 1e-6. The BFW was set to 500 nm. The waveguide crossing devices were fabricated using Sandia’s CMOS-compatible silicon photonics process with 248 nm DUV photolithography [31]. Figure 7 shows a top-down scanning electron micrograph image of the resist profile of an optimized device with a total width of 10 µm.

The through and cross power of the optimized crossings were characterized experimentally. Power was coupled into the test structures at the edge of the chip using a tunable laser with lensed fibers in the wavelength range 1500-1635 nm. Device layout for the power spectrum characterization is shown in Fig. 8. The through power structure shown in Fig. 8(a) consists of different waveguide crossing arrays with varying numbers of crossings used to obtain the insertion loss per crossing. Figure 8(b) shows the cross power test layout. The output waveguides are jogged relative to the input waveguide position by 2 mm to enable high extinction ratios to be measured.

## 6. Measured vs. simulated results

The mask pattern of the polygon for the shape shown in Fig. 7 was biased by 35 nm on all sides by expanding the polygon 2.5 nm in 14 increments along the directions of the vertex normal vectors to account for feature shrinkage during photolithography and waveguide dry etch. These devices underwent oxidation smoothing [43], however, which is known to reduce the thickness of the Si waveguiding layer by 15-20 nm from 250 down to ∼230 nm thick. Simulated through and cross power spectra were obtained in 3D for a 250 nm thick waveguide while biasing the polygon shape profile, using the same routine as for the mask design, from -50 nm to 50 nm on each side. The measured and simulated spectra are shown in Fig. 9. The measured through power shown in Fig. 9(a) matches up well with the simulated spectrum with 0 nm shape bias. The measured cross power extinction ratio in Fig. 9(b) reaches 50 dB at 1600 nm and is a bit lower than the simulated 0 nm bias peak. This is consistent with the cross power extinction ratio being much more sensitive to manufacturing variations than the through power and is most likely caused by the unintended oxidation smoothing that this sample received which reduced the Si waveguide thickness from 250 nm to about 230 nm. This is in line with our simulations for the 250 nm design with a 230 nm Si waveguide thickness.

## 7. Conclusion

In this work we have presented an adjoint shape algorithm capable of a gradient descent or a Levenberg-Marquardt update. For the cases studied here, the Levenberg-Marquardt update produces waveguide crossings with broader band responses (in terms of through power, cross power extinction ratio and radiated power) with a larger through power.

The basis function width was shown to improve shape manufacturability by eliminating narrow features. Basis function widths of 200–300 nm are sufficient to remove most of the small features in shape optimized SiO_{2}-buried SOI waveguiding structures, with a small cost to device performance.

While the measured through power spectrum of a 230 nm thick fabricated device with + 35 nm pattern bias on all sides matched well the simulated device spectrum of 250 nm thick with 0 nm bias, the measured cross power extinction ratio spectrum was 20 dB lower at its peak than the simulated device and redshifted by 50 nm. This is caused by the cross power extinction ratio being more sensitive to manufacturing deviations and the waveguide thickness being 20 nm smaller than designed due to oxidation smoothing. Future designs could also average the compound figure of merit over different waveguide thicknesses to ensure the sensitivity due to thickness is decreased.

We have produced an adjoint shaped optimized design with a Levenberg-Marquardt update and FOM weights of (1, -1) for the through and cross power with > 60 dB extinction ratio and > -0.08 dB through power over the 1500 to 1600 nm wavelength range.

## Appendix

The treatment begins with the definition in Eq. (9) of the FOM, *F*, as an integral of a scalar function, *f*, over the domain Ω. Both *F* and *f* are real-valued functions of the forward complex electric field, *E _{F}*. Let ∂

*Ψ*be represented by a finite set of parameters,

*p*. Then the gradient of the FOM with respect to the

_{i}*p*is given in Eq. (10), applying the chain rule.

_{i}*p*:

_{i}*∂E*:

_{F}/∂p_{i}*∂J*is zero:

_{F}/∂p_{i}*∂f/∂E*. The superscript

_{F}*T*denotes the transpose operation.

*adjoint*(transposed) system that represents a simulation run backwards with the desired field profile as the source with amplitude

*∂f/∂E*. As mentioned previously typically the FOM is defined in terms of an overlap integral with a desired mode field profile. The amplitude of the adjoint source thus has a complex phase coefficient associated with it here called

_{F}*φ*. This definition of the phase factor requires us to generate

_{F}*E*

_{A}^{m}using the same mode profile used to calculate the FOM at this iteration.

*µ*=

*µ*.

_{0}*∂Ψ*changes in the direction of the surface normal. An expansion or contraction at a particular point along

*∂Ψ*corresponds to a change in the permittivity at that point and thus we can apply the chain rule. In addition, the integrand is nonzero only along

*∂Ψ*. Using Eq. (22) in Eq. (20) yields the following:

*∂Ψ*one can split

*dv*into $d\mathop{r_n}\limits^{\rightharpoonup} dA$, where $\mathop{r_n}\limits^{\rightharpoonup} $ is the local surface normal coordinate:

*p*values along which

_{i}*F*increases maximally. For each iteration of an optimization algorithm one can scale the gradient vector and deform the shape according to the definition of the

*p*shape parameters. At each iteration one simply needs to run a forward and then an adjoint, backward, simulation to get the necessary information. In the case where the shape is two-dimensional, for instance for planar waveguide circuits where the effective index method has been applied, the surface integral of Eq. (25) becomes a path integral around the shape.

_{i}Finally, if one desires to have a compound FOM across multiple frequencies or output profiles one can simply average over them in a weighted fashion. This is possible because of the linearity of the shape gradient operator:

*k*sums over the wavelength and the index

*l*sums over the components of the figure of merit. Note that in Eq. (27) we have lumped the factor of 2 and the frequencies

*ω*into the scalar weight parameters

_{k}*α*. Note that Eq. (27) describes the gradient of the FOM and gives the direction along which the FOM increases maximally. If our FOM was designed to decrease we would want to move in the opposite direction, corresponding to negative

_{kl}*α*values.

_{kl}## Acknowledgments

The authors would like to thank Jonathan Cox and Jonathan Sterk for fruitful discussions on this topic.

## Disclosures

The authors declare that there are no conflicts of interest related to this article.

Sandia National Laboratories is a multi-mission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government.

## References

**1. **S. Molesky, Z. Lin, A. Piggott, W. Jin, J. Vuckovic, and A. Rodriguez, “Inverse design in nanophotonics,” Nat. Photonics **12**(11), 659–670 (2018). [CrossRef]

**2. **J. Jiang, J. Cai, G. P. Nordin, and L. Li, “Parallel microgenetic algorithm design for photonic crystal and waveguide structures,” Opt. Lett. **28**(23), 2381–2383 (2003). [CrossRef]

**3. **E. Kerrinckx, L. Bigot, M. Douay, and Y. Quiquempois, “Photonic crystal fiber design by means of a genetic algorithm,” Opt. Express **12**(9), 1990–1995 (2004). [CrossRef]

**4. **L. Sanchis, A. Håkansson, D. López-Zanón, J. Bravo-Abad, and J. Sánchez-Dehesa, “Integrated optical devices design by genetic algorithm,” Appl. Phys. Lett. **84**(22), 4460–4462 (2004). [CrossRef]

**5. **B. West and S. Honkanen, “MMI devices with weak guiding designed in three dimensions using a genetic algorithm,” Opt. Express **12**(12), 2716–2722 (2004). [CrossRef]

**6. **G. Roelkens, D. Van Thourhout, and R. Baets, “High efficiency silicon-on-insulator grating coupler based on a poly-Silicon overlay,” Opt. Express **14**(24), 11622–11630 (2006). [CrossRef]

**7. **J. Robinson and Y. Rahmat-Samii, “Particle swarm optimization in electromagnetics,” IEEE Trans. Antennas Propag. **52**(2), 397–407 (2004). [CrossRef]

**8. **Y. Ma, Y. Zhang, S. Yang, A. Novack, R. Ding, A. Lim, G.-Q. Lo, T. Baehr-Jones, and M. Hochberg, “Ultralow loss single layer submicron silicon waveguide crossing for SOI optical interconnect,” Opt. Express **21**(24), 29374–29382 (2013). [CrossRef]

**9. **Y. Ma, Y. Liu, R. Ding, T. Baehr-Jones, P. Magill, H. Guan, A. Gazman, Q. Li, K. Bergman, and M. Hochberg, “Optimized silicon photonics components for high-performance interconnect systems,” in IEEE Photonics Conference (IPC), Reston, VA, pp. 353–354 (2015).

**10. **J. Lu and J. Vuckovic, “Inverse design of nanophotonic structures using complementary convex optimization,” Opt. Express **18**(4), 3793–3804 (2010). [CrossRef]

**11. **J. Lu and J. Vuckovic, “Objective-first design of high-efficiency, small-footprint couplers between arbitrary nanophotonic waveguide modes,” Opt. Express **20**(7), 7221–7236 (2012). [CrossRef]

**12. **J. Lu and J. Vuckovic, “Nanophotonic computational design,” Opt. Express **21**(11), 13351–13367 (2013). [CrossRef]

**13. **A. Piggott, J. Lu, T. Babinec, K. Lagoudakis, J. Petykiewicz, and J. Vuckovic, “Inverse design and implementation of a wavelength demultiplexing grating coupler,” Sci. Rep. **4**(1), 7210 (2015). [CrossRef]

**14. **A. Piggott, J. Lu, K. Lagoudakis, J. Petykiewicz, T. Babinec, and J. Vuckovic, “Inverse design and demonstration of a compact and broadband on-chip wavelength demultiplexer,” Nat. Photonics **9**(6), 374–377 (2015). [CrossRef]

**15. **P.-I. Schneider, X. Santiago, V. Soltwisch, M. Hammerschmidt, S. Burger, and C. Rockstuhl, “Benchmarking five global optimization approaches for nano-optical shape optimization and parameter reconstruction,” https://arxiv.org/abs/1809.06674.

**16. **O. Miller, PhD thesis (2012), University of California at Berkeley, http://optoelectronics.eecs.berkeley.edu/ThesisOwenMiller.pdf

**17. **C. Lalau-Keraly, S. Bhargava, O. Miller, and E. Yablonovitch, “Adjoint shape optimization applied to electromagnetic design,” Opt. Express **21**(18), 21693–21701 (2013). [CrossRef]

**18. **A. Niederberger, D. Fattal, N. Gauger, S. Fan, and R. Beausoleil, “Sensitivity analysis and optimization of sub-wavelength optical gratings using adjoints,” Opt. Express **22**(11), 12971–12981 (2014). [CrossRef]

**19. **S. Bhargava and E. Yablonovitch, “Multi-objective inverse design of sub-wavelength optical focusing structures for heat assisted magnetic recording,” Proc. SPIE 9201, Optical Data Storage 2014, 92010M (5 September 2014).

**20. **A. Piggott, J. Petykiewicz, L. Su, and J. Vuckovic, “Fabrication-constrained nanophotonic inverse design,” Sci. Rep. **7**(1), 1786 (2017). [CrossRef]

**21. **A. Michaels and E. Yablonovitch, “Inverse design of near unity efficiency perfectly vertical grating couplers,” Opt. Express **26**(4), 4766–4779 (2018). [CrossRef]

**22. **A. Michaels and E. Yablonovitch, “Leveraging continuous material averaging for inverse electromagnetic design,” Opt. Express **26**(24), 31717–31737 (2018). [CrossRef]

**23. **N. Sapra, D. Vercruysse, L. Su, K. Yang, J. Skarda, A. Piggott, and J. Vuckovic, “Inverse design and demonstration of broadband grating couplers,” IEEE J. Sel. Top. Quantum Electron. **25**(3), 1–7 (2019). [CrossRef]

**24. **L. Frellsen, Y. Ding, O. Sigmund, and L. Frandsen, “Topology optimized mode multiplexing in silicon-on-insulator photonic wire waveguides,” Opt. Express **24**(15), 16866–16873 (2016). [CrossRef]

**25. **J. Jensen and O. Sigmund, “Topology optimization for nano-photonics,” Laser Photonics Rev. **5**(2), 308–321 (2011). [CrossRef]

**26. **C. Lalau-Keraly, “Continuous optimization wrapper for Lumerical” (GitHub, 2019). https://github.com/chriskeraly/lumopt.

**27. **M. Giles and N. Pierce, “An introduction to the adjoint approach to design,” Flow, Turbul. Combust. **65**(3/4), 393–415 (2000). [CrossRef]

**28. **K. Okamoto, * Fundamentals of Optical Waveguides*, 2

^{nd}Ed., (Academic Press, 2005), Chap. 6.

**29. **J. Martens, “Deep learning via Hessian-free optimization,” in Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel, pp. 735–742 (2010). http://www.cs.toronto.edu/∼jmartens/docs/Deep_HessianFree.pdf

**30. **H. Yu and B. Wilamowski, * Industrial Electronics Handbook, vol. 5 – Intelligent Systems*, 2nd Edition (CRC Press, 2011), Chap. 12.

**31. **A. Lentine, C. DeRose, P. Davids, N. Martinez, W. Zortman, J. Cox, A. Jones, D. Trotter, A. Pomerene, A. Starbuck, D. Savignon, T. Bauer, M. Wiwi, and P. Chu, “Silicon photonics platform for national security applications,” in IEEE Aerospace Conference, Big Sky, MT, (2015). [CrossRef] .

**32. **S. Johnson, C. Manolatou, S. Fan, P. Villeneuve, J. Joannopoulos, and H. Haus, “Elimination of cross talk in waveguide intersections,” Opt. Lett. **23**(23), 1855–1857 (1998). [CrossRef]

**33. **H. Liu, H. Tam, P. Wai, and E. Pun, “Low-loss waveguide crossing using a multimode interference structure,” Opt. Commun. **241**(1-3), 99–104 (2004). [CrossRef]

**34. **M. Popovic, E. Ippen, and F. Kärtner, “Low-loss Bloch waves in open structures and highly compact, efficient Si waveguide-crossing arrays,” in *IEEE LEOS Ann. Mtg.*, Lake Buena Vista, FL, pp. 56–57 (2007).

**35. **W. Bogaerts, P. Dumon, D. Van Thourhout, and R. Baets, “Low-loss, low-crosstalk crossings for silicon-on-insulator nanophotonic waveguides,” Opt. Lett. **32**(19), 2801 (2007). [CrossRef]

**36. **P. Bock, P. Cheben, J. Schmid, J. Lapointe, A. Delage, D.-X. Xu, S. Janz, A. Densmore, and T. Hall, “Subwavelength grating crossings for silicon wire waveguides,” Opt. Express **18**(15), 16146–16155 (2010). [CrossRef]

**37. **C.-H. Chen and C.-H. Chiu, “Taper-integrated multimode-interference based waveguide crossing design,” IEEE J. Quantum Electron. **46**(11), 1656–1661 (2010). [CrossRef]

**38. **A. Jones, C. DeRose, A. Lentine, D. Trotter, A. Starbuck, and R. Norwood, “Ultra-low crosstalk, CMOS compatible waveguide crossings for densely integrated photonic interconnection networks,” Opt. Express **21**(10), 12002–12013 (2013). [CrossRef]

**39. **H.-L. Han, H. Li, X.-P. Zhang, A. Liu, T.-Y. Lin, Z. Chen, H.-B. Lu, M.-H. Lu, X.-P. Liu, and Y.-F. Chen, “High performance ultra-compact SOI waveguide crossing,” Opt. Express **26**(20), 25602–25610 (2018). [CrossRef]

**40. **Z. Yu, A. Feng, X. Xi, and X. Sun, “Inverse-designed low-loss and wideband polarization-insensitive silicon waveguide crossing,” Opt. Lett. **44**(1), 77–79 (2019). [CrossRef]

**41. **Y. Liu, J. Shainline, X. Zeng, and M. Popovic, “Ultra-low-loss CMOS-compatible waveguide crossing arrays based on multimode Bloch waves and imaginary coupling,” Opt. Lett. **39**(2), 335–338 (2014). [CrossRef]

**42. **Y. Zhang, A. Hosseini, X. Xu, D. Kwong, and R. Chen, “Ultralow-loss silicon waveguide crossing using Bloch modes in index-engineered cascaded multimode-interference couplers,” Opt. Lett. **38**(18), 3608–3610 (2013). [CrossRef]

**43. **K. Lee, D. Lim, L. Kimerling, J. Shin, and F. Cerrina, “Fabrication of ultralow-loss Si/SiO_{2} waveguides by roughness reduction,” Opt. Lett. **26**(23), 1888–1890 (2001). [CrossRef]

**44. **K. Warnick, R. Selfridge, and D. Arnold, “Teaching electromagnetic field theory using differential forms,” IEEE Trans. Educ. **40**(1), 53–68 (1997). [CrossRef]

**45. **S. Johnson, M. Ibanescu, M. Skorobogaitiy, O. Weisberg, J. Joannopoulos, and Y. Fink, “Perturbation theory for Maxwell’s equations with shifting material boundaries,” Phys. Rev. E **65**(6), 066611 (2002). [CrossRef]