Over the past decades, photonics has transformed many areas in both fundamental research and practical applications. In particular, we can manipulate light in a desired and prescribed manner by rationally designed subwavelength structures. However, constructing complex photonic structures and devices is still a time-consuming process, even for experienced researchers. As a subset of artificial intelligence, artificial neural networks serve as one potential solution to bypass the complicated design process, enabling us to directly predict the optical responses of photonic structures or perform the inverse design with high efficiency and accuracy. In this review, we will introduce several commonly used neural networks and highlight their applications in the design process of various optical structures and devices, particularly those in recent experimental works. We will also comment on the future directions to inspire researchers from different disciplines to collectively advance this emerging research field.
© 2021 Chinese Laser Press
Novel optical devices consisting of elaborately designed structures have become an extremely dynamic and fruitful research area because of their capability of manipulating light flow down to the nanoscale. Thanks to the advanced numerical simulation, fabrication, and characterization techniques, people are able to design, fabricate, and demonstrate dielectric and metallic micro- and nano-structures with sophisticated geometries and arrangements. For instance, metamaterials and metasurface comprising subwavelength structures, called meta-atoms, can show extraordinary properties beyond those of natural materials . Many metadevices have been reported that offer enormous opportunities for technology breakthroughs in a wide range of applications including light steering [2–5], holography [6–9], imaging [10–14], sensing [15–17], and polarization control [18–21].
At present, we can handle most of the photonic design problems by accurately solving Maxwell’s equations using numerical algorithms such as the finite element method (FEM) and finite-difference time-domain (FDTD) method. However, those methods often require plenty of time and computational resources, especially when it comes to the inverse design problem aiming to retrieve the optimal structure from target optical responses and functionalities. In the conventional procedure, we normally start with full-wave simulations of an initial design based on the empirical knowledge and then adjust the geometric/material parameters iteratively to approach the customer-specific requirements. Such a trial-and-error process is time consuming, even for most experienced researchers. The initial design strongly relies on our experience and cognition, and usually some basic structures are chosen, including split-ring resonators [22,23], helix , cross , bowtie , L-shape , and H-shape [27,28] structures. Although it is known that a specific type of structures can produce a certain optical response (e.g., strong magnetic resonance from split-ring resonators and chiroptical response from helical structures), sometimes the well-established knowledge may limit our aspiration to seek an entirely new design that is suitable for the same applications or even more complicated ones when the traditional approach is not applicable.
Artificial neural networks (ANNs) provide a new and powerful approach for photonic designs [29–37]. ANNs can build an implicit relationship between the input (i.e., geometric/materials parameters) and the output (i.e., optical responses), mimicking the nonlinear nerve conduction process in the human body. With the help of well-trained ANNs, we can bypass the complicated and time-consuming design process that heavily relies on numerical simulations and optimization. The functions of most ANN models for photonic designs are twofold: the forward prediction and inverse design. The forward prediction network is used to determine the optical responses from the geometric/material parameters, and it can serve as a substitute for full-wave simulations. The inverse design network aims to efficiently retrieve the optimal structure from given optical responses, which is usually more important and challenging in the design process. One main advantage of the ANN models is the speed. For example, producing the spectrum of a meta-atom from a well-trained forward prediction model only takes a few milliseconds, orders of magnitude faster than typical full-wave simulations based on FEM or FDTD [38–40]. In the meantime, the accuracy of the ANN models is comparable with rigorous simulations. For instance, the mean squared loss of spectrum prediction is typically on the order of to [40,41]. Moreover, ANNs can unlock the nonintuitive and nonunique relationship between the physical structure and the optical response, and hence potentially enlighten the researchers with an entirely new class of structures.
Solving the photonic design problem by ANNs is a data-driven approach, which means a large amount of training sets with both geometric/material parameters and optical responses are needed. Once the ANN model works well on the training data set, it can be tested on a test set or real problem. The test and training data sets should be in the same design framework but contain completely different data. The general workflow for a forward prediction network includes four steps. First, a large number of input structures and output optical responses are generated from either simulations or experiments. In most of the published works, the amount of data is on the order of . It is noted that the performance of the neural networks depends on both the size and quality of data. To improve the quality of training data, some researchers have applied rule-based optimization methods in the generation of initial training data  or attempted to progressively increase the dimension of the training data with the new ones from the trained model . Then we design the ANNs with a certain network structure, such as fully connected layers (FCLs)-based neural networks or convolutional neural networks (CNNs). Next, the training data set is fed into the network, and we optimize the weight and bias for each node. Finally, the well-trained ANNs can be used to predict the response of other input structures that are outside the training and test data sets. As for the inverse design problem, one can simply reverse the input and output and use a similar network structure. However, for some problems, it requires complex methods and algorithms.
This review is devoted to the topic of designing photonic structures and devices with ANNs. We will focus on very recent works on this topic, especially the experimental demonstrations, after introducing the widely used ANNs. The remaining part of the review is organized as follows. In Section 2, we will discuss the basic FCLs and their application in the prediction of design parameters. Then, in Section 3, we will focus on the CNNs that are used in the retrieval of much more complicated structures described by pixelated images. In Section 4, other useful and efficient hybrid algorithms by combining deep learning and conventional optimization methods for photonic design will be discussed. In the last section, we will conclude the review by discussing the achievements, current challenges, and outlooks in the future.
2. PHOTONIC DESIGN BY FULLY CONNECTED NEURAL NETWORK
A. Introduction of FCLs
In the nervous system, the electric signals and information are transited by neurons. Figure 1(a) is the schematic illustration of a neuron, in which the main components include the dendrite, cell body, and axon. The dendrites receive and integrate signals from other neurons. Once the signal is strong enough, the cell is activated and then passes the signal to the next neuron through the axon. In analogy to biological neurons, spiking neural networks (SNNs) were introduced decades ago [44,45]. In SNNs, not all neurons are activated at each propagation loop. Only when the action potential, which mimics the membrane potential, reaches a certain value will the neurons transmit information to the next neuron. The FCLs, also called dense layers, are mathematically simplified structures in comparison with the SNNs, in which any neurons in adjacent layers are connected as shown in Fig. 1(b). The FCLs-based network consists of an input layer, hidden layers, and an output layer. Each layer receives the input from the preceding layer and combines the signal with a tensor operation that takes the weights and biases as learnable parameters. This is a purely linear process. The combination of linear calculation of all hidden layers is still linear, and it is not able to build complex relationships between the input and output. In this context, a nonlinear “activation” process is essential for the neurons. Therefore, the activation function should be a nonlinear function and properly selected so that the ANNs can approximate arbitrary functions. The Sigmoid, tanh and ReLU functions plotted in Fig. 1(c) are three commonly used activation functions. The Sigmoid and tanh functions map the input into (0,1) range. When the input is extremely small (or extremely large), the gradient of the two functions vanishes, which is not favorable for the ANNs. The ReLU is widely used to overcome this issue in the positive range, and it is not computationally expensive compared to Sigmoid and tanh functions.
The training process of the fully connected neural network is quite straightforward. The training set contains an input vector and an output vector ( can be a vector of complex/real values for regression problems or vector of discrete integers as labels for classification problems). The performance of the model is highly dependent on the quantity and quality of the training data set. During the training process, the network first takes the vector as input and calculates the output through the tensor operation and activation from left to right. Then a loss function (or cost function) is defined and needs to be minimized in order to calculate the performance of the neural network. For instance, we can use mean-squared-error (MSE) for regression problems and cross-entropy loss for classification problems. The next step, the backpropagation of error, is the most critical part of ANNs. In the ANN, there are a series of learnable parameters to be optimized, i.e., the weight and bias of each layer. We can then derive the partial derivative of the loss with respect to each parameter . To calculate those values, we need to apply the chain rule layer by layer from the end of the ANN to the front. This is why the process is called “backpropagation.” Finally, all the parameters are optimized by the stochastic gradient descent method:
B. Design Parameterized Structure by FCLs-Based ANNs
FCLs have been extensively adopted to design optical devices, especially in the field of metasurface and nanostructure design. In early 2018, D. Liu et al. introduced, for the first time, a tandem network architecture for the inverse design problem . There is one fundamental challenge in training ANNs for inverse design, arising from the fact that very similar optical responses may be achieved by different structures. Such nonunique one-to-many mapping makes the neural network hard to converge if conflicting instances with almost the same optical responses but different geometric labels exist in the training data set. Mathematically, the gradient of the functions to be approximated by the ANNs is extremely large at this data point. To tackle this challenge, the authors proposed a network structure consisting of a pretrained forward model and inversed design FCLs, which is illustrated in the top panel of Fig. 2(a). The network structure avoids direct comparison between the retrieved geometric parameters. Instead, it compares the predicted spectra of the retrieved structures. Therefore, the prediction of the network will converge to only one structure that can satisfy the required spectra, solving the one-to-many problem in the inverse design. The authors used the tandem neural network to design dielectric multilayers composed of and . The results are plotted in the bottom panel of Fig. 2(a), in which the transmission spectra of the retrieved structure (green dashed line) can well match the desired Gaussian-shaped spectra (blue solid lines).
Subsequent works have further confirmed the good performance of the tandem network architecture. For instance, S. So et al. used a similar ANN structure to design core-shell structures (with three layers) that support strong electric and magnetic dipole resonances . The ANN was built to learn the correlation between the extinction spectra and core-shell nanoparticle designs, including the material information and shell thicknesses. In Fig. 2(b), the predicted (open circles) extinction cross sections of the electric dipole (red) and magnetic dipole (black) of core-shell nanoparticles are compared with the target responses (solid lines). It is clear that both the electric dipole and magnetic dipole spectra of the designed core-shell nanoparticles fit well with the expectations. J. Peurifoy et al. also studied the inverse design with ANNs for multilayered particles (up to eight layers), with a focus on the scattering spectra . The FCLs were used in both forward prediction of scattering cross-section spectra and the inverse design from the spectra. Using a model trained with 50,000 training data, they can achieve a mean relative error of around 1%. One example is shown in the top panel of Fig. 2(c), in which the result from the neural network is compared with numerical nonlinear optimization as well as the desired spectra. The comparison demonstrates that the neural network model performs better in this design problem. Moreover, the running time of the ANNs-aided inverse design is shortened by more than 100 times in comparison with full-wave simulation as demonstrated in the bottom panel of Fig. 2(c). This result clearly shows the advantage of ANNs in terms of efficiency.
Besides the tandem network, other approaches have been introduced to improve the performance of the FCLs-based neural network. In 2019, Y. Chen et al. employed an adaptive batch-normalized (BN) neural network, targeting the smart and quick design of graphene-based metamaterials as illustrated in the top panel of Fig. 2(d) . Specifically, a layer using an adaptive BN algorithm is placed before each hidden layer to overcome the limitation of BN in small sampling spaces. In the adaptive BN network, it takes activation of each neuron in a minibatch B, batch normalization parameters , , and adaptive parameters , as the inputs. The outputs of the system are the new activation for each neuron. The authors tested their method by deriving the thickness of each layer in the structures. Prediction accuracy of over 95% was achieved. The bottom panel of Fig. 2(d) plots the optical responses of two different examples with varied absorbance in graphene, showing excellent accordance between the target and design responses.
In parallel, T. Qiu et al. proposed a new method, named REACTIVE, to conduct the inverse design based on reflection spectra . The authors applied this method to inversely design a metasurface whose unit cell can be described as a matrix of as shown in the left panel of Fig. 3(a). The input data sets are preprocessed by Gaussian smoothing and then transformed by a discrete cosine transform that can be modeled as3(a), the right panel shows the results from REACTIVE, including S-parameter (i.e., reflection coefficient) and absorptivity, which perfectly match the design targets.
Due to the data-driven nature of deep learning, the performance of a well-trained ANN highly relies on the training set, and the prediction loss is likely to increase as the inputs deviate from the training set. Therefore, a challenge in the deep-learning-aided inverse design lies in extending the capability of ANNs to an alternated data set that is very different from the training data. Usually, one needs to generate an entirely new training set for similar but different physical scenarios. In this context, reducing the demand for computational data is an efficient way to accelerate the training of deep learning models. Y. Qu et al. proposed a transfer learning method, which is schematically illustrated in Fig. 3(b), to migrate knowledge under different physical scenarios . The prediction accuracy is significantly improved, even with a much smaller data set for new tasks. Two sets of ANNs are involved in this work. The first one, named BaseNet, is trained with initial data. The second one, called TransferNet, copies the first layers from the BaseNet, and the entire system is fine-tuned simultaneously. The authors first transferred the spectra prediction task from a 10-layer film to an 8-layer film, where the source and target task were trained with 50,000 and 5000 examples, respectively. Comparing to direct learning, the result is good enough since the error drops when increases as shown in Fig. 3(c). The TransferNet is applicable for different structures, ranging from multilayer nanoparticles to multilayer films. Based on the model, a multitask learning scheme was studied, which combined the learning for multiple tasks at the same time. It was shown that the neural network in conjunction with the transfer learning method can produce more accurate predictions.
The FCLs have also been utilized in reinforcement learning [50–53], which is another hot area of machine learning, for the inverse design problem. Reinforcement learning has already achieved great performance in robotics, system control, and game-playing (AlphaGo). Instead of predicting the optimized geometry, the ANNs in reinforcement learning behave as an iterative optimization method. In each step, an action to optimize the geometry parameters is predicted. For instance, the action can be increasing or decreasing several parameters by a certain value. The advantage of this approach is that it can be adaptive to specific problems, and it can provide guidance for conventional trial-and-error optimization methods.
People have devoted experimental efforts in conjunction with the development of general models and algorithms using ANNs. For example, I. Malkiel et al. experimentally demonstrated that a deep neural network trained with thousands of synthetic experiments can retrieve subwavelength meta-atoms from far-field measurements and address the inverse design problem . In their work, the training of the inverse network to predict the structure based on the transmission spectra served as the first step. The material properties were also considered additional inputs. The second step was to train the direct network for forward prediction on top of the first network as shown in the left panel of Fig. 4(a). A significant and encouraging improvement in accuracy was noted when using eight joint layers. Based on the far-field spectra and the developed neural networks, the authors were able to derive the geometries of nanostructures. They achieved great agreement between the desired spectra and simulated spectra of the fabricated samples as shown in the right panel of Fig. 4(a).
In addition to spectrum prediction [55,56], the FCLs-based ANNs have also been used in the inverse design to realize other functionalities and benefit real-world applications [57–62]. Holographic images, for example, can be optimized by ANNs to achieve a wide viewing angle and three-dimensional vectorial field as recently demonstrated by H. Ren et al. . They used a network named multilayer perceptron ANN (MANN), which was composed of an input layer fed with an arbitrary three-dimensional (3D) vectorial field, four hidden layers, and an output layer for the synthesis of a two-dimensional (2D) vector field. There are 1000 neurons within each hidden layer. The scheme of this ANN is shown in the top left panel of Fig. 4(b). The authors showed that an arbitrary 3D vectorial field can be achieved with a 2D vector field predicted by the well-trained model. A 2D Dirac comb function was then applied to sample the desired image. Subsequently, digital holography, calculated from the desired image, was combined with the 2D vector field. This process can be visualized in the right panel of Fig. 4(b). With a split-screen spatial light modulator that independently controls the amplitude and phase orthogonal circularly polarized light, any desired 2D vector beam can be generated. As a result, the experimentally measured image from the hologram can show four different 3D vectorial fields in different regions as presented in the bottom left panel of Fig. 4(b). The authors experimentally realized an ultrawide viewing angle of 94° and high diffraction efficiency of 78%. The demonstrated 3D vectorial holography opens avenues to widespread applications such as holographic display as well as multidimensional data storage, machine learning microscopy, and imaging systems.
Another exciting work enabled by ANNs is a self-adaptive cloak that can respond within milliseconds to ever-changing incident waves and surrounding environments without human intervention . A pretrained ANN was adopted to achieve the function. As schematically illustrated on the left panel of Fig. 4(c), at the surface of the cloak, a single layer of active meta-atoms was applied, and the reflection spectrum of each varactor diode was controlled by DC bias voltage independently. To achieve the invisibility cloak function, the bias voltage was determined by the pretrained ANN with the incident wave characteristics (such as the incident angle, frequency, and reflection amplitude) as the input. The temporal response of the cloak was simulated, and an extremely fast transient response of 16 ms can be observed in the simulation. The authors then conducted the experiment, where a p-polarized Gaussian beam illuminated at an angle on a chameleon object covered by the cloak. Two detectors were used to extract the signals from the background and the incident wave to characterize the cloak. The right panel of Fig. 4(c) shows the experimental results at two incident angles (9° and 21°) and two frequencies (6.7 and 7.4 GHz). The magnetic field distribution in the case of a cloaked object is similar to that when only the background is present, while it is distinctly different from the bare object case. Differential radar cross-section (RCS) measurement further confirmed the performance of the cloak.
3. RETRIEVE COMPLEX STRUCTURES BY CONVOLUTIONAL NEURAL NETWORKS
A. Introduction of CNNs
The desired designs and structures are oftentimes hard to parameterize, especially when the structure of interest contains many basic shapes [41,65] or is freeform [66,67]. In some cases, we need to deal with complex optical responses as the input . Therefore, converting the structure to a 2D or 3D image is usually a good approach in these studies. Moreover, it can offer much larger degrees of freedom in the design process. However, preprocessing is required to handle the image input if we still want to use the FCLs-based model. Reshaping the image to a one-dimensional vector and applying feature extraction with linear embeddings, such as principal component analysis and random projection, are two effective ways to preprocess the image so that the input is compatible with the FCLs. However, the performance is usually not satisfactory. The reason is that these conversions will either break down the correlation of the nearest pixels in the vertical direction within an individual image or miss part of the information describing the integrality of the whole image. An extremely large dimension of the input is another big issue, which will increase the number of connections between layers quadratically. For conventional parameter input, the input dimension is usually a few tens or hundreds, while for a vectorized image, even an image with pixels will result in a 4096-dimensional input vector. CNNs are very suitable to deal with such circumstances. CNNs accept an image input without preprocessing, and then several filters move along the horizontal and vertical directions of the image to extract different features. Each filter has a certain weight to perform a convolutional operation at each subarea of the image, that is, the summation of the pointwise multiplication between the value of the subarea and the weight of the filter.
To explain the function of CNNs in detail, let us assume an input . Here is the number of channels of an image, while and are the number of pixels in horizontal and vertical directions, respectively. For binary or gray images , and for RGB images . Then each CNN consists of a weight tensor that has filters with the dimension , meaning each filter is built with channels of an matrix (usually a or matrix is used). The CNN is normally built with three operations, including convolution, activation, and pooling (sometimes a batch normalization will be added). Figure 5(a) illustrates the convolution operation (consider ). Each filter is initially placed on the top left subarea of each image. The pointwise multiplication of the two matrices is calculated and summed to a single value in the output image. Then the filter moves a certain number of pixels (known as “stride”) and repeats the process until the whole image is mapped to the output. The dimension of the output is usually smaller than that of the input. However, the output dimension can be easily tuned by adding paddings to the input images, which expand the dimension of the input image with zero pixels. In this example where one round of padding is added, the output image will have the same dimension as the input (stride equals 1, and the filter dimension is ). The activation function plays a significant role in the CNNs for the same reason as FCLs, and we can choose similar functions as previously mentioned. A pooling layer helps to reduce the dimension of the image. It usually maps a (or ) area in the input to a single value in the output according to the maximum or mean value of the four (or nine) values, as represented in Fig. 5(b). The entire workflow for conventional CNNs is shown in Fig. 5(c). The inputs are several images, and each represents a certain design of structure. The inputs pass through layers of CNN with three operations, and the size of the tensor gradually shrinks while the number of channels expands. The output now becomes a 1D vector. It can be regarded as the features extracted from the image, and the features are fed into the FCLs to predict the final output that is related to the optical response. The MSE and cross-entropy loss discussed in the previous section can also serve as the loss function in many cases of CNNs. The loss calculated by comparing the predicted and true response undergoes backpropagation through all layers to update the parameters. We want to emphasize that other loss functions, such as Kullback–Leibler divergence  and mean absolute error , can also be used in ANNs, depending on the physical constraints and the expected functions of the ANNs.
B. Design Complex Photonic Structures by CNNs
The CNNs have greatly expanded the design space of the possible structures that one can explore. For example, plasmonic structures have been extensively studied over the past decades, due to their unique features in optics and photonics and far-reaching impacts on other disciplines [70–74]. By carefully designing the geometry and composite materials, we can confine light into a sub-10 nm dimension with the local field amplified by 10–1000 times at the resonant wavelengths. Therefore, building a relationship between the design of the plasmonic structure and the corresponding optical responses is of great interest. In the work of I. Sajedian et al. published in 2019, the authors combined the CNNs with the recurrent neural networks (RNNs) to predict the absorption spectra of complex plasmonic structures in the near-infrared region . The CNNs helped to extract the features from the pixelated structures, and the RNNs with gated recurrent unit layers were used to predict the spectra. The model showed an MSE loss lower than when training with 100,000 data. The authors have also examined the output after each layer to investigate how the higher-level features could be extracted as the model goes deeper. In the same year, S. So et al. reported the use of conditional deep convolutional generative adversarial networks (cDCGANs) to retrieve silver plasmonic structures with six basic shapes, such as circle, square, and cross, from given reflection spectra under linearly polarized illumination . The generative adversarial networks (GANs) consist of a generator network and a discriminator network [65,75]. The training process for the GANs-based model is a competition between the generator and the discriminator. The generator generates structures from the input spectrum and a noise vector, trying to fool the discriminator that the generated structure is a rational structure according to the knowledge learned from the training set. The noise vectors are sampled from a conditional distribution, which is dependent on the prescribed spectra in this case. The discriminator tries to discriminate the “fake” structures generated from the generator and the “true” structures in the training data set. In the beginning, each input structure is pixelated into a image, and the CNNs are used to extract the features of the images in both networks. After running several epochs of the training process, even the optimized discriminator can hardly distinguish the difference between “fake” and “true” inputs, since the generator can generate extremely similar structures to the desired ones, resulting in a good model for inverse design. As shown in the top panel of Fig. 6(a), the simulated spectra of the retrieved structures (red line) agree well with the desired spectra (black line), which are either simulated with an existing structure (first row) or randomly generated with a Lorentzian shape (second row). The overall accuracy is noticeable, reaching a 0.0322 mean-absolute error among 12 test samples after the model is trained with 10,150 training data. The authors also showed that the model can inversely design different structures (but are still within the basic shape groups), while the spectra meet the target as illustrated at the bottom of Fig. 6(a). The emergence of structures different from the ground truths can be attributed to the one-to-many mapping issue that we have discussed in the introduction section.
W. Ma et al. also demonstrated a probabilistic approach for the inverse design of plasmonic structures in 2019 . In this work, the structure of interest was a metal-insulator-metal (MIM) structure, with geometries pixelated into images as training data. The authors focused on the co- and cross-polarized reflection spectra in the mid-infrared region from 40 to 100 THz. The developed neural network is shown at the top of Fig. 6(b), which comprises the prediction, recognition, and generation models. Again, the input geometry passes through the CNNs to extract the features from the image. Then the prediction model with FCLs can automatically predict the reflection spectra from the geometry features. For the inverse design part, the authors incorporated a variational auto-encoder (VAE) structure [76,77], which is a probabilistic approach, in the model. It works in the following way. First, the recognition network encodes both the structures and corresponding spectra into a latent space with a standard Gaussian prior distribution. While in the generation model, the network takes the desired spectra together with a latent variable randomly sampled from the conditional latent distribution to reconstruct one geometry. Here, the three models are trained together in an end-to-end manner. The well-trained model can not only predict the spectra from the given structure, serving as a powerful alternative for numerical simulation, but also reconstruct multiple structures from user-defined spectra. The bottom part of Fig. 6(b) shows the performance of the model trained with 30,000 data for spectral prediction and the inverse design for both user-defined spectra (first row) and spectra from a test structure (second row). The first column in the figure shows the target spectra. In the case where a test structure is used to generate the spectra, the predicted spectrum from the prediction model is also plotted as a scatter plot, which shows great coincidence with the spectra from full-wave simulation (solid lines). In the second and third columns, two examples of the geometry from the inverse design model and their simulated spectra are depicted. One can find that even though the structures are very different from each other and also from the ground truth, the spectra resemble the target ones. The authors further expanded the basic shapes by transfer learning to enable the reconstruction of a wide range of geometry groups. The generality of the model was exemplified by the designs of double-layer chiral metamaterials. Very recently, W. Ma and Y. Liu developed a semi-supervised learning strategy to accelerate the training data generation process, the most time-consuming part of the deep-learning-aided inverse design . In addition to the labeled data that have both the geometries of structures and simulated spectra, the unlabeled data with only the geometry information are included. Unlike the labeled data where simulated spectra can be the input in the inverse design model, the predicted spectra of the unlabeled data are used as input to reconstruct the geometry. Without numerical simulation, the unlabeled data can be generated several orders of magnitude faster. They also help to dramatically lower the training loss by 10%–30% for the model trained with the same number of labeled data.
Z. Liu et al. introduced a hybrid approach by combining the VAE model and the evolution strategy (ES) . The framework of the hybrid model is shown on the left of Fig. 6(c). In each iteration, a generation of latent vectors is fed into the model and a structure is reconstructed. Then a well-trained simulator is used to predict the transmittance spectra of the structures, and the fitness score is calculated. If the criteria are not satisfied yet, the ES will perform reproduction and mutation with the mutation strength to create a new generation of the latent vectors. Such a process is repeated until the criteria are met. The details of ES will be discussed in the genetic algorithm part in the next section. The right panel of Fig. 6(c) shows the performance of the inverse design model. The solid line and dashed line are the simulated spectra of the test pattern (orange) by finite element method and the reconstructed pattern (black) from the hybrid model, respectively. All the works in Fig. 6 solve the one-to-many mapping issue with a probabilistic approach like VAEs and GANs, where a randomly sampled parameter or vector is combined with the desired optical response as the input to reconstruct the structure. It enables the ANNs to explore the full physical possibility of the design space to produce sophisticated structures for novel functions.
In 2019, Q. Zhang et al. demonstrated the digital coding metasurface using CNNs . They explored different meta-atoms, each in the size of 8 mm and with pixels, to control the reflection phase. The CNNs model was built upon residual learning blocks and 70,000 training patterns. After training, the model can precisely predict the reflection phase; 90.05% of the test samples exhibited a deviation of less than 2° in the 360° phase range. Subsequently, the model was used for the inverse design of meta-atoms with a prescribed phase response. More specifically, the goal was to create a 1-bit coding with two meta-atoms such that the reflection phase of the incident - and - () polarized light satisfy7(a). By carefully combining the phase profile on a metasurface consisting of 16 designed units, the authors demonstrated the independent manipulation of the phase for orthogonal polarizations. As one example of potential applications, the authors fabricated several dual- and triple-beam coding metasurfaces that can deflect light with different polarizations into different angles at 10 GHz. The measurement was performed in the microwave chamber with a horn antenna as the excitation source. On the right of Fig. 7(a), we can find excellent agreement between the measured far-field scattering patterns and the simulated ones.
CNNs are widely applied in 2D image processing. The significance of CNNs is attributed to their ability to keep the local segment of the input as a whole, which can theoretically work in an arbitrary dimension. Taking advantage of this property, P. R. Wiecha and O. L. Muskens built a model with 3D CNNs to predict the near-field and far-field electric/magnetic response of arbitrary nanostructures . They pixelated the dielectric or plasmonic nanostructure of interest into a 3D image and fed the image into several layers of 3D CNNs. Then an output 3D image with the same size as the input was predicted, representing the electric field under a fixed wavelength and polarization in the same coordination system as shown in Fig. 7(b). The residual connections and shortcut connections in the network are known as the residual learning  and U-Net  blocks, which can help to stabilize the gradient of the networks and make the network deeper without compromising its performance [84,85]. From the predicted near-field response, other physical quantities, such as far-field scattering patterns, energy flux, and electromagnetic chirality, can then be deduced. The authors studied two cases: 2D gold nanostructures with random polygonal shapes and 3D silicon structures consisting of several pillars. Each scheme was trained by simulation data of 30,000 distinct geometries. With the well-trained model, the authors reproduced several nano-optical effects from the near-field prediction from the 3D CNNs, like antenna behavior of gold nanorods and Kerker-type scattering of Si nanoblocks. The model can potentially serve as an extremely fast tool to replace the current full-wave simulation methods, with the trade-off of slightly decreased accuracy.
In parallel, a one-dimensional (1D) CNN was also introduced to analyze the scattering spectra of silicon nanostructures for optical information storage as demonstrated by P. R. Wiecha et al. in 2019 . The authors used Si nanostructures to store the bit information with high density as shown in the left panel of Fig. 7(c). The nanostructure was divided into parts. If a certain part contained a silicon block, the particular bit was defined as “1;” otherwise it was “0.” Therefore, an -bit information storage unit was created. The readout of the information encoded in the nanostructure was through far-field measurement. Here, the dark-field spectra under - and -polarized light in the visible range were chosen to be the measured information. The 1D CNNs together with FCLs were used to analyze the spectra, where the input of the classification problem was the scattering spectra and the output was the index of the class number among the total classes for bits, representing the bit sequence. The network was trained with experimentally measured dark-field spectra of 625 fabricated nanostructures for each geometry. The model trained after 100 epochs can show quasi-error-free prediction with accuracy higher than 99.97% for the 2-bit to 5-bit (or even 9-bit) geometries as demonstrated in the right panel of Fig. 7(c). The authors further showed that the input information can be greatly reduced by feeding the network with only a small spectral window around 100 nm or even several discrete data points on the spectra, while the effect on the accuracy was neglectable. Finally, the authors managed to retrieve the stored information from the RGB value of the dark-field color image of the nanostructures. This new approach can reduce the complexity and equipment cost of the readout process and at the same time promises a massively parallel retrieval of information.
CNNs are not always the best choice for image inputs as found by A. Turpin et al. in 2018 . The scheme of this work is shown on the left of Fig. 7(d). They studied the speckle of the illuminated digital micromirror device (DMD) pattern after light passed through a layer of scattering material like a glass diffuser of multimode fibers. They intended to inversely design the required DMD pattern for an output speckle to form a certain image. The authors built two models by a single FCL and multilayer CNNs. The right panel of Fig. 7(d) presents the result of the inverse designs for the desired Gaussian beam outputs based on the two models. We can find that the measured results of the single FCL look better than those of the multilayer CNNs. Quantitatively, both of the models can achieve a signal-to-noise ratio larger than 10. However, the enhancement metric is for the first model and only 3.6 for the second model, where is defined as the intensity at the generated focal point divided by the mean intensity of the background speckle. Therefore, the authors concluded that in this particular application, CNNs can reduce the number of network parameters by almost 80% compared to the single FCL, but at the cost of a worse performance when the used training data have a similar number. The well-trained model can then be used to predict the required illumination pattern with varied output images. In this way, the authors achieved a dynamic scan of the focal point by manipulating the input illumination with a high frame rate of 22.7 kHz.
4. OTHER INTELLIGENT ALGORITHMS FOR PHOTONIC DESIGNS
There are other well-developed computational methods and algorithms that can be applied for the inverse design with satisfactory performance in specific circumstances. One of the most popular methods is the genetic algorithm [88,89], which is inspired by Charles Darwin’s natural evolution theory. As previously discussed , in the design toward a target response, a group of initial designs is created either randomly or empirically. The performance of the first generation of “species” is tested and compared to the target response, and a fitness score based on the comparison is calculated. The algorithm will select several “species” in the current generation that has the highest fitness score. Then reproduction combining the information of two or more designs and mutation that adds random noise to the design is performed to generate the next generation of species. The process is repeated until all or most of the species in the new generation have good fitness scores. This algorithm was already applied to photonic design problems a decade ago and achieved great success [90–94]. Recently, Z. Liu et al. published their work that integrated the genetic algorithm with ANNs . They studied “meta-molecules” consisting of multiple meta-atoms that can realize polarization conversion and anomalous light deflection as shown on the left of Fig. 8(a). The model is composed of a compositional pattern-producing network (CPPN), which is used to decode the 2D patterns from a latent variable, and a cooperative coevolution algorithm (CC) to identify a set of vectors in the latent space. The CPPNs take the coordinate tuple one at a time together with a latent vector , which controls the shapes of the patterns, and assemble the predictions from the whole input as a pattern. The CC then performs the genetic algorithm with the fitness score calculated based on the output polarization state, the ellipticity, and the phase and intensity of the electric field. The authors first trained a neural network simulator with the response from 8000 meta-atoms in different shapes. This simulator can be adopted in the CC to greatly reduce the time of fitness score computation. The simulator can achieve predictions of real and imaginary parts of spectra with an accuracy above 97%. The authors designed and fabricated meta-molecules comprising two (or eight) meta-atoms to implement polarization conversion under linear polarization as well as anomalous light deflection under circular polarization. The simulated and measured results of polarization conversion are plotted in the right panel of Fig. 8(a), showing excellent agreement with the target.
Another widely used optimization algorithm for the inverse design is gradient-based topology optimization [21,96–103]. In the optimization process, the design space is discretized into pixels whose properties (i.e., refractive index) can be represented by a parameter set . The parameter set will be optimized for a prescribed target response by maximizing (minimizing) a user-defined objective function . Starting from an initial parameter set, both a forward simulation and an adjoint simulation are performed to calculate the gradient of the objective function with respect to each parameter. Then the parameters are updated according to the gradient ascent (descent) method. This iterative process is continued until the objective function is well optimized. Taking advantage of the topology optimization, J. Jiang et al. presented a global optimizer for highly efficient metasurfaces that can deflect light to desired angles . As illustrated in the top panel of Fig. 8(b), the metagrating in one period is divided into 256 segments, and each segment can be filled with either air or Si. To optimize the metagrating, the authors used a global optimization method named GLOnet. The GLOnet is based on both a generative neural network (GNN) and topology optimization as shown in the bottom panel of Fig. 8(b). The GNN takes the desired deflection angle and the working wavelength together with a random noise vector as inputs. The inputs pass through FCLs and layers of deconvolutional blocks, and then a metagrating design is generated. The Gaussian filter at the last layer of the generator eliminates small features that are hard to fabricate. Next, the topology optimization is applied. By performing both a forward simulation and an adjoint simulation, the gradient of the objective function (efficiency) is calculated. The weights of the ANNs are updated according to the gradient ascent method. To make the model capable of working for any deflection angle and wavelength, the initialization of the model is essential to span the full design space. Therefore, an identity shortcut is added to map the random noise directly to the output design, which will enable all kinds of designs when the initial weight of the GNN is small. It should be noted that the GLOnet is different from conventional topology optimization. In conventional topology optimization, the structural parameters (like the refractive index of individual segments) are updated for a single device with a fixed deflection angle and wavelength. When the goal (deflection angle ) or the working wavelength is changed, the optimization needs to be performed again for the new device. However, in the GLOnet, the optimized parameters are the weights in the neural networks during each iteration. Therefore, the GNN is improved in terms of the ability to inversely design devices for varied goals and working wavelengths, without the need to retrain the model when the target changes. The performances of conventional topology optimization and the GLOnet optimization have been compared in this work: 92% of the devices designed by the GLOnet have efficiencies higher than or within 5% of the devices designed by the other method. In addition, the retrieved devices gradually converge to a high-efficiency region as the iteration number of the training process increases.
Combining topology optimization and ANNs, Z. A. Kudyshev et al. studied the structure optimization of high-efficiency thermophotovoltaic (TPV) cells operating in the desired wavelength range () . The design is based on a gap plasmonic structure. As shown in the top panel of Fig. 8(c), the optimization can be divided into three main steps. First, the topology optimization method is applied to generate a group of appropriate structures for training. Then an adversarial autoencoder (AAE) network is trained. Similar to the VAE, the AAE consists of an encoder to map the input designs to a latent space and a decoder to retrieve the structure from the latent vector sampled from the latent space. Both the VAE and AAE models try to make the latent distribution approach a predefined distribution (a 15-dimensional Gaussian distribution in Ref. ). In the VAE model, a Kullback–Leibler divergence that compares with is defined as one part of the loss function; while in the AAE, a discriminator used to distinguish the samples from and is built, and the encoder is trained to generate samples that can fool the discriminator. In the last step, the structure retrieved from the decoder is refined with topology optimization to remove the blurring of the generated designs. As a result, the hybrid method that combines AAE and topology optimization shows great performance, providing a mean efficiency of 90% for the retrieved structures. In contrast, the efficiency is 82% via direct topology optimization. The comparison between these two methods is shown at the bottom of Fig. 8(c) together with the emissivity and emission plots for the best designs from either method. In a very recent work , the same group further developed a global optimization method in which a global optimization engine can generate latent vectors and Visual Geometry Groupnet can rapidly assess the performance of the design.
Conventional machine learning methods, such as Bayesian learning , clustering , and manifold learning , are also very helpful in solving photonic design problems. In 2019, L. Li et al. showcased a machine-learning-based imager that can efficiently record the microwave image of a moving object by a reprogrammable metasurface . This work may pave the way for intelligent surveillance with both fast response time and high accuracy. The meta-atom has three metallic patches connected via PIN diodes to encode 2-bit information as schematically shown in the top panel of Fig. 8(d). The digital phase step is around 90° between adjacent states, and the state can be tuned by applying an external bias voltage. The authors recorded a moving person for less than 20 min to generate the training data for the model. With principal component analysis (or random projection), the main modes with significant contributions were calculated. Then all meta-atoms were tuned by a bias voltage to match the principal component analysis modes for each measurement. In this way, the measurement became more efficient because it always captured the information with a high contribution to reconstructing the microwave image. To test the well-trained model, another person was moving in front of the metasurface, and images of the movements were reconstructed as shown at the bottom of Fig. 8(d). With only 400 measurements, which were far fewer than the number of pixels, high-quality images could be produced even when the person was blocked by a 3-cm-thick paper wall. This method was further extended to the classification problem, in which the authors defined three different movements (i.e., standing, bending, and raising arms). With a simple nearest-neighbor algorithm, only 25 measurements led to good recognition of the movements.
5. CONCLUSION AND OUTLOOK
In this review, we have introduced the basic idea of applying ANNs and other advanced algorithms to accelerate and optimize photonic designs, including plasmonic nanostructures and metamaterials. We have highlighted some representative works in this field and discussed the performance and applications of the proposed models. In the inverse design problem, the neural network is usually built upon FCLs and CNNs, integrated with other neural network units like ResNets and RNNs. It is beneficial to incorporate ANNs with conventional optimization methods such as genetic algorithm and topology optimization because the conventional optimization methods can help to perform global optimization and provide feedback to further improve the ANNs. The emergence of all the methods offers a great opportunity to increase the structural complexity in the devices, which can realize much more complex and novel functionalities.
The development of photonics can also potentially benefit the studies of computational methods. For instance, it has been long sought to push the computation speed to the speed of the light. All-optical neuromorphic computing [108–112] via optical networks is one approach toward this goal. In principle, the diffraction nature of light described as can also be regarded as a nonlinear function. Therefore, the intensity profiles in two diffractive layers “connected” with light diffraction can be a good analogy to the connection between neurons in ANNs. Based on this idea, researchers have demonstrated a new kind of neural network built upon all-optical components, which are known as optical neural networks (ONNs) [113–117]. As a comprehensive example, X. Lin et al. reported an all-optical system that can serve as a diffractive deep neural network () for image classification in 2018 . The system is composed of several layers of 3D printed structures. According to the Huygens–Fresnel principle, points in the layers can be regarded as a secondary source of light. Therefore, each point in the front layer will contribute to the amplitude and phase distribution of each point in the following layers, while the propagation phase will function as the nonlinearity. The analogy between the and the ANN is illustrated in the top panel of Fig. 9(a). The authors designed the using the same error backpropagation method as in the ANNs and adjusted the phase distribution in each layer. This design process was run on the computer, but once the design finished, the fabricated device can perform prediction (classification) all-optically. In the measurement, the light passed through an input plane with the same shape as the image. By detecting the position with the maximum output intensity after light passing through all layers, the class of the input image can be read out. The authors trained and tested the classifier with images of handwritten digits and fashion products. The experimental results show great accordance with the expectation, as shown in the bottom panel of Fig. 9(a), with an accuracy of 91.75% and 86.60% for the two tasks, respectively. Two years later, C. Qian et al. showed optical logic operations by a diffractive neural network . The goal was to perform the logic operations such as “and,” “or,” and “not” for the inputs. As shown in the first two rows of Fig. 9(b), the input wave was shaped so that it can only pass through certain regions before illuminating on the diffractive metasurface. In this way, the two binary inputs and the logical operation can be controlled. The results can also be read out by detecting the intensity at two positions representing “0” and “1.” The last two rows of Fig. 9(b) show the experimental measurement for 10 different operations, and all the profiles indicate the correct results. More efforts are needed to further advance this exciting direction, for instance, by reducing the footprint and increasing the efficiency of the optical neural networks.
The ANNs are typically considered a “black box” since the relationship between inputs and outputs learned by the ANNs is usually implicit. In some published works, researchers can visualize the output of each individual layer to provide some information on what feature is learned (or what function is done) by each layer , which is a good attempt. However, if we can further extract the relation explicitly from the well-trained ANNs, it will be very helpful to find new structure groups that lie out of the conventional geometry groups (like H-shape, C-shape, bowtie). At the same time, it will also provide guidelines or insights for the design of optical devices. Another important direction is to extend the generality of the ANNs models. When applying ANNs to solve the traditional tasks, such as image recognition and natural language processing, we want the neural networks to learn the information and distribution that lie inside the natural images or languages themselves and try to reconstruct or approximate these distributions. The ANNs have been proven to work well in learning and summarizing the distributions from the images or languages. At the same time, it is relatively easy to extend the model to deal with other kinds of images or languages. However, the inverse design tasks in photonics are more complicated. The reason is that the ANNs need to learn the implicit physical rules (such as Maxwell’s equations) between the structures and their optical responses, instead of the information and distribution associated with the structures themselves. Therefore, extending the capability of a well-trained neural network in the inverse design problems remains a challenge. Most of the ANNs described in this review paper are only specified for a certain design platform or application. It is true that a model can be fine-tuned to handle different tasks, but the model needs to be retrained and, at the same time, an additional training data set is required. When the original training set contains all kinds of training data for multiple tasks, multiple design rules are likely to be involved and learned by the ANNs. The performance of the model will not be satisfactory for each individual task compared to the model trained with only a specific data set for this task, because the rules for other tasks will serve as perturbation or noise in this case. It is very important to find the trade-off.
Over the past decades, photonics and artificial intelligence have been evolving largely as two separate research disciplines. The intersection and combination of these two topics in recent years have brought exciting achievements. On one hand, the innovative ANN models provide a powerful tool to accelerate the optical design and implementation process. Some nonintuitive structures and phenomena have been discovered by this new strategy. On the other hand, the developed optical designs are expected to produce a variety of real-world applications, such as optical imaging, holography, communications, and information encryption, with high efficiency, fidelity, and robustness. Toward this goal, we need to include the practical fabrication constraints and underlying material properties into the design space in order to globally optimize the devices and systems. We believe that the field of interfacing photonics and artificial intelligence will significantly move forward as more researchers from different backgrounds join this effort.
National Science Fundation (ECCS-1916839).
The authors declare no conflicts of interest to this paper.
1. Y. Liu and X. Zhang, “Metamaterials: a new frontier of science and technology,” Chem. Soc. Rev. 40, 2494–2507 (2011). [CrossRef]
2. N. Yu, P. Genevet, M. A. Kats, F. Aieta, J.-P. Tetienne, F. Capasso, and Z. Gaburro, “Light propagation with phase discontinuities: generalized laws of reflection and refraction,” Science 334, 333–337 (2011). [CrossRef]
3. Y. W. Huang, H. W. H. Lee, R. Sokhoyan, R. A. Pala, K. Thyagarajan, S. Han, D. P. Tsai, and H. A. Atwater, “Gate-tunable conducting oxide metasurfaces,” Nano Lett. 16, 5319–5325 (2016). [CrossRef]
4. A. M. Shaltout, V. M. Shalaev, and M. L. Brongersma, “Spatiotemporal light control with active metasurfaces,” Science 364, eaat3100 (2019). [CrossRef]
5. L. Li, K. Yao, Z. Wang, and Y. Liu, “Harnessing evanescent waves by bianisotropic metasurfaces,” Laser Photon. Rev. 14, 1900244 (2020). [CrossRef]
6. L. Huang, X. Chen, H. Mühlenbernd, H. Zhang, S. Chen, B. Bai, Q. Tan, G. Jin, K.-W. Cheah, C.-W. Qiu, J. Li, T. Zentgraf, and S. Zhang, “Three-dimensional optical holography using a plasmonic metasurface,” Nat. Commun. 4, 2808 (2013). [CrossRef]
7. G. Zheng, H. Mühlenbernd, M. Kenney, G. Li, T. Zentgraf, and S. Zhang, “Metasurface holograms reaching 80% efficiency,” Nat. Nanotechnol. 10, 308–312 (2015). [CrossRef]
8. B. Wang, F. Dong, Q. T. Li, D. Yang, C. Sun, J. Chen, Z. Song, L. Xu, W. Chu, Y. F. Xiao, and Q. Gong, “Visible-frequency dielectric metasurfaces for multiwavelength achromatic and highly dispersive holograms,” Nano Lett. 16, 5235–5240 (2016). [CrossRef]
9. L. Jin, Z. Dong, S. Mei, Y. F. Yu, Z. Wei, Z. Pan, S. D. Rezaei, X. Li, A. I. Kuznetsov, Y. S. Kivshar, and J. K. Yang, “Noninterleaved metasurface for (26–1) spin-and wavelength-encoded holograms,” Nano Lett. 18, 8016–8024 (2018). [CrossRef]
10. X. Chen, L. Huang, H. Mühlenbernd, G. Li, B. Bai, Q. Tan, G. Jin, C. W. Qiu, S. Zhang, and T. Zentgraf, “Dual-polarity plasmonic metalens for visible light,” Nat. Commun. 3, 1198 (2012). [CrossRef]
11. X. Ni, S. Ishii, A. V. Kildishev, and V. M. Shalaev, “Ultra-thin, planar, Babinet-inverted plasmonic metalenses,” Light Sci. Appl. 2, e72 (2013). [CrossRef]
12. S. Wang, P. C. Wu, V. C. Su, Y. C. Lai, M. K. Chen, H. Y. Kuo, B. H. Chen, Y. H. Chen, T. T. Huang, J. H. Wang, and R. M. Lin, “A broadband achromatic metalens in the visible,” Nat. Nanotechnol. 13, 227–232 (2018). [CrossRef]
13. W. T. Chen, A. Y. Zhu, V. Sanjeev, M. Khorasaninejad, Z. Shi, E. Lee, and F. Capasso, “A broadband achromatic metalens for focusing and imaging in the visible,” Nat. Nanotechnol. 13, 220–226 (2018). [CrossRef]
14. X. Zang, H. Ding, Y. Intaravanne, L. Chen, Y. Peng, J. Xie, Q. Ke, A. V. Balakin, A. P. Shkurinov, X. Chen, and Y. Zhu, “A multi-foci metalens with polarization-rotated focal points,” Laser Photon. Rev. 13, 1900182 (2019). [CrossRef]
15. M. Faraji-Dana, E. Arbabi, A. Arbabi, S. M. Kamali, H. Kwon, and A. Faraon, “Compact folded metasurface spectrometer,” Nat. Commun. 9, 4196 (2018). [CrossRef]
16. A. Tittl, A. Leitis, M. Liu, F. Yesilkoy, D. Y. Choi, D. N. Neshev, Y. S. Kivshar, and H. Altug, “Imaging-based molecular barcoding with pixelated dielectric metasurfaces,” Science 360, 1105–1109 (2018). [CrossRef]
17. A. Leitis, A. Tittl, M. Liu, B. H. Lee, M. B. Gu, Y. S. Kivshar, and H. Altug, “Angle-multiplexed all-dielectric metasurfaces for broadband molecular fingerprint retrieval,” Sci. Adv. 5, eaaw2871 (2019). [CrossRef]
18. N. K. Grady, J. E. Heyes, D. R. Chowdhury, Y. Zeng, M. T. Reiten, A. K. Azad, A. J. Taylor, D. A. Dalvit, and H. T. Chen, “Terahertz metamaterials for linear polarization conversion and anomalous refraction,” Science 340, 1304–1307 (2013). [CrossRef]
19. M. Kim, K. Yao, G. Yoon, I. Kim, Y. Liu, and J. Rho, “A broadband optical diode for linearly polarized light using symmetry-breaking metamaterials,” Adv. Opt. Mater. 5, 1700600 (2017). [CrossRef]
20. L. Kang, S. P. Rodrigues, M. Taghinejad, S. Lan, K. T. Lee, Y. Liu, D. H. Werner, A. Urbas, and W. Cai, “Preserving spin states upon reflection: linear and nonlinear responses of a chiral meta-mirror,” Nano Lett. 17, 7102–7109 (2017). [CrossRef]
21. Z. Shi, A. Y. Zhu, Z. Li, Y. W. Huang, W. T. Chen, C. W. Qiu, and F. Capasso, “Continuous angle-tunable birefringence with freeform metasurfaces for arbitrary polarization conversion,” Sci. Adv. 6, eaba3367 (2020). [CrossRef]
22. Z. Wang, K. Yao, M. Chen, H. Chen, and Y. Liu, “Manipulating Smith-Purcell emission with babinet metasurfaces,” Phys. Rev. Lett. 117, 157401 (2016). [CrossRef]
23. D. Schurig, J. J. Mock, B. J. Justice, S. A. Cummer, J. B. Pendry, A. F. Starr, and D. R. Smith, “Metamaterial electromagnetic cloak at microwave frequencies,” Science 314, 977–980 (2006). [CrossRef]
24. J. K. Gansel, M. Thiel, M. S. Rill, M. Decker, K. Bade, V. Saile, G. von Freymann, S. Linden, and M. Wegener, “Gold helix photonic metamaterial as broadband circular polarizer,” Science 325, 1513–1515 (2009). [CrossRef]
25. X. Liu, T. Tyler, T. Starr, A. F. Starr, N. M. Jokerst, and W. J. Padilla, “Taming the blackbody with infrared metamaterials as selective thermal emitters,” Phys. Rev. Lett. 107, 045901 (2011). [CrossRef]
26. D. P. Fromm, A. Sundaramurthy, P. J. Schuck, G. Kino, and W. Moerner, “Gap-dependent optical coupling of single “bowtie” nanoantennas resonant in the visible,” Nano Lett. 4, 957–961 (2004). [CrossRef]
27. M. Choi, S. H. Lee, Y. Kim, S. B. Kang, J. Shin, M. H. Kwak, K. Y. Kang, Y. H. Lee, N. Park, and B. Min, “A terahertz metamaterial with unnaturally high refractive index,” Nature 470, 369–373 (2011). [CrossRef]
28. S. Sun, Q. He, S. Xiao, Q. Xu, X. Li, and L. Zhou, “Gradient-index meta-surfaces as a bridge linking propagating waves and surface waves,” Nat. Mater. 11, 426–431 (2012). [CrossRef]
29. K. Yao, R. Unni, and Y. Zheng, “Intelligent nanophotonics: merging photonics and artificial intelligence at the nanoscale,” Nanophotonics 8, 339–366 (2019). [CrossRef]
30. Q. Zhang, H. Yu, M. Barbiero, B. Wang, and M. Gu, “Artificial neural networks enabled by nanophotonics,” Light Sci. Appl. 8, 42 (2019). [CrossRef]
31. R. S. Hegde, “Deep learning: a new tool for photonic nanostructure design,” Nano. Adv. 2, 1007–1023 (2020). [CrossRef]
32. S. So, T. Badloe, J. Noh, J. Bravo-Abad, and J. Rho, “Deep learning enabled inverse design in nanophotonics,” Nanophotonics 9, 1041–1057 (2020). [CrossRef]
33. W. Ma, Z. Liu, Z. A. Kudyshev, A. Boltasseva, W. Cai, and Y. Liu, “Deep learning for the design of photonic structures,” Nat. Photonics 15, 77–90 (2020). [CrossRef]
34. M. M. R. Elsawy, S. Lanteri, R. Duvigneau, J. A. Fan, and P. Genevet, “Numerical optimization methods for metasurfaces,” Laser Photon. Rev. 14, 1900445 (2020). [CrossRef]
35. D. Piccinotti, K. F. MacDonald, S. Gregory, I. Youngs, and N. I. Zheludev, “Artificial intelligence for photonics and photonic materials,” Rep. Prog. Phys. 84, 012401 (2020). [CrossRef]
36. L. Huang, L. Xu, and A. E. Miroshnichenko, “Deep learning enabled nanophotonics,” in Advances and Applications in Deep Learning (IntechOpen, 2020).
37. J. Jiang, M. Chen, and J. A. Fan, “Deep neural networks for the evaluation and design of photonic devices,” Nat. Rev. Mater. (2020). [CrossRef]
38. J. Peurifoy, Y. Shen, L. Jing, Y. Yang, F. Cano-Renteria, B. G. DeLacy, J. D. Joannopoulos, M. Tegmark, and M. Soljačić, “Nanophotonic particle simulation and inverse design using artificial neural networks,” Sci. Adv. 4, eaar4206 (2018). [CrossRef]
39. T. Qiu, X. Shi, J. Wang, Y. Li, S. Qu, Q. Cheng, T. Cui, and S. Sui, “Deep learning: a rapid and efficient route to automatic metasurface design,” Adv. Sci. 6, 1900128 (2019). [CrossRef]
40. I. Sajedian, J. Kim, and J. Rho, “Finding the optical properties of plasmonic structures by image processing using a combination of convolutional neural networks and recurrent neural networks,” Microsyst. Nanoeng. 5, 27 (2019). [CrossRef]
41. W. Ma, F. Cheng, Y. Xu, Q. Wen, and Y. Liu, “Probabilistic representation and inverse design of metamaterials based on a deep generative model with semi-supervised learning strategy,” Adv. Mater. 31, 1901111 (2019). [CrossRef]
42. Z. A. Kudyshev, A. V. Kildishev, V. M. Shalaev, and A. Boltasseva, “Machine-learning-assisted metasurface design for high-efficiency thermal emitter optimization,” Appl. Phys. Rev. 7, 021407 (2020). [CrossRef]
43. F. Wen, J. Jiang, and J. A. Fan, “Robust freeform metasurface design based on progressively growing generative networks,” ACS Photon. 7, 2098–2104 (2020). [CrossRef]
44. W. Maass, “Networks of spiking neurons: the third generation of neural network models,” Neural Netw. 10, 1659–1671 (1997). [CrossRef]
45. W. Gerstner and W. M. Kistler, Spiking Neuron Models: Single Neurons, Populations, Plasticity (Cambridge University, 2002).
46. D. Liu, Y. Tan, E. Khoram, and Z. Yu, “Training deep neural networks for the inverse design of nanophotonic structures,” ACS Photon. 5, 1365–1369 (2018). [CrossRef]
47. S. So, J. Mun, and J. Rho, “Simultaneous inverse design of materials and structures via deep learning: demonstration of dipole resonance engineering using core-shell nanoparticles,” ACS Appl. Mater. Interfaces 11, 24264–24268 (2019). [CrossRef]
48. Y. Chen, J. Zhu, Y. Xie, N. Feng, and Q. H. Liu, “Smart inverse design of graphene-based photonic metamaterials by an adaptive artificial neural network,” Nanoscale 11, 9749–9755 (2019). [CrossRef]
49. Y. Qu, L. Jing, Y. Shen, M. Qiu, and M. Soljačić, “Migrating knowledge between physical scenarios based on artificial neural networks,” ACS Photon. 6, 1168–1174 (2019). [CrossRef]
50. I. Sajedian, T. Badloe, H. Lee, and J. Rho, “Deep Q-network to produce polarization-independent perfect solar absorbers: a statistical report,” Nano Converg. 7, 26 (2020). [CrossRef]
51. I. Sajedian, H. Lee, and J. Rho, “Double-deep Q-learning to increase the efficiency of metasurface holograms,” Sci. Rep. 9, 10899 (2019). [CrossRef]
52. I. Sajedian, H. Lee, and J. Rho, “Design of high transmission color filters for solar cells directed by deep Q-learning,” Sol. Energy 195, 670–676 (2020). [CrossRef]
53. H. Wang, Z. Zheng, C. Ji, and L. J. Guo, “Automated multi-layer optical design via deep reinforcement learning,” Mach. Learn. 2, 025013 (2020). [CrossRef]
54. I. Malkiel, M. Mrejen, A. Nagler, U. Arieli, L. Wolf, and H. Suchowski, “Plasmonic nanostructure design and characterization via deep learning,” Light Sci. Appl. 7, 60 (2018). [CrossRef]
55. T. Zhang, J. Wang, Q. Liu, J. Zhou, J. Dai, X. Han, Y. Zhou, and K. Xu, “Efficient spectrum prediction and inverse design for plasmonic waveguide systems based on artificial neural networks,” Photon. Res. 7, 368–380 (2019). [CrossRef]
56. C. C. Nadell, B. Huang, J. M. Malof, and W. J. Padilla, “Deep learning for accelerated all-dielectric metasurface design,” Opt. Express 27, 27523–27535 (2019). [CrossRef]
57. G. Alagappan and C. E. Png, “Deep learning models for effective refractive indices in silicon nitride waveguides,” J. Opt. 21, 035801 (2019). [CrossRef]
58. M. H. Tahersima, K. Kojima, T. Koike-Akino, D. Jha, B. Wang, C. Lin, and K. Parsons, “Deep neural network inverse design of integrated photonic power splitters,” Sci. Rep. 9, 1368 (2019). [CrossRef]
59. Y. Long, J. Ren, Y. Li, and H. Chen, “Inverse design of photonic topological state via machine learning,” Appl. Phys. Lett. 114, 181105 (2019).
60. L. Pilozzi, F. A. Farrelly, G. Marcucci, and C. Conti, “Machine learning inverse problem for topological photonics,” Commun. Phys. 1, 57 (2018). [CrossRef]
61. G. Alagappan and C. E. Png, “Modal classification in optical waveguides using deep learning,” J. Mod. Opt. 66, 557–561 (2018). [CrossRef]
62. I. Sajedian, T. Badloe, and J. Rho, “Optimization of colour generation from dielectric nanostructures using reinforcement learning,” Opt. Express 27, 5874–5883 (2019). [CrossRef]
63. H. Ren, W. Shao, Y. Li, F. Salim, and M. Gu, “Three-dimensional vectorial holography based on machine learning inverse design,” Sci. Adv. 6, eaaz4261 (2020). [CrossRef]
64. C. Qian, B. Zheng, Y. Shen, L. Jing, E. Li, L. Shen, and H. Chen, “Deep-learning-enabled self-adaptive microwave cloak without human intervention,” Nat. Photonics 14, 383–390 (2020). [CrossRef]
65. Z. Liu, D. Zhu, S. P. Rodrigues, K. T. Lee, and W. Cai, “Generative model for the inverse design of metasurfaces,” Nano Lett. 18, 6570–6576 (2018). [CrossRef]
66. J. Jiang, D. Sell, S. Hoyer, J. Hickey, J. Yang, and J. A. Fan, “Free-form diffractive metagrating design based on generative adversarial networks,” ACS Nano 13, 8872–8878 (2019). [CrossRef]
67. R. Trivedi, L. Su, J. Lu, M. F. Schubert, and J. Vuckovic, “Data-driven acceleration of photonic simulations,” Sci. Rep. 9, 19728 (2019). [CrossRef]
68. T. Zahavy, A. Dikopoltsev, D. Moss, G. I. Haham, O. Cohen, S. Mannor, and M. Segev, “Deep learning reconstruction of ultrashort pulses,” Optica 5, 666–673 (2018). [CrossRef]
69. S. So and J. Rho, “Designing nanophotonic structures using conditional deep convolutional generative adversarial networks,” Nanophotonics 8, 1255–1261 (2019). [CrossRef]
70. J. A. Schuller, E. S. Barnard, W. Cai, Y. C. Jun, J. S. White, and M. L. Brongersma, “Plasmonics for extreme light concentration and manipulation,” Nat. Mater. 9, 193–204 (2010). [CrossRef]
71. D. K. Gramotnev and S. I. Bozhevolnyi, “Plasmonics beyond the diffraction limit,” Nat. Photonics 4, 83–91 (2010). [CrossRef]
72. J. Lin, J. B. Mueller, Q. Wang, G. Yuan, N. Antoniou, X. C. Yuan, and F. Capasso, “Polarization-controlled tunable directional coupling of surface plasmon polaritons,” Science 340, 331–334 (2013). [CrossRef]
73. Z. Cai, Y. Xu, C. Wang, and Y. Liu, “Polariton photonics using structured metals and 2D materials,” Adv. Opt. Mater. 8, 1901090 (2019). [CrossRef]
74. Y. Liu, S. Palomba, Y. Park, T. Zentgraf, X. Yin, and X. Zhang, “Compact magnetic antennas for directional excitation of surface plasmons,” Nano Lett. 12, 4853–4858 (2012). [CrossRef]
75. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” in Advances in Neural Information Processing Systems (2014), pp. 2672–2680.
76. D. P. Kingma and M. Welling, “Auto-encoding variational Bayes,” in 2nd International Conference on Learning Representations (ICLR) (2014), pp. 1–14.
77. S. J. Wetzel, “Unsupervised learning of phase transitions: from principal component analysis to variational autoencoders,” Phys. Rev. E 96, 022140 (2017). [CrossRef]
78. W. Ma and Y. Liu, “A data-efficient self-supervised deep learning model for design and characterization of nanophotonic structures,” Sci. China Phys. Mech. Astron. 63, 284212 (2020). [CrossRef]
79. Z. Liu, L. Raju, D. Zhu, and W. Cai, “A hybrid strategy for the discovery and design of photonic structures,” IEEE J. Emerg. Sel. Top. Circuits Syst. 10, 126–135 (2020). [CrossRef]
80. Q. Zhang, C. Liu, X. Wan, L. Zhang, S. Liu, Y. Yang, and T. J. Cui, “Machine-learning designs of anisotropic digital coding metasurfaces,” Adv. Theor. Simul. 2, 1800132 (2018). [CrossRef]
81. P. R. Wiecha and O. L. Muskens, “Deep learning meets nanophotonics: a generalized accurate predictor for near fields and far fields of arbitrary 3D nanostructures,” Nano Lett. 20, 329–338 (2020). [CrossRef]
82. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 770–778.
83. O. Ronneberger, P. Fischer, and T. Brox, “U-net: convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention (2015), pp. 234–241.
84. D. Balduzzi, M. Frean, L. Leary, J. P. Lewis, K. W. Ma, and B. McWilliams, “The shattered gradients problem: if resnets are the answer, then what is the question?” in Proceedings of the 34th International Conference on Machine Learning (2017), pp. 342–350.
85. M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: inverted residuals and linear bottlenecks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018), pp. 4510–4520.
86. P. R. Wiecha, A. Lecestre, N. Mallet, and G. Larrieu, “Pushing the limits of optical information storage using deep learning,” Nat. Nanotechnol. 14, 237–244 (2019). [CrossRef]
87. A. Turpin, I. Vishniakou, and J. D. Seelig, “Light scattering control in transmission and reflection with neural networks,” Opt. Express 26, 30911–30929 (2018). [CrossRef]
88. D. E. Goldberg and J. H. Holland, Genetic Algorithms and Machine Learning (Springer, 1988).
89. J. H. Holland, Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence (MIT, 1992).
90. L. Shen, Z. Ye, and S. He, “Design of two-dimensional photonic crystals with large absolute band gaps using a genetic algorithm,” Phys. Rev. B 68, 035109 (2003). [CrossRef]
91. S. Preble, M. Lipson, and H. Lipson, “Two-dimensional photonic crystals designed by evolutionary algorithms,” Appl. Phys. Lett. 86, 061111 (2005). [CrossRef]
92. T. Feichtner, O. Selig, M. Kiunke, and B. Hecht, “Evolutionary optimization of optical antennas,” Phys. Rev. Lett. 109, 127701 (2012). [CrossRef]
93. C. Wang, S. Yu, W. Chen, and C. Sun, “Highly efficient light-trapping structure design inspired by natural evolution,” Sci. Rep. 3, 1025 (2013). [CrossRef]
94. M. D. Huntington, L. J. Lauhon, and T. W. Odom, “Subwavelength lattice optics by evolutionary design,” Nano Lett. 14, 7195–7200 (2014). [CrossRef]
95. Z. Liu, D. Zhu, K. T. Lee, A. S. Kim, L. Raju, and W. Cai, “Compounding meta-atoms into metamolecules with hybrid artificial intelligence techniques,” Adv. Mater. 32, 1904790 (2020). [CrossRef]
96. P. I. Borel, A. Harpøth, L. H. Frandsen, M. Kristensen, P. Shi, J. S. Jensen, and O. Sigmund, “Topology optimization and fabrication of photonic crystal structures,” Opt. Express 12, 1996–2001 (2004). [CrossRef]
97. J. S. Jensen and O. Sigmund, “Topology optimization for nano-photonics,” Laser Photon. Rev. 5, 308–321 (2011). [CrossRef]
98. Z. Lin, B. Groever, F. Capasso, A. W. Rodriguez, and M. Lončar, “Topology-optimized multilayered metaoptics,” Phys. Rev. Appl. 9, 044030 (2018). [CrossRef]
99. R. Matzen, J. S. Jensen, and O. Sigmund, “Topology optimization for transient response of photonic crystal structures,” J. Opt. Soc. Am. B 27, 2040–2050 (2010). [CrossRef]
100. J. Jiang and J. A. Fan, “Global optimization of dielectric metasurfaces using a physics-driven neural network,” Nano Lett. 19, 5366–5372 (2019). [CrossRef]
101. T. Phan, D. Sell, E. W. Wang, S. Doshay, K. Edee, J. Yang, and J. A. Fan, “High-efficiency, large-area, topology-optimized metasurfaces,” Light Sci. Appl. 8, 48 (2019). [CrossRef]
102. M. Mansouree, H. Kwon, E. Arbabi, A. McClung, A. Faraon, and A. Arbabi, “Multifunctional 25D metastructures enabled by adjoint optimization,” Optica 7, 77–84 (2020). [CrossRef]
103. S. Molesky, Z. Lin, A. Y. Piggott, W. Jin, J. Vucković, and A. W. Rodriguez, “Inverse design in nanophotonics,” Nat. Photonics 12, 659–670 (2018). [CrossRef]
104. L. Li, H. Ruan, C. Liu, Y. Li, Y. Shuang, A. Alù, C. W. Qiu, and T. J. Cui, “Machine-learning reprogrammable metasurface imager,” Nat. Commun. 10, 1082 (2019). [CrossRef]
105. Z. A. Kudyshev, A. V. Kildishev, V. M. Shalaev, and A. Boltasseva, “Machine learning–assisted global optimization of photonic devices,” Nanophotonics 10, 371–383 (2020). [CrossRef]
106. R. Patel, K. Roy, J. Choi, and K. J. Han, “Generative design of electromagnetic structures through Bayesian learning,” IEEE Trans. Magn. 54, 9900138 (2018). [CrossRef]
107. Y. Long, J. Ren, and H. Chen, “Unsupervised manifold clustering of topological phononics,” Phys. Rev. Lett. 124, 185501 (2020). [CrossRef]
108. D. Marković, A. Mizrahi, D. Querlioz, and J. Grollier, “Physics for neuromorphic computing,” Nat. Rev. Phys. 2, 499–510 (2020). [CrossRef]
109. Y. van de Burgt, A. Melianas, S. T. Keene, G. Malliaras, and A. Salleo, “Organic electronics for neuromorphic computing,” Nat. Electron. 1, 386–397 (2018). [CrossRef]
110. J. Torrejon, M. Riou, F. A. Araujo, S. Tsunegi, G. Khalsa, D. Querlioz, P. Bortolotti, V. Cros, K. Yakushiji, A. Fukushima, and H. Kubota, “Neuromorphic computing with nanoscale spintronic oscillators,” Nature 547, 428–431 (2017). [CrossRef]
111. D. Brunner, S. Reitzenstein, and I. Fischer, “All-optical neuromorphic computing in optical networks of semiconductor lasers,” in IEEE International Conference on Rebooting Computing (ICRC) (2016), pp. 1–2.
112. A. Katumba, M. Freiberger, F. Laporte, A. Lugnan, S. Sackesyn, C. Ma, J. Dambre, and P. Bienstman, “Neuromorphic computing based on silicon photonics and reservoir computing,” IEEE J. Sel. Top. Quantum Electron. 24, 8300310 (2018). [CrossRef]
113. Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, D. Englund, and M. Soljačić, “Deep learning with coherent nanophotonic circuits,” Nat. Photonics 11, 441–446 (2017). [CrossRef]
114. H. Bagherian, S. Skirlo, Y. Shen, H. Meng, V. Ceperic, and M. Soljacic, “On-chip optical convolutional neural networks,” arXiv:1808.03303 (2018).
115. J. Bueno, S. Maktoobi, L. Froehly, I. Fischer, M. Jacquot, L. Larger, and D. Brunner, “Reinforcement learning in a large-scale photonic recurrent neural network,” Optica 5, 756–760 (2018). [CrossRef]
116. T. Zhang, J. Wang, Y. Dan, Y. Lanqiu, J. Dai, X. Han, X. Sun, and K. Xu, “Efficient training and design of photonic neural network through neuroevolution,” Opt. Express 27, 37150–37163 (2019). [CrossRef]
117. E. Khoram, A. Chen, D. Liu, L. Ying, Q. Wang, M. Yuan, and Z. Yu, “Nanophotonic media for artificial neural inference,” Photon. Res. 7, 823–827 (2019). [CrossRef]
118. X. Lin, Y. Rivenson, N. T. Yardimci, M. Veli, Y. Luo, M. Jarrahi, and A. Ozcan, “All-optical machine learning using diffractive deep neural networks,” Science 361, 1004–1008 (2018). [CrossRef]
119. C. Qian, X. Lin, X. Lin, J. Xu, Y. Sun, E. Li, B. Zhang, and H. Chen, “Performing optical logic operations by a diffractive neural network,” Light Sci. Appl. 9, 59 (2020). [CrossRef]