## Abstract

An approach to optimizing the *Q* factors of
two-dimensional photonic crystal (2D-PC) nanocavities based on deep
learning is hereby proposed and demonstrated. We prepare a data set
consisting of 1000 nanocavities generated by randomly displacing the
positions of many air holes in a base nanocavity and calculate their
*Q* factors using a first-principles method. We train a
four-layer neural network including a convolutional layer to recognize
the relationship between the air holes’ displacements and the
*Q* factors using the prepared data set. After the
training, the neural network is able to estimate the
*Q* factors from the air holes’ displacements with an
error of 13% in standard deviation. Crucially, the trained neural
network can estimate the gradient of the *Q* factor
with respect to the air holes’ displacements very quickly using
back-propagation. A nanocavity structure with an extremely high
*Q* factor of 1.58 × 10^{9} was successfully
obtained by optimizing the positions of 50 holes over ~10^{6}
iterations, taking advantage of the very fast evaluation of the
gradient in high-dimensional parameter spaces. The obtained
*Q* factor is more than one order of magnitude higher
than that of the base cavity and more than twice that of the highest
*Q* factors reported so far for cavities with similar
modal volumes. This approach can optimize 2D-PC structures over a
parameter space of a size unfeasibly large for previous optimization
methods that were based solely on direct calculations. We believe that
this approach is also useful for improving other optical
characteristics.

© 2018 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

Photonic nanocavities based on artificial defects in two-dimensional (2D)
photonic crystal (PC) slabs have been used to realize high quality
(*Q*) factors from ~thousand [1], tens of thousands [2], hundreds of thousands [3], millions [4–9], to more than ten million [10] together with small modal volumes
(*V*_{cav}) of the order of one cubic wavelength or
less. A higher *Q* factor increases the storage time of
photons and light-matter interaction time, while a smaller
*V*_{cav} enhances the light matter interaction
strength and decreases the footprint. There have been various efforts to
increase the *Q* factors and/or
*Q*/*V* of 2D-PC slab nanocavities both in
theory and experiments [2–17]. The developed PC nanocavities have been used
for various applications including ultracompact channel add/drop devices
[1], nano-lasers [18], laser arrays for sensing [19], strongly coupled light-matter
systems in solids [20,21], ultra-low-power consumption optical
bi-stable systems [22],
ultracompact and low-threshold all-Si Raman lasers [23], and photonic buffer memories [24–26]. However, further improvement is desirable
for the realization of more advanced applications.

The fundamental design principle to increase the designed
*Q* factor (*Q*_{des}) of 2D-PC
nanocavities is well known: the wavevector components of the cavity
electro-magnetic field within the light cone should be decreased as much
as possible to reduce radiation loss [11]. Many design methods including Gaussian envelope approaches
[2,3], analytic inverse problem approaches [12,13], genetic algorithms [14,15],
and leaky position visualization [16] have been proposed for obtaining a higher
*Q*_{des} while keeping a small
*V*_{cav}. For example, a five-step heterostructure
nanocavity comprising a defect waveguide with lattice constant modulation
analytically designed to realize a Gaussian envelope function for the mode
field was reported to have a *Q*_{des} of 7 ×
10^{8} and a *V*_{cav} of 1.3 cubic
wavelength in the material (λ/n)^{3} with an assistance of leaky
mode visualization technique [12].
In addition, a two-step heterostructure nanocavity with a
*Q*_{des} of 1.4 × 10^{8} and a
*V*_{cav} of 1.5 (λ/n)^{3} was reported
[9], where the positions of the
eight air holes near the center of the cavity (two parameters) were tuned
using the leaky position visualization method [16]. Recently, the L4/3 cavity, in which positions of 22
air holes (11 parameters) were tuned by genetic algorithms [14,15],
was reported to have a *Q*_{des} of 2.1 ×
10^{7} and a *V*_{cav} of 0.32
(λ/n)^{3} [17]. Though
these approaches were successful, there still remain numbers of unused
design freedoms in PC nanocavities that are difficult to fully utilize due
to the large cost involved in calculating the gradient of the
*Q* factor in a high dimensional structural parameter space
in each step of structural optimization.

In this paper, we propose an approach to optimize 2D-PC nanocavities based
on deep learning of the relationship between the nanocavities’ structures
and their *Q* factors. We prepare a data set consisting of
1000 different nanocavities whose air hole’s positions are randomly but
symmetrically displaced. Their *Q* factors are calculated
by a first-principles method where multiple parallel computation
techniques can be fully utilized to reduce the computation time. Next, we
train a four-layer artificial neural network (NN) including a
convolutional layer using the data set to recognize the relationship
between the air hole displacement patterns and their *Q*
factors. The trained NN is able to predict the gradient of the
*Q* factor with respect to the air holes’ displacements at
a speed extremely faster than the first-principles calculation. This is
used to optimize the displacements of many air holes (50 air holes, 27
parameters) for a large number of repetitions (>1,000,000). This
optimization method demonstrates a very high *Q* factor
that exceeds one billion.

## 2. Framework

In general, automatic structural optimization with respect to target characteristic value(s) requires at least three steps; (a) select a set of parameters that represents the structure to be optimized, (b) calculate the gradient of the target characteristic value(s) with respect to the structural parameters, and (c) modify the structural parameters based on the calculated gradient. (b) and (c) are repeated until the target value(s) saturates. It is important to select all parameters that have a strong correlation with the target characteristic(s) in step (a). In step (b), fast evaluation of the gradient is required to ensure enough repetition of the optimization. However, it is difficult to fulfill these requirements when the structures to be optimized have large degrees of freedom and requires a large computation cost for the evaluation of the gradient.

This situation applies to *Q* factor optimization in 2D-PC
nanocavities, and we utilize a deep neural network (DNN) [27] to resolve these requirements.
(Structural optimization of optical nanostructures using neural network
has been reported for multilayered films [28] and metamaterials [29,30].) A DNN implements a
complex non-linear function that associates a fixed-size input to a fixed
size output through multiple units connected from layer to layer by linear
and nonlinear operations. Because a DNN contains a large number of
internal adjustable parameters (such as connection weights and biases) for
tuning, it can approximate various input-output relationships once the
internal parameters are properly tuned using many sets of example
input-output data (training data). In particular, a DNN that contains
convolutional layer(s), called a convolutional network (CNN), is very
effective for learning the spatial features of input data [31]. Because such a CNN is effective for
image processing [32], it is
considered useful for learning the relationship between the structure of
2D-PC nanocavities and their *Q* factors. Once we obtain a
properly trained CNN using a data set prepared by first-principles
calculations, the gradient of the *Q* factor with respect
to the structural parameters can be estimated much faster than the direct
calculations. Therefore, optimization of a number of parameters can be
repeated many times to fully exert the potential of 2D-PC structures,
whereas this is impossible using method that are solely based on a direct
calculation due to the exponential increase in computational costs with
increasing dimensions of the structural parameters. Our strategy is
summarized as follows:

- I. Select a base cavity structure to be optimized.
- II. Select the type of structural parameter to be optimized. (e.g. air hole position, air hole size, air hole shape)
- III. Generate many 2D-PC nanocavities from the base structure by randomly fluctuating all structural parameters selected in (II) in an area much larger than the cavity field.
- IV. Calculate the Q factors of the nanocavities prepared in (III) by a first principles method, where many structures can be calculated separately in a multiple parallel fashion to reduce the computation time.
- V. Prepare an NN to learn the relationship between the structural parameters and the Q factors.
- VI. Train the NN using the relationship between the subset of the parameters selected from (III) and the Q factors calculated in (IV).
- VII. Find the best set of parameters that minimizes the error between the Q factors predicted by the NN and those calculated by the first principles method.
- VIII. Starting from an initial cavity structure, gradually change the structural parameters selected in (VII) using the gradient of the Q factor with respect to the parameters predicted by the trained NN many times until the Q factor saturates.
- IX. Check the true Q factor of the optimized structure obtained in (VIII) by the first principles calculation.

## 3. Results

In this section, we describe the optimization of a two-step heterostructure
nanocavity as an example. Fig.
1 shows the structure of a base nanocavity [step (I)] that is a
two-step heterostructure nanocavity made of silicon slab with a thickness
of 220 nm. The radii of air holes were 110 nm, and a line defect waveguide
was formed by filling a row of air holes. The base lattice constant,
*a* was 410 nm, and those around the center of the
nanocavity were modulated by 3 nm in two steps to confine light by the
mode gap effect [3], as shown in
Fig. 1. The eight air holes shown
in the figure were shifted from their original positions by an order of
*a*/1000 through a manual tuning process based on the leaky
position visualization technique [16]. This manual tuning process increased the *Q*
factor of the nanocavity from 50 million (before tuning) to 140 million
(after tuning) [10], however
*V*_{cav} was almost unchanged [~1.5
(λ/n)^{3}].

#### 3.1 Preparation of data set for learning

In step (II), we selected the displacements of the air holes as the
parameters to be optimized. This is because the displacement of the
air holes can be implemented in the fabrication process more
accurately than other parameters such as the air hole radii or shapes.
The air holes’ positions can be accurately controlled in the electron
beam writing process, while their radii and shapes are largely
influenced by an etching process that is more difficult to control. In
step (III), we added random displacements to all air holes in the
*x* and *y* directions in such a way
that the symmetry of the structure was maintained. We maintained the
symmetry because an asymmetry of a PC cavity increases radiation loss
[11]. The induced random
displacements obeyed a Gaussian distribution with a standard deviation
of *a*/1000. This magnitude of fluctuation was
determined from experience of the manual optimization mentioned above
[10]. We generated 1000
randomly-fluctuated nanocavity structures using the above
procedure.

In step (IV), we calculated the electromagnetic field and
*Q* factors of the fundamental resonant mode for the
1000 structures using the three-dimensional (3D) finite difference
time domain (FDTD) method. We used sub cells with a size of
$\left(\text{\delta x},\text{\delta y},\text{\delta z}\right)=\left(\frac{a}{10\times 400},\frac{\sqrt{3}a}{16\times 400},\frac{0.5366\times a}{6\times 50}\right)$ for the discretization of the dielectric
constant distributions. Next, the dielectric constants of the cells
for FDTD calculation were determined by averaging the sub cells in
each FDTD cell. The size of the FDTD cell was $\left(\text{\Delta x},\text{\Delta y},\text{\Delta z}\right)=\left(\frac{a}{10},\frac{\sqrt{3}a}{16},\frac{0.5366\times a}{6}\right)$. Therefore, change of the order of
a/4000 in dielectric constant distribution according to the air holes’
displacements can be reflected in the FDTD calculation.

A histogram of the calculated *Q* factors
(*Q*_{FDTD}) is plotted in Fig. 2, and examples of the generated cavity structures are shown in
Fig. 3 with their electric fields (*E*_{y}) and
*Q*_{FDTD}. It can be seen from Fig. 2 that
*Q*_{FDTD} is distributed in a range of almost
two-orders of magnitudes, from ~10^{6} to more than 3 ×
10^{8}, however, it was mainly concentrated in the region
below ~3 × 10^{7}. In addition, there is a steep drop at the
lower-*Q* side of the peak. We thought that this
nonuniform distribution of *Q*_{FDTD} is
relatively difficult to be learned by an NN [27], and transformed
*Q*_{FDTD} to
log_{10}(*Q*_{FDTD}), as shown in Fig. 2(b) that shows a more uniform
distribution similar to the Gaussian distribution. As a result, we
used the relationships between the air hole’s displacement patterns
and log_{10}(*Q*_{FDTD})s of the 1000
prepared structures as data set for the machine learning in step
(VI).

#### 3.2 Configuration of neural network

Fig. 4 shows the configuration of the neural network prepared in step
(V). The input data were a set of displacement vectors
${\stackrel{\u20d7}{d}}_{i,j}$ of air holes [from the positions of a
base cavity (I)] in an *N*_{x}
(*a*) × *N*_{y} (rows) area
around the center of the nanocavity that is normalized by the unit of
*a*/1000, where *i* and
*j* are the discrete *x* and
*y* coordinates of the air holes. The first layer was a
convolutional layer [31] in
which ${\stackrel{\u20d7}{d}}_{i,j}$ in a local area of the input are
summarized into one unit in the next layer by element-wise
multiplication with a weight matrix of size
*N*_{fw} (holes) ×
*N*_{fh} (rows) (called filter) and summation.
By iteratively shifting the application area of this operation, where
the amount of shift is defined as a stride, the input is convoluted
with the filter to be summarized into a feature map. We used 50
filters of size 3 × 5 ( × 2 channels: *x* and
*y* displacements) so that the second layer contained
50 different summaries (feature maps) of the input, where the strides
in the *x* and *y* directions were 1 and
2, respectively.

The second layer was fully connected to the third layer with 200 units
through a rectified linear unit (ReLU [33]) and Affine transformation (multiplication with a weight
matrix followed by summation with a bias vector). The third layer was
fully connected to the fourth layer with 50 units through ReLU, random
information selection units (Dropout [34]), and Affine transformation. Finally, the information in
the fourth layer was summarized into the one output unit through ReLU
and Affine transformation. This output unit was supposed to predict
log_{10}(*Q*_{FDTD}).

#### 3.3. Training of neural network

### 3.3.1 Loss function

In step (VI), we trained this neural network using 900 data among
the 1000 prepared in steps (III and IV) and left the remaining 100
data as a test data set. A test data set is necessary to avoid
overfitting that is a situation in which an NN is too optimized to
the training examples so that it cannot predict meaningful answers
for other inputs. It is important to maximize the generalization
ability of an NN that is an ability to predict meaningful answers
to new inputs it did not see during the training. Therefore, the
generalization ability of an NN should be checked using a test
data set at intervals [27].
For the training, we set the loss function, *L*
as

The first term in Eq.
(1) represents the deviation from the true answer. The
second term was introduced to penalize large connection weights
${w}_{i}$ in the network, where the summation
is taken over all the weights in the network. This additional term
is effective in preventing overfitting (the weight decay method
[35]), and $\text{\lambda}$ is the control parameter (we used
$\text{\lambda}=$ 0.001). We randomly selected one set
of “${\stackrel{\u20d7}{d}}_{i,j}$
pattern”-“log_{10}(*Q*_{FDTD})”
from the training data set, and gradually changed the internal
parameters of the NN to reduce *L* (stochastic
gradient method [27]) based
on the gradient of *L* with respect to the internal
parameters that was obtained using the back-propagation method
[36]. We also applied the
Momentum optimization method to speed up the convergence [37], where the learning rate and
the momentum decay rate were set to 0.001 and 0.9, respectively.
The learning step was repeated until *L* of the
test data set converges. The accuracy of the prediction was
evaluated as the standard deviations of the output [ =
log_{10}(*Q*_{NN})] from the true
answer [log_{10}(*Q*_{FDTD})] for
the test data set (${\sigma}_{\text{test}}$) and training data set
(${\sigma}_{\text{train}}$). These values were further
converted into more comprehensive average prediction errors for
the *Q* values (*E*_{Q})
using the following equation:

*Q*

_{NN}’s are statistically distributed within (1 ±

*E*) ×

_{Q}*Q*

_{FDTD}’s

### 3.3.2 Training example

An example of a learning curve [iteration of learning
(optimization) vs. prediction error] is shown in Fig. 5(a), where the input area size (*N*_{x},
*N*_{y}) = (10, 5). It can be seen from the
figure that *E _{Q}* for the training and
test data sets are initially more than 80%, however, it decreases
to less than 20% after ~2000 learning iterations. It is natural
that

*E*

_{Q}_{test}is always larger than

*E*

_{Q}_{train}because the NN has never learned the test data set. Nevertheless, the minimum

*E*

_{Q}_{test}became as small as ~16% within 2 × 10

^{5}iterations. The correlation between

*Q*

_{FDTD}and

*Q*

_{NN}for the test and training data sets (for the case with minimum

*E*

_{Q}_{test}) is shown in Fig. 5(b) and 5(c), respectively. Good correlation (with a correlation coefficient of 0.92) was obtained even for the test data set. This result demonstrates successful achievement of the generalization ability, at least within the parameter space in which the prepared structure was distributed (dimensions: 10 × 5 × 2, range of each displacement: ~ ±

*a*/1000). We noticed that the correlation is better for the lower-

*Q*region, and that the deviation from

*Q*

_{FDTD}increases in the higher-

*Q*region. This is because the number of training data sets is much smaller in the higher-

*Q*region compared to the lower

*Q*region as shown in Fig. 2.

### 3.3.3 Dependence on input area size

Next, we trained the NN by changing the input area size
(*N _{x}*,

*N*), and plotted the minimum

_{y}*E*

_{Q}_{test}(obtained during 2 × 10

^{5}iterations of the learning steps) as a function of

*N*and

_{x}*N*in Fig. 6. It can be observed from the figure that the prediction error is higher than 60% when (

_{y}*N*,

_{x}*N*) is as small as (2, 5). The prediction error decreases as the input area size increases, and the case with (

_{y}*N*,

_{x}*N*) = (13, 5) shows the minimum prediction error of ~13%, where the correlation coefficients between

_{y}*Q*

_{FDTD}and

*Q*

_{NN}for the test and training data sets were 0.96 and 0.99, respectively. However, the error increases again for a larger input area size. It is considered that the learning process was disturbed when the air holes’ displacements that have small correlations with

*Q*

_{FDTD}were input in the NN because they acted as noise for the learning process. This provides a hint: Fig. 6 allows us to pick up the structural parameters that have strong correlations with the target value, that is, the parameters that are effective in the optimization process. In this case, we decided to optimize the displacements of airholes in the input area with (

*N*,

_{x}*N*) = (13,5) in step (VIII) that has 27 parameters in total. It can be seen from the lower-right inset of Fig. 6 that this area corresponds to the area where the electric field of the cavity is mainly distributed. Therefore, we can omit this step (VII) by simply setting the input area size to be the main area of cavity field distribution from the next time.

_{y}#### 3.4. Structural optimization by trained neural network

### 3.4.1. Loss function

In step (VIII), we performed a structural optimization of the
nanocavity using the gradient method, where the gradient was
calculated by the trained NN. More precisely, we took advantage of
the error back propagation method [36] to enable a high-speed calculation of the gradient.
The back-propagation method is extremely effective for calculating
the gradient of loss with respect to the internal parameters of an
NN, which was already used in the training process (VI). This
method can also be used to rapidly obtain the gradient of loss
with respect to the input parameters (i.e., the air holes’
displacements). We set the loss $L\text{'}$ as ${\left|{\mathrm{log}}_{10}{Q}_{NNW}-{\mathrm{log}}_{10}{Q}_{target}\right|}^{2}$, and calculated the gradient of
$L\text{'}$ with respect to ${\stackrel{\u20d7}{d}}_{i,j}$ using the same framework used for
the training process, where ${Q}_{target}$ was set to a very high constant
value (10^{10}). We also added an artificial loss to
penalize large displacements to constrain the air holes from
moving far away from the parameter space that the NN learned in
step (VI), where the prediction error was small:

*L*’ based on the Momentum [37] method:

*i*,

*j*in the

*n*-th step (${\overrightarrow{v}}_{i,j}^{0}=0)$, $\gamma $ is a control parameter called the momentum damping factor ( = 0.9), and ${o}_{r}$ is the optimization rate ( = 1 × 10

^{−5}). This process was iterated 10

^{6}times.

### 3.4.2 Optimization curve

In 3.3.3, we decided to use NN with an input area size of
(*N _{x}*,

*N*) = (13,5). Consequently, only the displacements of airholes in this area were recognized by the NN and modified in the optimization step. Fig. 7(a) shows the evolution of

_{y}*Q*

_{NN}during the optimization for various $\text{\lambda}$’s from 0.01 to 1, where the initial structure ${\overrightarrow{d}}_{i,j}^{0}$ was set to the structure that had the highest

*Q*

_{FDTD}[ = 3.8 × 10

^{8}, Fig. 3(a)] among the 1000 randomly prepared structures in step (III-IV). The displacements of the air holes for this initial structure recognized by the

*NN*are shown in Fig. 7(b). We also optimized the structure using $\text{\lambda '}=0.001$, however, the obtained result was identical to the case with $\text{\lambda '}=0.01$. It is seen in the figure that

*Q*

_{NN}increased from the original value after optimization in the cases with $\text{\lambda '}\le 0.1$.

*Q*

_{NN}slightly decreased from the original value after optimization in the cases with $\text{\lambda '}\ge 0.5$. This is considered because the high additional loss of $\frac{1}{2}{\lambda}^{\prime}{\displaystyle \sum _{i,j}{\left|{\overrightarrow{d}}_{i,j}\right|}^{2}}$ inhibited large deformation from the original structure (Fig. 1). We also performed the optimization process with a completely different randomly created initial structure (denoted as fluc46398753, Fig. 7(c)) with $\text{\lambda '}=0.05$. The result is plotted in Fig. 7(a) as indicated by the brown solid line. The initial

*Q*

_{NN}of this structure is as low as 1 × 10

^{7}, however, it increased to 4.84 × 10

^{8}after optimization. The final

*Q*

_{NN}is almost the same as the case started from the structure in Fig. 3(a) with $\text{\lambda '}=0.05$, and the obtained structures were also almost identical [see Fig. 8(b) and 8(f)]. Therefore, the selection of the initial structure is not so important.

### 3.4.3 Validation by FDTD

In step (IX), we calculated the *Q* factors for the
structures obtained in step (VIII) using the 3D-FDTD method. Here,
the displacements of air holes outside the NN input area were set
to zero because these air holes have small correlation with the Q
factor as discussed in 3.3.3. The results are summarized in Fig. 8, where the structure after
optimization, *Q*_{FDTD},
*Q*_{NN}, electric field distribution
(*E*_{y}), and cavity modal volume
(*V*_{cav}) are shown. (Precise
displacement values of air holes are provided in
Dataset 1
[38].) It can be seen from
the figure that an extremely high
*Q*_{FDTD} of 1.58 × 10^{9} was
obtained with $\text{\lambda '}=\text{}0.1$ [Fig. 8(c)]. This *Q*_{FDTD} is one
order of magnitude higher than the manually optimized two-step
heterostructure nanocavity (*Q*_{FDTD} =
1.37 × 10^{8}, Fig.
1), and more than twice the highest
*Q*_{FDTD} of the 2D-PC nanocavity reported
so far (*Q*_{FDTD} = 7 × 10^{8}
[11]). The successful
achievement of such an extremely high
*Q*_{FDTD} demonstrates the effectiveness
of the proposed optimization method, where large degrees of
freedom of the 2D-PC structure were effectively utilized, as can
be seen from a comparison between Fig. 1 and Fig.
8.

## 4. Discussion

#### 4.1 Discussion of results

It can be observed from Fig.
8(c) that *Q*_{NN} is less than 1/3 of
*Q*_{FDTD}. This is understood from the
response of the trained NN shown in Fig. 5(b) and 5(c), where *Q*_{NN}
tends to be lower than *Q*_{FDTD} as
*Q*_{FDTD} increases. As discussed before, the
number of data with *Q*_{FDTD} > 1 ×
10^{8} is rare (40 samples among 1000), and there are many
data samples with lower *Q*_{FDTD} (Fig. 2) so that the prediction tends
to be low for high *Q*_{FDTD} structures.
Nevertheless, the fact that the structure optimized by this method has
a larger *Q*_{FDTD} than the initial structure
indicates that the *direction* of the gradient of
*Q*_{FDTD} with respect to the air holes’
displacements were properly evaluated by the trained NN.

It is also interesting that the highest
*Q*_{FDTD} was achieved by constraining the
magnitude of the airholes’ displacements to some extent by increasing
$\text{\lambda '}$ in the loss function *L*’
[Eq. (3))]. In Fig. 8(a) to 8(c),
*Q*_{FDTD} increases from 4.48 × 10^{8}
to 1.58 × 10^{9} as $\text{\lambda '}$ increases from 0.01 to 0.1, and the
corresponding air holes’ displacements decreases. The structure in
Fig. 8(c) shows a much larger
*Q*_{FDTD} than that of Fig. 8(a) because the accuracy of the
*Q* prediction becomes lower as the displacements of
the air holes move away from the center of the parameter space that
the NN has learned (i.e. ${\stackrel{\u20d7}{d}}_{\text{i},\text{j}}$ = 0). For the case with $\text{\lambda '}$ >0.5, the magnitudes of the
displacements are too constraint to obtain the highest
*Q*_{FDTD}, however,
*Q*_{FDTD}’s > 1.39 × 10^{9} were
still realized.

We also compare Fig. 8(b) and
8(f). Although the initial structures for these two cases are
completely different (as can be seen in Fig. 7(b) and 7(c)), the final optimized structures
and their *Q*_{FDTD}’s are almost the same.
This result indicates that the structures obtained using this method
are globally optimized at least within and near the parameter space
that the NN learned.

#### 4.2 Comparison with other optimization methods

Finally, we compare the proposed method with the other state of the art optimization methods: (A) Genetic algorithm [14,15] and (B) leaky component visualization [16].

(A) Optimization based on genetic algorithm prepare many randomly
generated cavities (individuals) and select better individuals to
prepare individuals in the next generation with natural-selection,
cross-over and random mutations that is repeated until the
*Q* factor converges [14]. This method can optimize cavities automatically, however,
it requires the calculation of relatively large numbers of cavities.
Ref [15] reports that a few
tens of generations with 80 individuals (i.e. calculation of 1600~2400
patterns of cavities) were required to optimize 3 parameters (shift of
3 air holes) in the L3 cavity. The requirement increases as the
numbers of parameters to be optimized increases. To optimize 5
parameters in the L3 cavity, where *Q*_{FDTD}
of 4.2 million with *V*_{cav} =
0.95(λ/n)^{3} has been achieved, 100 generations were required
(i.e. calculation of ~8000 cavities). To optimize 7 parameters in the
H0 cavity, where *Q*_{FDTD} of 8.3 million with
*V*_{cav} = 0.64(λ/n)^{3} and
*Q*_{FDTD} of 1.66 million with
*V*_{cav} = 0.34(λ/n)^{3} have been
achieved, 300 generations with 120 individuals in each generation were
required (i.e. calculation of 36000 cavities).

(B) Optimization based on leaky component visualization utilizes
Fourier transformation of a cavity mode field followed by clipping of
the components within the light cone and inverse Fourier
transformation to visualize the real space region where out of plane
loss occurs. The air holes located at the leaky region were manually
tuned by scanning the parameters (e.g. positions or radii) and by
calculating the *Q* factor to obtain local maxima. This
procedure was repeated until the *Q* factor converges.
This method required less numbers of cavity patterns to optimize the
structure. In [16], we repeated
the procedure 8 times and calculated 10~20 cavities in each step to
optimize the L3 cavity (i.e. calculation of < 200 cavities), where
*Q*_{FDTD} of 5.02 million with
*V*_{cav} = 0.75(λ/n)^{3} has been
achieved by tuning 9 parameters. To optimize the H0 cavity, the
procedure was repeated four times (i.e. calculation of < 100
cavities), where *Q*_{FDTD} of 1.67 million
with *V*_{cav} = 0.31(λ/n)^{3} has been
achieved by tuning five parameters. Method (B) required much less
samples for the optimization because the most important region is
determined and optimized in each step. However, this method needs
manual tuning of parameters because the parameter(s) to be optimized
are selected manually and the scanning range is also determined
manually. In addition, there is a risk that the optimization becomes
local because only one or two parameters were tuned in each step.

In comparison to these two methods, the proposed method based on deep
learning requires relatively small numbers of sample cavities (1000)
to optimize large numbers of parameters (27 parameters), and the
optimization is performed automatically. (Learning and optimization
process consume much less time (< 2 hours) compared to the FDTD
calculation of 1000 sample cavities.) Because the features of the
cavities were recognized by a layered network, not only spatially
local features but spatially global features that relate to the
*Q* factors were separately learned in each layer of
the network that were then used to optimize the cavity structure very
efficiently. The recognition of cavity structures by deep layered
networks is the key of this optimization method, and can be applied to
more general structures even though the target cavity optimized in
this paper is different from those discussed the above. However, as
discussed in 4.1, our approach works well in the vicinity of the
parameter space where the learning data set was prepared. When the
structure needs to be modified significantly, two-step approach (rough
and fine data sets) is considered effective. In such cases, the number
of sample cavities required for optimization will increase.

## 5. Conclusion

We have proposed and demonstrated a novel approach for optimizing 2D-PC
nanocavities based on deep learning of the relationship between
nanocavities’ structures and their *Q* factors. We have
successfully trained a neural network consisting of a convolutional layer
and three fully connected layers using 1000 randomly generated
nanocavities and their *Q* factors. After the training, the
convolutional neural network was able to predict *Q*
factors from the displacement patterns of air holes with an error of 13%
in standard deviation. Structural optimization was performed by estimating
the gradient of *Q* with respect to the displacements of
the air holes using the trained neural network. A nanocavity structure
with an extremely high theoretical *Q* factor of 1.58 ×
10^{9} that is 10 times larger than that of the manually optimized
base structure, and more than twice the highest *Q* factor
ever reported for 2D-PC cavities with similar modal volumes, was
successfully obtained. We attribute our unprecedentedly high
*Q* factor to the ability of our method to optimize the
nanocavity over a parameter space of a size unfeasibly large for previous
methods that were based solely on direct calculations. We believe that
this approach is effective for the optimization of various types of 2D-PC
nanocavity structures, not only for increasing *Q* factors
but also for improving other target characteristics.

## Funding

JSPS KAKENHI (15H03993); New Energy and Industrial Technology Development Organization (NEDO).

## Acknowledgment

The authors would like to thank Mr. Koki Saito for his helpful textbook on deep learning written in Japanese (Deep learning from scratch, O’Reilly Japan).

## References

**1. **S. Noda, A. Chutinan, and M. Imada, “Trapping and emission of
photons by a single defect in a photonic bandgap
structure,” Nature **407**(6804),
608–610 (2000). [CrossRef] [PubMed]

**2. **Y. Akahane, T. Asano, B.-S. Song, and S. Noda, “High-Q photonic nanocavity in
a two-dimensional photonic crystal,”
Nature **425**(6961),
944–947 (2003). [CrossRef] [PubMed]

**3. **B. S. Song, S. Noda, T. Asano, and Y. Akahane, “Ultra-high-Q photonic
double-heterostructure nanocavity,” Nat.
Mater. **4**(3),
207–210 (2005). [CrossRef]

**4. **T. Asano, B.-S. Song, and S. Noda, “Analysis of the experimental
Q factors (~ 1 million) of photonic crystal
nanocavities,” Opt. Express **14**(5),
1996–2002 (2006). [CrossRef] [PubMed]

**5. **E. Kuramochi, M. Notomi, S. Mitsugi, A. Shinya, T. Tanabe, and T. Watanabe, “Ultrahigh-Q photonic crystal
nanocavities realized by the local width modulation of a line
defect,” Appl. Phys. Lett. **88**(4), 041112
(2006). [CrossRef]

**6. **Y. Takahashi, H. Hagino, Y. Tanaka, B. S. Song, T. Asano, and S. Noda, “High-Q nanocavity with a 2-ns
photon lifetime,” Opt. Express **15**(25),
17206–17213 (2007). [CrossRef] [PubMed]

**7. **E. Kuramochi, H. Taniyama, T. Tanabe, A. Shinya, and M. Notomi, “Ultrahigh-Q two-dimensional
photonic crystal slab nanocavities in very thin
barriers,” Appl. Phys. Lett. **93**(11), 111112
(2008). [CrossRef]

**8. **Z. Han, X. Checoury, D. Néel, S. David, M. El Kurdi, and P. Boucaud, “Optimized design for
2×10^{6} ultra-high Q silicon photonic crystal
cavities,” Opt. Commun. **283**(21),
4387–4391 (2010). [CrossRef]

**9. **H. Sekoguchi, Y. Takahashi, T. Asano, and S. Noda, “Photonic crystal nanocavity
with a Q-factor of ~9 million,” Opt.
Express **22**(1),
916–924 (2014). [CrossRef] [PubMed]

**10. **T. Asano, Y. Ochi, Y. Takahashi, K. Kishimoto, and S. Noda, “Photonic crystal nanocavity
with a Q factor exceeding eleven million,”
Opt. Express **25**(3),
1769–1777 (2017). [CrossRef] [PubMed]

**11. **K. Srinivasan and O. Painter, “Momentum space design of
high-Q photonic crystal optical cavities,”
Opt. Express **10**(15),
670–684 (2002). [CrossRef] [PubMed]

**12. **D. Englund, I. Fushman, and J. Vucković, “General recipe for designing
photonic crystal cavities,” Opt.
Express **13**(16),
5961–5975 (2005). [CrossRef] [PubMed]

**13. **Y. Tanaka, T. Asano, and S. Noda, “Design of photonic crystal
nanocavity with Q-Factor of ~ 10^{9},”
J. Lightwave Technol. **26**(11),
1532–1539 (2008). [CrossRef]

**14. **Y. Lai, S. Pirotta, G. Urbinati, D. Gerace, M. Minkov, V. Savona, A. Badolato, and M. Galli, “Genetically designed L3
photonic crystal nanocavities with measured quality factor exceeding
one million,” Appl. Phys. Lett. **104**(24), 241101
(2014). [CrossRef]

**15. **M. Minkov and V. Savona, “Automated optimization of
photonic crystal slab cavities,” Sci.
Rep. **4**(1),
5124 (2014). [CrossRef] [PubMed]

**16. **T. Nakamura, Y. Takahashi, Y. Tanaka, T. Asano, and S. Noda, “Improvement in the quality
factors for photonic crystal nanocavities via visualization of the
leaky components,” Opt. Express **24**(9),
9541–9549 (2016). [CrossRef] [PubMed]

**17. **M. Minkov, V. Savona, and D. Gerace, “Photonic crystal slab cavity
simultaneously optimized for ultra-high Q / V and vertical radiation
coupling,” Appl. Phys. Lett. **111**(13), 131104
(2017). [CrossRef]

**18. **M. Nomura, N. Kumagai, S. Iwamoto, Y. Ota, and Y. Arakawa, “Laser oscillation in a
strongly coupled single quantum dot-nanocavity
system,” Nat. Phys. **6**(4),
279–283 (2010). [CrossRef]

**19. **S. Kita, K. Nozaki, and T. Baba, “Refractive index sensing
utilizing a cw photonic crystal nanolaser and its array
configuration,” Opt. Express **16**(11),
8174–8180 (2008). [CrossRef] [PubMed]

**20. **T. Yoshie, A. Scherer, J. Hendrickson, G. Khitrova, H. M. Gibbs, G. Rupper, C. Ell, O. B. Shchekin, and D. G. Deppe, “Vacuum Rabi splitting with a
single quantum dot in a photonic crystal nanocavity,”
Nature **432**(7014),
200–203 (2004). [CrossRef] [PubMed]

**21. **S. Sun, H. Kim, Z. Luo, G. S. Solomon, and E. Waks, “A single-photon switch and
transistor enabled by a solid-state quantum memory,”
Science **361**(6397),
57–60 (2018). [CrossRef] [PubMed]

**22. **K. Nozaki, A. Shinya, S. Matsuo, Y. Suzaki, T. Segawa, T. Sato, Y. Kawaguchi, R. Takahashi, and M. Notomi, “Ultralow-power all-optical
RAM based on nanocavities,” Nat.
Photonics **6**(4),
248–252 (2012). [CrossRef]

**23. **Y. Takahashi, Y. Inui, M. Chihara, T. Asano, R. Terawaki, and S. Noda, “A micrometre-scale Raman
silicon laser with a microwatt threshold,”
Nature **498**(7455),
470–474 (2013). [CrossRef] [PubMed]

**24. **Y. Tanaka, J. Upham, T. Nagashima, T. Sugiya, T. Asano, and S. Noda, “Dynamic control of the Q
factor in a photonic crystal nanocavity,” Nat.
Mater. **6**(11),
862–865 (2007). [CrossRef] [PubMed]

**25. **Y. Sato, Y. Tanaka, J. Upham, Y. Takahashi, T. Asano, and S. Noda, “Strong coupling between
distant photonic nanocavities and its dynamic
control,” Nat. Photonics **6**(1),
56–61 (2012). [CrossRef]

**26. **R. Konoike, H. Nakagawa, M. Nakadai, T. Asano, Y. Tanaka, and S. Noda, “On-demand transfer of trapped
photons on a chip,” Sci. Adv. **2**(5), e1501690
(2016). [CrossRef] [PubMed]

**27. **Y. LeCun, Y. Bengio, and G. Hinton, “Deep
learning,” Nature **521**(7553),
436–444 (2015). [CrossRef] [PubMed]

**28. **D. Liu, Y. Tan, E. Khoram, and Z. Yu, “Training deep neural networks
for the inverse design of nanophotonic structures,”
ACS Photonics **5**(4),
1365–1369 (2018). [CrossRef]

**29. **W. Ma, F. Cheng, and Y. Liu, “Deep-learning-enabled
on-demand design of chiral metamaterials,” ACS
Nano **12**(6),
6326–6334 (2018). [CrossRef] [PubMed]

**30. **S. Inampudi and H. Mosallaei, “Neural network based design
of metagratings,” Appl. Phys. Lett. **112**(24), 241102
(2018). [CrossRef]

**31. **Y. LeCun, B. Boser, J. S. Denker,
D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jacklel,
“Handwritten digit recognition with a back-propagation networks,” in
Proceedings of Advances in Neural Information Processing Systems, pp.
396–404 (1990).

**32. **A. Krizhevsky, I. Sutskever, and
G. E. Hinton, “ImageNet classification with deep convolutional neural
networks,” in Proceedings of Advances in Neural Information Processing
Systems, pp. 1097–1105, (2012).

**33. **X. Glorot, A. Bordes, and Y.
Bengio: “Deep sparse rectifier neural networks,” in Proceedings of
Artificial Intelligence and Statistics, pp. 315–323 (2011).

**34. **N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to
prevent neural networks from overfitting,” J.
Mach. Learn. Res. **15**,
1929–1958
(2014).

**35. **A. Krogh and J. A. Hertz, “A
simple weight decay can improve generalization,” in Proceedings of
Advances in Neural Information Processing Systems, pp. 950–957
(1991).

**36. **D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by
back-propagating errors,” Nature **323**(6088),
533–536 (1986). [CrossRef]

**37. **B. T. Polyak, “Some methods of speeding up
the convergence of iteration methods,” USSR
Comput. Math. Math. Phys. **4**(5),
791–803 (1964). [CrossRef]

**38. **Data for the precise
displacement values of air holes are provided: https://doi.org/10.6084/m9.figshare.7223222.