## Abstract

We present a parameter set for obtaining the maximum number of atoms in a grating magneto-optical trap (gMOT) by employing a machine learning algorithm. In the multi-dimensional parameter space, which imposes a challenge for global optimization, the atom number is efficiently modeled via Bayesian optimization with the evaluation of the trap performance given by a Monte-Carlo simulation. Modeling gMOTs for six representative atomic species - ^{7}Li, ^{23}Na, ^{87}Rb, ^{88}Sr, ^{133}Cs, ^{174}Yb - allows us to discover that the optimal grating reflectivity is consistently higher than a simple estimation based on balanced optical molasses. Our algorithm also yields the optimal diffraction angle which is independent of the beam waist. The validity of the optimal parameter set for the case of ^{87}Rb is experimentally verified using a set of grating chips with different reflectivities and diffraction angles.

© 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

For the practical applications of atom-based quantum technologies, significant efforts have been made to reduce the physical size and complexity of experimental apparatuses for a magneto-optical trap (MOT). One approach is to use single beam MOTs such as a pyramid MOT [1–3], a grating MOT (gMOT) [4], and most recently, a MOT with a metasurface beam splitter [5]. These attempts are motivated by the fact that a conventional MOT requires three orthogonal pairs of counter-propagating beams which consist of various space-consuming optical elements. Recently, a gMOT has become a popular choice owing to the open optical access enabled by fully planar geometry. Since the first realization with $^{87}$Rb [4], the use of gMOTs have been extended to $^{7}$Li [6] and $^{88}$Sr [7]. Meanwhile, a number of design factors for microfabricated grating chips have been investigated [8–10], and sub-Doppler temperature has been reached [4,8,11,12]. Applications of gMOTs now include a two-dimensional (2D) MOT [13], a miniaturized cold-atom cell [14,15], a magnetometer [16], an atomic clock [17], an ultracold electron source [18], and an atom interferometer [19].

In general, experimental parameters of a conventional MOT including optical beam paths and intensities can be controlled independently and manually so that optimization of each parameter is straightforward with immediate feedback from measurement. Unfortunately, however, a gMOT obtains its compactness at the expense of parameter controllability. In other words, the grating introduces additional experimental parameters, for example, the diffraction angle and the reflectivity of the incident beam, which are fixed for a given grating chip [8–12]. Under current technology, tuning of the grating parameters can only be done by fabricating many grating chips of different characteristics, and manually replacing them for a targeted parameter set. Moreover, a change in a single grating parameter affects multiple physical mechanisms involved in a gMOT making it hard to predict an efficient route for the manual optimization. Thus, gMOTs are considerably hindered from exploring the large parameter space accessible to conventional MOTs. Indeed, previous works have implemented only a limited number of different gratings over the past decade. The current best performing and widely adopted grating comprises three segments of one-dimensional (1D) line-and-space pattern ordered in a triangular geometry with a wide range of diffraction angles from $27{}^{\circ }$ to $69 {}^{\circ }$ [Fig. 1],[6,7,14,16–18].

In this paper, we investigate the optimal parameters for the maximum number of atoms in a triangular gMOT by building an efficient machine learning (ML) model for a large multi-dimensional parameter space. For this purpose, we first implemented Monte Carlo (MC) numerical simulations which show good agreement with our experimental observations for $^{87}$Rb. Then, we ran a machine learning algorithm upon the MC simulation based on the Bayesian optimization (BO) for a set of atomic species including $^{7}$Li, $^{23}$Na, $^{87}$Rb, $^{88}$Sr, $^{133}$Cs, and $^{174}$Yb. Our BO algorithm can search for the optimal parameters in a six-dimensional parameter space with higher efficiency than conventional optimization algorithms. The parameter dependence around the optimum is again verified by experiments with $^{87}$Rb for four gratings which span three diffraction angles and four reflectivities.

Results from this study indicate that the optimal reflectivity is consistently larger than the widely-used value for balanced optical molasses, which is $R=1/3$ for a three segmented grating [4]. For example, in the case of $^{87}$Rb with the beam waist $w_0=2$ cm as in the experiment, the optimal reflectivity is about 44% (41%) when the grating chip is placed outside (inside) the glass cell. Variation among different atomic species fall within a few percent. We also observe that even higher reflectivity is required as the beam waist of the input laser is reduced.

For the case of optimal diffraction angle, a simple method of estimation is not known since we have to consider nontrivial interplay between the trapping volume and restoring force due to non-orthogonal propagation of the trapping beams. Our optimization result states that the optimal diffraction angle is not very sensitive to the beam size while it can vary by up to $\sim 4^{\circ }$ depending on the atomic species we consider. For example, the optimal angle is about $\sim 38^{\circ }$ for $^{7}$Li and $^{23}$Na while it is $\sim 42^{\circ }$ for $^{87}$Rb, $^{88}$Sr, $^{133}$Cs and $^{174}$Yb when the grating is placed outside the glass cell. The optimal angle is increased by a few degrees if the grating is inside the vacuum chamber.

This paper is organized as follows. In Sec. 2., we discuss the details of our numerical MC simulation. In Sec. 3., the machine learning algorithm and the analysis of its performance is presented. In Sec. 4., we describe the experimental apparatus and the result. Finally, a summary and outlooks are provided in Sec. 5.

## 2. Numerical Monte Carlo simulation

MC simulation is a proven method to numerically study particle trajectories and physical properties of various types of MOTs [20–28] and optical traps [25,29]. The simulation works by first evaluating the force field of the problem of interest. Then, the subsequent particle trajectories are obtained by numerically solving the equations of motion for random initial conditions.

#### 2.1 Average scattering force field in a gMOT

The grating chip we consider here is the type that is most frequently used in gMOT experiments. It is comprised of three equally-divided sections filled with identically patterned binary gratings. The radius of the grating $r_g$ is 1 cm, and the input beam is also apertured to have radius of 1 cm [Fig. 1(b)]. Then, by geometric consideration, the quadrupole field center ($z_{\textrm {off}}$) should be placed around $z_q = \frac {r_g}{2}\cot \phi$ above the grating surface where $\phi$ is the diffraction angle with respect to the normal input direction [Fig. 1(c)].

To proceed the MC simulation, the scattering force from each beam involved in gMOTs should be known. We assume, for the sake of simplicity, that the absorption and the spontaneous emission of photons occur within the $F$ = 0 and $F^\prime$ = 1 manifold where quantum numbers $F$ and $F^\prime$ represent the ground and excited state total angular momentum respectively. Then, the scattering force $\mathbf {F}_i$, which an atom moving with a velocity $\mathbf {v}$ in the quadrupole magnetic field $\mathbf {B} = \frac {B_q}{2}(x,y,-2z)$ experiences by a laser beam labeled by $i$, is written as

In the course of computing the force, one should evaluate the laser beam distribution first. Unlike the conventional MOT, due to the segmented triangular sections of the grating, the diffracted beams are truncated Gaussian beams. The typical size of each grating section is on the order of $\sim 1$ cm$^2$ and the gMOT is formed less than 1 cm above from the grating. Accordingly, the Fresnel number $\pi a^2/\lambda z$ ($a$: size of an aperture, $\lambda$: wavelength of the light, $z$: propagation distance) is much larger than unity, meaning that the light distribution in the region of interest can be approximated as the projection of each aperture [34]. Mathematically, we first calculate a two dimensional step function whose shape is the projected sections of the grating in the coordinate frame in which the $z -$ axis is aligned with the wavevector $\mathbf {k}_i$ as shown in Fig. 2. Then we impose the step function upon the Gaussian beam intensity distribution, creating a truncated beam. One might consider calculating the beam intensity distribution via a decomposition method for exact beam shapes [35], but it is undesirable for the MC simulation where the beam intensity should be evaluated numerous times. Examples of computed beam intensity distribution, and the stream line of the total scattering force for various input beams are shown in Fig. 2. Here we used the detuning $\Delta = -1.5~\Gamma$, the diffraction angle $\phi = 45^{\circ }$, and the input beam peak intensity $I = 10$ mW/cm$^2$ in the case of $^{87}$Rb atoms.

We note that, as McGilligan *et al.* [12] pointed out, the inhomogeneous intensity distribution of a Gaussian beam changes the properties of the gMOT significantly due to the non-orthogonal beam alignment. For example, the vertical ($z$) position of the gMOT is lower compared to the quadrupole field center due to the weakened upward force [Fig. 2(g)], and the relative damping and trapping forces are affected accordingly [Fig. 2(h)]. We further notice that the corresponding modification of the force could eventually repel atoms from the gMOT. The scattering force at the trap center for atoms with near zero velocities can be approximated as a restoring force in the form of $F_x = kx$. An atom can be trapped at the center of the trap only when the spatial gradient of the force at the trap center $dF_x/dx = k$ has a negative value as in Fig. 2(d) and (e). However, in a gMOT, if the Gaussian beam waist is too small, the combined effect of non-orthogonal and non-uniform beam distribution could change the sign of the gradient [Fig. 2(f), (h)], making the formation of the gMOT impossible. For example, when $I$ = 10 mW/cm$^2$ , the scattering force repels atoms unless $w_0 >$ 1.0 cm. One can circumvent this to some extent by increasing the quadrupole field strength, but this may not be desirable because it requires more currents in the anti-Helmholtz coils. In this work, we limit ourselves to the case of $w_0 >$ 1 cm.

#### 2.2 Simulation conditions

Grating chips are often placed outside the glass chamber for easier construction of apparatus. However, in some recent works, the grating is installed inside the vacuum chamber for compactness and better performance [6,7,15,18,19]. Here we consider both cases in the numerical optimization. In the first case, the grating is positioned outside of a glass cell which is anti-reflection (AR) coated on the outside surface only. Thus we should consider the angle-dependent Fresnel reflection loss at the single boundary which corresponds to our experiment. Note that we do not consider the polarization rotation occurring at the surfaces as it is not significant when the amount of the polarization rotation is expressed in terms of the intensity of right or left circular polarization [13]. In the second case, the grating surface is directly exposed to atoms in the vacuum chamber.

Atomic trajectories for 15,000 randomly chosen initial conditions were computed for a given parameter set. The initial positions are randomly sampled in a cylinder which has the same radius (1 cm) as the grating and a height of 2 cm from the grating surface ($z=0$ plane). This cylinder also defines the simulation volume. The initial velocities are randomly taken from a three-dimensional (3D) Maxwell-Boltzmann (MB) distribution in the Cartesian coordinate. There are two strategies to speed up the calculation. One is to use realistic temperature but with a cut-off particle speed [27]. The other is to use the MB distribution with lower temperatures [25]. Here, we use the former with an appropriate cut-off speed $v_c$ which is determined by the estimated capture speed of a gMOT. For our calculations of $^{87}$Rb we use $v_c$=20 m/s. Then, we let each atom evolve under the total force $\mathbf {F}_{\textrm {tot}}=\Sigma \mathbf {F}_i$. Our simulation neglects the higher-order effects such as atom-atom interactions, radiation pressure caused by re-absorption of spontaneously emitted photons, and the optical pumping effects within the hyberfine manifold. See Appendix A for details of the numerical integration procedure.

An atom is considered to be trapped if the position of the atom is found within 1 cm from the gMOT center *and* the velocity is lower than 1 m/s. The number of trapped atom reflects the capture rate $\gamma$ (expressed in the unit of %) of the gMOT for a given condition. Since the atom number in the steady state is linearly proportional to $\gamma$ when the background vapor pressure is low enough [36], we can compare the simulation results with experimental data with proper atom number scaling.

## 3. Optimization using machine learning

In cold-atom experiments, ML technologies have demonstrated the ability to efficiently optimize experimental results often surprising us with nonintuitive yet better results [37–41]. They also have been actively utilized for recognizing features in images of atoms enabling efficient and accurate data analysis [42–49]. In our case, an ML algorithm is implemented to optimize the atom number in a gMOT while minimizing the computational resource required for MC simulation described in the previous section.

#### 3.1 Optimization algorithm

When it comes to optimization using ML, we rely on the fact that artificial intelligence can be very fast in estimating an approximate model for the landscape of the cost function for a given parameter space. Furthermore, evaluating a point in the parameter space using the model is inexpensive in terms of computational cost, allowing for finding its maximum or minimum value effectively. In our case, the minus of the capture rate $-\gamma$ is chosen for the cost function and the parameter space is composed of laser detuning, laser beam intensity, grating reflectivity, etc.

We use Bayesian optimization (BO), also known as Gaussian process regression, which has recently become a popular choice to accomplish an effective search of optimal parameters for a black box function [50]. BO is a sequential optimization algorithm that uses a Gaussian process as a surrogate model of the cost function where a Gaussian process is a multivariate Gaussian probability distribution over functions that satisfy previous observations. From the Gaussian process, BO gives us a next evaluation point by calculating an acquisition function over the uncharted region of the parameter space. The acquisition function is determined by trade-off between the high expectation obtained by evaluating points near the current optimum and the reduced uncertainty given by evaluating the most uncertain point to avoid being trapped in a local optimum. In this context, BO is preferred for our purpose of global optimization compared to other algorithms relying more on the local information. Although many candidates such as upper confidence bound and Thompson sampling have been suggested, expected improvement is widely adopted as an acquisition function [51].

The algorithm enables us to conduct much fewer number of function assessment, i.e. MC simulation, compared to simple grid search or random search. In addition, the stochastic nature of BO makes it work well for noisy functions, which is advantageous for our case because the MC simulation inherently shows stochastic behavior. Indeed, BO routinely appears in hyper parameter optimization for a deep neural network [52], in the previous works on online optimization of cold-atom experiments [37,40,41], and have shown a better performance than algorithms such as the differential evolution, the Nelder-Mead algorithm, and even a deep-neural network based algorithm [37,41].

We have implemented a BO algorithm for gMOTs and found optimal parameters in the six-dimensional parameter space $\{\Delta, B_q, R, \phi, I, z_{\textrm {off}}\}$ where $z_{\textrm {off}}$ is defined as the offset of the quadrupole center position from the grating surface along $z$-axis. See Appendix B for detailed implementation of the algorithm. For example of $^{87}$Rb, the range of search domain is set to be relevant to typical experimental limits: $\Delta \in [-3, 0)~\Gamma$, $B_q \in [0, 20)$ G/cm, $R \in [30, 50)$ %, $\phi \in [30, 50)^{\circ }$, $I \in [0, 50)$ mW/cm$^2$, and $z_{\textrm {off}} \in z_q+[-1.5, 1.5)$ mm, where the left bracket [ includes the end points while the right parenthesis ) excludes them. Since each parameter has a different range, this may baffle the performance of the ML. Thus, each parameter is normalized in the numerical computation to circumvent the problem.

Note that we do not include the input polarization state in our parameter space since it has been shown that the imperfect input polarization only hinders ideal formation of the gMOT in 3D [4]. We also notice from finite-element analysis of gratings that modifying grating structures in order to include polarization imperfection in the reflected 1st-order beams usually accompanies increase in 0th-order beam intensity which is known to be detrimental to gMOT performance [9].

Before the Gaussian process works, the model was trained with 20 randomly selected parameter sets (random search). Then, the algorithm automatically starts to search for the optimal parameters. The optimization ends when it reaches 100 trials. The entire procedure typically takes 10 minutes on a computer described in Appendix A. This is a remarkably huge improvement compared to a simple grid search where, if we were to take 10 points for each parameter, the computation could take up to $(2 \times 10)^6~\textrm {s} \approx 740$ days.

#### 3.2 Discussion on the optimal set of parameters for $^{87}$Rb

We depict the optimization procedure for a $^{87}$Rb gMOT in Fig. 3. The artificial intelligence quickly approaches to the maximum point while it keeps searching for possibilities of another global maximum [Fig. 3(a), (b)]. We can also notice that, when we compare the cost function landscape between the grid search [Figs. 3(c)] and the Gaussian process [Figs. 3(d)] in a specific 2D *slice* of the parameter space, the artificial intelligence creates a good approximate model for the parameter space within only 120 trials including 20 random initial searches. It particularly shows good agreement on the location of the optimum and its corresponding maximum value. Although the detailed landscapes look slightly different, if we consider this is created by only 120 evaluations for a six dimensional space, the result seems very impressive. The functional form of parameter space created by the Gaussian process model [Fig. 3(d)] would look much similar to that of the grid scan [Fig. 3(c)] if we try more points during the BO procedure. Since the optimization is a stochastic procedure and it depends on the initial random sampling, it shows different performances for independent trials of the optimization. Thus we repeat the optimization 10 times and take the average. The averaged result is displayed in Fig. 3(b). As it is verified in Nakamura *et al.* [40], BO converges to the optimum much faster than simple random search.

The resultant optimal parameter sets for $^{87}$Rb are summarized in Table 1 and Table 2. Although it is obvious that $\gamma$ itself increases for larger $w_0$ [12], we investigate several cases with different beam waists to consider applications where using small $w_0$ is required [14,53]. An interesting observation is that, even though the inhomogeneous Gaussian beam profile modifies the behaviors of a gMOT significantly, optimal gMOTs are created for similar parameter sets within the margin of error.

First of all, the parameters in common with the conventional MOT, i.e. {$\Delta$, $B_q$, $I$}, share similar optimal values which is consistent with previous works which investigated $I$-$\Delta$ plane of gMOTs [12]. In addition, as shown later in Sec. 4., the MOTs equipped with gratings of different optical properties have very similar optimal values as to these parameters. Although the model does not lead us to unexpected parameter range in this case, it is worth confirming the result from a model constructed in the entire parameter space by global optimization.

The optimal gMOT-specific parameters were also found, where the diffraction angle $\phi$ turns out to be, regardless of $w_0$, about 42$^{\circ }$ and 45$^{\circ }$ for the grating outside and inside of the glass cell respectively. In terms of trap volume, reduced $\phi$ is preferred since smaller diffraction angle leads to increased volume of trapping region. In a conventional MOT, for example, the trapped-atom number scales as $\sim V^{1.2}$ where $V$ is the trapping volume. Previous works applied this assumption in estimating atom numbers of a gMOT [8,18]. However, in the case of a gMOT, we should be aware that decreasing $\phi$ inevitably results in smaller trapping force on the $x-y$ plane [7,12] due to the non-orthogonal configuration of the beams. The compromise between the trapping force and volume determines the optimal angle mentioned above. It is worth noting that, previous reports with two gratings with distinct diffraction angles (32$^{\circ }$ and 41$^{\circ }$) [12] showed three times difference in trapped atom number agreeing quantitatively with our result [Fig. 4(b)]. In fact, our optimization results confirm that one of their gMOTs with the diffraction angle of 41$^{\circ }$ is actually close to the best condition for the maximum atom number.

Another gMOT-specific parameter, the optimal grating reflectivity, unveils a rather unknown aspect of the gMOT. Based on a simple argument of balanced optical molasses, the reflectivity $R = 1/N$ where $N$ is the number of segments has been considered optimum [10]. However, the scattering force by a single reflected beam may not be as effective as $1/N$ of the input beam due to the polarization decomposition in the presence of angle between the local quantization axis and the beam propagation direction. And it is not obvious how to modify the simple relationship $R=1/N$ by including those nontrivial aspects of gMOTs. Our finding from BO is that the optimal $R$ of the grating with $N=3$ should be higher than 33%, and the exact value varies according to $w_0$ and Fresnel reflection loss.

As a representative example, a 2D slice of $\gamma$ in $\phi -R$ plane for {$\Delta$, $B_q$, $I$, $z_{\textrm {off}}$} = {−1.5 $\Gamma$, 15 G/cm, 15 mW/cm$^2$, $z_q$} for the grating outside the cell is displayed in Fig. 4. The gratings with different $\phi$ and $R$ are optimized at the similar parameters common with conventional MOT, and the values we choose for Fig. 4 are, in fact, similar with those in Table 1, suggesting each data point in the figure represents the optimal $\gamma$ for each $\phi$, $R$ pair. We observe the increase of $\gamma$ as we increase $R$ to the optimal value ($R$= 41% for $w_0 = \infty$ and $R$= 44% for $w_0 =$ 2 cm). A larger reflectivity $R$ introduces a larger force, but the change in $R$ does not alter the trapping volume. Thus, the change in $\gamma$ is mainly due to the increase of the scattering forces.

In order to further investigate the effect of the increased scattering force with $R$, we run the MC simulation by modifying the scattering force of the reflected beams with a scaling factor. Here, we only discuss the case of $w_0 = \infty$ for simplicity. First, we multiply an arbitrary scaling factor $\alpha _{xy}$ to the $x$ and $y$ components of the average scattering force as $\mathbf {F}^{\prime } = (\alpha _{xy} f_x,~\alpha _{xy} f_y,~f_z)$, where $\mathbf {F} = (f_x, f_y, f_z)$ is the force vector obtained by calculating Eq. (1) with $R=1/3$ [Fig. 5(a)]. We notice that $\gamma$ increases with $\alpha _{xy}$, verifying the increase of the scattering force is responsible for the increase of $\gamma$. However, the rate of increase in $\gamma$ for the large diffraction angles, say 42$^\circ$ and 46$^\circ$, rather decreases for $\alpha _{xy} > 1.2$ in spite of larger scattering forces. Also, as discussed previously, a small trapping volume of 46$^{\circ }$ case results in the smaller values of $\gamma$ than those of 42$^{\circ }$ case for $\alpha _{xy} > 1.2$ although the scattering force with $\phi = 46^{\circ }$ is the largest among the angles listed in the figure. Second, when we multiply a scaling factor $\alpha _z$ to the $z$ component of the scattering force, we notice that $\gamma$ does not increase monotonically but decrease for $\alpha _z > 1.2$ in the same manner as $\gamma$ drops for $R$ > 41% [Fig. 4(a)]. Note that $R=41\%$ actually corresponds to $\alpha _z=1.2$ since $(41/33)\sim 1.2$. Because increasing upward force brings up the MOT center position, it will limit the physical size of the gMOT in the $x-y$ plane [Fig. 4(c)]. In general, a gMOT extends to a few millimeters, and it has an oblate spheroidal shape. Due to the trapping region geometry that has a double hexagonal pyramid shape, when the gMOT center is raised above the central region of the trapping volume by increased $R$, the region wherein atoms can be trapped in the $x-y$ plane gets narrower. The trade-off between the increment of the force and the reduction of trapping region brings in an optimal value of $R$. Quantitatively, changing $R$ from 33% to 41% results in the increase of $\gamma$ from 2.3 to 2.9 for the case of $\phi =42 ^\circ$ and from 1.0 to 1.7 for the case of $\phi =34^\circ$. Since changing $R$ can be thought as changing $\alpha _{xy}$ and $\alpha _z$ simultaneously, we find the values of increment of $\gamma$ mentioned above are similar to the sum of the increment of $\gamma$ obtained from changing $\alpha _{xy,z} = 1.0$ to $1.2$.

We also notice that the optimal $R$ is dependent on $w_0$ for gratings both inside and outside the glass cell. The difference of the optimal reflectivity can be explained in terms of Fresnel reflection loss at the inner surface of the glass chamber.

It is also worth mentioning here that gMOTs with more than three sections will have worse performance. The increased number of sections does not give any benefits in terms of average scattering forces, yet its trapping volume is reduced, which will decrease the atom number after all. But we note that a gMOT having more sections may achieve high atom-number if it is coupled with a 2D MOT source and implemented with a larger grating ($\sim 4$ cm) as in Imhof *et al.* [13].

#### 3.3 Other atomic species

So far, we have discussed the optimization result for $^{87}$Rb which has been the most popular species for gMOT experiments to date. However, miniaturizing cold-atom systems with various atomic species is recently getting more attention for a wide range of applications from optical clocks to inertial sensors. In this regards, $^{7}$Li [6] and $^{88}$Sr [7] gMOT has recently been reported.

Owing to our efficient algorithm, we could easily repeat the same optimization procedure for five more atomic species - $^{174}$Yb, $^{88}$Sr, $^{7}$Li, $^{23}$Na, and $^{133}$Cs. Since our simplified gMOT model does not take into account the detailed energy level structure of the atom, the numerical calculations can be straightforwardly extended to other atomic species. The changes that should be made for the MC simulation are the atomic mass, transition wavelength, transition linewidth, and saturation intensity in Eq. (1). We also adjust the search range of $B_q$ and $I$ since different atomic species require different range. For example, Yb or Sr requires larger $B_q$ due to larger linewidths ($\Gamma /2\pi \sim 30$ MHz) [54–56]. Here we consider only $w_0 = \infty$ case for simplicity with one exception for $^{174}$Yb which includes $w_0 =$ 2 cm for the grating outside the cell in order to compare the result with that of the $^{87}$Rb gMOT.

The resulting optimal sets are summarized in Table 3 and Table 4. For each species we observe that the optimal parameters for $\Delta, B_q,$ and $I$ are quantitatively very similar to previously reported values for their corresponding conventional MOTs. This result is consistent with that of $^{87}$Rb discussed above. For example, in the case for alkali atoms, optimal $I/I_{\textrm {sat}} \sim$ a few $I_{\textrm {sat}}$ and $\Delta$ ranges from $-1.5~\Gamma$ to $-2.5~\Gamma$. We also note that our findings for optimal $\Delta$, $B_q$, and $I$ are close to the experimental condition of gMOTs in the previous reports of Sr and Li [6,7], although different values of $R$ and $\phi$ were used.

When it comes to the optimal grating design, we observe that the optimal reflectivities and the angles for various atomic species are remarkably close to each other in spite of different atomic properties. For a top-hat input beam, all optimal reflectivities fall within the range from 38% to 41% with the grating outside the cell, and they all decrease by a few percent with the grating inside. The optimal angle converges to 42$^\circ$ (46$^\circ$) for heavier elements, that is $^{87}$Rb, $^{88}$Sr, $^{133}$Cs, $^{174}$Yb for the grating outside (inside) the cell, while $^{7}$Li and $^{23}$Na tend to have a few degree lower optimal angles. This is a surprising result because, although we do not consider the complex hyperfine structures, we take into account atoms with a variety of atomic properties, thereby the force field distribution around the gMOT composed of different atomic species would look very different.

## 4. Experimental verification

Figure 6(a) shows the experimental apparatus for our $^{87}$Rb gMOT system. The quadrupole and bias coils for $x, y$, and $z$ directions are mounted on a single coil holder frame, and the entire frame is movable along the $z-$direction by using a translation stage. The grating, which is placed outside the glass cell, is also mounted on a separate translation stage for easy chip replacement. The mount for grating is designed to fit into the inner coil holder such that the insertion of the mount ensures the alignment of the grating center and the quadrupole field center along the $z$-axis.

The input beam of the gMOT is circularly polarized and its beam waist is first adjusted to $\sim$ 2 cm ($1/e^2$ radius of Gaussian beam), and later passes through an aperture of 1 cm radius. We actively servo the beam intensity in order to make sure that we keep the intensity unchanged while we control the frequency of an acousto-optic modulator to change the detuning. We carefully align the input beam to be perpendicular to the grating as possible by monitoring the diffracted beams from the grating. The imaging and repumping beams are applied to the atomic cloud along the $y-$direction.

The trapped atom number is obtained from absorption images taken by shining a resonant laser tuned to $|F=2,m_F=2\rangle$ $\rightarrow$ $|F'=3, m_F'=3\rangle$ transition. We take images after 3 ms of time of flight [Fig. 6(b)]. The center of the quadrupole field is inferred from the position of the gMOT recorded on the CCD camera and the corresponding theoretical offsets between $z_{\textrm {off}}$ and the center of the cloud. To avoid complications caused by the ambient magnetic field, we null out the field using a set of bias coils. Since a finite bias field would hamper the performance of sub-Doppler cooling, we confirm the reduction of the bias field by measuring the temperature of the cloud after polarization gradient cooling procedure, i.e., turning off the quandrupole field after 1 s of loading the gMOT along with a detuning jump to $\Delta = -10~\Gamma$. The atomic cloud is further cooled to 8 $\mu$K which is comparable to the lowest temperatures for gMOTs reported so far, to our knowledge [11,16].

We have used four types of gratings, two of which have $\phi ~=$ 33$^{\circ }$ with different reflectivities $R$: 35% and 40% (We name them as grating 1 and 2, respectively). These grating parameters can be achieved by coating 100 nm of aluminum and gold on the top of the same silicon grating structure, respectively [10]. Another grating corresponds to $\phi ~=$ 41$^{\circ }$ and $R$ = 40% (grating 3). We also tested a commercially available chip (Kelvin Nanotechnology) from which we measured $\phi ~=$ 45$^{\circ }$, and $R$ = 36% (grating 4). The 1st-order diffracted beam from all of the gratings shows very high polarization purity $\eta >$ 0.98, where $\eta$ is defined as the fractional power of the diffracted beam with desired helicity [4]. This is mostly the case for gratings which are designed to minimize the 0th-order reflection, thereby justifying our BO which excludes the polarization imperfections from the parameter space.

We summarize the experimental observations and corresponding MC simulation results in Fig. 7. They show an excellent agreement not only in optimal parameter values but also in the overall tendency of the atom number for a wide range of parameters up to a single scaling constant. Notice that the grating with the parameters close to the optimal one presented in Table 1 (grating 3) shows the best performance, confirming the optimization result. One discrepancy between the experiment and simulation to notice is that the range that we can actually observe the gMOT signal is narrower in the experiment than in the simulation. Specifically, the trapped atom number suppression in the experiment is noticeable in the parameter region close to the atomic resonance ($\Delta < \Gamma$). This might be because the near-resonant light tends to blow away the atoms in the gMOT, which is not included in our model.

## 5. Conclusion

In conclusion, we obtained the parameter set for the highest atom number achievable in a gMOT with the help of ML. In particular, we employ BO on the MC simulation to dramatically reduce the time taken for locating the global optimum in the six dimensional parameter space associated with a gMOT. With the capability of efficient optimization, we could investigate a number of popular atomic species for laser cooling as well as realistic experimental settings such as a glass chamber wall.

As a result of optimization, we could confirm that the optimal parameter subset, excluding the diffraction angle and efficiency, is indeed close to that of a conventional MOT. On the other hand, the grating-related parameters are strongly affected by nontrivial aspects of gMOTs, which end up with consistently larger reflectivities than simple estimation of 33% and beam-size independent optimal diffraction angles. These findings provide useful information for diverse applications of gMOTs with various atomic species. The validity of our optimization was experimentally confirmed with a $^{87}$Rb gMOT which shows good agreement between simulation and experimental data over a wide range of parameters.

A natural extension of this study will be to investigate the gratings with various geometries or non-trivial input beam intensity distributions. In such cases, the ML technology will be more useful due to the increased number of parameters. Finally, for future work, we will apply other types of algorithms for the optimization such as using deep learning [39] or unsupervised learning such as reinforcement learning [57].

## Appendix

## A. Numerical integration in Monte Carlo method

The atom trajectories subjected to the force field are computed by using the Runge-Kutta method. The code is written in the programming language - Julia. We take advantages of Julia’s DifferentialEquations.jl package that provides very fast numerical solvers. The time step is $\delta t = 200~\mu$s, and a trajectory is evaluated to 10 ms. At each time step, we record the positions and the velocities of the atoms, and, if an atom trajectory exits the simulation volume, we stop calculating and reject the trajectory. This gives us additional speed up of calculation by eliminating unnecessary computations and saves memory. Finally, as for the benchmark, we note that we parallelize the computation by utilizing all of the 8 cores of the Intel Core i9-9900 processor, resulting in obtaining 15,000 trajectories in typically 2 s.

## B. Implementation of Bayesian optimization

The code is written in a Python script that can communicate with the MC code written in Julia via Pyjulia library. Doing so, we can make use of the advantages of both languages - Julia’s fast computation and rich, convenient open source libraries such as Scikit-learn of Python for implementing Gaussian process regression [58].

Specifically, in our implementation of BO algorithm, we make use of the min-max scaler for normalizing parameters to a range [0,1]. The Gaussian process kernel $k$ we adopt is the sum of two terms $k=k_{\textrm {se}} + k_{\textrm {w}}$, where $k_{\textrm {se}}$ is the squared exponential kernel defined as $k_{\textrm {se}}(x_i, x_j) = \exp {\left [-\frac {(x_i-x_j)^2}{2l^2}\right ]}$ with $l$ being the length scale of the kernel, and a white kernel $k_{\textrm {w}}(x_i,x_j) = n~\textrm {if}~x_i = x_j~\textrm {else}~0$ with $n$ being the noise level. The white kernel is introduced in order to consider the results of the MC simulation which contain noise. We set the both hyperparameters $l=1$ and $n=1$.

## Funding

Institute of Information and communications Technology Planning and evaluation (2019-0-00720) 90%; Korea Research Institute of Standards and Science (KRISS–2021–GP2021-0001) 10%.

## Disclosures

The authors declare no conflicts of interest.

## Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

## References

**1. **K. I. Lee, J. A. Kim, H. R. Noh, and W. Jhe, “Single-beam atom trap in a pyramidal and conical hollow mirror,” Opt. Lett. **21**(15), 1177 (1996). [CrossRef]

**2. **S. Ravenhall, B. Yuen, and C. Foot, “High-flux, adjustable, compact cold-atom source,” Opt. Express **29**(14), 21143 (2021). [CrossRef]

**3. **M. Vangeleyn, P. F. Griffin, E. Riis, and A. S. Arnold, “Single-laser, one beam, tetrahedral magneto-optical trap,” Opt. Express **17**(16), 13601 (2009). [CrossRef]

**4. **M. Vangeleyn, P. F. Griffin, E. Riis, and A. S. Arnold, “Laser cooling with a single laser beam and a planar diffractor,” Opt. Lett. **35**(20), 3453 (2010). [CrossRef]

**5. **L. Zhu, X. Liu, B. Sain, M. Wang, C. Schlickriede, Y. Tang, J. Deng, K. Li, J. Yang, M. Holynski, S. Zhang, T. Zentgraf, K. Bongs, Y. Lien, and G. Li, “A dielectric metasurface optical chip for the generation of cold atoms,” Sci. Adv. **6**(31), eabb6667 (2020). [CrossRef]

**6. **D. S. Barker, E. B. Norrgard, N. N. Klimov, J. A. Fedchak, J. Scherschligt, and S. Eckel, “Single-Beam Zeeman Slower and Magneto-Optical Trap Using a Nanofabricated Grating,” Phys. Rev. Appl. **11**(6), 064023 (2019). [CrossRef]

**7. **A. Sitaram, P. K. Elgee, G. K. Campbell, N. N. Klimov, S. Eckel, and D. S. Barker, “Confinement of an alkaline-earth element in a grating magneto-optical trap,” Rev. Sci. Instrum. **91**(10), 103202 (2020). [CrossRef]

**8. **C. C. Nshii, M. Vangeleyn, J. P. Cotter, P. F. Griffin, E. A. Hinds, C. N. Ironside, P. See, A. G. Sinclair, E. Riis, and A. S. Arnold, “A surface-patterned chip as a string source ultracold atoms for quantum technologies,” Nat. Nanotechnol. **8**(5), 321–324 (2013). [CrossRef]

**9. **J. P. Cotter, J. P. McGilligan, P. F. Griffin, I. M. Rabey, K. Docherty, E. Riis, A. S. Arnold, and E. A. Hinds, “Design and fabrication of diffractive atom chips for laser cooling and trapping,” Appl. Phys. B **122**(6), 172 (2016). [CrossRef]

**10. **J. P. McGilligan, P. F. Griffin, E. Riis, and A. S. Arnold, “Diffraction-grating characterization for cold-atom experiments,” J. Opt. Soc. Am. B **33**(6), 1271 (2016). [CrossRef]

**11. **J. Lee, J. A. Grover, L. A. Orozco, and S. L. Rolston, “Sub-Doppler cooling of neutral atoms in a grating magneto-optical trap,” J. Opt. Soc. Am. B **30**(11), 2869 (2013). [CrossRef]

**12. **J. P. McGilligan, P. F. Griffin, E. Riis, and A. S. Arnold, “Phase-space properties of magneto-optical traps utilising micro-fabricated gratings,” Opt. Express **23**(7), 8948 (2015). [CrossRef]

**13. **E. Imhof, B. K. Stuhl, B. Kasch, B. Korese, S. E. Olson, and M. B. Squires, “Two-dimensional grating magneto-optical trap,” Phys. Rev. A **96**(3), 033606 (2017). [CrossRef]

**14. **J. P. McGilligan, K. R. Moore, A. Dellis, G. D. Martinez, E. de Clercq, P. F. Griffin, A. S. Arnold, E. Riis, R. Boudot, and J. Kitching, “Laser cooling in a chip-scale platform,” Appl. Phys. Lett. **117**(5), 054001 (2020). [CrossRef]

**15. **O. S. Burrow, P. F. Osborn, E. Boughton, F. Mirando, D. P. Burt, P. F. Griffin, A. S. Arnold, and E. Riis, “Stand-alone vacuum cell for compact ultracold quantum technologies,” arXiv: 2101.07851

**16. **J. P. McGilligan, P. F. Griffin, R. Elvin, S. J. Ingleby, E. Riis, and A. S. Arnold, “Grating chips for quantum technologies,” Sci. Rep. **7**(1), 384 (2017). [CrossRef]

**17. **R. Elvin, G. W. Hoth, M. Wright, B. Lewis, J. P. McGilligan, A. S. Arnold, P. F. Griffin, and E. Riis, “Cold-atom clock based on a diffractive optic,” Opt. Express **27**(26), 38359–38366 (2019). [CrossRef]

**18. **J. G. H. Franssen, T. C. H. de Raadt, M. A. W. van Ninhuijs, and O. J. Luiten, “Compact ultracold eletron source based on a grating magneto-optical trap,” Phys. Rev. Accel. Beams **22**(2), 023401 (2019). [CrossRef]

**19. **J. Lee, R. Ding, J. Christensen, R. R. Rosenthal, A. Ison, D. P. Gillund, D. Bossert, K. H. Fuerschbach, W. Kindel, P. S. Finnegan, J. R. Wendt, M. Gehl, H. McGuinness, C. A. Walker, A. Lentine, S. A. Kemme, G. Biedermann, and P. D. D. Schwindt, “A Cold-Atom Interferometer with Microfabricated Gratings and a Single Seed Laser,” arXiv:2107.04792

**20. **W. Wohlleben, F. Chevy, K. Madison, and J. Dalibard, “An atom faucet,” Eur. Phys. J. D **15**(2), 237–244 (2001). [CrossRef]

**21. **J. M. Kohel, J. Ramirez-Serrano, R. J. Thompson, L. Maleki, J. L. Bliss, and K. G. Libbrecht, “Generation of an intense cold-atom beam from a pyramidal magneto-optical trap: Experiment and simulation,” J. Opt. Soc. Am. B **20**(6), 1161 (2003). [CrossRef]

**22. **S. Chaudhuri, S. Roy, and C. S. Unnikrishnan, “Realization of an intense cold Rb atomic beam based on a two-dimensional magneto-optical trap: Experiments and comparison with simulations,” Phys. Rev. A **74**(2), 023406 (2006). [CrossRef]

**23. **D. Comparat, “Molecular cooling via sisyphus processes,” Phys. Rev. A **89**(4), 043410 (2014). [CrossRef]

**24. **A. Szulc, Master’s thesis, Ben-Gurion University of the Negev, 2016.

**25. **R. K. Hanley, P. Huillery, N. C. Keegan, A. D. Bounds, D. Boddy, R. Raoro, and M. P. A. Jones, “Quantitative simulation of a magneto-optical trap operating near the photon recoil limit,” J. Mod. Opt. **65**(5-6), 667–676 (2018). [CrossRef]

**26. **A. D. Bounds, N. C. Jackson, R. K. Hanley, R. Faoro, E. M. Bridge, P. Huillery, and M. P. A. Jones, “Rydberg Dressed Magneto-Optical Trap,” Phys. Rev. Lett. **120**(18), 183401 (2018). [CrossRef]

**27. **M. Barbiero, M. G. Tarallo, D. Calonico, F. Levi, G. Lamporesi, and G. Ferrari, “Sideband-Enhanced Cold Atomic Source for Optical Clocks,” Phys. Rev. Appl. **13**(1), 014013 (2020). [CrossRef]

**28. **C. S. Nichols, L. M. Nofs, M. A. Viray, L. Ma, E. Paradis, and G. Raithel, “Magneto-Optical Trap with Millimeter Ball Lenses,” Phys. Rev. Appl. **14**(4), 044013 (2020). [CrossRef]

**29. **R. W. Mu, Z. L. Wang, Y. L. Ki, X. M. Ji, and J. P. Yin, “A controllable double-well trap for cold atoms (or molecules) using a binary *π*-phase plate: Experimental demonstration and monte carlo simulation,” Eur. Phys. J. D **59**(2), 291–300 (2010). [CrossRef]

**30. **M. Vangeleyn, Ph.D. thesis, University of Strathclyde, Glasgow, United Kindom, 2011

**31. **M. H. Shah, H. A. Camp, M. L. Trachy, G. Veshapidze, M. A. Gearba, and B. D. DePaola, “Model-independent measurement of the excited fraction in a magneto-optical trap,” Phys. Rev. A **75**(5), 053418 (2007). [CrossRef]

**32. **G. Kregar, N. Santic, D. Aumiler, and T. Ban, “Radiation pressure force on cold rubidium atoms due to excitation to a non-cooling hyperfine level,” Eur. Phys. J. D **68**(12), 360 (2014). [CrossRef]

**33. **D. A. Steck, Rubidium 87 D Line Data (2008).

**34. **M. Gu, *Advanced Optical Imaging Theory*, vol. 75. Springer, 1999.

**35. **Z. Xiao, R. Lanning, M. Zhang, I. Novikova, E. E. Mikhailov, and J. P. Dowling, “An analytically simple and computationally efficient Gaussian beam mode-decomposition approach to classical diffraction theory,” arXiv:1810.06757

**36. **C. Monroe, W. Swann, H. Robinson, and C. Wieman, “Very Cold Trapped Atoms in a Vapor Cell,” Phys. Rev. Lett. **65**(13), 1571–1574 (1990). [CrossRef]

**37. **P. B. Wigley, P. J. Everitt, A. V. D. Hengel, J. W. Bastian, M. A. Sooriyabandara, G. D. Mcdonald, K. S. Hardman, C. D. Quinlivan, P. Manju, C. C. N. Kuhn, I. R. Petersen, A. N. Luiten, J. J. Hope, N. P. Robins, and M. R. Hush, “Fast machine-learning online optimization of ultra-cold-atom experiments,” Sci. Rep. **6**(1), 25890 (2016). [CrossRef]

**38. **T. Lausch, M. Hohmann, F. Kindermann, D. Mayer, F. Schmidt, and A. Widera, “Optimizing quantum gas production by an evolutionary algorithm,” Appl. Phys. B **122**(5), 112 (2016). [CrossRef]

**39. **A. D. Tranter, H. J. Slatyer, M. R. Hush, A. C. Leung, J. L. Everett, K. V. Paul, P. Vernaz-Gris, P. K. Lam, B. C. Buchler, and G. T. Campbell, “Multiparameter optimisation of a magneto-optical trap using deep learning,” Nat. Commun. **9**(1), 4360 (2018). [CrossRef]

**40. **I. Nakamura, A. Kanemura, T. Nakaso, R. Yamamoto, and T. Fukuhara, “Non-standard trajectories found by machine learning for evaporative cooling of ^{8}7Rb atoms,” Opt. Express **27**(15), 20435 (2019). [CrossRef]

**41. **A. J. Barker, H. Style, K. Luksch, S. Sunami, D. Garrick, F. Hill, C. J. Foot, and E. Bentine, “Applying machine learning optimization methods to the production of a quantum gas,” Mach. Learn.: Sci. Technol. **1**(1), 015007 (2020). [CrossRef]

**42. **B. S. Rem, N. Käming, M. Tarnowski, L. Asteria, N. Fläschner, C. Becker, K. Sengstock, and C. Weitenberg, “Identifying quantum phase transitions using artificial neural networks on experimental data,” Nat. Phys. **15**(9), 917–920 (2019). [CrossRef]

**43. **G. Torlai, B. Timar, E. P. L. van Nieuwenburg, H. Levine, A. Omran, A. Keesling, H. Bernien, M. Greiner, V. Vuletić, M. D. Lukin, R. G. Melko, and M. Endres, “Integrating Neural Networks with a Quantum Simulator for State Reconstruction,” Phys. Rev. Lett. **123**(23), 230504 (2019). [CrossRef]

**44. **S. Pilati and P. Pieri, “Supervised machine learning of ultracold atoms with speckle disorder,” Sci. Rep. **9**(1), 5613 (2019). [CrossRef]

**45. **G. Bess, A. Vainbaum, C. Shkedrov, Y. Florshaim, and Y. Sagi, “Single-Exposure absorption Imaging of Ultracold Atoms using Deep Learning,” Phys. Rev. Appl. **14**(1), 014011 (2020). [CrossRef]

**46. **A. U. J. Lode, R. Lin, M. Büttner, L. Pararello, C. Lévêque, R. Chitra, M. C. Tsatsos, D. Jaksch, and P. Molignini, “Optimized Observable Readout from Single-shot Images of Ultracold Atoms via Machine Learning,” arXiv:2010.14510v1 (2020).

**47. **L. R. hofer, M. Kratajić, P. Juhász, A. L. Marchant, and R. P. Smith, “Atom Cloud detection Using a Deep Neural Network,” arXiv:2011, 10536v1 (2020).

**48. **F. Metz, J. Polo, N. Weber, and T. Busch, “Deep learning based quantum vortex detection in atomic Bose-Einstein condensates,” arXiv:2012.13097v1 (2020).

**49. **S. Guo, A. R. Fritsch, G. Greenberg, I. B. Spielman, and J. P. Zwolak, “Machine-learning enhanced dark soliton detection in Bose-Einstein condensates,” arXiv:2101:05404v1 (2021).

**50. **C. E. Rasmussen and C. K. I. Williams, * Gaussian Process for Machine Learning* (Massachusetts Institute of Technology, 2006)

**51. **J. Snoek, H Larochelle, and R. P. Adams, “Practical Bayesian Optimization of Machine Learning Algorithms,” arXiv:1206.2944

**52. **K. Eggensperger, M. Feurer, F. Hutter, J. Bergstra, J. Snoek, H. H. Hoos, and K. Leyton-Brown, “Towards an Empirical Foundation for Assessing Bayesian Optimization of Hyperparameters,” NIPS Workshop on Bayesian Optimization in Theory and Practice **10**, 3 (2013).

**53. **G. W. Hoth, E. A. Donley, and J. Kitching, “Atom number in magneto-optic traps with milimeter scale laser beams,” Opt. Lett. **38**(5), 661 (2013). [CrossRef]

**54. **F. Scazza, Ph. D. thesis, Ludwig-Maximilians-Universität München 2015.

**55. **A. Kawasaki, B. Braverman, Q. Yu, and V. Vuletić, “Two-Color Magneto-Optical Trap with Small Magnetic Field for Ytterbium,” J. Phys. B: At., Mol. Opt. Phys. **48**(15), 155302 (2015). [CrossRef]

**56. **J. Lee, J. H. Lee, J. Noh, and J. Mun, “Core-shell magneto-optical trap for alaline-earth-metal-like atoms,” Phys. Rev. A **91**(5), 053405 (2015). [CrossRef]

**57. **M. Y. Niu, S. Boixo, V. N. Smelyanskiy, and H. Neven, “Universal quantum control through deep reinforcement learning,” npj Quantum Inf. **5**(1), 33 (2019). [CrossRef]

**58. **F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine Learning in Python,” J. Mach. Learn. Res. **12**, 2825 (2011).