GPU-accelerated full-field modelling of highly dispersive ultrafast optical parametric oscillators

Sebastian C. Robarts; Diana E. Hunter; Derryck T. Reid; Richard A. McCracken

doi:10.1364/OE.509307

1. Introduction

Optical parametric oscillators (OPOs) are one of the most versatile platforms for the generation of visible, near- and mid-infrared pulses, and have found widespread use in spectroscopy and imaging applications [1–4]. In its simplest form, the energy exchange between discrete pump $(\omega _p)$, signal $(\omega _s)$, and idler $(\omega _i)$ frequencies in an OPO can be approximated using the coupled wave equations [5], which rely on the assumption that the pulse bandwidth is narrow with respect to the carrier frequency. In contrast, ultrafast synchronously-pumped OPOs are broadband systems that display complex dynamics [6] and spectral structures [7] that cannot be described by this slowly-varying envelope approximation. These systems are more readily described by a $\chi ^{(2)}$ nonlinear envelope equation (NEE) [8], an analytical technique for studying ultra-broadband $\chi ^{(2)}$ interactions.

We have previously demonstrated [9] that the NEE provides good agreement with OPOs containing short ($\leq 1$ mm) nonlinear crystals operating in a low-dispersion regime [10–12], however dispersive OPO cavities such as those containing long crystals [13–15] or significant material dispersion [16–20] result in fields exhibiting large computational time-bandwidth products, requiring a simulation that comprises both large temporal and spectral ranges. Modelling such fields is computationally expensive due to the data required to maintain sufficient sampling in large temporal/spectral windows (space complexity), compounded by the many round-trips needed to achieve steady-state (time complexity), an issue not encountered when simulating single-pass nonlinear frequency conversion processes [21]. Although evolution of the full field is inherently sequential, independent calculation of all sample points at each time step allows parallel processing to be leveraged for large data sets.

Recently a GPU-based method for solving the coupled wave equations was reported [22], in which the authors employed NVIDIA’s CUDA libraries to improve the execution time by ~$50\times$ when compared to their CPU benchmark, however the results are restricted to nanosecond and CW regimes, and no experimental comparison is made. Here we demonstrate GPU acceleration of full-field split-step Fourier methods (SSFM) using the NEE through extreme parallelism and an adaptive step-size algorithm, leading to more than three orders of magnitude improvement in computation time for large time-bandwidth product OPOs. We apply our NEE simulation to both idealised picosecond and experimental femtosecond systems, where we detail efforts to create an OPO digital twin by incorporating physical properties of optical components.

2. Model

In this section we describe the modifications to our original approach that enable GPU implementation and the simulation of large time-bandwidth systems. We include details that may benefit those without a computational science background.

2.1 Algorithm

2.1.1 Nonlinear envelope equation

The NEE describes the evolution of the complex electric field envelope ${\bf A}(z,t) = {\bf E}(z,t)e^{-i\omega _0t + i\beta _0z}$ in a transparent dispersive medium with position-sensitive nonlinear susceptibility $\chi ^{(2)}(z)$, where ${\bf E}(z,t)$ is the inverse Fourier transform of the positive components of the physical electric field ${\bf E}(z,\omega )$, $\omega _0$ is a chosen reference frequency, and $\beta _0$ is the real-valued wavevector at $\omega _0$. The NEE is written as [8]:

(1)$$\frac{\delta{\bf A}}{\delta z} + iD{\bf A} ={-}i\frac{\chi^{(2)}(z)\omega^2_0}{4\beta_0c^2}\left(1-\frac{i}{\omega_0}\frac{\partial}{\partial\tau}\right)\left[{\bf A}^2 e^{i\omega_0\tau - i\left(\beta_0-\beta_1\omega_0\right)z} + 2\left|{\bf A}\right|^2 e^{{-}i\omega_0\tau + i\left(\beta_0-\beta_1\omega_0\right)z}\right],$$

where

(2)$$D = \sum_{m\geq2}^{\infty}\frac{i^{m+1}}{m!}\left.\frac{\partial^mk}{\partial\omega^m}\right|_{\omega_0}\frac{\partial^m}{\partial\tau^m}$$

is a dispersive function shifted into frame of reference $\tau = t-\beta _1z$ that propagates at the group velocity of $\omega _0$. Equation (1) can be numerically solved using a SSFM which allows the linear and nonlinear components to be treated separately, with the dispersive linear part on the left analysed in the frequency domain, and the nonlinear right-hand side integrated by a partial differential equation computational method. Optical feedback is implemented with a complex filter function that mimics the net cavity reflectivity, and a delay is added to the resonant pulse after each roundtrip to adjust the synchronicity relative to the next pump field.

2.1.2 Adaptive step size algorithm

In a traditional approach the nonlinear medium is split into discrete steps of fixed size $h$, with the nonlinear component calculated at each point by integrating over the corresponding temporal step. A fast Fourier transform is performed and dispersion applied over one step, before an inverse transform is performed and the process repeated for the next step. The length of the medium dictates the number of steps and transforms, $N_z$, scaling the sample size ($N_k = 2^n$) dependent linearithmic time complexity to $\mathcal{O}(N_zN_k\log_{2}N_k)$ with significant implications for performance.

Our previous work [9] carried out the integration using a second-order Runge-Kutta (RK) method with two stages, known as the midpoint method. This method treats pulse evolution as a linear function between steps, requiring uniform step size $h$ to be small enough to allow convergence of the solution, such that the error at each step is of order $\mathcal{O}(h^3)$. The tangent to the slope at the start of step $n$ is computed at the point $(z_n,E_n)$, as is the case for the one-stage basic Euler’s method, but here is used only to approximate the function value at the midpoint step $(z_n + h/2, E_{n+1/2})$. The computed tangent at this approximated midpoint is used as the gradient for the full step from $z_n$ to $z_{n+1}$. This process is illustrated in Fig. 1(a).

Fig. 1. Graphical representation contrasting two split-step Fourier approaches for solving the NEE. (a) A simple approach uses a fixed step-size, $h$ (top), with integration carried out using the midpoint method (bottom). (b) Our adaptive step-size algorithm alters $h$ as the local $E$-field evolves (top), with Heun’s method providing an error measurement (bottom).

Download Full Size | PDF

An alternative two-stage second-order approach is Heun’s method, as shown in Fig. 1(b). Here we take the full Euler step with the tangent calculated at $(z_n,E_n)$, find the tangent at the approximated end point ($K_1$), take the same step again with the new tangent ($K_2$) and return the average of the two end points as $(z_{n+1},E_{n+1})$. For any function which is monotonic over one suitably small step, the mean of these two tangent estimates will provide an improved prediction.

In most cases Heun’s method yields similar results to the midpoint method, with the advantage that by utilising a full step first-order approximation as an integral part of the second-order approximation, one can easily use the difference between the two as an error measurement, here written as $\displaystyle 100\left \lVert \frac {K_2 - K_1}{2E_{n+1}} \right \rVert$. This allows implementation of an adaptive step size, where $h$ is allowed to increase as long as the error remains within specified bounds. The initial step size is selected to ensure sufficient granularity, however adapting the step size in accordance with required levels of precision allows much larger steps to be taken at non-critical points in the field evolution, increasing computational performance while maintaining the desired precision, here 0.001 – 0.1${\% }$ [23,24].

2.2 Implementation

2.2.1 Discretisation

The computational time-bandwidth product (TBP) is defined for the entire simulated field and is given by the product of the temporal range ($\Delta t$) and the spectral range ($\Delta \omega$). The Fourier uncertainty principle dictates the trade off between the resolution of these two sampling windows, and the maximum simulated TBP depends on the number of spectral sampling points $N_k$:

(3)$$\Delta t \Delta\omega = 2\pi N_k.$$

This relation is further constrained by the highest frequency component expected in the simulated spectrum according to the Nyquist criterion

(4)$$\omega_{max} \Delta t \leqslant \pi N_k,$$

resulting in a direct relation between the number of spectral components and the maximum temporal range.

Low temporal resolution increases the discretisation error that arises when solving the NEE using the SSFM. We illustrate this effect in Fig. 2 by simulating the single-pass parametric gain of the signal pulses from a 7-mm-long PPLN crystal with a 28.4 $\mu$m grating period pumped by a 600-fs, 35-MHz, 1030-nm Yb:YAG thin disk laser, as reported by Südmeyer et al. [25]. When $N_k = 2^{14}$ the system has insufficient granularity, with spectral components that would ordinarily walk off due to group velocity mismatch instead remaining together and experiencing artificial gain, evidenced by a large output power integrated across the 1.4-1.5 $\mu$m signal band. With $N_k \geq 2^{20}$ the temporal resolution is sufficiently fine to sample all interactions, with the output power reaching a plateau around 500 mW, providing strong agreement with the results in [25].

Fig. 2. Single-pass parametric gain in a 7-mm PPLN crystal with insufficient (left) and sufficient sampling (right). See text for simulation details. Middle: integrated power across the 1.4-1.5 $\mu$m signal band as the number of spectral sampling points $N_k$ is increased.

Download Full Size | PDF

2.2.2 Aliasing

Full-field OPO modelling requires a minimum time window equal to the maximum accumulated delay between interacting temporal components, initially determined by dispersion in the nonlinear crystal. Additional dispersion arising from other cavity elements such as chirped mirrors, lenses or prisms is incorporated as a single phase term along with the round-trip cavity delay, and can significantly increase this minimum window. Insufficient temporal range in the NEE model will produce non-zero field elements at the time window boundaries, resulting in component wrapping after a Fourier transform is carried out at each propagation step. This is illustrated in Fig. 3(a), where the choice of reference frequency $\omega _0$ leads to pulse walk-off that extends beyond the time window at both the primary (orange) and second harmonic frequencies (green-blue). Temporal wrapping can be eliminated with a hard boundary, however this introduces unwanted reflections (Fig. 3(b)) and spurious results as it allows mixing between wave components that would otherwise not be delay matched. While temporal aliasing can be reduced by increasing the time window until all non-zero field components remain suitably far from temporal boundaries, this negatively impacts processing memory and will reduce the spectral resolution, which can result in discretised components falsely exhibiting gain, as described above. We therefore introduce an absorbing boundary condition (ABC) to gradually remove components responsible for aliasing, without introducing boundary reflections (Fig. 3(c)) [26]. The ABC is of the form:

(5)$${\bf E}(z,t) = {\bf E}(z,t)\begin{cases} 1, & \text{if } \Delta t - \left|2t\right| < \beta \Delta t;\\ 1-\text{sech}^{2}{(\alpha(1 - \left|\frac{2t}{\Delta t}\right|))}, & \text{otherwise} \end{cases} $$

where $\beta \in (0,1) = 0.25$ defines the effective boundaries and $\alpha = 24$ is a real number defining the rate of decay. We note that the presence of temporal components hitting this boundary does not suggest that temporal window $\Delta t$ is insufficiently large. Rather, it is those components of the field which are weakly generated in a single pass and which lie outside the cavity filter function which are reduced, as these would otherwise become temporally aliased and artificially see gain. A simulation that incorporates ABCs should produce the same results as one with an extremely large time window, which we have verified for the results presented here.

Fig. 3. Illustration of temporal aliasing. (a) Due to the choice of reference frequency $\omega _0$ the temporal range is insufficient to account for pulse walk-off, leading to wrapping at the boundaries. (b) The use of a hard boundary eliminates wrapping but introduces unwanted reflections. (c) An absorbing boundary condition eliminates both temporal aliasing and reflections. (d) Changing $\omega _0$ alters the pulse walk-off angle, but does not eliminate the need for a boundary in a dispersive system.

Download Full Size | PDF

Balancing resolution and temporal aliasing requires an appropriate choice of reference frequency $\omega _0$, with a suitable frame of reference reducing the absolute maximum walk-off between $\omega _0$ and other frequency components (Fig. 3(d)). The optimum choice of reference frequency can often be the component which experiences the median group delay across the wavelength range of interest, however a similar effect can be achieved by offsetting the arrival time of the pump pulse into the nonlinear medium.

2.2.3 Crystal transparency and cavity filter function

OPO crystals must exhibit sufficient transparency across the pump, signal and idler wavebands to overcome any significant material absorption losses. Oxide crystals such as MgO:PPLN or PPKTP exhibit broad transparency from 300 – 6000 nm, while quasi-phase matched semiconductors such as OP-GaP and OP-GaAs possess excellent mid-IR transparency yet suffer from two-photon absorption in the visible to near-IR. The NEE is agnostic to these limitations and considers all possible phase matched interactions simultaneously, artificially enhancing wave components that would otherwise experience significant loss. For example, MgO:PPLN with a 20.8 $\mu$m poling period pumped at 795 nm will produce phase matched signal outputs at 944 nm and 1305 nm, with 5037 nm and 2034 nm idler outputs respectively. The longer wavelength idler will experience $\sim$10x greater attenuation loss compared to the lower wavelength [27]. We incorporate crystal absorption as part of each linear step, limiting the gain of spurious spectral components, noting that total energy is conserved only if this absorption is applied at every step. These components then cannot erroneously build up and siphon energy away from the pump.

We extend a similar argument to the cavity filter function. Previously we have found good agreement between our NEE model and experiment for OPOs with relatively simple, low dispersion cavities, where the cavity function is implemented as a super-Gaussian centered over the resonant wavelength range. This approximation has proved to be invalid for more exotic cavity designs, motivating a more rigorous approach to characterisation.

Often, in such cavities, each component has an associated coating with differing transmission bands, and a smooth filter function allows some frequencies to erroneously exhibit gain due to spectral discretisation. We model the cavity filter function by combining coating and transmission data from the component manufacturers, using numerical values where possible. This approach also allows transmission and dispersion to be applied element-wise around the cavity, reducing the size of the time window required to account for the full cavity dispersion of frequency components not present past a particular interface.

2.2.4 Duty cycle errors

Modelling periodically poled nonlinear crystals with a perfect 50:50 duty cycle artificially amplifies idealised interactions. Discussion with manufacturers reveals a "worst-case" scenario of a duty cycle between 40:60 and 60:40 at any point in the crystal, with shorter grating periods and thicker samples being prone to more significant deviations from the ideal case. We implement a poling structure based on domain widths which are drawn from a normal distribution of pseudo-random values, centred around half the specified grating period, with a standard deviation of 5%, commensurate with the periods required for quasi-phase matching in Ti:sapphire- and Yb-pumped OPOs [28,29]. An offset duty cycle reduces the gain of the fundamental QPM process while also increasing the gain for non-ideal interactions, with similar effects observed for random domain distributions.

2.2.5 Accessible parallelism

Implementation of the NEE through a SSFM is inherently a serial problem, as each step through the nonlinear medium is entirely dependent on the output of the previous step. CPU execution in reasonable timescales is readily achievable for systems where $N_z$ is low, such as the OPO containing a 1-mm PPLN crystal modelled in [10] that employed a fixed step size of $h = 1~\mu$m $(N=1000)$. The temporal evolution of the pulse cannot be split into independent steps for parallel processing, however operations on discrete spectral components can be performed independently. While multi-core CPU architectures can improve simulation times, the additional processing overheads associated with executing simple individual operations many thousands of times makes a GPU approach more attractive, as this offers high-level parallelism that is well suited for the task. Modern GPUs offer thousands of individual processors optimised for relatively simple calculations, mirroring the thousands of sample points needed in simulations, where complexity comes from the scale of the problem as a whole rather than the atomic operations performed in each step.

The benefits of parallelism scale with the number of spectral components used in the model and the GPU can simultaneously perform the error calculations required for the adaptive step size algorithm for many frequencies. As many of the operations as possible should be performed on the GPU, in order to limit the number of transfers to and from the device, since these transfers carry significant computational overheads. NVIDIA’s CUDA libraries offer a GPU-optimised Fourier transform algorithm based on FFTW [30] which further improves the performance of the linear step in the algorithm. This is particularly relevant in considering the optimum order of RK method to implement, as sequential implementations of adaptive SSFM can leverage higher order solvers to improve performance by reducing the number of FFTs performed in each pass. The increased cost of the solver is offset because the FFT step is many times more expensive on a CPU. Conversely, CUDA’s FFT kernels only account for around half the GPU processing time in our second order implementation. FFT performance scales approximately linearly with precision, so appropriate array data types must be chosen accordingly. Given that our adaptive step size algorithm sets a 0.001${\% }$ minimum deviation between Euler’s and Heun’s methods, the 0.00001${\% }$ minimum precision of 32-bit floating point values is more than sufficient for the relatively small number of compounding calculations performed in each step.

A flowchart illustrating our GPU implementation is shown in Fig. 4. Our simulation is hosted in MATLAB with NVIDIA’s CUDA platform providing an interface to the GPU. To facilitate the integration between MATLAB and CUDA, an intermediary file known as a MEX gateway is employed. This MEX gateway acts as a translation layer, converting the simulation parameters defined and initialized in MATLAB into a format compatible with CUDA code, allowing for their efficient utilisation on the GPU. Serving as a crucial entry point from the host side (CPU) to the CUDA file, the MEX gateway bridges the gap between MATLAB’s high-level environment and the intricacies of GPU architecture.

Fig. 4. Flowchart illustrating our implementation of the NEE using GPU-enabled parallelism. Split-step Fourier calculations are executed on the GPU, while component-wise transmission and dispersion properties of the OPO cavity are executed on the CPU.

Download Full Size | PDF

Within the MEX gateway, the simulation parameters are transformed into data structures that conform to the memory model of the GPU. These structures, commonly referred to as device memory, reside in the GPU’s memory space and are accessible to the CUDA code for efficient parallel processing. The MEX gateway ensures that the data are correctly packaged and transferred from the host’s memory to the device’s memory. Once the data are prepared, they are passed to the CUDA code, which forms the core of the GPU-accelerated computation. The CUDA code, written in C/C++, defines kernels, which are specialized functions designed to be executed in parallel by multiple threads on the GPU. The serialised instructions for calling these kernels and the conditional logic of the adaptive step size algorithm are implemented in host side code within the MEX CUDA file.

3. Results and experimental validation

We tested our model against two highly dispersive OPO configurations. First, we verify our implementation of parallel processing of the NEE in a theoretical system containing a long nonlinear crystal pumped by chirped pulses, comparable to examples reported elsewhere [31,32]. Second, we introduce our digital twinning approach and model an OPO comprising a short crystal with significant intracavity dispersion introduced by a pair of Pellin-Broca prisms, recently reported by Hunter et al. [19]. Verification of the GPU-based approach was carried out using an NVIDIA GeForce RTX 4090 GPU with 24 GB of VRAM and 16384 CUDA cores, with an Alder Lake Intel Core i9-12900K CPU with 5 GHz processor base frequency acting as a performance baseline. Our model can be executed on any NVIDIA GPU that supports CUDA implementation in MATLAB. Our simulation is 1-dimensional and assumes a uniform spot size along the length of the nonlinear medium, calculated from the fundamental optical mode for each OPO cavity using ray transfer matrix analysis [33]. We implement a wavelength-dependent overlap integral to scale the energy of each frequency component according to the predicted spot size.

3.1 Example 1 – high dispersion from a nonlinear crystal

We consider the linear OPO outlined the upper right of Fig. 5, with a 20-mm MgO:PPLN crystal pumped by the chirped output of an amplified 100-MHz laser with 2-W average power. We assume an ideal Gaussian spectrum with 20 nm FWHM bandwidth centered at 1034 nm, with transform-limited 78 fs pulses chirped to 1 ps through the addition of 30,000 fs$^2$ of GDD to reflect the dispersion introduced by a fiber amplifier, resulting in a peak power of $\sim$18kW. The crystal grating period is 30.0 $\mu$m, producing a 1500 nm phase matched signal output at 30 $^\circ$C. We assume cavity mirrors with R $>99.8{\% }$ over 1450 – 1800 mm and a 50${\% }$ output coupler. After accounting for additional dispersion from a second pass through the nonlinear crystal, this cavity filter function was incorporated by multiplying the field by a 10${\rm ^{th}}$-order super-Gaussian with suitably chosen wavelength extremes. The initial CPU implementation in Matlab achieved steady state in 30 round trips with a step size of 1 $\mu$m, 2$^{20}$ sample points and a temporal range of 100 ps. The reference wavelength was 1500 nm and the round trip cavity delay was 40 fs. This small offset ensures that the relative group velocity of the signal pulse has a consistent sign, and prevents discretized components that fall on either side of reference wavelength from walking away from each other, which can cause the simulated pulse to split into two parts.

Fig. 5. Left: Comparison of spectral evolution plots and final spectral intensity for CPU and CUDA methods. Intensity is relative to the undepleted pump at the crystal input. Top right: Illustration of a linear OPO containing a long nonlinear crystal. Lower right: Iterative improvements in computation time for CPU, MATLAB GPU and CUDA implementations.

Download Full Size | PDF

Figure 5 shows the evolution of the cavity intensity after each round trip for both CPU and GPU approaches, along with iterative improvements in the execution time. Both results display the broadband idler spectrum that arises from pump-idler group-velocity matching and high signal-idler temporal walkoff, as reported elsewhere [31,32]. On our 64-bit CPU the simulation took 35,130 seconds to complete, illustrating how prohibitive investigation of long nonlinear crystals with the NEE can be when adjusting even a single cavity parameter would require another 10 hours to simulate. The inclusion of the adaptive step size algorithm and 32-bit CUDA implementation resulted in a $1265\times$ improvement in execution time, with results generated in less than 30 seconds using 1.8 GB of VRAM. We note that moving from a 64-bit CPU fixed step size to a 32-bit CUDA adaptive step size will naturally produce subtle differences in the simulation output, however this is a result of the change in implementation approach, rather than a hardware effect.

Figure 6 shows the variation of $h$ in the CUDA implementation due to the adaptive step size algorithm for different round trips. We see step size $h$ modulating in accordance with the complexity of the pulse evolution process at different stages in the crystal. The initially large step size multiplier decreases as parametric down conversion begins, finding a suitable step size for that process, then further reducing the step size for multi-wave mixing processes. The multiplier then increases as an equilibrium is reached. Crucially, these processes occur at different points in the crystal as a function of round trip number, as subsequent crystal passes are seeded by the resonant signal field.

Fig. 6. Left: Comparison of spectral evolution through the 20-mm crystal for round trip 7 and round trip 30. Right: Corresponding step size evolution through the crystal.

Download Full Size | PDF

3.2 Example 2 – high dispersion from intracavity optics

In a direct comparison with experiment, we simulated the prism-based OPO reported in [19]. This system was pumped by a Ti:sapphire laser operating with a center wavelength of 803 nm and 28-nm bandwidth with 1.16 W of average power at a repetition rate of 333 MHz. The pump was focused into a 1-mm-long MgO:PPLN fan-out crystal with grating periods ranging from $\Lambda = 20.50~\mu$m to $\Lambda = 23.50~\mu$m, with the temperature held at $\sim$90 $^\circ$C. The OPO cavity was formed using a pair of Brewster-angled BK7 Pellin-Broca prisms acting as retro-reflective elements, with intracavity lenses creating a focus in the crystal, as illustrated in Fig. 7. Pump input coupling and signal output coupling were achieved via a dichroic mirror and the Fresnel reflection from a single uncoated surface of a sapphire plate, respectively.

The pump was initially simulated as a transform-limited 24-fs sech$^2$ pulse, before replacing its spectrum in the frequency domain with experimental data. Mode-matching optics and the intracavity focusing lens resulted in chirped pump pulses arriving at the crystal plane, therefore $\sim$850 fs$^2$ of GDD was added to the simulated pump pulse. The resulting peak power was 41 kW and the intensity in the crystal focus was 0.25 MW cm$^{-2}$. Including the input and output couplers, the intracavity elements contributed to a net GDD that ranged from 2000 fs$^2$ at 1000 nm to −2600 fs$^2$ at 1700 nm, crossing zero at 1352 nm. The relative group delay introduced by these elements requires a simulation time window of $\Delta t>100$ ps. Considering Eq. (4), for $\omega _{max} = 1$ PHz (300 nm) the Nyquist criterion is satisfied when $N_k = 2^{18}$. Applying the cavity function before dispersion allows frequencies outside this region to be ignored, reducing the time window to 10 ps and the corresponding number of points to $2^{14}$.

Fig. 7. Rendering of the OPO described in [19]. Our simulation includes component-wise transmission functions for the crystal and pre/post output coupling (OC).

Download Full Size | PDF

The simulation results are shown in Fig. 8. Each OPO element operates in transmission rather than reflection, therefore the cavity function cannot be approximated using a super-Gaussian. A component-wise function is implemented using manufacturer coating data, shown in the lower panels of Fig. 8(a) and (b). Material absorption in 5%-doped MgO:PPLN (red) was accounted for at each step during the NEE calculations. Idler absorption in the collimating lens was combined with anti-reflection (AR) coating losses outside the signal region to form a pre-output coupling transmission function (yellow). The wavelength-dependent Fresnel loss from the sapphire plate (${\sim }7.5{\% }$) is shown in purple. Additional AR coatings post-output coupling, along with absorption losses from the N-BK7 prisms and atmospheric water [34], are shown in green, with the net cavity function shaded in blue. The crystal was modelled after $\S$2.2.4, with a representative histogram shown in Fig. 8(d). Each simulation run implemented a pseudo-random distribution of domain widths, however we did not observe significant wavelength shifts between runs.

Fig. 8. Simulation of the prism-based OPO reported in [19]. (a) Top: Energy spectral density (ESD) plot of the output coupled power after 140 round trips. Middle: Intensity evolution over 140 round trips for −80 fs of cavity delay. Bottom: Cavity transmission function. (b) Results from (a) highlighting the 0.95 – 1.65 $\mu$m band. Top: Output coupled signal spectra (lines) and experimental data (shaded areas) for different cavity delays. (c) Output power integrated over two regions of signal operation. (d) Duty cycle histogram.

Download Full Size | PDF

An example of the cavity intensity evolution is shown in the middle panels of Fig. 8(a) and (b), normalised to the input pump power. A quasi-steady-state condition was achieved after ${\sim }60$ round trips, however we observed oscillatory behavior that is common when simulating resonant OPOs [9]. When plotting the final output spectrum, we Fourier transformed the round-trip-varying output power in the signal band to identify the oscillatory frequency and averaged the output spectra over the corresponding period to more accurately reflect the acquisition rate of our spectrometer. The upper panels of Fig. 8(a) and (b) compare the simulation output (lines) against experimental results (shaded areas) for three cavity delays, specifically −160 fs, −80 fs and −36 fs relative to the reference frame at 1352 nm. Further highlighting the parallelism enabled by the GPU, the three simulations of 140 round trips were performed simultaneously with an execution time of 34.3 s using 730 MB of VRAM. Dual-wavelength operation was reported in [19], with a weak secondary signal present at longer wavelengths arising from the minimum in net cavity group delay at 1352 nm. We find strong agreement between our simulation and experiment, resolving fine structure in the 1215 nm signal and accurate distribution of power between primary and secondary outputs, with the integrated output power in these bands shown in Fig. 8(c) for the −80 fs delay simulation. In the temporal domain, simulated pulse durations agree with experimental autocorrelation results to within ${\sim }5{\% }$.

4. Conclusion

We have demonstrated a GPU-accelerated OPO simulator that exploits high-level parallelism in order to solve the $\chi ^{(2)}$ nonlinear envelope equation over $1000\times$ faster than a CPU-based approach, enabling studies of highly dispersive systems that are otherwise computationally inaccessible. Implementing absorbing boundary conditions along with a realistic cavity filter function which incorporates manufacturer coatings, material properties and absorption losses enables a reduction in the temporal simulation window, while the use of an adaptive step-size algorithm allows for faster execution, particularly when modelling longer crystals. Our model can easily be modified to investigate more complex dispersive systems such as fiber-feedback OPOs [35,36], solving the generalised nonlinear Schrödinger equation for the fiber via the SSFM and adaptive step-size algorithm. Our approach can also be extended to simulating supercontinuum generation in aperiodically-poled waveguides [37], where GPU parallelism enables rapid iteration of chirped grating periods via optimisation of an appropriate fitness function.

Future developments in our approach will include the incorporation of $\chi ^{(3)}$ nonlinearities [37] and an extension to (2+1)D to more accurately reflect the spatial overlap of the pump, signal and idler within the focus of the nonlinear crystal. We note the importance of training such a tool on experimental data, a prerequisite for a future platform that with automated suggestion of system designs based on a desired spectral/temporal pulse output and other chosen constraints, and would encourage researchers to make their experimental data sets available for analysis.

Funding

Science and Technology Facilities Council (ST/T000651/1, ST/T003242/1); European Office of Aerospace Research and Development (FA8655-21-2-7001).

Disclosures

The authors declare no conflicts of interest.

Data availability

The source code is publicly accessible through GitHub [38].

References

1. O. Kara, F. Sweeney, M. Rutkauskas, et al., “Open-path multi-species remote sensing with a broadband optical parametric oscillator,” Opt. Express 27(15), 21358–21366 (2019). [CrossRef]

2. K. Johnson, P. Castro-Marin, O. Kara, et al., “High resolution ZrF4-fiber-delivered multi-species infrared spectroscopy,” OSA Continuum 3(12), 3595–3603 (2020). [CrossRef]

3. K. Johnson, P. Castro-Marin, C. Farrell, et al., “Hollow-core fiber delivery of broadband mid-infrared light for remote spectroscopy,” Opt. Express 30(5), 7044–7052 (2022). [CrossRef]

4. L. Maidment, Z. Zhang, C. R. Howle, et al., “Stand-off detection of liquid thin films using active mid-infrared hyperspectral imaging,” in Electro-Optical and Infrared Systems: Technology and Applications XII; and Quantum Information Science and Technology, vol. 9648D. A. Huckridge, R. Ebert, M. T. Gruneisen, eds., et al., International Society for Optics and Photonics (SPIE, 2015), p. 964804.

5. C. L. Tang and L. K. Cheng, Fundamentals of Optical Parametric Processes and Oscillators (CRC Press, 2020).

6. R. Hamerly, A. Marandi, M. Jankowski, et al., “Reduced models and design principles for half-harmonic generation in synchronously pumped optical parametric oscillators,” Phys. Rev. A 94(6), 063809 (2016). [CrossRef]

7. L. Xu, X. Zhong, J. Zhu, et al., “Efficient femtosecond optical parametric oscillator with dual-wavelength operation,” Opt. Lett. 37(9), 1436–1438 (2012). [CrossRef]

8. M. Conforti, F. Baronio, and C. De Angelis, “Nonlinear envelope equation for broadband optical pulses in quadratic media,” Phys. Rev. A 81(5), 053841 (2010). [CrossRef]

9. D. T. Reid, “Ultra-broadband pulse evolution in optical parametric oscillators,” Opt. Express 19(19), 17979–17984 (2011). [CrossRef]

10. K. Balskus, Z. Zhang, R. A. McCracken, et al., “Mid-infrared 333 MHz frequency comb continuously tunable from 1.95 to 4.0 µm,” Opt. Lett. 40(17), 4178–4181 (2015). [CrossRef]

11. R. A. McCracken and D. T. Reid, “Few-cycle near-infrared pulses from a degenerate 1-GHz optical parametric oscillator,” Opt. Lett. 40(17), 4102–4105 (2015). [CrossRef]

12. H. Wu, Z. Zhang, S. Chen, et al., “Development of a deep-ultraviolet pulse laser source operating at 234 nm for direct cooling of Al⁺ ion clocks,” Opt. Express 29(8), 11468–11478 (2021). [CrossRef]

13. K. A. Tillman, D. T. Reid, D. Artigas, et al., “Low-threshold, high-repetition-frequency femtosecond optical parametric oscillator based on chirped-pulse frequency conversion,” J. Opt. Soc. Am. B 20(6), 1309–1316 (2003). [CrossRef]

14. D. Descloux, C. Laporte, J.-B. Dherbecourt, et al., “Spectrotemporal dynamics of a picosecond OPO based on chirped quasi-phase-matching,” Opt. Lett. 40(2), 280–283 (2015). [CrossRef]

15. B. Nandy, S. C. Kumar, and M. Ebrahim-Zadeh, “Yb-fiber-pumped high-average-power picosecond optical parametric oscillator tunable across 1.3-1.5 µm,” Opt. Express 30(10), 16340–16350 (2022). [CrossRef]

16. T. Südmeyer, E. Innerhofer, F. Brunner, et al., “High-power femtosecond fiber-feedback optical parametric oscillator based on periodically poled stoichiometric LiTaO3,” Opt. Lett. 29(10), 1111–1113 (2004). [CrossRef]

17. F. Kienle, P. S. Teh, S.-U. Alam, et al., “Compact, high-pulse-energy, picosecond optical parametric oscillator,” Opt. Lett. 35(21), 3580–3582 (2010). [CrossRef]

18. K. A. Ingold, A. Marandi, M. J. F. Digonnet, et al., “Fiber-feedback optical parametric oscillator for half-harmonic generation of sub-100-fs frequency combs around 2 µm,” Opt. Lett. 40(18), 4368–4371 (2015). [CrossRef]

19. D. E. Hunter and R. A. McCracken, “Ultrashort-pulsed optical parametric oscillator employing Brewster angle prism retroreflectors,” Opt. Express 29(23), 37013–37020 (2021). [CrossRef]

20. D. E. Hunter, S. C. Robarts, and R. A. McCracken, “Brewster mirror ultrafast optical parametric oscillator with high precision wavelength tuning,” Opt. Express 31(24), 39917–39926 (2023). [CrossRef]

21. T. Voumard, M. Ludwig, T. Wildi, et al., “Simulating supercontinua from mixed and cascaded nonlinearities,” APL Photonics 8(3), 036114 (2023). [CrossRef]

22. A. Sanchez, S. Chaitanya Kumar, and M. Ebrahim-Zadeh, “CUDA-based optical parametric oscillator simulator,” Comput. Phys. Commun. 294, 108910 (2024). [CrossRef]

23. O. V. Sinkin, R. Holzlöhner, J. Zweck, et al., “Optimization of the split-step Fourier method in modeling optical-fiber communications systems,” J. Lightwave Technol. 21(1), 61–68 (2003). [CrossRef]

24. C. W. Gear, Numerical Initial Value Problems in Ordinary Differential Equations (Prentice Hall PTR, USA, 1971).

25. T. Südmeyer, J. A. der Au, R. Paschotta, et al., “Novel ultrafast parametric systems: high repetition rate single-pass OPG and fibre-feedback OPO,” J. Phys. D: Appl. Phys. 34(16), 2433–2439 (2001). [CrossRef]

26. G. Mur, “Absorbing boundary conditions for the finite-difference approximation of the time-domain electromagnetic-field equations,” IEEE Trans. Electromagn. Compat. EMC-23(4), 377–382 (1981). [CrossRef]

27. L. E. Myers, R. C. Eckardt, M. M. Fejer, et al., “Multigrating quasi-phase-matched optical parametric oscillator in periodically poled LiNbO3,” Opt. Lett. 21(8), 591–593 (1996). [CrossRef]

28. C. R. Phillips, J. S. Pelc, and M. M. Fejer, “Parametric processes in quasi-phasematching gratings with random duty cycle errors,” J. Opt. Soc. Am. B 30(4), 982–993 (2013). [CrossRef]

29. C.-T. Liu, C.-L. Tsai, J.-Y. Lai, et al., “Electro-optic measurement of averaged duty ratio for periodically poled crystals,” in Conference on Lasers and Electro-Optics, (Optica Publishing Group, 2017), p. JW2A.3.

30. M. Frigo and S. Johnson, “The design and implementation of FFTW3,” Proc. IEEE 93(2), 216–231 (2005). [CrossRef]

31. Z. Zhang, J. Sun, T. Gardiner, et al., “Broadband conversion in an Yb:KYW-pumped ultrafast optical parametric oscillator with a long nonlinear crystal,” Opt. Express 19(18), 17127–17132 (2011). [CrossRef]

32. S. C. Kumar, S. Parsa, and M. Ebrahim-Zadeh, “Fiber-laser-based, green-pumped, picosecond optical parametric oscillator using fan-out grating PPKTP,” Opt. Lett. 41(1), 52–55 (2016). [CrossRef]

33. H. Kogelnik and T. Li, “Laser beams and resonators,” Appl. Opt. 5(10), 1550–1567 (1966). [CrossRef]

34. I. Gordon, L. Rothman, R. Hargreaves, et al., “The hitran2020 molecular spectroscopic database,” J. Quant. Spectrosc. Radiat. Transfer 277, 107949 (2022). [CrossRef]

35. F. Kienle, D. Lin, S.-U. Alam, et al., “Green-pumped, picosecond MgO:PPLN optical parametric oscillator,” J. Opt. Soc. Am. B 29(1), 144–152 (2012). [CrossRef]

36. C. F. O’Donnell, S. C. Kumar, T. Paoletta, et al., “Widely tunable femtosecond soliton generation in a fiber-feedback optical parametric oscillator,” Optica 7(5), 426–433 (2020). [CrossRef]

37. M. Rutkauskas, A. Srivastava, and D. T. Reid, “Supercontinuum generation in orientation-patterned gallium phosphide,” Optica 7(2), 172–175 (2020). [CrossRef]

38. , "OptiFax - a GPU accelerated full-field model of optical parametric interactions,” Github. https://github.com/McCrackenLab/OptiFax.

GPU-accelerated full-field modelling of highly dispersive ultrafast optical parametric oscillators

Abstract

1. Introduction

2. Model

2.1 Algorithm

2.1.1 Nonlinear envelope equation

2.1.2 Adaptive step size algorithm

2.2 Implementation

2.2.1 Discretisation

2.2.2 Aliasing

2.2.3 Crystal transparency and cavity filter function

2.2.4 Duty cycle errors

2.2.5 Accessible parallelism

3. Results and experimental validation

3.1 Example 1 – high dispersion from a nonlinear crystal

3.2 Example 2 – high dispersion from intracavity optics

4. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (8)

Equations (5)

Optics Express

Sebastian C. Robarts	https://orcid.org/0009-0004-5023-4953
Diana E. Hunter	https://orcid.org/0000-0001-6011-0795
Derryck T. Reid	https://orcid.org/0000-0002-0385-4939
Richard A. McCracken	https://orcid.org/0000-0002-9294-1910