## Abstract

This paper presents a computationally efficient wavefront aberration prediction framework for data-driven control in large-scale adaptive optics systems. Our novel prediction algorithm splits prediction into two stages: a high-resolution and a low-resolution stage. For the former, we exploit sparsity structures in the system matrices in a data-driven Kalman filtering algorithm and constrain the identified gain to be likewise sparse; for the latter, we identify a dense Kalman gain and perform corrections to the suboptimal predictions of the former on a smaller grid. This novel prediction framework is able to retain the robustness to measurement noise of the standard Kalman filter in a much more computationally efficient manner, in both its offline and online aspects, while minimally sacrificing performance; its data-driven nature further compensates for modeling errors. As an intermediate result, we present a sparsity-exploiting data-driven Kalman filtering algorithm able to quickly estimate an approximate Kalman gain without solving the Riccati equation.

© 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. INTRODUCTION

Adaptive optics (AO) systems aim to compensate for incoming phase aberrations in optical systems. The incoming light is split between a focal plane camera and a wavefront sensor (WFS). A reconstruction of the aberrated wavefront is developed from the WFS image, which is used by a controller that regulates a deformable mirror (DM) that is able to correct for the phase aberrations. Determining a minimum variance wavefront reconstruction has been extensively studied [1,2]. Due to the existence of modes of the wavefront that are not visible to the sensor and the presence of measurement noise, this is not trivial. Usually, these issues are addressed by including prior information regarding the statistical properties of the aberration and noise.

Due to a delay within the AO control loop, each control input is used to correct an aberration ahead in time, meaning that wavefront prediction techniques are important for effective correction. Extending the minimum variance wavefront reconstruction to the online prediction case, with a dynamic wavefront subject to process and measurement noises, suggests the use of a Kalman filter (KF) in an optimal control framework [3–5]. Because computing the steady-state Kalman gain involves solving the computationally complex discrete algebraic Riccati equation (DARE), straightforward KF implementations are impractical for large-scale applications. As a result, multiple KF-based methods have been proposed to decrease the computational complexity in the large-scale AO context [6–10].

Alongside their large computational complexity, KF-based methods in AO have another important drawback: the computation of the Kalman gain via the DARE requires knowledge of the temporal dynamics of the wavefront and of the statistical properties of the process and measurement noises. In the Kalman filtering literature, data-driven KFs have been proposed to avoid the need for prior knowledge of the statistical properties of both the process and measurement noises [11,12], instead relying on measurement data and the deterministic part of a state-space model; however, practical application of these techniques can be challenging in large-scale systems when limited data are available, and due to their computational complexity. While data-driven Kalman filtering has not previously been brought to AO, the reduction of reliance on prior information has motivated the development of data-driven optimal controllers [13–15] in AO, and the reduction of their computational complexity has led to an increased attention toward the use of structured models, as matrix structures such as sparsity and tensor structures appear in models describing the turbulence dynamics [15–18].

Furthermore, significant work has been dedicated to the design of fast wavefront reconstruction algorithms for online aberration correction [10,19–21], which must accompany the sampling frequency of the system in large-scale AO systems. Among these, the idea of conducting operations on multiple grids of different densities has been explored in methods such as Refs. [20,21] and is explored likewise in the present work. Specifically, we propose a second low-resolution grid designed to intuitively fit into the discussed modeling and data-driven Kalman filtering frameworks with lessened computational burden.

This paper addresses both the offline and online computational complexities
involved in wavefront prediction in a data-driven setting. First, it
introduces sparse matrix techniques into data-driven Kalman filtering
using AO modeling insights, allowing obtention of the Kalman gain more
efficiently than with the DARE and with less reliance on prior knowledge,
and serving as a first step for the second contribution. Second, it
proposes a framework, the high-low-resolution filter (HLRF), in which
wavefront prediction is split into a high-resolution stage and a
low-resolution one; by enforcing sparsity in the gain identified for the
high-resolution stage and correcting the error thus induced with the
low-resolution stage, a computationally efficient *online* prediction framework is created, while reducing the
*offline* (identification) time complexity
down to linear in the total number of lenslets in the WFS array.

The novelty of the new proposed data-driven framework for predictive AO control is twofold with respect to existing prediction solutions [7–10,19–21]. First, it incorporates information about the process and measurement noises without assuming it is known beforehand, instead deriving it implicitly from measurement data. Second, efficient algorithms, both for the off- and online computations, are derived; in particular, the HLRF sacrifices some of the performance of the KF due to heuristic choices, with simulations showing that it nonetheless achieves high performance in comparison with the full data-driven and DARE-based KFs, with improved prediction times.

This paper is structured as follows: Section 2 provides an introduction to Kalman filtering theory in standard AO systems, including the modeling of the WFS and atmospheric turbulence; the data-driven Kalman filtering algorithm from [22] is overviewed in Section 3; Section 4 discusses an implementation of this algorithm in AO that exploits sparsity structures in the system matrices and introduces the HLRF. The results of both data-driven Kalman filtering and the HLRF are discussed in Section 5. The main conclusions and practical applicability of the methods are addressed in Section 6.

*Notation:* When considering the temporal
dynamics, the current discrete time-step is denoted by a subscript
$k$, i.e., ${\phi _k}$ represents the wavefront at moment
$k$ in time. The pseudo-inverse of a
rectangular matrix is denoted by ${A^\dagger}$. The computational complexity, which is
considered to be the order of magnitude of the total number of elementary
arithmetic operations, is expressed by ${\rm O}(\cdot)$. Rather than including both the number
of pixels $n$ of the predictions and the number
$m$ of lenslets in the WFS array in the
complexities, $m$ will take the place of $n$, using the fact that $n \approx m/2$ in large-scale AO applications. Notation
$w \sim {\cal N}(\mu
,Q)$ is used to describe a Gaussian random
(vector) signal $w$ with mean $\mu$ and covariance $Q$, and E $(\cdot)$ denotes the expectation operator.
Operator $\parallel \cdot \parallel
_{{\rm F}}$ represents the Frobenius norm.

## 2. KALMAN FILTERING IN AO

This section provides an introduction to the application of standard KF theory in AO systems. The theory presented in this section can be found in various textbooks; see, for example, [23] for more information about AO and [24] for more information about system identification and Kalman filtering.

#### A. Modeling for AO

This subsection will introduce the components of a standard AO system for astronomy and will discuss the aberration dynamics caused by atmospheric turbulence. A square aperture is considered so as to accommodate the HLRF, as detailed later.

### 1. WFS

We consider a grid of $L \times
L$ lenslets measuring local
wavefront gradients (often called *slopes*) ${y_k} \in {{\mathbb
R}^m}$ corresponding to the subaperture
of each lenslet (as is the case with Shack–Hartmann or, for
example, pyramid sensors). Assuming a square aperture and *Fried geometry*, $m =
2{L^2}$, and the wavefront is sampled on
a grid of $(L + 1) \times (L +
1)$ *pixels*. Defining the total number of pixels as
$n = (L +
{1)^2}$, the relation between the sensor
signal ${y_k}$ and the discretized wavefront
${\phi _k} \in
{{\mathbb R}^n}$ is formulated as

*sensor geometry matrix*and ${v_k}$ is a measurement noise vector. Usually, ${v_k}$ is assumed to be a white Gaussian measurement noise vector with diagonal covariance matrix $R$, i.e., ${v_k} \sim {\cal N}(0,R)$ [3,8,15].

This sensor configuration is unable to measure two particular modes of the wavefront phase: the piston (spatially constant) and waffle (checkerboard) modes, which is reflected in a rank deficiency of $G$, with rank $(G) = n - 2$.

### 2. Modeling the Turbulence Dynamics

The wavefront aberrations caused by atmospheric turbulence are time-varying and commonly assumed to adhere to Taylor’s frozen flow assumption [25], according to which atmospheric turbulence is represented by multiple “frozen” turbulence layers, each moving at its own constant speed in its own direction. A common choice to model the temporal evolution of atmospheric turbulence is the vector auto-regressive (VAR) model of order 1 [3,4,8,26],

where ${w_k}$ is assumed to be Gaussian white noise ${w_k} \sim {\cal N}(0,Q)$, uncorrelated with ${\phi _k}$. Since the speed and direction of the phase screens are typically unknown, a common choice is to assume the state-transition matrix $A$ is diagonal, ignoring the temporal dynamics of the wavefront [3,4,8]. Although this model may be sufficiently accurate when the movement of the turbulence in between consecutive time steps is negligible, a better $A$ would represent this movement, especially when the temporal evolution is not negligible. Defining the covariance matrices: ${C_{\phi ,j}} = {\rm E}({\phi _{k + j}}\phi _k^{\rm T})$, $A$ in model (2) is given by In practice, covariances ${C_{\phi ,0}}$ and ${C_{\phi ,1}}$ are unknown and would have to be estimated using a finite set of wavefront data. Let the wavefront phase data, available from time step $0$ to $N$, be arranged into “past” and “future” data matrices,### 3. Modeling Errors

On top of the wavefront data set being estimated, a major cause of modeling errors is that the true dynamic system driving the turbulence is more complex than the simple VAR-1 model of Eq. (2). For one, when considering the atmosphere as a stack of multiple dynamic phase screens with each phase screen dynamics exactly represented by a VAR-1 model, their combined influence in the observed wavefront ${\phi _k}$ is no longer described by any VAR model in terms of ${\phi _k}$. As a consequence, the VAR-1 model can only approximate the true system dynamics.

The VAR-1 model is additionally only capable of accurately representing movements of the wavefront phase screen that are exact pixel-size shifts in between time steps, given the definition of a pixel as in Section 2.A.1. This issue is addressed in the appendix of [27], which proposes using a lagged version of the VAR-1 model in Eq. (2) such that it more closely represents whole-pixel movement, and running alternating KFs at no additional computational cost.

Both of these modeling errors result in VAR-1 models with nonwhite ${w_k}$, thus violating one of the assumptions made in KF design and negatively impacting its performance. Due to its data-driven nature, the Kalman filtering algorithm presented in this paper is nonetheless capable of compensating for a certain degree of modeling error, as illustrated in Fig. 3. More details on modeling errors are included in [27].

#### B. Kalman Filtering

The combination of the VAR-1 turbulence model [Eq. (2)] and the sensor equation [Eq. (1)] yields the following state-space model:

Since the model is linear time-invariant (LTI), the prediction of the steady-state KF is given in observer form by Eq. (6) and in innovation form by Eq. (7) [24],

where ${e_k} \sim {\cal N}(0,{R_e})$ is the prediction-error sequence defined via The steady-state Kalman gain $K$ is obtained from where $P \in {{\mathbb R}^{n \times n}}$ is the state-prediction-error covariance matrix, given by the positive-definite solution to the DARE,The computational complexity of the KF will be split into online and
offline complexities. The *online*
computational complexity will refer to the computations that have to
be performed within each sample time of the KF. In the case of the
standard LTI KF of this section, the online computation pertains to
Eq. (7), which has a
computational complexity of ${\cal
O}({m^2})$. The combined computational
complexity of all the other operations will be referred to as the
*offline* computational complexity.
Finding the solution to the DARE and applying Eq. (9) are, with a complexity of
O$({m^3})$, the offline bottlenecks in the
computation of the Kalman gain $K$.

## 3. DATA-DRIVEN KALMAN FILTERING

The previous section discussed how the use of standard Kalman filters has several difficulties when applied to large-scale AO applications. The data-driven Kalman filtering algorithm by authors Juang and Chen [22], which estimates the Kalman gain directly from measurement data, is here introduced and shortly derived. The use of this data-driven Kalman filtering algorithm instead of the standard DARE-based KF is motivated by the following three properties:

- • The data-driven algorithm directly identifies the Kalman gain without solving the DARE nor applying Eq. (9), replacing these with equations that are apt for exploitation of the sparsity of the system matrices in AO.
- • It requires only (an estimate of) the deterministic part of the state-space model in Eq. (5) (matrices $A$ and $G$) and measurement data, implicitly accounting for the stochastic part of the model (matrices $Q$ and $R$).
- • It compensates for a certain degree of modeling error by implicitly considering the modeling errors to be a form of process noise [28] and identifying a corresponding Kalman gain. Given the sources of modeling error discussed in Section 2, it is possible for the identified Kalman gain to outperform the DARE-based gain.

Let matrices ${Y_{i,j,N}}$ and ${Y_{i,N}}$ be defined as

Let there be a total of ${N_d} + 1$ time steps of measurement data from time step 0 to ${N_d}$. Then, stacking the left-hand side of (13) horizontally into matrix ${Y_{s,N}}$,

where $N = {N_d} - s - p + 2$, so that ${y_{s + p + N - 2}} = {y_{{N_d}}}$. The first step toward identification of the Kalman gain is estimation of $G{\cal L}$, which can be done in a linear least-squares setting,With $G{\cal L}$ estimated from Eq. (15), one has obtained estimates of the observer form Markov parameters ${{\cal A}_j}$ up to $j = s$. From these, the next step is to compute the innovation form Markov parameters,

The data-driven Kalman filtering procedure of [22] is summarized as follows:

- (2) Solve the least-squares problem in Eq. (15) for an estimate $\widehat{G{\cal L}}$ of $G{\cal L}$.

## 4. SPARSE METHODS FOR LARGE-SCALE AO

Solving the optimization problems in Eqs. (15)–(20) for an arbitrary system with dense system matrices has a computational complexity of O$({m^3})$. This section will describe the intuitive reasoning behind sparsity patterns in AO, enabled by the frozen-flow nature of turbulence [17,18] and exploit them in two novel algorithms. The first, in Subsection 4.B, combines the data-driven Kalman filtering of Section 3 with sparse VAR modeling to quickly estimate an approximate Kalman gain, but still in ${\rm O}({m^3})$ time. The second algorithm, in Subsection 4.C, aims to reduce the online computational complexity by directly enforcing sparsity in the identified gain and compensating for the error thus induced with an additional set of fast operations, while additionally reducing the offline identification complexity down to ${\rm O}(m)$.

#### A. Sparse Turbulence Modeling

The sparsity of the matrix $A$ can easily be illustrated by recalling the frozen turbulence assumption. Assuming a wind speed of $\nu$, the frozen turbulence assumption implies that, when $\nu$ is at least overestimated, and without assuming knowledge of the wind direction, one knows that information travels only within a radius of $r = \tau \nu$ in between consecutive time steps, $\tau$ being the sampling time. Because $A$ in Eq. (2) establishes this movement within a single sampling period, a sparsity pattern can be set beforehand to directly identify a sparse model.

Consequently, for a single turbulence layer, the sparsity pattern of $A$ is dictated by the wind speed given in pixels per time step, here denoted by $\omega$,

such that $\delta$ will be referred to as the*pixel width*, which is defined as the width of a single pixel: $\delta = D/L$. The sparsity structure of $A$ is then proposed as follows: any entries of $A$ that relate pixels further than $\omega$ pixel widths away from each other should be zero. With the adoption of this

*a priori*structure for matrix $A$, a sparse multibanded matrix is obtained, similar to the structure adopted in [17].

In practice, a single layer may not accurately represent the total atmosphere, and the wind speed $\nu$ should be replaced by the dominant wind speed of all the phase screens. The exact wind speed may also be unknown; it is therefore necessary to approximate the sparsity pattern based on the limited information available: if a rough estimate (or overapproximation) of the wind speed is available, this can be used instead in Eq. (21). For example, when $\delta = 0.05$ m, $\tau = 0.005$ s and an overestimation $\nu \le 20$ m/s is available, then $\omega \le 2$ pixel widths per sampling time.

Let ${{\cal S}_A}$ denote the set of valid $A$ matrices with the previously described sparsity pattern. The linear least-squares problem in Eq. (4), reformulated as

#### B. Sparse Data-Driven Kalman Filtering

This section deals with the introduction of sparsity into the data-driven Kalman filtering algorithm of Section 3 and its use in AO. The algorithm is divided into two linear least-squares problems: the first problem fits a VAR model to measurement data, and the second one determines the Kalman gain from these models. These are explored separately ahead.

### 1. Sparse VAR Slope Modeling

In order to obtain efficient solutions to the optimization problem presented in Eq. (15), which fits a VAR-$s$ model to slope data, it is important to obtain a sparse representation of matrix $G{\cal L}$. Consider a VAR-$s$ model for the slopes,

In the model of Eq. (23), each ${{\cal A}_i}$ coefficient establishes a relationship between slope vectors $i$ time steps apart, so each coefficient should have a sparsity radius ${r_i}$ that increases with $i$.

Let ${{\cal S}_{G{\cal L}}}$ refer to the set of valid $G{\cal L}$ matrices with a sparsity pattern fulfilling the description above, and constrain the first least-squares problem of the data-driven Kalman filtering procedure [Eq. (15)],

Furthermore, the consequently low number of parameters to identify permits, for large-scale application, the use of “tall” data sets, with a number of time -steps of data much smaller than the number $m$ of lenslets in the WFS array.

### 2. Retrieval of the Kalman Gain

After constructing the sparse estimate $\widehat{{{\cal O}_p}K}$ from Eqs. (18) and (19), which involves only negligible ${\rm O}(m)$ sparse matrix products and additions for sensible choices of $s$ and $p$, the remaining final step is to obtain a computationally efficient solution to Eq. (20). However, since the Kalman gain itself is dense, the approach used to formulate the sparse solutions in Eqs. (22) and (24) is not applicable in this case.

Instead, the linear least-squares problem of Eq. (20) has a highly sparse regressor ${{\cal O}_p}$ and regressand $\widehat{{{\cal O}_p}K}$, which enables the use of dedicated solvers. The sparse rectangular solver built into MATLAB’s \ operator is [29] as of R2020b, and solves Eq. (20) in roughly ${\rm O}({m^3})$ time, as is the case without leveraging sparsity or finding the gain with the DARE, but is efficient in practice due to its sparsity-exploiting nature, leading to the results of Fig. 2 that confirm that the entire identification procedure is far faster than the DARE.

Alongside the computational complexity of its computation, which remains the bottleneck for applications of arbitrarily large scale, another major drawback of the dense matrix $K$ is that it also directly influences the online predictions, given by Eq. (7). Since the online computations involve matrix-vector multiplications with $K$, the online computational complexity will scale as $O({m^2})$. In order to obtain a scalable online computational complexity, the use of the large and dense matrix $K$ should be avoided. Section 4.C introduces a new framework, partly based on the presented sparsity-exploiting data-driven KF, that aims to decrease the computational complexity of the online computations and, in addition, decreases the offline computational complexity further.

### 3. Implementation Details and Approximations in AO

Practical issues arise when applying the proposed data-driven Kalman filtering method to AO systems using the approximate VAR-1 turbulence model derived in Section 2. This subsection acknowledges and discusses these issues.

The assumption that ${(A - KG)^s} \approx 0$ any small $s$ generally does not hold for the AO case. Because sparsity is progressively lost with increases in $s$, $s$ is nonetheless consciously kept low, meaning that the resulting Kalman gain will be approximate. In practice, the presence of the modeling errors (see Section 2) implies that the identified Kalman gain may outperform the DARE-based gain even for low values of $s$ and that arbitrarily increasing $s$ does not necessarily mean better performance. The influence of $s$ on the performance of the KF will be addressed in the simulation results of Section 5.

Furthermore, a near-$1$ eigenvalue of $(A - KG)$ corresponding (approximately) to the piston mode leads as well to the appearance of column-wise constant distortions in the identified Kalman gain due to the implicit pseudo-inversion of ${{\cal O}_p}$ when solving for $\hat K$ in Eq. (20). The column-wise means of the identified Kalman gain should therefore be removed after retrieval with Eq. (20), as should the piston mode of the filter’s predictions, lest their mean become unstable and lead to numerical problems.

#### C. HLRF

The computational complexity of the standard KF prediction operation [Eq. (7)] is ${\rm O}({m^2})$. Even with a sparse $A$, the denseness of the Kalman gain remains an issue. This section presents an algorithm that aims to have the noise robustness of Kalman filtering, reduce the computation time of the online prediction operation, and improve the time complexity of the offline identification procedure. The algorithm separates prediction into two distinct stages, one in the original array using only sparse matrix operations, and one in an artificially constructed low-resolution array where dense operations are acceptable without significant computational burden. These are as follows:

- • The
**high-resolution stage**performs prediction in the original, potentially extremely large fine array. In this stage, only sparse operations are acceptable. To achieve this, high sparsity is imposed in the identified gain, resulting in a data-driven sparse observer that is capable of predicting detailed local high-spatial-frequency structures in the wavefront phase, but fails to satisfactorily account for its overarching shape. - • The
**low-resolution stage**predicts the remainder of the wavefront in a coarse, low-resolution array using a (dense) data-driven KF. In fact, this stage is effectively predicting the state prediction errors of the sparse observer of the high-resolution stage; these prediction errors comprise low-spatial-frequency modes of the wavefront phase, which should not require an array as dense as the original one for prediction. This low-resolution array is a mathematical construct whose measurements are built from measurement data from the original array using sparse operations likewise, hence demanding no hardware modification.

The algorithm is formulated for square lenslet arrays due to the current construction of the low-resolution stage, which is done by designing each lenslet of the low-resolution array to be a square comprising lenslets of the original high-resolution array. Extension to circular apertures is briefly discussed in Section 6.

### 1. Sparse Predictor Gain

The HLRF algorithm is based upon setting a radius-based sparsity pattern on the identified gain, analogous to those in Eqs. (22) and (24); denote by ${{\cal S}_K}$ the set of potential gains ${K^*}$ with such a sparsity pattern, and constrain Eq. (20) accordingly,

Solving Eq. (25) was observed to systematically produce asymptotically stable observers. These, with a tight constraint on the gain, are able to accurately predict high-spatial-frequency modes of the wavefront, but perform poorly for the low-spatial-frequency modes, as shown in Fig. 5 of Section 5, which motivates the use of the low-resolution stage. Note that unlike the dense gain of Section 4.B, the column-wise means of the sparse gain should not be removed, as Eq. (25) will not fit for the distortions, and removal of the means would destroy sparsity.

### 2. HLRF Formulation

This subsection overviews the HLRF algorithm. The low-resolution lenslet array is of size $L^\prime \times L^\prime $; define $m^\prime = 2(L^\prime {)^2}$ and $n^\prime = (L^\prime + {1)^2}$. The exact definitions of the low- and high-resolution sampling arrays will be discussed further in Section 4.C.3. With sparse gain ${K^*}$, the high-resolution stage is given by

where $\hat \phi _k^{[{\rm h}]} \in {{\mathbb R}^n}$ is its wavefront prediction for time $k$ and the measurement prediction errors of the high-resolution stage $e_k^{[{\rm h}]} \in {{\mathbb R}^m}$ are Next, for the derivation of the low-resolution stage, define a system (still in the original high-resolution array) whose states are the wavefront prediction errors of the high-resolution stage, The dynamics of this wavefront prediction error are given byLet a superscript prime indicate matrices in the low-resolution array, which are obtained as described below. Using this notation, the wavefront prediction given by the HLRF is computed as follows:

In Eq. (29), the Kalman gain $K^\prime \in {{\mathbb R}^{n^\prime \times m^\prime}}$ is the low-resolution (dense) Kalman gain, herein identified using the data-driven method of Section 3, considering ${\cal H}{e_k}$ as measurement data, and $(A - {K^*}G)^\prime $ and $G^\prime $ as system matrices. The state-transition matrix of the low-resolution stage, $(A - {K^*}G)^\prime $, with available wavefront phase data, can be obtained following the procedures of Section 2, after running the measurement data through the high-resolution stage first and subtracting its wavefront predictions from the corresponding wavefront data as per Eq. (27). Matrix $G^\prime \in {{\mathbb R}^{m^\prime \times n^\prime}}$ is built exactly like $G$ but for the dimensions of the low-resolution array.

### 3. Definition of the Low-Resolution Array

Whereas the high-resolution array is defined by the lenslets of the WFS, the low-resolution array is a mathematical construct that is not the result of any hardware within the AO loop. This subsection will define the matrix ${\cal H}$ introduced in Section 4.C.2.

In Eq. (29), the
measurement prediction errors $e_k^{[{\rm
h}]}$ were mapped onto the
low-resolution array by matrix ${\cal H}$. In order to define this
mapping, consider the problem of transforming a set of slope
measurements ${y_k}$ obtained from an
$L \times
L$ array of WFS lenslets into an
artificial set of measurements ${z_k}$ that give the slopes on a
lower-dimensional $L^\prime \times
L^\prime $ sampling grid, $L^\prime \ll
L$. Moreover, assume that
$L^\prime
$ is chosen such that
${L_{\rm u}} =
L/L^\prime $, the width of each lenslet of
the low-resolution array, henceforth called a *unit*, is an integer value.

Matrix ${\cal H} \in {{\mathbb R}^{m^\prime \times m}}$ linearly combines the measurements from the original array,

such that, disregarding measurement noise, ${z_k}$ is related to the wavefront sampled on a low-resolution array, denoted by ${\psi _k} \in {{\mathbb R}^{{n^\prime}}}$, via The low-resolution wavefront ${\psi _k}$ can be defined as a selection of entries of ${\phi _k}$ that are shared by the low-resolution array. Let matrix $Z \in {{\mathbb R}^{n^\prime \times n}}$ of ones and zeros define this selection, Combining Eqs. (30)–(32), the problem lies in finding a combination matrix ${\cal H}$ such that for any arbitrary ${\phi _k}$. In the present setting, in which the units are square, and as detailed in Appendix A, it is possible to find an exact solution to Eq. (33) for all ${\phi _k} \in {{\mathbb R}^n}$ with a highly sparse ${\cal H}$ by solving the following least-squares problem:## 5. TUNING AND SIMULATION RESULTS

This section overviews the performance of the proposed algorithms with
respect to quality of the predictions and execution times. Prediction
quality is measured by the mean squared 2-norm of the
**piston-removed** wavefront prediction error, normalized by the
mean squared 2-norm of the piston-removed incoming wavefront; this measure
of performance is henceforth referred to as normalized mean-squared error
(NMSE),

As a proof of concept for the proposed algorithm, a numerical simulation study of a standard AO system is performed. A square telescope aperture is used to match the square array of WFS lenslets assumed in the obtention of matrices ${\cal H}$ and ${\cal I}$ in Section 4.C.3. The setting of the analyses is the following:

- • The sensor array is an $L \times L$ square array configured in Fried geometry, with $L = 60$. Square arrays were chosen so as to accommodate our HLRF and allow it to be compared to the alternatives; square arrays are adopted throughout for consistency.
- • The turbulence parameters are set as follows: the Fried parameter is ${r_0} = 0.1$ m, outer scale is ${L_0} = 25$ m, and the telescope aperture width is $D = 4$ m. The wind speed is $\omega = 0.25$ pixels per time step, which, to illustrate, corresponds to $v \approx 8.2$ m/s assuming a sampling frequency of 500 Hz [see Eq. (21) for details]. The signal-to-noise ratio (SNR) of the slope measurements (of both the simulation and identification data sets) is set to 5 dB.
- • As for the data-driven Kalman filtering parameters, the VAR model orders $s$, as seen in Eq. (23) and used in Eq. (24), are specified prior to each specific set of results, and $p = 2$ always, as it was observed to produce the best results; the radial constraints applied to the VAR model of the slopes in Eq. (24) are ${r_1} = {r_2} = 1.5$, ${r_3} = {r_4} = 2$, ${r_5} = 2.5$. In the low-resolution stage, the parameters are always set to ($s^\prime = 4$, $p^\prime = 2$).
- • The turbulence dynamics are represented by a single translating phase screen, simulated using the Object-Oriented MATLAB Adaptive Optics (OOMAO) toolbox [30]. The sensor signals were obtained simply via the application of Eq. (1) to the simulated turbulence, giving us direct control over the SNR of the slope measurements.
- • The radius-based constraint of the sparse gain, whose width is denoted ${r_K}$, was here implemented as a square of lenslets around each pixel: a constraint of width zero is such that each pixel accepts only information from the four lenslets immediately adjacent to it; each unit increase in the constraint expands the square one lenslet outward. These implementation details are arbitrary, and another user can define the constraint otherwise.
- • The matrix-vector multiplication (MVM) method used herein as a benchmark predicts the wavefront with a regularized least-squares estimator of ${\phi _k}$ and a one-step progression via multiplication with $A$,

#### A. Tuning the Model Order

The model order $s$ is a vital parameter of our identification procedures. A larger $s$ yields a VAR model with more time steps of information to smooth out measurement noise, but the added Markov parameters, each with equal or less sparsity than the previous, increase the identification time. Additionally, it has been observed that in the presence of erroneous modeling, $s$ cannot be increased arbitrarily for better performance.

The influence of $s$ in the NMSE [Eq. (35)] of both the identified
KF and the HLRF is illustrated in Fig. 1. The best performance of the data-driven KF was attained for
$s = 4$, with slight drops visible for
larger $s$. As for the HLRF, the tighter the
constraint applied to the gain, the sooner performance is maximized
with respect to $s$; for example, for $({r_K} = 3)$, which is used in results below,
$s = 3$ yielded the best performance of the
HLRF. Figure 2 shows how the
time taken for *offline* identification of
the gain (Kalman gain, in the case of 4.B) using the methods from Section 4 increases with $m$ and $s$, and further compares it with that
of the DARE.

#### B. Performance and Prediction Times

Figure 3 presents a comparison between the NMSE obtained with the DARE-based KF, the MVM method, the identified KF (from Section 4.B), and the HLRF algorithm with varying parameters. Good relative performance of the HLRF is achieved early in the values of its tuning parameters, and both data-driven procedures can outperform the DARE-based KF due to the modeling error induced by a fractional $\omega$, as suggested in Section 2. The error-time trade-off is made evident with Fig. 4. Notice that, for instance, the (${r_K} = 3$, $L^\prime = 10$) combination yields a better NMSE than the DARE-based KF, while being an order of magnitude faster than dense Kalman filtering and MVM.

#### C. Spatial Frequency Domain Performance

As argued in Section 4, the sparse data-driven observer of the high-resolution stage of the HLRF fails to predict the low-spatial-frequency modes of the incoming wavefront as accurately as a dense data-driven KF does. This motivated the creation of the low-resolution stage and is illustrated in Fig. 5 alongside results for the DARE-based KF and MVM. Although the low frequencies are generally less present in the prediction errors than the high frequencies, the error induced by sparsity is sufficient to bring the NMSE [Eq. (35)] of the sparse observer slightly over that of MVM. Worse models further raise the importance of the low-resolution stage [27].

## 6. CONCLUSIONS AND RECOMMENDATIONS

This section summarizes the main conclusions and addresses the practical applicability of the developed data-driven KF by presenting several recommendations for future research.

#### A. Conclusions

This paper has presented a new data-driven wavefront prediction method for real-time application in large-scale AO systems. By exploiting sparsity structures within the system matrices, we first developed a data-driven Kalman filtering method capable of, from measurement data, quickly identifying an approximate Kalman gain, but still in ${\rm O}({m^3})$ time.

The novel HLRF was further presented, which splits the wavefront prediction into a low- and a high-resolution stage. With the enforcement of sparsity in all matrices of the high-resolution stage, linearity with respect to $m$ of both the identification (offline) and prediction (online) computational complexities was obtained, as the low-resolution stage is designed to be computationally light. This algorithm exchanges some of the performance of the KF for lower prediction times. Because the complexities are reduced to linear in $m$ by shifting this dependence onto tuning parameters that should take the array size into consideration, it was shown in the example of a $60 \times 60$ lenslet array that substantial improvements to prediction times demand only minimal sacrifices in performance compared to dense data-driven Kalman filtering.

#### B. Applicability and Future Research

Although the results presented in Section 5 are based on simplified numerical simulations only, the HLRF of Section 4.C shows potential as an accurate real-time wavefront prediction method for large-scale AO systems. However, several issues still need to be addressed to make this method practically applicable. Note, however, that the radius-based sparsity of Sections 4.A and 4.B is by default applicable to nonsquare apertures, and that 4.B thus presents an already realistically applicable method for identification of the Kalman gain for AO systems.

The main issues that remain to be addressed in the HLRF are the limitations imposed upon the (original) sensor array by the current formulation of the low-resolution array and the corresponding matrix ${\cal H}$. First, the algorithm is still to be extended to nonsquare lenslet arrays (e.g., corresponding to circular or annular apertures); second, the width of the original array currently must be divisible by that of the low-resolution array, which limits the choices of both widths.

The issue of algorithm extension to nonsquare arrays has been likewise
reported in the literature of computationally efficient wavefront
estimation methods; see, for example [15], where zero padding is proposed to artificially form a
square data array from a nonsquare one. Zero padding can, however,
lead to significant estimation errors. Since the low-dimensional array
is merely a mathematical construct, future research could find
alternative definitions of ${\cal H}$ that enable additional freedom in
the shape of the low-dimensional units, which could then accommodate
nonsquare arrays and more choices of widths. Furthermore, the results
shown herein explored only linear interpolation from the
low-dimensional array to the high-dimensional one. Interpolation
methods that better replicate the shape of a wavefront should
naturally be explored. The method has so far only been tested in open
loop. Due to its low online computational time, the method could be
applied in a closed-loop environment, where the predictions pertain to
the *residual* wavefront, whose frequency
content should differ from that of the uncorrected incoming wavefront.
In order to properly assess its closed-loop performance, the algorithm
should be paired with a control algorithm that uses its predictions
and be tested in a more detailed simulation study, for example using
OOMAO [30], or in a laboratory
setup, and should be compared to existing fast data-driven optimal
control methods such as [15,17].

## APPENDIX A: DETAILED SPARSE CONSTRUCTION OF ${\cal H}$

Recall that the low-resolution array consists of larger lenslets, here
called *units*, that are ${L_{\rm u}}$ high-resolution lenslets wide.
Consider the example of Fig. 6,
with $L = {L_{\rm
u}}$ and $L^\prime = 1$, in order to isolate a single
$2 \times 2$ unit. Let a subscript
$x$ or $y$ indicate the axis to which a
particular slope corresponds. In this example, $y_{{\rm
x},k}^{(1)}$ and $y_{{\rm
y},k}^{(1)}$ are given by

In truth, Eq. (A7) takes ${\rm O}({m^3})$ time for fixed $L^\prime $ and ${\rm O}({m^{3/2}})$ time for $L^\prime \approx \sqrt L$, but the time effectively taken to solve it is so insignificant regardless (using MATLAB and a sparse ${G_{\rm u}}$), even for unrealistically massive units of tens of thousands of lenslets themselves, that it is ignored throughout the paper.

## Funding

H2020 European Research Council (339681); Seventh Framework Programme (2007-2013).

## Disclosures

The authors declare no conflicts of interest.

## Data Availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

## REFERENCES

**1. **B. Ellerbroek and T. Rhoadarmer, “Adaptive wavefront control
algorithms for closed loop adaptive optics,”
Math. Comput. Modell. **33**,
145–158 (2001). [CrossRef]

**2. **B. L. Ellerbroek, “Efficient computation of
minimum-variance wave-front reconstructors with sparse matrix
techniques,” J. Opt. Soc. Am. A **19**, 1803–1816
(2002). [CrossRef]

**3. **C. Kulcsár, H.-F. Raynaud, C. Petit, J.-M. Conan, and P. V. De Lesegno, “Optimal control, observers
and integrators in adaptive optics,” Opt.
Express **14**,
7464–7476 (2006). [CrossRef]

**4. **B. Le Roux, J.-M. Conan, C. Kulcsár, H.-F. Raynaud, L. M. Mugnier, and T. Fusco, “Optimal control law for
classical and multiconjugate adaptive optics,”
J. Opt. Soc. Am. A **21**,
1261–1276 (2004). [CrossRef]

**5. **G. Sivo, C. Kulcsár, J.-M. Conan, H.-F. Raynaud, É. Gendron, A. Basden, F. Vidal, T. Morris, S. Meimon, C. Petit, D. Gratadour, O. Martin, Z. Hubert, A. Sevin, D. Perret, F. Chemla, G. Rousset, N. Dipper, G. Talbot, E. Younger, R. Myers, D. Henry, S. Todd, D. Atkinson, C. Dickson, and A. Longmore, “First on-sky SCAO validation
of full LQG control with vibration mitigation on the CANARY
pathfinder,” Opt. Express **22**, 23565–23591
(2014). [CrossRef]

**6. **A. Beghi, A. Cenedese, and A. Masiero, “On the computation of Kalman
gain in large adaptive optics systems,” in
*21st Mediterranean Conference on Control and
Automation* (IEEE,
2013),
pp. 1374–1379.

**7. **C. Correia, J.-M. Conan, C. Kulcsár, H.-F. Raynaud, and C. Petit, “Adapting optimal LQG methods
to ELT-sized AO systems,” in *1st AO4ELT
Conference-Adaptive Optics for Extremely Large Telescopes*
(EDP Sciences, 2010), p.
07003.

**8. **P. Massioni, H.-F. Raynaud, C. Kulcsár, and J.-M. Conan, “An approximation of the
Riccati equation in large-scale systems with application to adaptive
optics,” IEEE Trans. Control Syst.
Technol. **23**,
479–487 (2014). [CrossRef]

**9. **P. Massioni, L. Gilles, and B. Ellerbroek, “Adaptive distributed Kalman
filtering with wind estimation for astronomical adaptive
optics,” J. Opt. Soc. Am. A **32**, 2353–2364
(2015). [CrossRef]

**10. **M. Gray, C. Petit, S. Rodionov, M. Bocquet, L. Bertino, M. Ferrari, and T. Fusco, “Local ensemble transform
Kalman filter, a fast non-stationary control law for adaptive optics
on ELTs: theoretical aspects and first simulation
results,” Opt. Express **22**, 20894–20913
(2014). [CrossRef]

**11. **R. Mehra, “Approaches to adaptive
filtering,” IEEE Trans. Autom.
Control **17**,
693–698 (1972). [CrossRef]

**12. **J. Duník, O. Straka, O. Kost, and J. Havlík, “Noise covariance matrices in
state-space models: a survey and comparison of estimation methods.
Part I,” Int. J. Adapt. Control Signal
Process. **31**,
1505–1543
(2017). [CrossRef]

**13. **K. Hinnen, M. Verhaegen, and N. Doelman, “A data-driven
${{\cal H}_2}$-optimal control approach for
adaptive optics,” IEEE Trans. Control Syst.
Technol. **16**,
381–395 (2008). [CrossRef]

**14. **B. Sinquin, L. Prengere, C. Kulcsár, H.-F. Raynaud, E. Gendron, J. Osborn, A. Basden, J.-M. Conan, N. Bharmal, L. Bardou, L. Staykov, T. Morris, T. Buey, F. Chemla, and M. Cohen, “On-sky results for adaptive
optics control with data-driven models on low-order
modes,” Mon. Not. R. Astron. Soc. **498**, 3228–3240
(2020). [CrossRef]

**15. **B. Sinquin and M. Verhaegen, “Tensor-based predictive
control for extremely large-scale single conjugate adaptive
optics,” J. Opt. Soc. Am. A **35**, 1612–1626
(2018). [CrossRef]

**16. **R. Fraanje, J. Rice, M. Verhaegen, and N. Doelman, “Fast reconstruction and
prediction of frozen flow turbulence based on structured Kalman
filtering,” J. Opt. Soc. Am. A **27**, A235–A245
(2010). [CrossRef]

**17. **C. Yu and M. Verhaegen, “Structured modeling and
control of adaptive optics systems,” IEEE
Trans. Control Syst. Technol. **26**,
664–674 (2017). [CrossRef]

**18. **P. Piscaer, “Sparse VARX model identification for
large-scale adaptive optics,” Master’s thesis
(Delft University of Technology,
2016).

**19. **M. Rosensteiner, “Wavefront reconstruction for
extremely large telescopes via CuRe with domain
decomposition,” J. Opt. Soc. Am. A **29**, 2328–2336
(2012). [CrossRef]

**20. **L. Gilles, C. R. Vogel, and B. L. Ellerbroek, “Multigrid preconditioned
conjugate-gradient method for large-scale wave-front
reconstruction,” J. Opt. Soc. Am. A **19**, 1817–1822
(2002). [CrossRef]

**21. **D. L. Fried, “Adaptive optics wave function
reconstruction and phase unwrapping when branch points are
present,” Opt. Commun. **200**, 43–72
(2001). [CrossRef]

**22. **J. Juang, C. Chen, and M. Phan, “Estimation of Kalman filter
gain from output residuals,” J. Guid. Control
Dyn. **16**,
903–908 (1993). [CrossRef]

**23. **F. Roddier, *Adaptive Optics in Astronomy*
(Cambridge University,
1999).

**24. **M. Verhaegen and V. Verdult, *Filtering and System Identification:
A Least Squares Approach* (Cambridge
University, 2007).

**25. **G. I. Taylor, “The spectrum of
turbulence,” Proc. R. Soc. London A **164**, 476–490
(1938). [CrossRef]

**26. **F. Assémat, R. W. Wilson, and E. Gendron, “Method for simulating
infinitely long and non stationary phase screens with optimized memory
storage,” Opt. Express **14**, 988–999
(2006). [CrossRef]

**27. **P. Cerqueira, “Data-driven filtering for
large-scale adaptive optics,” Master’s thesis
(Delft University of Technology,
2020).

**28. **A. H. Jazwinski, “Adaptive
filtering,” Automatica **5**, 475–485
(1969). [CrossRef]

**29. **T. A. Davis, “Algorithm 915, SuiteSparseQR:
multifrontal multithreaded rank-revealing sparse QR
factorization,” ACM Trans. Math.
Softw. **38**, 8
(2011). [CrossRef]

**30. **R. Conan and C. Correia, “Object-oriented Matlab
adaptive optics toolbox,” Proc. SPIE **9148**, 2066–2082
(2014). [CrossRef]