Centroid computation for Shack-Hartmann wavefront sensor in extreme situations based on artificial neural networks

Ziqiang Li; Xinyang Li

doi:10.1364/OE.26.031675

1. Introduction

Adaptive optics (AO) systems work successfully in many areas by estimating and correcting for the phase aberration. Normally the derivative of the phase is measured by a Shack-Hartmann wavefront sensor (SHWFS) to reconstruct the phase or to calculate voltages of a deformable mirror directly [1–4]. In order to obtain a better performance in AO systems, researchers have proposed many algorithms to improve the accuracy of phase measurement. These methods can be roughly classified as advanced Correlation methods and advanced Center of Gravity (CoG) methods. Poyneer found that the periodic correlation algorithm, which employed fast Fourier transforms, worked well for extended scene [5]. Y. Wang et al. presented a shift estimation algorithm based on the gradient cross-correlation, and this algorithm well solved the wraparound effect existing in the periodic correlation [6]. However, it must be noted that a disadvantage of using correlation methods is the extra computational cost. In this paper we are concerned with how to measure the phase in a real-time system using a point source. Therefore, CoG algorithm and its variations are better than correlation methods. Advanced CoG methods include threshold CoG (TCoG), weighted CoG (WCoG) and Windowing [7–11]. TCoG methods such as $T_{m} C o G$ (using m% of the maximum intensity of spot as threshold) and $T_{k} C o G$ (using $μ_{N} + k σ_{N}$ as the threshold, $μ_{N}$ and $σ_{N}$ are the mean value and deviation of back noise) are two kinds of optimum threshold selection methods. They can achieve pretty high accuracy with priori information of noise. The WCoG method exists in two versions. The weight can either be fixed, or recentered on the spot (“following weight”), in a manner similar to thresholding. Windowing and WCoG are quite similar because latter can be seen as a square weighted WCoG, which means that both methods deeply rely on priori information of spot position. Recently, Kong et al. proposed a centroid estimator based on stream processing. This stream-based CoG (SCoG) identifies the size of useful signal within the subaperture on the detector to achieve a high accuracy on centroid calculation [12,13]. The SCoG algorithm is very similar to the traditional CoG method except that a floating CoG window is applied to each incoming pixel with its surrounding pixels, the floating CoG window can be selected to match the spot size without cutting off useful signal pixels. Due to the boundary effect and noise, the SCoG algorithm may misidentify fake spots as potential centroids. Hence, a threshold must be applied to the local sum of pixels within the CoG window to eliminate those fake spots. However, in extreme situations, AO systems have to work under strong noise and weak signal conditions. Besides, there may exist strong interference of environment light. Fake spots may be brighter than real spots, and if the spot is not detected, centroid calculation with such methods is completely wrong. Thus, the most crucial thing in extreme situations is spot detection [14–16].

Meanwhile, artificial intelligence (AI) techniques including artificial neural networks (ANNs) have achieved a great success, especially in computer vision. Therefore, we try to solve the problem above by using this technique. In fact, ANNs have been used in AO systems since the mid-1990s. Barrett and Sandler used ANNs to estimate the static aberration in the Hubble Space Telescope primary mirror [17]. Jorgenson and Aitken used ANNs to predict the tip, tilt and piston components of wavefront distortions [18]. Hart and McGuire also attempted to predict the output from the wavefront sensor based on immediate past history [19]. Kendrick et al. used the General Regression Neural Network to calculate the wavefront errors based on a phase-diversity wavefront sensor [20]. Montera et al. used ANNs to estimate key parameters such as the Fried coherence length, the wind-speed profile and the noise level of a wavefront sensor, the ANNs were also used for wavefront sensor slope measurement in their work [21,22]. Weddell and Webb used the echo state network to predict the space-varying point spread function [23]. Guo et al. presented a method using ANNs to reconstruct the wavefront from spot displacements measured by a SHWFS in order to replace the least square fit and singular value decomposition (SVD) method [24]. This algorithm relies on the output of a SHWFS which is exactly what we are focusing on. In this paper, we do not concern about wavefront prediction or wavefront reconstruction that have been studied above. Instead, we use ANNs to detect and locate spot position in extreme situations where traditional methods are invalid. Then we calculate the centroid based on the spot center which has been detected before. Unlike traditional object detection tasks such as cars detection and faces detection [25,26], spots in SHWFS do not have as many as features that cars or faces have. Besides, spot detection should be as quick as possible because time delay deceases the performance of real-time AO system. Therefore, we use fully-connected neural networks with one hidden layer instead of using convolution neural networks or very deep fully-connected neural networks just as other researchers do to deal with natural object detection problems.

The paper is organized in the following way: In Section 2 we analyze the centroid computation task for SHWFS and introduce related work that will be compared with our method. We then transform the spot detection problem into a classification problem and build a kind of ANNs called SHWFS-Neural Networks (SHNNs) in Section 3. Architectures of SHNNs are explained including details in training process. In Section 4, we do simulations and experiments to validate the algorithm, followed by the conclusion.

2. Centroid computation models

2.1 Spot model and error analysis

Throughout the paper, we consider only one subaperture of a SHWFS, and calculate centroid separately. The most common spot model for point source imaging is the two-dimensional Gaussian function, as shown in Eq. (1), where $(x_{0}, y_{0})$ is the true centroid position, $N_{p h}$ (ADU) represents the total energy of the spot, and $σ_{s p o t}$ is the equivalent Gauss width and equals 1.7 in this paper.

I (x, y) = \frac{N_{p h}}{2 π σ_{s p o t}^{2}} \exp [- \frac{{(x - x_{0})}^{2} + {(y - y_{0})}^{2}}{2 σ_{s p o t}^{2}}]

As real spot images are always degraded by noise from both detectors and environment light, the main noise sources from CCD can be concluded as follows [10]: Read out noise; Background level; Background photon noise; Signal photon noise; Discrete sampling error; Nonlinear reaction error, and so on. Among all these noises, some of them obey Gaussian distribution while some others obey Poisson distribution, which can also be considered to obey Gaussian distribution with average value equaling to the variance, as long as the average photon number of Poisson distributed signal is greater than ten on each pixel. That is to say, supposing the background light is flat, the sources of noise can be approximated and integrated to a Gaussian distribution noise: $N_{G} ~ N (\bar{N_{G}}, σ_{G}^{2})$ .

However, in some situations such as strong interference of environment light, the background light cannot be considered as flat but may be a ramp interference (Fig. 1(a)). The ramp interference $N_{S}$ can be expressed as:

Fig. 1 (a) Simulated data with Gaussian distribution noise and ramp interference. (b) Simulated data with Gaussian distribution noise and vertical/horizontal Gaussian interference.

Download Full Size | PDF

N_{S} = A_{S} (r_{x} \cdot x + r_{y} \cdot y)

Where $r_{x}$ and $r_{y}$ are random coefficients between −0.5 and 0.5, which represent the horizontal and vertical noise ratios respectively, while $A_{S}$ stands for the maximum noise intensity.

Another possible noise is a kind of vertical/horizontal Gaussian interference (Fig. 1(b)):

N_{v} = A_{v} \exp [- \frac{{(x - x_{v})}^{2}}{2 σ_{n v}^{2}}]

N_{h} = A_{h} \exp [- \frac{{(y - y_{h})}^{2}}{2 σ_{n h}^{2}}]

Where $N_{v}$ and $N_{h}$ represent vertical/horizontal Gaussian interference respectively, $A_{v}$ and $A_{h}$ are the maximum intensity, $x_{v}$ / $y_{h}$ means the center position, and $σ_{n v}$ / $σ_{n h}$ is the equivalent Gauss width. The vertical/horizontal Gaussian interference may not appear in a subaperture simultaneously. Besides, the ramp interference and the vertical/horizontal Gaussian interference will not happen at the same time. Thus, the total noise $N_{t o t a l}$ is the summary of random Gaussian distribution noise, the ramp interference or vertical/horizontal Gaussian interference.

In order to express how strong the noise is in different situations, a proper SNR calculation criterion must be chosen. $S N R_{p}$ (using the peak intensity value of the spot, $I_{p}$ ) is a valid measurement and has been broadly used to measure noise conditions:

S N R_{p} = \frac{I_{p}}{σ_{t o t a l}}

_{$σ_{t o t a l}$} is the standard deviation of total noise, and it is worth noting that $I_{p}$ here does not always equal to but may be smaller than $\frac{N_{p h}}{2 π σ_{s p o t}^{2}}$ because of the discrete sampling of CCD.

If there exists environment light interference, $S N R_{p}$ is not sufficient. We define a relative interference power to measure how strong the interference is comparing with signal.

P o w e r = \frac{E_{N}}{E_{S}}

_{$E_{N}$} and $E_{S}$ are total energy of interference and signal respectively. Noting that when we calculate $S N R_{p}$ , we just calculate the Gaussian distribution noise and interference is not included.

In order to compare performances among different methods, Centroid Estimation Error (CEE) is proposed to measure the distance between the theoretical spot center position and the calculated center position, as shown in Eq. (7):

C E E = \sqrt{{(x_{c} - x_{0})}^{2} + {(y_{c} - y_{0})}^{2}}

Where $(x_{c}, y_{c})$ is the calculated centroid and $(x_{0}, y_{0})$ is the theoretical centroid.

In relatively low $S N R_{p}$ situations or there exists environment light interference, CEE may be very large and the computed center may be trustless. So, False Rate is proposed in order to evaluate the reliability of algorithms, we assume that when $C E E > σ_{s p o t}$ , the result is judged to be false and the algorithm fails in this case.

2.2 Related work in centroid computation

The CoG is the simplest and most direct way to calculate the position of a symmetric spot, and the formula is following [27]:

{\begin{matrix} x_{i} = \frac{\sum_{m = 1}^{M} \sum_{n = 1}^{N} x_{n m} I_{n m}}{\sum_{m = 1}^{M} \sum_{n = 1}^{N} I_{n m}} \\ y_{i} = \frac{\sum_{m = 1}^{M} \sum_{n = 1}^{N} y_{n m} I_{n m}}{\sum_{m = 1}^{M} \sum_{n = 1}^{N} I_{n m}} \end{matrix}

Where $(x_{i}, y_{i})$ is the computed centroid of $i_{t h}$ subaperture, $(x_{n m}, y_{n m})$ is the coordinate of pixels and $I_{n m}$ is the relating intensity. Due to the poor performance of CoG in low SNR situations, some improved algorithms have been studied.

Threshold CoG methods estimate a threshold and set pixels' intensity smaller than the threshold to zero to reduce some noise. But how to choose the threshold is a big problem. Some researchers have made detailed studies on the selection of thresholds. $T_{m} C o G$ assumes the spot is the brightest, it first searches the maximum intensity, and then use m% of the maximum intensity as threshold [8]:

T_{m} = μ_{n} + (I_{m} - μ_{n}) \frac{m}{100}

Where $μ_{n}$ is the mean of noise, $I_{m}$ is the maximum of the whole image which is supposed to be the maximum intensity of spot under noisy condition. Lardiere et al. first mentioned this method [9]. However, there is no theory to direct how to choose m to get the optimum threshold. Li et al. analyzed this problem and provided an empirical equation by fitting the curve of optimum m versus $S N R_{p}$ [8]. Simulations show that m can be set to 90 if $S N R_{p} < 4$ , and 10 when $S N R_{p} > 30$ . Otherwise, m can be calculated from the following empirical equation:

m = \frac{368.5}{S N R_{p}}

Where 368.5 is an empirical constant.

WCoG and Windowing algorithms deeply rely on the spot center pixel selection. The idea of the WCoG is to give weight to different pixels depending on their flux - a kind of “soft” threshold. The contribution of the noisy pixels with very little signal - outside the core of the spot - is attenuated but not eliminated, expressed as follows [7]:

{\begin{matrix} x_{i} = \frac{\sum_{m = 1}^{M} \sum_{n = 1}^{N} W_{n m} \cdot x_{n m} I_{n m}}{\sum_{m = 1}^{M} \sum_{n = 1}^{N} W_{n m} \cdot I_{n m}} \\ y_{i} = \frac{\sum_{m = 1}^{M} \sum_{n = 1}^{N} W_{n m} \cdot y_{n m} I_{n m}}{\sum_{m = 1}^{M} \sum_{n = 1}^{N} W_{n m} \cdot I_{n m}} \end{matrix}

The Gaussian model is often chosen to be the weight function because the intensity of spots often obeys Gaussian distribution. Windowing can be seen as a kind of WCoG with a rectangular or some other “hard” shape weight functions. Weights are ones for pixels in the window and zeros out the window. The most important thing is to find the spot center.

Nicolle et al. reported that in the case of Gaussian noise statistics, the WCoG with Gaussian weight is the maximum likelihood estimate if the spot center is well chosen [28]. In practice, the weight function center is always picked as the brightest pixel, which obviously can be a mistake in low SNR and background light interference situations. Some schemes just pick the aperture center to be the weight function center (fixed weight WCoG), this may have advantages in closed-loop AO systems, because spots are often near the aperture center. But it is invalid in open-loop AO systems and large dynamic range situations.

3. Artificial neural networks for centroid computation

3.1 Architectures of traditional methods in the perspective of ANNs

A neural network is a model of machine learning that utilizes successive layer of neurons to compute an approximate mapping from an input space to an output space, as shown in Fig. 2(a). Each neuron computes a weighted sum of the output vector from the previous layer, then optionally applies an activation function, and finally outputs the value, as shown in Fig. 2(b).

Fig. 2 (a) ANNs with one hidden layer and one output. (b) The basic operation and function of a neuron. The bias (or threshold) $b$ is very important, but we will no longer note that in following Figs. just for convenience.

Download Full Size | PDF

Y = δ (\sum x_{j} w_{j} + b)

In fact, the CoG algorithm can be expressed by an ANN. Suppose the size of a subaperture image is 25 × 25 pixels, and CoG method is explained by Eq. (8), it can also be expressed as a computation graphs as follows (Fig. 3).

Fig. 3 CoG algorithm expressed by ANN. (a) Computation graph of coordinate X. (b) Computation graph of coordinate Y.

Download Full Size | PDF

When computing coordinate $X$ , layer 1 is calculating $\sum_{n = 1}^{25} x_{n m} I_{n m}$ , and m is the sequence number of neurons in layer 1, varies from 1 to 25. Horizontal pixel coordinates $x_{n m}$ in each column play a role of weights while intensities $I_{n m}$ play a role of input data. Thus, weights in layer 1 are as follows:

\begin{array}{l} w_{1, 1}^{[1]} ... w_{25, 1}^{[1]} = 1 \\ w_{26, 2}^{[1]} ... w_{50, 2}^{[1]} = 2 \\ ... \\ w_{n, m}^{[1]} = m \\ w_{601, 25}^{[1]} ... w_{625, 25}^{[1]} = 25 \end{array}

There is no bias $b$ in the network, and obviously the activation function is a constant-one. Computations in layer 2 is shown as follows, weights in layer 2 are ones, and the activation function is the constant $\sum_{m = 1}^{25} \sum_{n = 1}^{25} I_{n m}$ .

z = \sum_{m = 1}^{25} \sum_{n = 1}^{25} x_{n m} I_{n m}

y = \frac{z}{\sum_{m = 1}^{25} \sum_{n = 1}^{25} I_{n m}}

The process of computing coordinate $Y$ is almost the same as $X$ , except weights in layer 1 are:

\begin{array}{l} w_{1, 1}^{[1]} ... w_{25, 1}^{[1]} = 1, 2, ..., 25 \\ w_{26, 2}^{[1]} ... w_{50, 2}^{[1]} = 1, 2, ..., 25 \\ ... \\ w_{n, m}^{[1]} = n \\ w_{601, 25}^{[1]} ... w_{625, 25}^{[1]} = 1, 2, ..., 25 \end{array}

Combine these two computation graphs into one, weights in layer 1 are the same with corresponding above, those that did not mentioned are all zeros. The architecture is shown in Fig. 4:

Fig. 4 CoG algorithm expressed by ANN.

Download Full Size | PDF

Not only CoG method can be expressed as a weights-fixed network, but also other traditional methods can. Fixed weight WCoG raises some weights and lower some others respectively on the basis of CoG ANN. All other methods can be seen as a special kind of following weight WCoG whose weights are different for different inputs. For example, Windowing sets some weights to zeros according to geometry relationship, while TCoG sets some weights to zeros according to intensity. Those weights are all determined by human priori information, by using AI techniques, we can just let the data talk and build ANNs to learn weights themselves.

3.2 Approximation capabilities of ANNs

In order to build ANNs to learn weights themselves, we should insure that ANNs have the ability to approximate the “right” function at first. In the mathematical theory of ANNs, the universal approximation theorem states that a feed-forward network with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of $R^{n}$ , under mild assumptions on the activation function. Many researchers have worked on it [29–31]. Leshno et al. relaxed those limits and the universal approximation theorem can be achieved if and only if the network's activation function is not a polynomial [31].

Leshno Theorem: Let $δ \in M$ . Set

\sum_{n} = s p a n {δ (w \cdot x + b) : w \in R^{n}, b \in R}

Then $\sum_{n}$ is dense in $C (R^{n})$ if and only if $δ$ is not an algebraic polynomial.

Where $δ$ is the activation function. M denotes the set of functions that satisfies the following property: The closure of the set of points of discontinuity of any function in M is of zero Lebesgue measure. x denotes an input vector. w denotes a vector of weights. b denotes a threshold value. R denotes the set of all real numbers. Rⁿ denotes an n-dimensional vector space over the field of the real numbers. C(Rⁿ) means the compact subsets of Rⁿ and can be seen as the family of “real world” functions that one may wish to approximate. The span of a set of vectors in a vector space is the intersection of all linear subspaces which each contain every vector in that set.

Results above promised that a network with one hidden layer, as shown in Fig. 2, can approximate any function arbitrarily well using rectified linear unit (ReLU) activation function below. Although it does not touch upon the algorithmic learnability of those parameters, it is the base of ANNs and can explain why our algorithm works.

R e L u (t) = \max (0, t) = {\begin{matrix} t a s t > 0 \\ 0 a s t < 0 \end{matrix}

3.3 Transforming the spot detection problem into a classification problem

As discussed before, CoG and improved CoG methods can be seen as a regression problem using ANNs, which input subaperture images and output center coordinates. But the loss function of a regression problems is the mean square error (MSE) rather than cross-entropy, which is more trainable than MSE. Besides, comparing with centroid calculation, spot detection is more important. And if we label the center pixel by only one number rather than a vector $(x, y)$ , we can use the softmax algorithm and transform the regression problem into a classification problem.

Figure 5 shows the architecture of a classification network for spot detection. We label the pixel number from left to right and top to bottom, that is, we have 625 classes in total because every pixel is possible to be the center of the spot. Instead using one neuron whose value varies from 1 to 625 to represent all these classes, we use 625 neurons. There is only one neuron valued as one while others are all zeros in each output, thus the order of that non-zero valued neuron is the class number and can locate the center of spot. We call this architecture with 50 hidden layer neurons SHNN-50.

Fig. 5 A classification network similar to CoG method for spot detection.

Download Full Size | PDF

This feed-forward process can be expressed as follows:

z = X \cdot W_{1} + b_{1}

a = \max (z, 0)

Y = a \cdot W_{2} + b_{2}

Where $X$ is a 1 by 625 vector reshaped from the 25 by 25 image; $W_{1}$ is a 625 by 50 matrix; $b_{1}$ is a 1 by 50 vector; $W_{2}$ is a 50 by 625 matrix and $b_{2}$ is a 1 by 625 vector.

In training process, a kind of multi-classifiers called softmax is needed. It can be seen as an activation function in output layer, which gets the probability of spot center location. Softmax is expressed as follows:

Y_{p r e d i c t,}_{i} = g (Y_{i}) = \frac{e^{Y_{i}}}{\sum_{j = 1}^{n} e^{Y_{j}}}

We then calculate the loss function using cross-entropy. In order to prevent overfitting, we add a regularization as well:

L o s s = J (Y_{l a b e l}, Y_{p r e d i c t}) = - \sum Y_{l a b e l} \cdot \log Y_{p r e d i c t} + λ \cdot {\sum ‖ w ‖}^{2}

Where $Y_{l a b e l}$ is the label, $Y_{p r e d i c t}$ is the output of softmax, and $λ$ is the regularization coefficient.

Back propagation uses gradient descent algorithm to minimize the loss. There are several advanced gradient descent algorithms such as Momentum, RMSProp and Adam. We focus more on the effectiveness of ANNs applied in SHWFS centroid computation field than the convergence performance of back propagation algorithms. Thus, we just use a normal gradient descent algorithm and more details about advanced algorithms are beyond the topic of this paper.

Selection of the number of hidden layer neurons is almost empirical. Leshno's theory demonstrated that shallow networks with a broad family of non-polynomial activation functions have universal approximation properties. It specified that a sufficiently wide rectifier network could approximate any function, but no instruction of how wide it should be. In fact, there is no need to approximate any function, we just need to make sure that the network has the ability to find out the spot. Larger ANNs can deal with more complicated noise situations, but too many neurons will take more computations and may cause overfitting. If we increase the number of hidden layer neurons from 50 to 900, we get a more powerful model called SHNN-900 with the cost of consuming more computing resources (Fig. 6). Experiments between those two architectures are shown in Section 4.

Fig. 6 An architecture with 900 hidden layer neurons.

Download Full Size | PDF

Once the network is trained successfully, it can be used in real applications with a post-processing to achieve subpixel accuracy. Feed-forward process gets the spot center position firstly, then we select a 3 by 3 window around the spot center similar to the Windowing method, at last we sort these 9 pixels by intensity in the window to pick up five brightest pixels to calculate centroid.

3.4 SHNN and space transformation

SHNNs are powerful in spot detection tasks which will be shown in Section 4. In fact, this ANN-based algorithm can be seen as a kind of space transformation like Fourier transform and wavelet transform. The brightest pixel selection algorithm performs well if the spot center has the largest intensity. In normal situations, it actually satisfies that assumption. However, the brightest pixel selection algorithm becomes naive when noise or interference make some other pixel(s) brighter rather than spot center. Our algorithm, as we can see in the in Fig. 7(a)-7(c), indeed, takes a space transformation to make the brightest pixel algorithm valid again. In other words, our algorithm learns features such as shape of spot, intensity and so on, through tens of thousands of samples with different noise and interference conditions, to generate a new space by adjusting parameters $W$ and $b$ , after that, it represents input image in that new space and then project it back in some proper way, at this moment the spot center pixel will be the brightest, and what we should do is just picking the brightest pixel to be the spot center. Unlike frequency domain and wavelet domain generated by Fourier transform and wavelet transform, the new space generated by SHNN does not rely on human priori information and expressed by specific equations, it is learned from data directly.

Fig. 7 (a) The original image. (b) Network output. (c) The spot center algorithm found.

Download Full Size | PDF

4. Simulations and experiments

4.1 Training process

Take SHNN-50 for example, we randomly choose 200 pictures from the training set as a batch and use batch gradient descent algorithm to train the neural network for 50000 steps, the Loss decrease from 6.630642 to 2.386451. We then randomly choose 200 pictures from the test set to compute the 3 by 3 accuracy, which means if the predicted spot center is in the 3 by 3 window around the real spot center, we count the result correct. The accuracy increases from 1.39% to 95.93%, as shown below (Fig. 8).

Fig. 8 Accuracy and Loss in training process.

Download Full Size | PDF

4.2 Accuracy comparison in strong noise situations

We use Monte Carlo method to do simulations and compare performances. In [8], the $T_{m} C o G$ is the best optimum threshold selection method. Owing to the fact that simulation is controllable, we can choose the optimal parameters for $T_{m} C o G$ , which cannot be realized in the real system. Performances of CoG, Windowing, $T_{m} C o G$ , SHNN-50, and SHNN-900 are compared by generating 100 pictures each $S N R_{p}$ and calculate the mean CEE. To have an intuition of extremely noisy situations, pictures are shown below, just for example. Figs. 9(a)–9(d) are images when $S N R_{p}$ is 1,3,5,7, respectively. Figure 10 shows the CEE of different algorithms with different $S N R_{p}$ that varies from 1 to 10. In these cases, False Rate concerns most. Thus, False Rate of different $S N R_{p}$ is also shown in Fig. 11 and Table 1. Results show that our method is more reliable than any other methods.

Table 1. False Rate of different methods in low SNR situations.

View Table | View all tables in this article

Fig. 9 Low SNR subaperture images.

Download Full Size | PDF

Fig. 10 CEE of different methods in low SNR situations.

Download Full Size | PDF

Fig. 11 False Rate of different methods in low SNR situations.

Download Full Size | PDF

It is obvious that SHNNs perform much better in extreme low SNR situations. For example, when $S N R_{p}$ is 3, False Rate of SHNN-50 is 6%, and that of SHNN-900 is 0%, while the best result of traditional methods is 26%.

4.3 Accuracy comparison in background interference situations

We lead interference into pictures. In order to highlight the impact of interference, the Gaussian noise $S N R_{p}$ is fixed to 10,000. We analyze ramp interference (Fig. 12) and vertical/horizontal Gaussian interference (Fig. 13), respectively. False Rate varies from relative interference power, as shown below.

Fig. 12 False Rate with different power of ramp interference.

Download Full Size | PDF

The result shows that our ANN-based algorithm is the most reliable method. When there exists ramp interference, the other three traditional methods are invalid when the relative interference power is larger than two. When there exists vertical/horizontal Gaussian interference, Windowing performs good until relative interference power is larger than seven. That is because the maximum intensity of vertical/horizontal Gaussian interference may not exceed the maximum intensity of spot. Thus, this kind of interference has less impact on Windowing algorithm than ramp interference. We also notice that except in Fig. 13 the performance of SHNN-50 coincide with that of SHNN-900, SHNN-900 always performs better than SHNN-50. That is because the ability of generalization is related to number of neurons in hidden layer, which should be chose according to the concrete situation based on noise conditions.

Fig. 13 False Rate with different power of vertical/horizontal interference.

Download Full Size | PDF

4.4 Experimental validation

To test and verify our method, we gather real data from a SHWFS with different level of environment light interference as shown in Fig. 14(a), and use SHNNs to calculate the centroid. Although we don't know where the real centroid is, we can assume that the computed centroid in ideal experiment conditions can represent the original wavefront. In such situations the SNR is high and have no interference. After that, environment light interference is led into the experiment environment, meanwhile, the wavefront to be measured remains fixed. Images of two subapertures with centroid computation are shown in Fig. 14(b) just for example.

Fig. 14 (a) Real data with different level of environment light interference. (b) Subaperture images with calculated centroids. The cross stands for SHNN-50 result, and the triangle stands for TCoG result.

Download Full Size | PDF

The reconstructed wavefront and residual wavefront are shown in Fig. 15(a)-15(g), and Table 2 compares mean CEE, total False Rate, Peak-Valley value (PV) and Root Mean Square (RMS) of residual wavefront. Results show that SHNN-50 and SHNN-900 are much better than CoG method.

Fig. 15 (a) Original wavefront. (b) Reconstructed wavefront in interfered situation by using SHNN-50. (c) Reconstructed wavefront in interfered situation by using SHNN-900. (d) Reconstructed wavefront in interfered situation by using TCoG. (e) Residual wavefront by using SHNN-50. (f) Residual wavefront by using SHNN-900. (g) Residual wavefront by using TCoG.

Download Full Size | PDF

Table 2. Experiments of TCoG, SHNN-50 and SHNN-900.

View Table | View all tables in this article

5. Conclusion

In this paper, we have reviewed previous work on wavefront sensing and the challenges of traditional methods. Because correlation algorithms require extra computational cost, CoG algorithm and its variations are most reliable in real-time AO systems and are widely used in point source SHWFS. However, traditional CoG methods such as TCoG and Windowing cannot work well in extreme situations. For example, the threshold in TCoG is invalid when the background light interference is brighter than the real spot. Since ANNs have been used in AO systems such as wavefront prediction successfully, we presented a new case for ANNs used in SHWFS to make centroid computation more robust under extreme conditions. By transforming the regression problem into a classification problem, the proposed method called SHNN is trainable for spot detection task.

Simulations show that the proposed algorithm performs better than other methods in extreme low SNR situations. Besides, when there exists strong interference of environment light, ramp interference in particular, all methods are nearly invalid except SHNNs. Impact of number of neurons in hidden layer has also been studied. Although SHNN-50 cannot achieve as high as accuracy that SHNN-900 achieves, it is still better than traditional methods. We did experiments to test our algorithm. Performances of three different methods for centroid calculation were evaluated. Results show that our algorithms are much more accurate than the optimum TCoG algorithm in extreme situations. We then reconstructed the wavefront based on the same SVD method and compared the residual wavefront. It is obvious that both RMS and PV of residual wavefront are much smaller by using SHNNs, which means this method can be used in AO systems under extreme conditions.

Funding

National Natural Science Foundation of China (61675205 and 61505215).

Acknowledgments

We gratefully acknowledge the comments and suggestions of Linhai Huang, Institute of Optics and Electronics, Chinese Academy of Sciences.

References

1. J. Ares, T. Mancebo, and S. Bará, “Position and displacement sensing with Shack-Hartmann wave-front sensors,” Appl. Opt. 39(10), 1511–1520 (2000). [CrossRef] [PubMed]

2. C. Rao, W. Jiang, and N. Ling, “Atmospheric characterization with Shack-Hartmann wavefront sensors for non-Kolmogorov turbulence,” Opt. Eng. 41(2), 534–541 (2002). [CrossRef]

3. D. Dayton, S. Sandven, J. Gonglewski, S. Browne, S. Rogers, and S. McDermott, “Adaptive optics using a liquid crystal phase modulator in conjunction with a Shack-Hartmann wave front sensor and zonal control algorithm,” Opt. Express 1(11), 338–346 (1997). [CrossRef] [PubMed]

4. F. Rigaut, B. L. Ellerbroek, and M. J. Northcott, “Comparison of curvature-based and Shack-Hartmann-based adaptive optics for the Gemini telescope,” Appl. Opt. 36(13), 2856–2868 (1997). [CrossRef] [PubMed]

5. L. A. Poyneer, “Scene-based Shack-Hartmann wave-front sensing: analysis and simulation,” Appl. Opt. 42(29), 5807–5815 (2003). [CrossRef] [PubMed]

6. Y. Wang, X. Chen, Z. Cao, X. Zhang, C. Liu, and Q. Mu, “Gradient cross-correlation algorithm for scene-based Shack-Hartmann wavefront sensing,” Opt. Express 26(13), 17549–17562 (2018). [CrossRef] [PubMed]

7. S. Thomas, T. Fusco, A. Tokovinin, M. Nicolle, V. Michau, and G. Rousset, “Comparison of centroid computation algorithms in a Shack–Hartmann sensor,” Mon. Not. R. Astron. Soc. 371(1), 323–336 (2006). [CrossRef]

8. X. Li, X. Li, and C. Wang, “Optimum threshold selection method of centroid computation for Gaussian spot,” Proc. SPIE 9675, 967517 (2015). [CrossRef]

9. O. Lardiere, R. Conan, R. Clare, C. Bradley, and N. Hubin, “Compared performance of diﬀerent centroiding algorithms for high–pass filtered laser guide star Shack–Hartmann wavefront sensors,” Proc. SPIE 7736, 773672 (2010).

10. X. Ma, C. Rao, and H. Zheng, “Error analysis of CCD-based point source centroid computation under the background light,” Opt. Express 17(10), 8525–8541 (2009). [CrossRef] [PubMed]

11. C. Leroux and C. Dainty, “Estimation of centroid positions with a matched-filter algorithm: relevance for aberrometry of the eye,” Opt. Express 18(2), 1197–1206 (2010). [CrossRef] [PubMed]

12. F. Kong, M. C. Polo, and A. Lambert, “Centroid estimation for a Shack-Hartmann wavefront sensor based on stream processing,” Appl. Opt. 56(23), 6466–6475 (2017). [CrossRef] [PubMed]

13. M. C. Polo, F. Kong, and A. Lambert, “FPGA implementations of low latency centroiding algorithms for adaptive optics,” in Imaging and Applied Optics 2018 (3D, AO, AIO, COSI, DH, IS, LACSEA, LS&C, MATH, pcAOP), OSA Technical Digest (Optical Society of America, 2018), paper OTh3E.3.

14. J. Vargas, L. González-Fernandez, J. A. Quiroga, and T. Belenguer, “Shack–Hartmann centroid detection method based on high dynamic range imaging and normalization techniques,” Appl. Opt. 49(13), 2409–2416 (2010). [CrossRef]

15. J. Vargas, R. Restrepo, J. C. Estrada, C. O. S. Sorzano, Y. Z. Du, and J. M. Carazo, “Shack-Hartmann centroid detection using the spiral phase transform,” Appl. Opt. 51(30), 7362–7367 (2012). [CrossRef] [PubMed]

16. J. Vargas, R. Restrepo, and T. Belenguer, “Shack-Hartmann spot dislocation map determination using an optical flow method,” Opt. Express 22(2), 1319–1329 (2014). [CrossRef] [PubMed]

17. T. K. Barrett and D. G. Sandler, “Artificial neural network for the determination of Hubble Space Telescope aberration from stellar images,” Appl. Opt. 32(10), 1720–1727 (1993). [CrossRef] [PubMed]

18. M. B. Jorgenson and G. J. M. Aitken, “Neural network prediction of turbulence induced wavefront degradations with applications to adaptive optics,” in Adaptive and Learning Systems, F. A. Sadjadi, ed., Proc. SPIE 1706, 113–121 (1992).

19. M. Lloyd-Hart and P. McGuire, “Spatio-temporal prediction for adaptive optics wavefront reconstructors,” in Proc. European Southern Observatory Conf. on Adaptive Optics, pp. 95–102 (1995).

20. R. L. Kendrick, D. S. Acton, and A. L. Duncan, “Phase-diversity wave-front sensor for imaging systems,” Appl. Opt. 33(27), 6533–6546 (1994). [CrossRef]

21. D. A. Montera, B. M. Welsh, M. C. Roggemann, and D. W. Ruck, “Processing wave-front-sensor slope measurements using artificial neural networks,” Appl. Opt. 35(21), 4238–4251 (1996). [CrossRef] [PubMed]

22. D. A. Montera, B. M. Welsh, M. C. Roggemann, and D. W. Ruck, “Prediction of wave-front sensor slope measurements with artificial neural networks,” Appl. Opt. 36(3), 675–681 (1997). [CrossRef] [PubMed]

23. S. J. Weddell and R. Y. Webb, “Reservoir computing for prediction of the spatially-variant point spread function,” IEEE J. Sel. Top. Signal Process. 2(5), 624–634 (2008). [CrossRef]

24. H. Guo, N. Korablinova, Q. Ren, and J. Bille, “Wavefront reconstruction with artificial neural networks,” Opt. Express 14(14), 6456–6462 (2006). [CrossRef] [PubMed]

25. C. Szegedy, A. Toshev, and D. Erhan, “Deep neural networks for object detection,” in Proceedings of Advances in Neural Information Processing Systems (NIPS, 2013), pp. 2553-2561.

26. S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017). [CrossRef] [PubMed]

27. D. R. Neal, J. Copland, and D. Neal, “Shack-Hartmann wavefront sensor precision and accuracy,” Proc. SPIE 4779(1), 148–160 (2002). [CrossRef]

28. M. Nicolle, T. Fusco, G. Rousset, and V. Michau, “Improvement of Shack-Hartmann wave-front sensor measurement for extreme adaptive optics,” Opt. Lett. 29(23), 2743–2745 (2004). [CrossRef] [PubMed]

29. G. Cybenko, “Approximation by superpositions of a sigmoidal function,” Math. Control Signal Syst. 2(4), 303–314 (1989). [CrossRef]

30. K. Hornik, “Approximation capabilities of multilayer feedforward networks,” Neural Netw. 4(2), 251–257 (1991). [CrossRef]

31. M. Leshno, V. Y. Lin, A. Pinkus, and S. Schocken, “Multilayer feedforward networks with a nonpolynomial activation function can approximate any function,” Neural Netw. 6(6), 861–867 (1993). [CrossRef]

Performances Methods	PV/um	RMS/um	CEE/pixel	False Rate/%
TCoG	2.6593	0.5349	4.5958	95.31
SHNN-50	0.3107	0.0651	0.5250	1.17
SHNN-900	0.1758	0.0383	0.4401	0.39

Performances Methods	PV/um	RMS/um	CEE/pixel	False Rate/%
TCoG	2.6593	0.5349	4.5958	95.31
SHNN-50	0.3107	0.0651	0.5250	1.17
SHNN-900	0.1758	0.0383	0.4401	0.39

Centroid computation for Shack-Hartmann wavefront sensor in extreme situations based on artificial neural networks

Abstract

1. Introduction

2. Centroid computation models

2.1 Spot model and error analysis

2.2 Related work in centroid computation

3. Artificial neural networks for centroid computation

3.1 Architectures of traditional methods in the perspective of ANNs

3.2 Approximation capabilities of ANNs

3.3 Transforming the spot detection problem into a classification problem

3.4 SHNN and space transformation

4. Simulations and experiments

4.1 Training process

4.2 Accuracy comparison in strong noise situations

4.3 Accuracy comparison in background interference situations

4.4 Experimental validation

5. Conclusion

Funding

Acknowledgments

References

Cited By

Figures (15)

Tables (2)

Equations (23)

Optics Express

$S N R_{p}$	False Rate/%
$S N R_{p}$	CoG	Windowing	TmCoG	SHNN-50	SHNN-900
1	98	83	94	73	55
2	97	43	61	28	7
3	97	28	26	6	0
4	98	8	6	1	0
5	97	2	1	1	0