Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Optical image centroid prediction based on machine learning for laser satellite communication

Open Access Open Access

Abstract

Optical image tracing is one of key technologies to realize and maintain satellite-to-ground laser communication. Since machine learning has been proved to be a powerful tool for modeling nonlinear system, a model containing a preprocessing module, a CNN module (Convolutional Neural Network Module) as well as a LSTM module (Long-Short Term Neural Network Memory Module) was developed to process digital images in time series and then predict centroid positions under the influence of atmospheric turbulence. Different from most previous models composed of neural networks, some important physical situations are considered for light fields distributed on CMOS. By building and training this model, centroid positions can be predicted in real time for practical applications in laser satellite communication.

© 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

1. Introduction

Laser communication has been considered as a powerful candidate for satellite communication due to its high-speed and large capacity [1]. There are all kinds of technologies required for laser satellite communication, among which tracing the optical signal is one of the critical. A telescope under automatic control and a camera connected with a computer are usually used for both transmitting and receiving laser signals. The algorithm for tracing the centroid of the optical image is utilized to maintain the stability of the laser link. Post feedback control is more widespread used than others in current experiments on laser satellite communication. Due to the complicity and strong fluctuations of atmosphere turbulence it is always a challenging task to trace optical signals in stability especially in the condition of bad weather. On the other hand, machine learning is an extremely hot topic in recent years. The basic ideas on neural networks were already proposed decades ago, however, researchers did not get great achievement until this decade. This is mainly because of newly developed hardware such as GPU and CPU for high speed calculations. By reasonable designing and well training, a machine learning system can become a very efficient tool for fitting a nonlinear system to find results or make decisions which were unable to achieve by utilizing linear algorithms before [2,3]. Scientists in different areas are still developing machine learning technologies fast. In the area of optics for light propagates through inhomogeneous medium, there is already some interesting work on phase retrieval wavefront sensing by applying convolutional neural networks [4]. In this paper we turn to the field on light propagates in random medium. A model is developed to trace the centroid of optical images for laser satellite communication.

Optical signals from satellites are always fluctuated because of disturbances from atmosphere turbulence as well as oscillations of satellite platforms. Computational fluent dynamics can be applied in aerodynamics to get results as accurate as possible by solving Navier Stokes Equations numerically. However, the numerical calculation consumes too much computing power to get real time results. Statistical theory is utilized for analyzing the scintillation as well as decoherence for light propagating through random medium, with the shortcoming that it is difficult to get reliable results to do some real time predictions for engineering requirements. Machine learning methods have more potentialities to solve these problems. There are both newly developed deep neural networks and some traditional machine learning methods available to do this job. Among the traditional machine learning methods, we choose extended Kalman filter to create labels (in Section 2.1), and probabilistic graphical model to describe our model architecture (in Section 2.5). Besides, some classic computer vision techniques are utilized to obtain optical and kinetic information on light fields (in Section 2.2). As for neural networks, CNN (convolutional neural networks) has strong ability to fit nonlinear system (in Section 2.3), and LSTM is an efficient tool for predictions on signals in time series (in Section 2.4). Hence, we combine CNN with LSTM to get a powerful architecture for the nonlinear random process, light propagates through atmosphere medium.

Though the relation between neural networks and probabilistic graph model goes beyond the main contents of this paper, it should be argued that a neural network’s goal is, in some sense, to estimate the likelihood for a Bayesian network. Correspondingly speaking, to find the best model weights for a Neural Network is equivalent to maximize the likelihood estimation for a Bayesian Network [5].

2. Theory

For the sake of tracing optical signals, our final task is to do real time prediction on centroid positions for optical images which will come at future times, according to previously obtained images. The probabilistic graphic model is a conditional random field shown as

$$P[{{{\vec{x}}_\textrm{C}}({{t_{m + 1}}} ),{{\vec{x}}_\textrm{C}}({{t_{m + 2}}} ),\ldots ,{{\vec{x}}_\textrm{C}}({{t_{m + k}}} )|I({x({t_m})} ),I({x({t_{m - 1}})} ),\ldots ,I({x({t_{m - n + 1}})} )} ]$$
here, m is an integer, assume the system is currently at time ${t_m}$, $I({{t_m}} ),I({{t_{m - 1}}} ),\ldots ,I({{t_{m - n + 1}}} )$ are images respectively obtained at time ${t_m},{t_{m - 1}},\ldots ,{t_{m - n + 1}}$, ${\vec{x}_\textrm{C}}({{t_{m + 1}}} ),{\vec{x}_\textrm{C}}({{t_{m + 2}}} ),\ldots ,{\vec{x}_\textrm{C}}({{t_{m + k}}} )$ are predicted centroid positions for images obtained at time ${t_{m + 1}},{t_{m + 2}},\ldots ,{t_{m + k}}$.

2.1 Label creation

As it is well known that deep neural networks based on supervised learning can be efficient tools for predictions only if it is well trained with reliable data. Hence, before talking about more details on newly developed models, we need firstly describe how to create labels for the supervised learning process. Usually, the centroid position of an image can be calculated with the formulation as below [6]

$$\begin{array}{l} {\textrm{X}_C} = \frac{{\sum\limits_{i,j} {({i \times I({i,j} )} )} }}{{\sum\limits_{i,j} {I({i,j} )} }},\\ {Y_C} = \frac{{\sum\limits_{i,j} {({j \times I({i,j} )} )} }}{{\sum\limits_{i,j} {I({i,j} )} }}. \end{array}$$
in which, $I({i,j} )$ is the gray intensity at the position $({i,j} )$ in the image, and ${X_C},{Y_C}$ are respectively X and Y coordinates of the centroid position. Due to the disturbances of environment light, as well as the noise of a CMOS detector, we use a Gaussian filter on each image, and an extended Kalman filter (EKF) [7] to estimate the current centroid position of the captured optical image.

As a recursive estimator firstly developed in the middle of last century, Kalman Filter has been proved to be efficient for object tracing disturbed by Gaussian noise. With the similar structure of Kalman Filter, EKF is widely used as a typical tool for estimations in nonlinear system which assumes the true state at time t is evolved from the state at (t−1) according to [7]

$$X({{t_m}} )= F({{t_m}} )X({{t_{m - 1}}} )+ B({{t_m}} )U({{t_m}} )+ W({{t_m}} )$$
where F(tm) is the state transition model, B(tm) is the control-input mode, W(tm) is the process noise. Given that our task is to build a model for centroid position prediction according to previously obtained images in time series, from the time tm-n+1 to tm+1 we get n + 1 images, and n + 1 centroid positions are calculated by utilizing Eq. (2), and then n centroid positions for images obtained from the time tm-n+2 to tm+1 are estimated by applying EKF in order to get results in a higher accuracy. These centroid positions are used as labels for both training and testing.

It can be pointed out that before our current model is developed, we have built a Model I and a Model II to predict centroid positions, in which CNN is utilized in Model I, and Model II combines CNN with LSTM. Our current model contains optical flow [8], CNN, LSTM [9], as well as some techniques in digital image processing [10]. However, neither Model I nor Model II gives ideal convergence. Because temporal correlations are not sufficiently accounted by CNN, so that no acceptable convergence was obtained by Model I. By combining CNN with LSTM, some acceptable results can be obtained by Model II, but without enough physical meaning contained in this model, it is difficult to do multiple time-step prediction. Hence, a Model III is developed with significant improvement which contains three modules, including a preprocessing module (P), a CNN module (C) and a LSTM module (L). To be convenient, the model is named by Model III or Model PCL in this paper.

2.2 Preprocessing module

Different from many deep learning models, a preprocessing module is applied before data processing in neural networks to extract important features from the view of kinetics and optics. And since these kinetic and optical features are extracted based on sophisticated computer vision techniques, the algorithm is expected to be robust for working under different weather conditions. Assume there is a coordinate $Oxyz$ fixed on the CMOS, with the origin on the position of pixel (0,0) of CMOS, and $Ox$ along the width, $Oy$ along the height. Given that the entrance pupil of the receiving telescope is parallel to CMOS, the wave function of light field on entrance presented by $R({x,y,l,{t_m}^\prime } )$ is shown as below

$$R({x,y,l,{t_m}^\prime } )= A^{\prime}({x,y,{t_m}^\prime } )\exp [{ - i\Phi^{\prime}({x,y,{t_m}^\prime } )} ],$$
where ${t_m}^\prime$ is the time for telescope entrance to receive light signals, $l$ is the length of telescope which is also the distance between the entrance and CMOS surface, $A^{\prime}({x,y,{t_m}^\prime } )$ and $\Phi ^{\prime}({x,y,{t_m}^\prime } )$ are respectively the amplitude and the phase distribution at time ${t_m}^\prime$. Then the wave function of light field on CMOS can be computed by [11]
$$\Psi ({x,y,{t_m}} )= F_{x,y}^{ - 1}\{{{F_{\xi ,\eta }}[{R({x,y,l,{t_m}^\prime } )} ]T({\xi ,\eta } )} \}$$
in which, $T({\xi ,\eta } )$ is the optical transferring function in frequency domain for the telescope, ${F_{\xi ,\eta }}(\cdot )$ is Fourier transform from the space domain $({x,y} )$ to the frequency domain $({\xi ,\eta } )$, and $F_{x,y}^{ - 1}(\cdot )$ is inverse Fourier transform from frequency domain $({\xi ,\eta } )$ to space domain $({x,y} )$. And ${t_m}$ is the time for CMOS to receive light signals, then we have the relation ${t_m} = {t_m}^\prime + l/c$. The digital image obtained in computer can be presented as [10]
$$I({{x_i},{y_j},{t_m}} )= {S_{i,j}}[{\Psi ({x,y,0,{t_m}} ){\Psi ^\ast }({x,y,0,{t_m}} )} ]$$
where the operator ${S_{i,j}}(\cdot )$ includes sampling, AD conversion, local integral, calibration and some other process as well, here ${S_{i,j}}$ can be approximately considered as a linear operator in a normal camera working state, especially for optical signals neither too strong nor too weak [12], which fits the situation for laser satellite communication. $I({{x_i},{y_j},{t_m}} )$ is the digital image grayscale value obtained at the discrete spacetime $({{x_i},{y_j},{t_m}} )$ . Hence, the digital image obtained at time ${t_m}$ can be considered as the projection of a distribution in an infinite dimensional space on a matrix space ${I_{{\textrm{s}_x} \times {s_y},m}}$, in which ${s_x}$ and ${s_y}$ are respectively the height and the width of the digital image (depending on the camera resolution), m is corresponding to time ${t_m}$. According to photon theory in modern optics [13], there is the equation
$$n({x,y,0,{t_m}} )= K\psi ({x,y,0,{t_m}} ){\psi ^\ast }({x,y,0,{t_m}} )$$
in which $n({x,y,0,{t_m}} )$ is the photon density distribution on CMOS surface, K is a constant related with normalization. Hence, we have the equation of $I({{x_i},{y_j},{t_m}} )$ about the photon density $n({x,y,0,{t_m}} )$
$$I({{x_i},{y_j},{t_m}} )= {S_{i,j}}[{n({x,y,0,{t_m}} )} ]$$
Assume we are trying to predict the image intensity $I({{x_i},{y_j},{t_{m + 1}}} )$ at time ${t_{m + 1}}$ according to the intensity $I({{x_i},{y_j},{t_m}} )$ at time ${t_m}$, we can use first order expansion as below
$$\begin{array}{l} I({{x_i},{y_j},{t_{m + 1}}} )= I({{x_i},{y_j},{t_m}} )+ \Delta t\frac{{dI({{x_i},{y_j},{t_m}} )}}{{dt}} + {O_I}({\Delta t} )\\ = I({{x_i},{y_j},{t_m}} )+ \Delta t\frac{{\partial I({{x_i},{y_j},{t_m}} )}}{{\partial x}}{{\dot{x}}_i} + \Delta t\frac{{\partial I({{x_i},{y_j},{t_m}} )}}{{\partial y}}{{\dot{y}}_j} + \Delta t\frac{{\partial I({{x_i},{y_j},{t_m}} )}}{{\partial t}} + {O_I}({\Delta t} )\end{array}$$
in which, $\Delta t$ is the time interval between two neighboring frames, ${O_I}({\Delta t} )$ contains terms with orders higher than one, time variants ${\dot{x}_i} = dx/dt$ and ${\dot{y}_i} = dy/dt$ are respectively velocities along x and y directions, derivatives $\partial I({{x_i},{y_j},{t_m}} )/\partial x$ and $\partial I({{x_i},{y_j},{t_m}} )/\partial y$ are respectively image grayscale gradients respectively along x and y direction. Similarly, the photon density distribution at time ${t_{m + 1}}$ can be presented as below [14]
$$\begin{array}{l} n({x,y,0,{t_{m + 1}}} )= n({x,y,0,{t_m}} )+ \Delta t\frac{{\partial n({x,y,0,{t_m}} )}}{{\partial x}}\dot{x} + \Delta t\frac{{\partial n({x,y,0,{t_m}} )}}{{\partial y}}\dot{y}\\ + \Delta t\frac{{\partial n({x,y,0,{t_m}} )}}{{\partial t}} + {O_n}({\Delta t} ). \end{array}$$
Since ${S_{i,j}}$ can be considered as a linear operator, according to Eqs. (9) and (10), we have
$$\frac{{\partial I({{x_i},{y_j},{t_m}} )}}{{\partial x}}{\dot{x}_i} = {S_{i,j}}\left[ {\frac{{\partial n({x,y,0,{t_m}} )}}{{\partial x}}\dot{x}} \right]$$
here the left side presents motions at the space time $({{x_i},{y_j},{t_m}} )$ along x direction in the image, the x component of grayscale gradient $\partial I({{x_i},{y_j},{t_m}} )\textrm{/}\partial x$ gives the weight of moving along x direction in the image at corresponding spacetime. Similarly, the right side of Eq. (11) indicates photon motions along x direction on CMOS surface, where $\dot{x}$ is the ‘statistical velocity’ which indicates the variations of photon distributions along x axis. And $\partial n({x,y,0,{t_m}} )\textrm{/}\partial x$ shows the photon distribution gradient along x direction at spacetime $({x,y,0,{t_m}} )$. In accordance with the well-known knowledge that in a flow the region with higher density gradients contributes more to motions and fluctuations. Similarly, we can get Eq. (12), which indicates the relation between motions in the image along y axis and photon distribution variations along the same direction on CMOS as below
$$\frac{{\partial I({{x_i},{y_j},{t_m}} )}}{{\partial y}}{\dot{y}_j} = {S_{i,j}}\left[ {\frac{{\partial n({x,y,0,{t_m}} )}}{{\partial y}}\dot{y}} \right]$$
Besides, we also have Eq. (13) presented as
$$\frac{{\partial I({{x_i},{y_j},{t_m}} )}}{{\partial t}} = {S_{i,j}}\left[ {\frac{{\partial n({x,y,0,{t_m}} )}}{{\partial t}}} \right]$$
which indicates the relation between temporal derivatives in images (a video) at spacetime $({{x_i},{y_j},{t_m}} )$ and photon density increments or decrements at corresponding spacetime on CMOS surface. The spacetime where and when the increment or decrement of photons occurs, can be considered as the existence of a virtual source or sink for photons (during certain sampling period) on CMOS surface.

The term ${O_I}({\Delta t} )$ in Eq. (9) contains combinations of products of higher order derivations, including ${\partial ^n}I({{x_i},{y_j},{t_m}} )\textrm{/}\partial {x^n}$, ${\partial ^n}I({{x_i},{y_j},{t_m}} )\textrm{/}\partial {y^n}$, ${\partial ^n}I({{x_i},{y_j},{t_m}} )\textrm{/}\partial {t^n}$, ${d^n}{x_i}\textrm{/}d{t^n}$, ${d^n}{y_j}\textrm{/}d{t^n}$, $({n \ge 2} )$. Based on discrete numerical analysis methods, all these high order derivatives can be represented by combinations of products of first order derivations, which include $\partial I({{x_i},{y_j},{t_m}} )\textrm{/}\partial x$, $\partial I({{x_i},{y_j},{t_m}} )\textrm{/}\partial y$, $\partial I({{x_i},{y_j},{t_m}} )\textrm{/}\partial t$, $d{x_i}/dt$, $d{y_j}/dt$.

According to Eq. (9), by considering it as a simple Markov process, if the time interval $\Delta t$ between capturing neighboring images is small enough, then it is possible to predict the image $I({{x_i},{y_j},{t_{m + 1}}} )$ according to the current image $I({{x_i},{y_j},{t_m}} )$ together with the previous image $I({{x_i},{y_j},{t_{m - 1}}} )$. However, due to strong nonlinearity and complicity of turbulence, necessary time interval for doing this prediction is far below the interval between two neighboring frames. It is infeasible to do reliable prediction only based on Eq. (9). Hence statistical learning will be used by taking the account of a group of images in time series (introduced in Section 2.4). Assume we are going to do prediction on the centroid position of the image at time ${t_{m + 1}}$ according to the (n + 1) images $I({{t_m}} ),I({{t_{m - 1}}} ),\ldots ,I({{t_{m - n}}} )$, which are considered strongly correlated with the image $I({{t_{m + 1}}} )$, where n is an integer in compromise with the reliability of prediction and computing efficiency.

In the preprocessing module, for each couple of images obtained at time ${t_{m - s}}$, ${t_{m - s - 1}}$ $({s = 0,1,\ldots ,({n - 1} )} )$, digital image processing and computer vision techniques are utilized to get some first order derivative features, in which, $\partial I({{x_i},{y_j},{t_{m - s}}} )/\partial x$ and $\partial I({{x_i},{y_j},{t_{m - s}}} )/\partial y$ can be easily obtained by

$$\frac{{\partial I({{x_i},{y_j},{t_{m - s}}} )}}{{\partial x}} = I({{x_{i + 1}},{y_j},{t_{m - s}}} )- I({{x_i},{y_j},{t_{m - \textrm{s}}}} )({i < {s_x} - 1} )$$
$$\frac{{\partial I({{x_i},{y_j},{t_{m - s}}} )}}{{\partial y}} = I({{x_i},{y_{j + 1}},{t_{m - s}}} )- I({{x_i},{y_j},{t_{m - s}}} )({y < {s_y} - 1} )$$
which respectively show contours of images along x and y directions [10]. The derivative on time is usually obtained by
$$\frac{{\partial I({{x_i},{y_j},{t_{m - s}}} )}}{{\partial t}} = I({{x_i},{y_j},{t_{m - s}}} )- I({{x_i},{y_j},{t_{m - s - 1}}} )$$
During every sampling period on CMOS, the spacetime $({{x_i},{y_j},{t_{m - s}}} )$ can be regarded as a photon source if $\partial I({{x_i},{y_j},{t_{m - s}}} )/\partial t > \textrm{0}$, and it can be considered as a photon sink if $\partial I({{x_i},{y_j},{t_{m - s}}} )/\partial t < \textrm{0}$. As for velocities $d{x_i}/dt$ and $d{y_j}/dt$, they can be computed by dense optical flow with two neighboring images $I({{t_{m - s}}} ),I({{t_{m - s - 1}}} )$. Dense optical flow is an algorithm invented by Gunner Farneback in 2003 [8] proved to be efficient for tracing moving objects and computing velocities [15]. There is already some research on tracing optical signals with optical flow for inter-satellite laser communication [16]. Here in our research, dense optical flow is applied to measure variations between neighboring optical images disturbed by both satellite motions and atmosphere disturbances. The process is shown here as
$$[{{V_x},{V_y}} ]= VDOF[{I({{t_{m - s}}} ),I({{t_{m - s - 1}}} )} ]$$
where ${V_x} = {({\dot{x}({i,j,{t_{m - s}}} )} )_{{s_x} \times {s_y}}}$ and ${V_y} = {({\dot{y}({i,j,{t_{m - s}}} )} )_{{s_x} \times {s_y}}}$ are two dimensional velocity fields respectively along x and y axis, ${s_x}$ and ${s_y}$ are width and height of the captured images as defined before, and VDOF is an operator utilized to obtain velocity fields with dense optical flow [17]. According to the description above, in preprocessing module, derivatives $\partial I({{t_{m - s}}} )\textrm{/}\partial x$, $\partial I({{t_{m - s}}} )\textrm{/}\partial y$ respectively show contours of image $I({{t_{m - s}}} )$ along x and y directions [18] which are related with density gradients of photon distributions on CMOS at corresponding spacetime. Time variational fields $\partial I({{t_{m - s}}} )/\partial t$ are derived from increments or decrements of photon density distributions on CMOS. Velocity fields ${V_x}({{t_{m - s}}} )$ and ${V_y}({{t_{m - s}}} )$ are calculated by utilizing dense optical flow with two neighboring images $I({{t_{m - s}}} )$, $I({{t_{m - s - 1}}} )$, which respectively indicate photon motions along x and y directions on CMOS. Together with image $I({{t_{m - s}}} )$, six matrices will be obtained in total by carrying out calculations discussed above. According to Eqs. (9) and (10), these matrices contain extremely useful information for predictions on next optical images.

2.3 CNN module

Figure 1 shows the architecture of CNN module. It has a 6 channel input which is to accept data from the preprocessing module. Since the final task is to predict centroid positions for coming images, there is no need to predict the whole image, instead, at the time ${t_{m - s}}$ we are going to predict a feature vector about the image $I({{t_{m - s + 1}}} )$ by utilizing convolutional neural networks. As shown in Fig. 1, CNN contains nine convolutional layers, a dropout layer and two fully connected layers. A four-dimension vector is obtained as the output of CNN module shown as below,

$${\vec{U}_4}({{t_{m - s + 1}}} )= CNN\left[ {\frac{{\partial I({{t_{m - s}}} )}}{{\partial x}},\frac{{\partial I({{t_{m - s}}} )}}{{\partial y}},\frac{{\partial I({{t_{m - s}}} )}}{{\partial t}},{V_x}({{t_{m - s}}} ),{V_y}({{t_{m - s}}} ),I({{t_{m - s}}} )} \right]$$
where ${\vec{U}_4}({{t_{m - s + 1}}} )$ is expected to be a feature vector related with centroid position and velocity for the image at time ${t_{m - s + 1}}$. After every two neighboring images are processed by CNN, we obtain $n$ 4d feature vectors ${\vec{U}_4}({{t_m}} ),{\vec{U}_4}({{t_{m - 1}}} ),\ldots ,{\vec{U}_4}({{t_{m - n + 1}}} )$. Due to the entanglement between different orientations, as well as correlations among data points sampled at different spacetime, all the n 4d vectors are grouped to get a $4 \times n$ matrix and then reshaped into a $4n$ dimension vector, the process is shown as Eq. (19)
$${\vec{F}_{4n}}({{t_m},{t_{m - 1}},\ldots ,{t_{m - n + 1}}} )= reshape[{{{\vec{U}}_4}^T({{t_m}} ),{{\vec{U}}_4}^T({{t_{m - 1}}} ),\ldots ,{{\vec{U}}_4}^T({{t_{m - n + 1}}} )} ],$$
where ${\vec{F}_{4n}}$ is a 4n dimension vector which will be input to LSTM with a single layer input channel.

 figure: Fig. 1.

Fig. 1. Architecture of CNN.

Download Full Size | PDF

2.4 LSTM module

Theoretically speaking, temporal relations in the air flow can be calculated based on computational aerodynamics. But it is obviously unwise to solve Navier Stokes Equations in free space only for laser communication. Instead, some other fitting method for signals in time series is necessary. There are two important large time scales for atmosphere turbulence, ${T_1}$ and ${T_2}$. Here, ${T_1}$ is due to advection, estimated by ${L_0}/{V_ \bot }$, where ${L_0}$ is the outer scale of turbulence and ${V_ \bot }$ is the mean wind speed transverse to the observation path. The time scale ${T_1}$ is typically on the order of 1 s. The other time scale ${T_2}$, associated with the eddy turnover time, is typically on the order of 10s [19]. It should be emphasized that, the light field incident on a detector is influenced by large scale turbulence, including the distance along the propagating distance and the circular area with the diameter usually more than several meters near the receiver on transverse to observation path. Hence, current turbulence evolutions can be influential to light fields on detector many mini seconds or even several seconds later. This is related with the chaotic essential of turbulence. Since a CCD or CMOS detector applied to receive optical signals from satellite usually takes pictures at least in the frequency of 30fps or higher, a group of pictures taken within seconds can be correlated with each other, hence it is a wise choice to take the advantage of LSTM, which has been proved efficient for processing nonlinear signals in time series. The equations for a typical LSTM cell are presented as below [9]

$$\begin{array}{l} {{\vec{f}}_s} = {\sigma _g}[{{W_f}{{\vec{x}}_s} + {U_f}{{\vec{h}}_{s - 1}} + {{\vec{b}}_f}} ]\\ {{\vec{i}}_s} = {\sigma _g}({{W_i}{{\vec{x}}_s} + {U_i}{{\vec{h}}_{s - 1}} + {{\vec{b}}_i}} )\\ {{\vec{o}}_s} = {\sigma _g}({{W_o}{{\vec{x}}_s} + {U_o}{{\vec{h}}_{s - 1}} + {{\vec{b}}_o}} )\\ {{\vec{c}}_s} = {{\vec{f}}_s} \circ {{\vec{c}}_{s - 1}} + {{\vec{i}}_s} \circ {\sigma _c}({{W_c}{{\vec{x}}_s} + {U_c}{{\vec{h}}_{s - 1}} + {{\vec{b}}_c}} )\\ {h_s} = {o_s} \circ {\sigma _h}({{c_s}} )\end{array}$$
where $\vec{x}$ is the input vector to the LSTM module, $\vec{f}$ is the forget gate's vector, $\vec{i}$ is the input gate's vector, $\vec{o}$ is the output gate's vector, $\vec{h}$ is hidden state vector, which is also known as output vector of the LSTM unit, $\vec{c}$ is the cell state vector, W and U are weight matrices (or tensors), and $\vec{b}$ is the bias vector, ${\sigma _g}$ is sigmoid function, both ${\sigma _c}$ and ${\sigma _h}$ are hyperbolic tangent functions.

As described before, the output from CNN module forms into a 4n-dimension feature vector ${\vec{F}_{4n}}({{t_m},{t_{m - 1}},\ldots ,{t_{m - n + 1}}} )$, which is a signal in time series. And a LSTM module with single layer input is utilized instead of the one with multiple channel input, so that the elements from feature vectors obtained at different times will be fully mixed in LSTM. The output $\vec{Q}({{t_{m + k}},{t_{m + k - 1}},\ldots ,{t_{m + 1}}} )$ for LSTM is a $2 \times k$ matrix which is also a signal in time series, shown as

$$\vec{Q}({{t_{m + k}},{t_{m + k - 1}},\ldots ,{t_{m + 1}}} )= LSTM[{{{\vec{F}}_{4n}}({{t_m},{t_{m - 1}},\ldots ,{t_{m - n + 1}}} )} ],$$
here n is the input length for LSTM. For any LSTM module, weight matrices have to match the input n dimensional vector. There is a parameter L for the dimension of weight matrices/tensors in LSTM module that satisfies L = n. For nonlinear system with longer temporal correlations, both n and L are usually set to be larger.

The matrix $\vec{Q}({{t_{m + k}},{t_{m + k - 1}},\ldots ,{t_{m + 1}}} )$ is composed of k 2d position vectors at $k$ different times shown as

$$\vec{Q}({{t_{m + k}},{t_{m + k - 1}},\ldots ,{t_{m + 1}}} )= [{\vec{X}({{t_{m + k}}} ),\vec{X}({{t_{m + k - 1}}} ),\ldots ,\vec{X}({{t_{m + 1}}} )} ],$$
where $\vec{X}({{t_{m + \textrm{s}}}} )({s = 1,2,\ldots ,k} )$ is a 2d vector to predict the centroid position of the optical image at time ${t_{m + s}}$.

2.5 Description on Model PCL as an integrity

Assume centroid positions of optical images can be fully predicted by feature vectors ${\vec{U}_4}\left( {{t_i}} \right)\left( {i = m + 1, m,..., m - n + 2} \right)$, then Bayesian Network for Model PCL is presented as Eq. (23).

$$\begin{array}{l} P[{\vec{Q}({{t_{m + k}},{t_{m + k - 1}},\ldots ,{t_{m + 1}}} )|I({{t_m}} ),I({{t_{m - 1}}} ),I({{t_{m - 2}}} ),\ldots ,I({{t_{m - n}}} )} ]\\ = P[{\vec{Q}({{t_{m + k}},{t_{m + k - 1}},\ldots ,{t_{m + 1}}} )|{{\vec{U}}_4}({{t_{m + 1}}} ),{{\vec{U}}_4}({{t_m}} ),{{\vec{U}}_4}({{t_{m - 1}}} ),\ldots ,{{\vec{U}}_4}({{t_{m - n + 2}}} )} ]\\ P[{{{\vec{U}}_4}({{t_{m + 1}}} ),{{\vec{U}}_4}({{t_m}} ),{{\vec{U}}_4}({{t_{m - 1}}} ),\ldots ,{{\vec{U}}_4}({{t_{m - n + 2}}} )|I({{t_m}} ),I({{t_{m - 1}}} ),I({{t_{m - 2}}} ),\ldots ,I({{t_{m - n}}} )} ]\\ = P[{\vec{Q}({{t_{m + k}},{t_{m + k - 1}},\ldots ,{t_{m + 1}}} )|{{\vec{U}}_4}({{t_{m + 1}}} ),{{\vec{U}}_4}({{t_m}} ),{{\vec{U}}_4}({{t_{m - 1}}} ),\ldots ,{{\vec{U}}_4}({{t_{m - n + 2}}} )} ]\\ P[{{{\vec{U}}_4}({{t_{m + 1}}} ),{{\vec{U}}_4}({{t_m}} ),{{\vec{U}}_4}({{t_{m - 1}}} ),\ldots ,{{\vec{U}}_4}({{t_{m - n + 2}}} )|} \\ {({I({{t_m}} ),I({{t_{m - 1}}} )} ),({I({{t_{m - 1}}} ),I({{t_{m - 2}}} )} ),\ldots ,({I({{t_{m - n + 1}}} ),I({{t_{m - n}}} )} )} ]\\ = P[{\vec{Q}({{t_{m + k}},{t_{m + k - 1}},\ldots ,{t_{m + 1}}} )|{{\vec{U}}_4}({{t_{m + 1}}} ),{{\vec{U}}_4}({{t_m}} ),{{\vec{U}}_4}({{t_{m - 1}}} ),\ldots ,{{\vec{U}}_4}({{t_{m - n + 2}}} )} ]\\ P[{{{\vec{U}}_4}({{t_{m + 1}}} )|({I({{t_m}} ),I({{t_{m - 1}}} )} )} ]P[{{{\vec{U}}_4}({{t_m}} )|({I({{t_{m - 1}}} ),I({{t_{m - 2}}} )} )} ]\ldots \\ P[{{{\vec{U}}_4}({{t_{m - n + 2}}} )|({I({{t_{m - n + 1}}} ),I({{t_{m - n}}} )} )} ], \end{array}$$
where $\vec{Q}({{t_{m + k}},{t_{m + k - 1}},\ldots ,{t_{m + 1}}} )$ is composed of k 2d position vectors as described in Eq. (22).

The conditional probability $P[{\vec{x}({{t_{m + 1}}} )|{{\vec{U}}_4}({{t_{m + 1}}} ),{{\vec{U}}_4}({{t_m}} ),{{\vec{U}}_4}({{t_{m - 1}}} ),\ldots ,{{\vec{U}}_4}({{t_{m - n + 2}}} )} ]$ is corresponding to LSTM module utilized to do statistical learning for feature vectors in time series. According to Eqs. (14)–(18), the feature vector ${\vec{U}_4}({{t_i}} )$ can be computed based on $I({{t_m}} ),I({{t_{m - 1}}} )$. Hence, the decoupling in Eq. (23) is reasonable, and $P[{{{\vec{U}}_4}({{t_{m + 1}}} )|({I({{t_m}} ),I({{t_{m - 1}}} )} )} ]$ is corresponding to Preprocessing Module and CNN Module as described in Section 2.2 and 2.3. Since the architecture of Model PCL contains a structure of CNN combined with LSTM, all the parameters both in CNN and LSTM nodules should be recursively updated in the same optimizing process.

Figure 2 shows the whole architecture of Model PCL. The preprocessing module is obviously important for Model PCL, in which several kinds of algorithms are utilized to get useful information about photon distributions and motions on CMOS, which also indicates some variations for light propagates through atmosphere medium. CNN Module is utilized to predict feature vectors which represent the information about coming images. LSTM Module is applied to treat feature vectors in time series and give predictions on centroid positions. In addition, based on Eq. (23), Model PCL is competent for multiple time-step predictions. Actually, it is the highlight of this model. LSTM plays an important role in this function. From Eq. (21) and Eq. (22), suppose the input for our model (including LSTM module) are signals in timeseries containing n time-steps, and the goal is to do m time-step predictions, then the output will firstly be a n×b array, here b is the batch size. To predict centroid positions for m time-steps, m out of these n signals can be easily extracted (usually, m < 0.5n) by data choosing or matrices dot multiplication. And then, the loss function can be defined together with labels for multiple time-step prediction. While the random initialization in LSTM usually causes big differences between different output channels, which lead to unbalances in the optimizing process (The output for some time-step might be still underfitting, while the output for some other time-step can be already overfitting). Hence, some additional terms to suppress these unbalances are necessary.

 figure: Fig. 2.

Fig. 2. Architecture of Model PCL.

Download Full Size | PDF

Assume $({{x_{i,1}},{y_{i,1}}} ),({{x_{i,2}},{y_{i,2}}} ),\ldots ,({{x_{i,m}},{y_{i,m}}} )$ centroid positions predicted by a m time-step model for time ${t_1},{t_2},\ldots ,{t_m}$ here i is corresponding to every image in a batch. Correspondingly, $({{X_{i,1}},{Y_{i,1}}} ),({{X_{i,2}},{Y_{i,2}}} ),\ldots ,({{X_{i,m}},{Y_{i,m}}} )$ are labels for centroid positions. To suppress the unbalances in optimizing process, loss functions are defined as below,

$${L_1} = \sum\limits_{i \in batch} {[{{{({{x_{i,1}} - {X_{i,1}}} )}^2} + {{({{y_{i,1}} - {Y_{i,1}}} )}^2}} ]}$$
is the loss function for time-step t1,
$${L_2} = \sum\limits_{i \in batch} {[{{{({{x_{i,2}} - {X_{i,2}}} )}^2} + {{({{y_{i,2}} - {Y_{i,2}}} )}^2}} ]}$$
is the loss function for time-step t2,
$${L_m} = \sum\limits_{i \in batch} {[{{{({{x_{i,m}} - {X_{i,m}}} )}^2} + {{({{y_{i,m}} - {Y_{i,m}}} )}^2}} ]}$$
is the loss function for time-step tm.

Then the loss function for m time-step prediction model can be expressed by

$$Loss = \sum\limits_{n = 1}^m {{L_n}} + \sum\limits_{i,j = 1\atop i \ne j}^m {{{({{L_i} - {L_j}} )}^2}} .$$
in which, Eq. (28)
$${L_{binding}} = \sum\limits_{i,j = 1\atop i \ne j}^m {{{({{L_i} - {L_j}} )}^2}} .$$
can be called “binding item” to suppress separations among different output channels. As shown in these equations, in an ideal situation, each of the ${L_i}({i = 1,2, \ldots m} )$ vanishes. In the optimizing process, the absolute difference between ${L_i},\; \; {L_j}$ is also minimized. This binding term works very well to decrease the gaps between different output channels.

3. Experiment

We carried out the experiment to receive optical signals and predict centroid positions by utilizing the equipment shown in Fig. 3, which mainly contains an 800nm semiconductor laser generator, an emitting telescope (for laser emitting) and a Cassegrain telescope for signal receiving, as well as a camera connected with a computer.

 figure: Fig. 3.

Fig. 3. Equipment for experiment.

Download Full Size | PDF

Shown in Table 1 are main devices applied in the experiment. The angle at the entry for per pixel can be calculated by

$$\alpha = \frac{{({pixel\, size} )}}{{({Focal\, Length} )}} = 10.3\mu rad$$
Assume the prediction error is m in pixels, then the corresponding angular error can be defined by
$$\beta = m\alpha$$

Tables Icon

Table 1. Equipment for on-line process.

The receiving telescope is set at the window of the lab (unfixed), which is on the roof of a building with 15 floors. The telescope might be shaken due to wind, which can be utilized to train and test the model on oscillating condition. The main task of the experiment is to make the prediction error less than 1 pixel (or 10.3 µrad).

Figure 4 shows a screenshot of Google map for local area, in which the green triangle is the emitting terminal on the altitude of 216m, and the red triangle is the receiving terminal on the altitude of 195m. The two terminals are respectively located in two buildings with the distance of 11.16 km across Songhua River in Harbin City.

 figure: Fig. 4.

Fig. 4. Google map for local Area.

Download Full Size | PDF

Shown in Fig. 5 are four images obtained by the receiver with a 550fps camera. To capture image data for training Model PCL, experiments are carried out at different times and also in different windy conditions. An ideal model should be robust to atmosphere turbulence, as well as mechanic oscillations. No matter for atmosphere turbulence, or windy condition or mechanic oscillations, the standard deviation (SD) of centroid positions is always an effective quantity to describe the influences of environment. Hence, SD of centroid positions is calculated in on-line process to indicate the strength of fluctuations. In this experiment, the prediction error is expected to be smaller than one half of standard deviation.

 figure: Fig. 5.

Fig. 5. Images of Light Spots.

Download Full Size | PDF

It should be mentioned that deep machine learning is still consuming lots of computing power, so that it is challenging to do prediction for 550 times per second with a single computer. There are two choices to solve the problem. One is to utilize distributed computation by combining several computers, the other is to realize reliable multiple time-step prediction. Actually, multiple time-step prediction is truly valuable for practical use. As a model containing a LSTM module, it does not consume more computing power to do multiple time-step predictions than to predict centroid position for a single time-step. Hence, by applying a n time-step predictor, the forecasting job can be done in the frequency of 550/n Hz, so that lots of computing power can be saved. In Section 3.13.2, one time-step, two time-step and four time-step predictions are discussed, in which the four time-step prediction should be highlighted, since it is very useful for high frequency predictions.

3.1 Off-line training

As it is well known that the experiment about neural networks is always relying on training. Since our task is to do real time prediction on centroid positions, our first target is to get an ‘easy training’ model with fast convergence. Due to the strong nonlinearity, randomness and complicity of turbulence, it is infeasible to build and train a model upon a time to fit centroid positions under atmosphere disturbances all the time. Neither is it a wise idea to predict centroid positions with an untrained model in real time. Hence, to obtain a practical model for real time prediction, we do them both. Firstly, after enough image data have been sampled and preprocessed, we train the model with 36000 images (30000 for training, 6000 for testing), which is called ‘off-line’ training. Subsequently, the pretrained model is fixed into working computer together with optimized parameters. And then, ‘on-line’ training will be carried out together with real-time prediction.

In the experiment, for off-line training, image data are both obtained in the early evening and after midnight. In the city of Harbin, it has been observed that optical signals captured in the dusk are more fluctuated than those obtained after midnight.

Some important parameters should be introduced before talking about training process. In every training epoch, a sub-dataset is extracted from the whole dataset with a random initial position. The sub-dataset is then reshaped into certain number (assume it to be N) of large batches, with each containing L small batches. The number L is an interior parameter for LSTM which is equal to input length for LSTM (as described in and below Eq. (21)). Every small batch contains m images (m = 25 is frequently used in the paper). Images captured in the evening are not mixed with those obtained at midnight in the same large batch while large batches obtained at different times can be mixed. In every epoch, the number of large batches is calculated by N = M/(L×m). Of course, these numbers are all integers.

The parameter L is firstly determined, several different numbers have been tried, 40 is chosen for the experiment. It is the same for training and testing. Both M and m are changeable. The sub-dataset size M is an important parameter which can be adjusted to control the training time, prevent overfitting and underfitting. These influences of parameters on convergence efficiency are more obvious in on-line training than in off-line training.

Shown in Fig. 6 is the training error in an off-line process together with its testing error. Sub-dataset size M = 10000, reshaped into 10 large batches, which are fed into the model for training in every epoch. Both the training and testing errors decrease fast. Stable convergence is sustained after the 200th epoch. (It is not suggested to do off-line training for a too long time here, or the parameters might be overfitting, and the pre-trained model would become uneasy to get fitted on images obtained in a new environment.) It can be seen that the training curve is very close to the testing curve. This phenomenon usually appears in some successful high frequency prediction process, especially for one time-step prediction. For one thing, it verifies that Model PCL is a powerful tool for fitting, for another, images obtained at 550fps are sometimes close with each other in both training and testing dataset, so that it is easy to get stable convergence in this condition. In Fig. 6, both training and testing errors are eventually between 1.3 and 1.5 pixel. Since they are square distances, the corresponding angular errors are constrained to be between 13.39 and 15.45 µrad.

 figure: Fig. 6.

Fig. 6. Off-Line one time-step training and testing.

Download Full Size | PDF

Presented in Fig. 7 are off-line training and testing error curves for a two time-step prediction model. Sub-dataset size M = 12000. Each sub-dataset is reshaped into 12 large batches, which are fed into the model for training in every epoch. Training error 1 and testing error 1 respectively present the training and the testing error for centroid position at time ${t_{m + 1}}$, training error 2 and testing error 2 respectively show the training and the testing error for centroid position at time ${t_{m + 2}}$. The flat area before the 600th epoch of the curves can be relieved by decreasing the weight of norm function [22], but it has little influences on final convergence. As shown in Fig. 7, different curves are covered by each other, hence processes for different time-steps are shown separately as below. Figure 8 presents the first time-step training and testing error for the two time-step model, and Fig. 9 presents the second time-step training and testing error for the same model.

 figure: Fig. 7.

Fig. 7. Off-line training and testing for a two time-step prediction model.

Download Full Size | PDF

 figure: Fig. 8.

Fig. 8. Two time-step training and testing curves (for the 1st time-step).

Download Full Size | PDF

 figure: Fig. 9.

Fig. 9. Two time-step training and testing curves (for the 2nd time-step).

Download Full Size | PDF

From Fig. 7, Fig. 8 and Fig. 9, we can see the abnormal phenomenon that training errors fluctuate fiercely while the testing curve goes smoothly. These fluctuations actually come from the binding term as shown in Eq. (28) in loss function. Because of the binding term, the interactions between the output at time ${t_{m + 1}}$ and ${t_{m + 2}}$ is enhanced. According to Eqs. (24), (25) and (27), the loss functions for time ${t_{m + 1}}$ and ${t_{m + 2}}$ (Loss1 and Loss2) have to be minimized together. Hence, in the optimizing process, Loss1 and Loss2 ‘drag’ each other. Because of asynchronization, the two loss functions are sometimes ‘dragged up’ and sometimes ‘dragged down’ by each other. Interestingly, the training error can be dragged up only slightly higher than the testing error and is mostly dragged down lower than testing error. Since the parameters of the model are optimized in the right trend, though fluctuations of training errors exist, the testing errors still get convergence smoothly. The second time-step training error fluctuates more fiercely due to the way of tensor multiplication in neural network model. Comparing with other loss function, the binding term has been verified helpful for getting effective convergence in multiple time-step prediction process.

As shown in Fig. 10 are two testing curves for dual time-step prediction model. Though there are some obvious difference between two testing error curves in the beginning, the two channels are finally close with each other. Final errors for the 1st and 2nd time-steps are respectively between 1.48∼1.55 pixels and 1.52∼1.58 pixels, corresponding angular errors are between 15.28∼15.96 µrad and 15.62∼16.29 µrad.

 figure: Fig. 10.

Fig. 10. Testing curves for the dual time-step model.

Download Full Size | PDF

Shown in Fig. 11 are training and testing curves for a four time-step model. Sub-dataset size M = 15000, reshaped into 15 large batches, which are fed into the model for training in every epoch. Assume it is at time tm, then the curves for the 1st time-step, 2nd time-step, 3rd time-step and 4th time-step respectively present the training and testing errors at time tm+1, tm+2, tm+3, tm+4. Similar with the situation in Fig. 7, curves for different times are entangled and very close with each other. The training curves are not so fluctuated as Fig. 7, which is mainly because the binding term containing 4 square differences offers more constraints than before. Similar with before, there are still 32000 images fed into neural networks in every training epoch. Training and testing curves for different time-steps are presented separately as below.

 figure: Fig. 11.

Fig. 11. Off-line training and testing for a four time-step prediction model.

Download Full Size | PDF

Figure 12 presents the first time-step training and testing curves for the four time-step off-line process.

 figure: Fig. 12.

Fig. 12. Four time-step training and testing curves (for the 1st time-step).

Download Full Size | PDF

Figure 13 presents the 2nd time-step training and testing curves for the four time-step off-line process.

 figure: Fig. 13.

Fig. 13. Four time-step training and testing curves (for the 2nd time-step).

Download Full Size | PDF

Figure 14 presents the 3rd time-step training and testing curves for the four time-step off-line process.

 figure: Fig. 14.

Fig. 14. Four time-step training and testing curves (for the 3rd time-step).

Download Full Size | PDF

Figure 15 presents the 4th time-step training and testing curves for the four time-step off-line process.

 figure: Fig. 15.

Fig. 15. Four time-step training and testing curves (for the 4th time-step).

Download Full Size | PDF

Figure 16 presents four testing curves for the off-line four time-step training and testing process. Though four training curves fluctuate in optimizing process, all the testing curves get convergence smoothly and nicely.

 figure: Fig. 16.

Fig. 16. Testing errors for the four time-step training and testing model.

Download Full Size | PDF

Final errors for the 1st, 2nd, 3rd and 4th time-step are respectively between 1.61∼1.70 pixels, 1.44∼1.48 pixels, 1.54∼1.61 pixels and 1.55∼1.63 pixels. Corresponding angular errors are between 16.61∼ 17.54 µrad, 14.83∼15.28 µrad, 15.96 ∼16.61 µrad, 16.04∼16.86 µrad.

3.2 On-line training and prediction

As described before, after off-line training is done, we are going to do on-line training and testing by taking the advantage optimized parameters. It is obviously infeasible to do on-line training by utilizing datasets with the same size as large as the size in off-line training. Instead, we have to use much smaller datasets. What is more, multi-process technique is utilized [20], in which we start two processes. One for on-line prediction which is the main process also for receiving images, and the other for on-line training. Images obtained in on-line prediction process are utilized directly in on-line training process, and parameters optimized in on-line training process are also used directly in on-line prediction, both through memory sharing technique [21].

However, because machine learning usually takes too much computing power, it is still very challenging to do predictions 550 times per second. Hence, two identical computers can be utilized, each one contains a GTX 2070 graphic card and an I7 9700K CPU. Distributed computing technique is applied in both off-line and on-line process. For on-line process, one computer is utilized for receiving images from the camera and carrying out the prediction about 550 times per second. The other computer is responsible for training the model. And the communication between two computers is based on the protocol within tensorflow. Figure 9 presents distributed computing for the high frequency on-line optimizing and prediction process. Image datasets are transferred from computer 1 to computer 2, and optimized parameters are transferred from computer 2 to computer 1. Figure 17 illustrates the basic idea of distributed computing.

 figure: Fig. 17.

Fig. 17. Distributed computing for high frequency on-line process.

Download Full Size | PDF

For single and double time-step on-line training and prediction process, distributed computing as shown in Fig. 17 is carried out. As described before, if four time-step on-line training and prediction process can be realized, distributed computing can be replaced by multi-process technique.

Figure 18 presents the data arrangement for multi-process computing in on-line process. Similar with off-line process, the training dataset are composed of N large batch, each large batch is composed of L small batches, and every small batch contains m1 images. The training dataset composed of ${M_1} = {N_1} \times {m_1} \times L$ images are maintained in memory based on first small batch in, first small batch out principle (FIFO), so that the training dataset can be updated in a reasonable frequency to catch up with the camera, and newly received images can be fed into the model in time. The small batch size m1 is changeable, the optimizing efficiency will be low if m1 is too small, while it might be difficult to catch up with the camera frame rate (550) if m1 is too large. In experiment described in paper, m1=10, N1=10. About 2 epochs will be finished in one second, which means model parameters are updated for 20 times per second.

 figure: Fig. 18.

Fig. 18. Data arrangement for multi-process on-line training and prediction.

Download Full Size | PDF

Figure 19 presents an on-line training and testing process for 1000 seconds. Figure 20 magnifies the last half part (from the 500th to 1000th second) of Fig. 19 to distinguish the training and testing curves, and SD (standard deviation) of centroid positions is also presented. Similar with off-line process, the training curve is very close to the testing curve. Nice convergence is obtained after the 400th second. According to Fig. 20, the training error is finally 0.418 pixel, corresponding to the angular error 4.310 µrad, and testing error is finally around 0.430 pixel, corresponding to the angular error of 4.436 µrad. Both the training and testing error have been stabilized to be within 1/3 of standard deviation for centroid positions.

 figure: Fig. 19.

Fig. 19. On-line training and testing error for a single time-step model.

Download Full Size | PDF

 figure: Fig. 20.

Fig. 20. On-line training and testing error compared with standard deviation of centroid position (500∼1000s).

Download Full Size | PDF

Figure 21 presents the training and testing error for a two time-step prediction model. The data were sampled in a sunny evening, from 9:00 p.m. to 9:25 p.m., while the temperature was between 18° and 20°. In Fig. 13, training error 1 and testing error 1 respectively present the training error and the testing error for centroid position predicted at time ${t_{m + 1}}$. Similarly, training error 2 and testing error 2 respectively show the training and the testing error for centroid position predicted at time ${t_{m + 2}}$. Unlike off-line training, fluctuations of training errors do not exist, which means the parameters have been pretrained very well. Errors decrease fast after the 400th second. It becomes stabilized after the 800th second.

 figure: Fig. 21.

Fig. 21. On-line training and testing error for a two time-step prediction model

Download Full Size | PDF

Figure 22 shows the last half of the optimizing process. Training error 1 and testing error 1 are very close with each other, while training error 2 and testing error 2 are separated by a narrow gap with a width of about 0.15 pixel. The standard deviation of centroid positions varies around 1.6 pixel. The testing error 1 and testing error 2 are finally respectively 0.742 and 0.765 pixel, the corresponding angular errors are respectively 7.643 µrad and 7.880 µrad.

 figure: Fig. 22.

Fig. 22. On-line training and testing error for two time-step training and prediction (750∼1500s) together with SD for centroid positions.

Download Full Size | PDF

Figure 23 presents training and testing errors for a four time-step model. The data were sampled in an early evening with slight wind, from 7:00 p.m. to 8:10 p.m., while the temperature was between 17° and 19°. Nice convergence is obtained after the 500th second.

 figure: Fig. 23.

Fig. 23. On-line training and testing for a four time-step prediction model.

Download Full Size | PDF

To prove the robustness of this model, we keep the on-line training and prediction process for more than one hour. Figure 24 shows the training and prediction errors for the four time-step on-line process from the 3000th to 4000th second. And SD (standard deviation) for centroid positions is also shown in the same figure. Different curves are distinguished here. Though standard deviation of centroid positions is higher than before, nice convergences for training and testing errors in different time-steps can still be obtained. According to Fig. 24, the final testing error for the 1st time-step, the 2nd time-step, the 3rd time-step and the 4th time-step are respectively 0.483 pixel, 0.605 pixel, 0.741 pixel and 0.572 pixel, corresponding angular errors are 4.97 µrad, 6.232 µrad, 7.632 µrad, 5.892 µrad. Sometimes, the curves are not so flat, slight fluctuations might exist, but errors can still be suppressed to stay under 1 pixel. While standard deviation for centroid positions is around 2 pixels.

 figure: Fig. 24.

Fig. 24. On-line training and testing error for four time-step training and prediction process (3000∼4000s) together with SD for centroid positions.

Download Full Size | PDF

Experiments show the model is insensitive to standard deviation of centroid positions, when SD varies between 1.5 to 2.2 pixels due to atmosphere turbulence, windy conditions and telescope oscillations. For a model well pretrained in off-line process, the predictions are more likely to be dependent on parameter adjustment, data arrangement and variable initialization. More detailed investigations will be carried out for these relations in further research.

4. Conclusion

According to the analysis above, a powerful tool for centroid prediction under nonlinear disturbances can be obtained by combining a preprocessing module with a CNN module and a LSTM module. By adjusting parameters, the prediction error can be suppressed to be less than 1 pixel, which proves the potentiality for practical use. But there are still plenty of things to do, such as, 1. To enhance robustness by more data training and parameter adjusting, 2. To improve optical and photoelectronic systems so that accuracy can be promoted, 3. To consider optical aberrations. Some interesting work has been going on, and more valuable results will come out in future.

References

1. D. Cornwell, “Space-Based Laser Communications Break Threshold,” Opt. Photonics News 27(5), 24–31 (2016). [CrossRef]  

2. M. R. Clark, “Application of Machine Learning Principles to Modeling of Nonlinear Dynamic Systems,” J. Arkansas Acad. Sci. 48, 36–40 (1994).

3. K. Worden and P. L. Green, “A machine learning approach to nonlinear modal analysis,” Mech. Syst. Signal. Process. 84, 34–53 (2017). [CrossRef]  

4. G. Ju, X. Qi, H. Ma, and C. Yan, “Feature-based phase retrieval wavefront sensing approach using machine learning,” Opt. Express 26(24), 31767–31783 (2018). [CrossRef]  

5. D. Margaritis, “Learning Bayesian Network Model Structure from Data,” PhD dissertation, Carnegie Mellon University, (2013)

6. S. Arnon and N. S. Kopeika, “Adaptive suboptimum detection of an optical pulse-position-modulation signal with a detection matrix and centroid tracking,” J. Opt. Soc. Am. A 15(2), 443–448 (1998). [CrossRef]  

7. P. Zarchan and H. Musoff, Fundamentals of Kalman Filtering (Progress in Aeronautics and Astronautics) (AIAA, 2015), pp. 210–212

8. G. Farnebäck, “Two-Frame Motion Estimation Based on Polynomial Expansion,” in Proceedings of the 13th Scandinavian Conference on Image Analysis (SCIA, 2003), pp. 363–370.

9. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation 9(8), 1735–1780 (1997). [CrossRef]  

10. R. C. Gonzalez and R. E. Woods, Digital Image Processing (Pearson, 2017), pp. 57–120.

11. J. W. Goodman, Introduction to Fourier Optics (W.H. Freeman, 2017), pp. 441–452.

12. R. Y. Tsai, “A Versatile Camera Calibration Technique for High-Accuracy 3D Machine Vision Metrology Using Off-the-Shelf TV Cameras and Lenses,” IEEE J. Robot. Automat. 3(4), 323–344 (1987). [CrossRef]  

13. O. Keller, Light - The Physics of the Photon (CRC Press, 2014), pp. 378–402.

14. K. Cahill, Physical Mathematics (Cambridge University Press, 2013), pp. 245–256.

15. J. Corso, “Motion and Optical Flow,” https://web.eecs.umich.edu/∼jjcorso/t/598F14/files/lecture_1015_motion.pdf, University of Chicago (2014).

16. Q. Wang, Y. Siyuan, L. Tan, and J. Ma, “Approach for Recognizing and Tracking Beacon in Inter-Satellite Optical Communication Based on Optical Flow Method,” Opt. Express 26(21), 28080–28090 (2018). [CrossRef]  

17. D. B. Bungbung and D. Valero, “Application of the Optical Flow Method to Velocity Determination in Hydraulic Structure Models,” in 6th International Symposium on Hydraulic Structures and Water System Management, Portland, (2016).

18. E. R. Davies, Computer Vision: Principles, Algorithms, Applications, Learning (Academic Press, 2017), pp. 347–357.

19. L. C. Andrews and R. L. Phillips, Laser Beam Propagation through Random Media (SPIE Press, 1998).

20. https://docs.python.org/2/library/multiprocessing.html

21. https://docs.python.org/3.8/library/multiprocessing.shared_memory.html

22. I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning (The MIT Press, 2016), pp. 542–550.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (24)

Fig. 1.
Fig. 1. Architecture of CNN.
Fig. 2.
Fig. 2. Architecture of Model PCL.
Fig. 3.
Fig. 3. Equipment for experiment.
Fig. 4.
Fig. 4. Google map for local Area.
Fig. 5.
Fig. 5. Images of Light Spots.
Fig. 6.
Fig. 6. Off-Line one time-step training and testing.
Fig. 7.
Fig. 7. Off-line training and testing for a two time-step prediction model.
Fig. 8.
Fig. 8. Two time-step training and testing curves (for the 1st time-step).
Fig. 9.
Fig. 9. Two time-step training and testing curves (for the 2nd time-step).
Fig. 10.
Fig. 10. Testing curves for the dual time-step model.
Fig. 11.
Fig. 11. Off-line training and testing for a four time-step prediction model.
Fig. 12.
Fig. 12. Four time-step training and testing curves (for the 1st time-step).
Fig. 13.
Fig. 13. Four time-step training and testing curves (for the 2nd time-step).
Fig. 14.
Fig. 14. Four time-step training and testing curves (for the 3rd time-step).
Fig. 15.
Fig. 15. Four time-step training and testing curves (for the 4th time-step).
Fig. 16.
Fig. 16. Testing errors for the four time-step training and testing model.
Fig. 17.
Fig. 17. Distributed computing for high frequency on-line process.
Fig. 18.
Fig. 18. Data arrangement for multi-process on-line training and prediction.
Fig. 19.
Fig. 19. On-line training and testing error for a single time-step model.
Fig. 20.
Fig. 20. On-line training and testing error compared with standard deviation of centroid position (500∼1000s).
Fig. 21.
Fig. 21. On-line training and testing error for a two time-step prediction model
Fig. 22.
Fig. 22. On-line training and testing error for two time-step training and prediction (750∼1500s) together with SD for centroid positions.
Fig. 23.
Fig. 23. On-line training and testing for a four time-step prediction model.
Fig. 24.
Fig. 24. On-line training and testing error for four time-step training and prediction process (3000∼4000s) together with SD for centroid positions.

Tables (1)

Tables Icon

Table 1. Equipment for on-line process.

Equations (30)

Equations on this page are rendered with MathJax. Learn more.

P [ x C ( t m + 1 ) , x C ( t m + 2 ) , , x C ( t m + k ) | I ( x ( t m ) ) , I ( x ( t m 1 ) ) , , I ( x ( t m n + 1 ) ) ]
X C = i , j ( i × I ( i , j ) ) i , j I ( i , j ) , Y C = i , j ( j × I ( i , j ) ) i , j I ( i , j ) .
X ( t m ) = F ( t m ) X ( t m 1 ) + B ( t m ) U ( t m ) + W ( t m )
R ( x , y , l , t m ) = A ( x , y , t m ) exp [ i Φ ( x , y , t m ) ] ,
Ψ ( x , y , t m ) = F x , y 1 { F ξ , η [ R ( x , y , l , t m ) ] T ( ξ , η ) }
I ( x i , y j , t m ) = S i , j [ Ψ ( x , y , 0 , t m ) Ψ ( x , y , 0 , t m ) ]
n ( x , y , 0 , t m ) = K ψ ( x , y , 0 , t m ) ψ ( x , y , 0 , t m )
I ( x i , y j , t m ) = S i , j [ n ( x , y , 0 , t m ) ]
I ( x i , y j , t m + 1 ) = I ( x i , y j , t m ) + Δ t d I ( x i , y j , t m ) d t + O I ( Δ t ) = I ( x i , y j , t m ) + Δ t I ( x i , y j , t m ) x x ˙ i + Δ t I ( x i , y j , t m ) y y ˙ j + Δ t I ( x i , y j , t m ) t + O I ( Δ t )
n ( x , y , 0 , t m + 1 ) = n ( x , y , 0 , t m ) + Δ t n ( x , y , 0 , t m ) x x ˙ + Δ t n ( x , y , 0 , t m ) y y ˙ + Δ t n ( x , y , 0 , t m ) t + O n ( Δ t ) .
I ( x i , y j , t m ) x x ˙ i = S i , j [ n ( x , y , 0 , t m ) x x ˙ ]
I ( x i , y j , t m ) y y ˙ j = S i , j [ n ( x , y , 0 , t m ) y y ˙ ]
I ( x i , y j , t m ) t = S i , j [ n ( x , y , 0 , t m ) t ]
I ( x i , y j , t m s ) x = I ( x i + 1 , y j , t m s ) I ( x i , y j , t m s ) ( i < s x 1 )
I ( x i , y j , t m s ) y = I ( x i , y j + 1 , t m s ) I ( x i , y j , t m s ) ( y < s y 1 )
I ( x i , y j , t m s ) t = I ( x i , y j , t m s ) I ( x i , y j , t m s 1 )
[ V x , V y ] = V D O F [ I ( t m s ) , I ( t m s 1 ) ]
U 4 ( t m s + 1 ) = C N N [ I ( t m s ) x , I ( t m s ) y , I ( t m s ) t , V x ( t m s ) , V y ( t m s ) , I ( t m s ) ]
F 4 n ( t m , t m 1 , , t m n + 1 ) = r e s h a p e [ U 4 T ( t m ) , U 4 T ( t m 1 ) , , U 4 T ( t m n + 1 ) ] ,
f s = σ g [ W f x s + U f h s 1 + b f ] i s = σ g ( W i x s + U i h s 1 + b i ) o s = σ g ( W o x s + U o h s 1 + b o ) c s = f s c s 1 + i s σ c ( W c x s + U c h s 1 + b c ) h s = o s σ h ( c s )
Q ( t m + k , t m + k 1 , , t m + 1 ) = L S T M [ F 4 n ( t m , t m 1 , , t m n + 1 ) ] ,
Q ( t m + k , t m + k 1 , , t m + 1 ) = [ X ( t m + k ) , X ( t m + k 1 ) , , X ( t m + 1 ) ] ,
P [ Q ( t m + k , t m + k 1 , , t m + 1 ) | I ( t m ) , I ( t m 1 ) , I ( t m 2 ) , , I ( t m n ) ] = P [ Q ( t m + k , t m + k 1 , , t m + 1 ) | U 4 ( t m + 1 ) , U 4 ( t m ) , U 4 ( t m 1 ) , , U 4 ( t m n + 2 ) ] P [ U 4 ( t m + 1 ) , U 4 ( t m ) , U 4 ( t m 1 ) , , U 4 ( t m n + 2 ) | I ( t m ) , I ( t m 1 ) , I ( t m 2 ) , , I ( t m n ) ] = P [ Q ( t m + k , t m + k 1 , , t m + 1 ) | U 4 ( t m + 1 ) , U 4 ( t m ) , U 4 ( t m 1 ) , , U 4 ( t m n + 2 ) ] P [ U 4 ( t m + 1 ) , U 4 ( t m ) , U 4 ( t m 1 ) , , U 4 ( t m n + 2 ) | ( I ( t m ) , I ( t m 1 ) ) , ( I ( t m 1 ) , I ( t m 2 ) ) , , ( I ( t m n + 1 ) , I ( t m n ) ) ] = P [ Q ( t m + k , t m + k 1 , , t m + 1 ) | U 4 ( t m + 1 ) , U 4 ( t m ) , U 4 ( t m 1 ) , , U 4 ( t m n + 2 ) ] P [ U 4 ( t m + 1 ) | ( I ( t m ) , I ( t m 1 ) ) ] P [ U 4 ( t m ) | ( I ( t m 1 ) , I ( t m 2 ) ) ] P [ U 4 ( t m n + 2 ) | ( I ( t m n + 1 ) , I ( t m n ) ) ] ,
L 1 = i b a t c h [ ( x i , 1 X i , 1 ) 2 + ( y i , 1 Y i , 1 ) 2 ]
L 2 = i b a t c h [ ( x i , 2 X i , 2 ) 2 + ( y i , 2 Y i , 2 ) 2 ]
L m = i b a t c h [ ( x i , m X i , m ) 2 + ( y i , m Y i , m ) 2 ]
L o s s = n = 1 m L n + i , j = 1 i j m ( L i L j ) 2 .
L b i n d i n g = i , j = 1 i j m ( L i L j ) 2 .
α = ( p i x e l s i z e ) ( F o c a l L e n g t h ) = 10.3 μ r a d
β = m α
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.