Optical image centroid prediction based on machine learning for laser satellite communication

Liying Tan; Yubin Cao; Jing Ma; Kangning Li

doi:10.1364/OE.27.026615

1. Introduction

Laser communication has been considered as a powerful candidate for satellite communication due to its high-speed and large capacity [1]. There are all kinds of technologies required for laser satellite communication, among which tracing the optical signal is one of the critical. A telescope under automatic control and a camera connected with a computer are usually used for both transmitting and receiving laser signals. The algorithm for tracing the centroid of the optical image is utilized to maintain the stability of the laser link. Post feedback control is more widespread used than others in current experiments on laser satellite communication. Due to the complicity and strong fluctuations of atmosphere turbulence it is always a challenging task to trace optical signals in stability especially in the condition of bad weather. On the other hand, machine learning is an extremely hot topic in recent years. The basic ideas on neural networks were already proposed decades ago, however, researchers did not get great achievement until this decade. This is mainly because of newly developed hardware such as GPU and CPU for high speed calculations. By reasonable designing and well training, a machine learning system can become a very efficient tool for fitting a nonlinear system to find results or make decisions which were unable to achieve by utilizing linear algorithms before [2,3]. Scientists in different areas are still developing machine learning technologies fast. In the area of optics for light propagates through inhomogeneous medium, there is already some interesting work on phase retrieval wavefront sensing by applying convolutional neural networks [4]. In this paper we turn to the field on light propagates in random medium. A model is developed to trace the centroid of optical images for laser satellite communication.

Optical signals from satellites are always fluctuated because of disturbances from atmosphere turbulence as well as oscillations of satellite platforms. Computational fluent dynamics can be applied in aerodynamics to get results as accurate as possible by solving Navier Stokes Equations numerically. However, the numerical calculation consumes too much computing power to get real time results. Statistical theory is utilized for analyzing the scintillation as well as decoherence for light propagating through random medium, with the shortcoming that it is difficult to get reliable results to do some real time predictions for engineering requirements. Machine learning methods have more potentialities to solve these problems. There are both newly developed deep neural networks and some traditional machine learning methods available to do this job. Among the traditional machine learning methods, we choose extended Kalman filter to create labels (in Section 2.1), and probabilistic graphical model to describe our model architecture (in Section 2.5). Besides, some classic computer vision techniques are utilized to obtain optical and kinetic information on light fields (in Section 2.2). As for neural networks, CNN (convolutional neural networks) has strong ability to fit nonlinear system (in Section 2.3), and LSTM is an efficient tool for predictions on signals in time series (in Section 2.4). Hence, we combine CNN with LSTM to get a powerful architecture for the nonlinear random process, light propagates through atmosphere medium.

Though the relation between neural networks and probabilistic graph model goes beyond the main contents of this paper, it should be argued that a neural network’s goal is, in some sense, to estimate the likelihood for a Bayesian network. Correspondingly speaking, to find the best model weights for a Neural Network is equivalent to maximize the likelihood estimation for a Bayesian Network [5].

2. Theory

For the sake of tracing optical signals, our final task is to do real time prediction on centroid positions for optical images which will come at future times, according to previously obtained images. The probabilistic graphic model is a conditional random field shown as

(1)$$P[{{{\vec{x}}_\textrm{C}}({{t_{m + 1}}} ),{{\vec{x}}_\textrm{C}}({{t_{m + 2}}} ),\ldots ,{{\vec{x}}_\textrm{C}}({{t_{m + k}}} )|I({x({t_m})} ),I({x({t_{m - 1}})} ),\ldots ,I({x({t_{m - n + 1}})} )} ]$$

here, m is an integer, assume the system is currently at time ${t_m}$, $I({{t_m}} ),I({{t_{m - 1}}} ),\ldots ,I({{t_{m - n + 1}}} )$ are images respectively obtained at time ${t_m},{t_{m - 1}},\ldots ,{t_{m - n + 1}}$, ${\vec{x}_\textrm{C}}({{t_{m + 1}}} ),{\vec{x}_\textrm{C}}({{t_{m + 2}}} ),\ldots ,{\vec{x}_\textrm{C}}({{t_{m + k}}} )$ are predicted centroid positions for images obtained at time ${t_{m + 1}},{t_{m + 2}},\ldots ,{t_{m + k}}$.

2.1 Label creation

As it is well known that deep neural networks based on supervised learning can be efficient tools for predictions only if it is well trained with reliable data. Hence, before talking about more details on newly developed models, we need firstly describe how to create labels for the supervised learning process. Usually, the centroid position of an image can be calculated with the formulation as below [6]

(2)$$\begin{array}{l} {\textrm{X}_C} = \frac{{\sum\limits_{i,j} {({i \times I({i,j} )} )} }}{{\sum\limits_{i,j} {I({i,j} )} }},\\ {Y_C} = \frac{{\sum\limits_{i,j} {({j \times I({i,j} )} )} }}{{\sum\limits_{i,j} {I({i,j} )} }}. \end{array}$$

in which, $I({i,j} )$ is the gray intensity at the position $({i,j} )$ in the image, and ${X_C},{Y_C}$ are respectively X and Y coordinates of the centroid position. Due to the disturbances of environment light, as well as the noise of a CMOS detector, we use a Gaussian filter on each image, and an extended Kalman filter (EKF) [7] to estimate the current centroid position of the captured optical image.

As a recursive estimator firstly developed in the middle of last century, Kalman Filter has been proved to be efficient for object tracing disturbed by Gaussian noise. With the similar structure of Kalman Filter, EKF is widely used as a typical tool for estimations in nonlinear system which assumes the true state at time t is evolved from the state at (t−1) according to [7]

(3)$$X({{t_m}} )= F({{t_m}} )X({{t_{m - 1}}} )+ B({{t_m}} )U({{t_m}} )+ W({{t_m}} )$$

where F(t_m) is the state transition model, B(t_m) is the control-input mode, W(t_m) is the process noise. Given that our task is to build a model for centroid position prediction according to previously obtained images in time series, from the time t_m_-n+1 to t_m₊₁ we get n + 1 images, and n + 1 centroid positions are calculated by utilizing Eq. (2), and then n centroid positions for images obtained from the time t_m_-n+2 to t_m₊₁ are estimated by applying EKF in order to get results in a higher accuracy. These centroid positions are used as labels for both training and testing.

It can be pointed out that before our current model is developed, we have built a Model I and a Model II to predict centroid positions, in which CNN is utilized in Model I, and Model II combines CNN with LSTM. Our current model contains optical flow [8], CNN, LSTM [9], as well as some techniques in digital image processing [10]. However, neither Model I nor Model II gives ideal convergence. Because temporal correlations are not sufficiently accounted by CNN, so that no acceptable convergence was obtained by Model I. By combining CNN with LSTM, some acceptable results can be obtained by Model II, but without enough physical meaning contained in this model, it is difficult to do multiple time-step prediction. Hence, a Model III is developed with significant improvement which contains three modules, including a preprocessing module (P), a CNN module (C) and a LSTM module (L). To be convenient, the model is named by Model III or Model PCL in this paper.

2.2 Preprocessing module

Different from many deep learning models, a preprocessing module is applied before data processing in neural networks to extract important features from the view of kinetics and optics. And since these kinetic and optical features are extracted based on sophisticated computer vision techniques, the algorithm is expected to be robust for working under different weather conditions. Assume there is a coordinate $Oxyz$ fixed on the CMOS, with the origin on the position of pixel (0,0) of CMOS, and $Ox$ along the width, $Oy$ along the height. Given that the entrance pupil of the receiving telescope is parallel to CMOS, the wave function of light field on entrance presented by $R({x,y,l,{t_m}^\prime } )$ is shown as below

(4)$$R({x,y,l,{t_m}^\prime } )= A^{\prime}({x,y,{t_m}^\prime } )\exp [{ - i\Phi^{\prime}({x,y,{t_m}^\prime } )} ],$$

where ${t_m}^\prime$ is the time for telescope entrance to receive light signals, $l$ is the length of telescope which is also the distance between the entrance and CMOS surface, $A^{\prime}({x,y,{t_m}^\prime } )$ and $\Phi ^{\prime}({x,y,{t_m}^\prime } )$ are respectively the amplitude and the phase distribution at time ${t_m}^\prime$. Then the wave function of light field on CMOS can be computed by [11]

(5)$$\Psi ({x,y,{t_m}} )= F_{x,y}^{ - 1}\{{{F_{\xi ,\eta }}[{R({x,y,l,{t_m}^\prime } )} ]T({\xi ,\eta } )} \}$$

in which, $T({\xi ,\eta } )$ is the optical transferring function in frequency domain for the telescope, ${F_{\xi ,\eta }}(\cdot )$ is Fourier transform from the space domain $({x,y} )$ to the frequency domain $({\xi ,\eta } )$, and $F_{x,y}^{ - 1}(\cdot )$ is inverse Fourier transform from frequency domain $({\xi ,\eta } )$ to space domain $({x,y} )$. And ${t_m}$ is the time for CMOS to receive light signals, then we have the relation ${t_m} = {t_m}^\prime + l/c$. The digital image obtained in computer can be presented as [10]

(6)$$I({{x_i},{y_j},{t_m}} )= {S_{i,j}}[{\Psi ({x,y,0,{t_m}} ){\Psi ^\ast }({x,y,0,{t_m}} )} ]$$

where the operator ${S_{i,j}}(\cdot )$ includes sampling, AD conversion, local integral, calibration and some other process as well, here ${S_{i,j}}$ can be approximately considered as a linear operator in a normal camera working state, especially for optical signals neither too strong nor too weak [12], which fits the situation for laser satellite communication. $I({{x_i},{y_j},{t_m}} )$ is the digital image grayscale value obtained at the discrete spacetime $({{x_i},{y_j},{t_m}} )$ . Hence, the digital image obtained at time ${t_m}$ can be considered as the projection of a distribution in an infinite dimensional space on a matrix space ${I_{{\textrm{s}_x} \times {s_y},m}}$, in which ${s_x}$ and ${s_y}$ are respectively the height and the width of the digital image (depending on the camera resolution), m is corresponding to time ${t_m}$. According to photon theory in modern optics [13], there is the equation

(7)$$n({x,y,0,{t_m}} )= K\psi ({x,y,0,{t_m}} ){\psi ^\ast }({x,y,0,{t_m}} )$$

in which $n({x,y,0,{t_m}} )$ is the photon density distribution on CMOS surface, K is a constant related with normalization. Hence, we have the equation of $I({{x_i},{y_j},{t_m}} )$ about the photon density $n({x,y,0,{t_m}} )$

(8)$$I({{x_i},{y_j},{t_m}} )= {S_{i,j}}[{n({x,y,0,{t_m}} )} ]$$

Assume we are trying to predict the image intensity $I({{x_i},{y_j},{t_{m + 1}}} )$ at time ${t_{m + 1}}$ according to the intensity $I({{x_i},{y_j},{t_m}} )$ at time ${t_m}$, we can use first order expansion as below

(9)$$\begin{array}{l} I({{x_i},{y_j},{t_{m + 1}}} )= I({{x_i},{y_j},{t_m}} )+ \Delta t\frac{{dI({{x_i},{y_j},{t_m}} )}}{{dt}} + {O_I}({\Delta t} )\\ = I({{x_i},{y_j},{t_m}} )+ \Delta t\frac{{\partial I({{x_i},{y_j},{t_m}} )}}{{\partial x}}{{\dot{x}}_i} + \Delta t\frac{{\partial I({{x_i},{y_j},{t_m}} )}}{{\partial y}}{{\dot{y}}_j} + \Delta t\frac{{\partial I({{x_i},{y_j},{t_m}} )}}{{\partial t}} + {O_I}({\Delta t} )\end{array}$$

in which, $\Delta t$ is the time interval between two neighboring frames, ${O_I}({\Delta t} )$ contains terms with orders higher than one, time variants ${\dot{x}_i} = dx/dt$ and ${\dot{y}_i} = dy/dt$ are respectively velocities along x and y directions, derivatives $\partial I({{x_i},{y_j},{t_m}} )/\partial x$ and $\partial I({{x_i},{y_j},{t_m}} )/\partial y$ are respectively image grayscale gradients respectively along x and y direction. Similarly, the photon density distribution at time ${t_{m + 1}}$ can be presented as below [14]

(10)$$\begin{array}{l} n({x,y,0,{t_{m + 1}}} )= n({x,y,0,{t_m}} )+ \Delta t\frac{{\partial n({x,y,0,{t_m}} )}}{{\partial x}}\dot{x} + \Delta t\frac{{\partial n({x,y,0,{t_m}} )}}{{\partial y}}\dot{y}\\ + \Delta t\frac{{\partial n({x,y,0,{t_m}} )}}{{\partial t}} + {O_n}({\Delta t} ). \end{array}$$

Since ${S_{i,j}}$ can be considered as a linear operator, according to Eqs. (9) and (10), we have

(11)$$\frac{{\partial I({{x_i},{y_j},{t_m}} )}}{{\partial x}}{\dot{x}_i} = {S_{i,j}}\left[ {\frac{{\partial n({x,y,0,{t_m}} )}}{{\partial x}}\dot{x}} \right]$$

here the left side presents motions at the space time $({{x_i},{y_j},{t_m}} )$ along x direction in the image, the x component of grayscale gradient $\partial I({{x_i},{y_j},{t_m}} )\textrm{/}\partial x$ gives the weight of moving along x direction in the image at corresponding spacetime. Similarly, the right side of Eq. (11) indicates photon motions along x direction on CMOS surface, where $\dot{x}$ is the ‘statistical velocity’ which indicates the variations of photon distributions along x axis. And $\partial n({x,y,0,{t_m}} )\textrm{/}\partial x$ shows the photon distribution gradient along x direction at spacetime $({x,y,0,{t_m}} )$. In accordance with the well-known knowledge that in a flow the region with higher density gradients contributes more to motions and fluctuations. Similarly, we can get Eq. (12), which indicates the relation between motions in the image along y axis and photon distribution variations along the same direction on CMOS as below

(12)$$\frac{{\partial I({{x_i},{y_j},{t_m}} )}}{{\partial y}}{\dot{y}_j} = {S_{i,j}}\left[ {\frac{{\partial n({x,y,0,{t_m}} )}}{{\partial y}}\dot{y}} \right]$$

Besides, we also have Eq. (13) presented as

(13)$$\frac{{\partial I({{x_i},{y_j},{t_m}} )}}{{\partial t}} = {S_{i,j}}\left[ {\frac{{\partial n({x,y,0,{t_m}} )}}{{\partial t}}} \right]$$

which indicates the relation between temporal derivatives in images (a video) at spacetime $({{x_i},{y_j},{t_m}} )$ and photon density increments or decrements at corresponding spacetime on CMOS surface. The spacetime where and when the increment or decrement of photons occurs, can be considered as the existence of a virtual source or sink for photons (during certain sampling period) on CMOS surface.

The term ${O_I}({\Delta t} )$ in Eq. (9) contains combinations of products of higher order derivations, including ${\partial ^n}I({{x_i},{y_j},{t_m}} )\textrm{/}\partial {x^n}$, ${\partial ^n}I({{x_i},{y_j},{t_m}} )\textrm{/}\partial {y^n}$, ${\partial ^n}I({{x_i},{y_j},{t_m}} )\textrm{/}\partial {t^n}$, ${d^n}{x_i}\textrm{/}d{t^n}$, ${d^n}{y_j}\textrm{/}d{t^n}$, $({n \ge 2} )$. Based on discrete numerical analysis methods, all these high order derivatives can be represented by combinations of products of first order derivations, which include $\partial I({{x_i},{y_j},{t_m}} )\textrm{/}\partial x$, $\partial I({{x_i},{y_j},{t_m}} )\textrm{/}\partial y$, $\partial I({{x_i},{y_j},{t_m}} )\textrm{/}\partial t$, $d{x_i}/dt$, $d{y_j}/dt$.

According to Eq. (9), by considering it as a simple Markov process, if the time interval $\Delta t$ between capturing neighboring images is small enough, then it is possible to predict the image $I({{x_i},{y_j},{t_{m + 1}}} )$ according to the current image $I({{x_i},{y_j},{t_m}} )$ together with the previous image $I({{x_i},{y_j},{t_{m - 1}}} )$. However, due to strong nonlinearity and complicity of turbulence, necessary time interval for doing this prediction is far below the interval between two neighboring frames. It is infeasible to do reliable prediction only based on Eq. (9). Hence statistical learning will be used by taking the account of a group of images in time series (introduced in Section 2.4). Assume we are going to do prediction on the centroid position of the image at time ${t_{m + 1}}$ according to the (n + 1) images $I({{t_m}} ),I({{t_{m - 1}}} ),\ldots ,I({{t_{m - n}}} )$, which are considered strongly correlated with the image $I({{t_{m + 1}}} )$, where n is an integer in compromise with the reliability of prediction and computing efficiency.

In the preprocessing module, for each couple of images obtained at time ${t_{m - s}}$, ${t_{m - s - 1}}$ $({s = 0,1,\ldots ,({n - 1} )} )$, digital image processing and computer vision techniques are utilized to get some first order derivative features, in which, $\partial I({{x_i},{y_j},{t_{m - s}}} )/\partial x$ and $\partial I({{x_i},{y_j},{t_{m - s}}} )/\partial y$ can be easily obtained by

(14)$$\frac{{\partial I({{x_i},{y_j},{t_{m - s}}} )}}{{\partial x}} = I({{x_{i + 1}},{y_j},{t_{m - s}}} )- I({{x_i},{y_j},{t_{m - \textrm{s}}}} )({i < {s_x} - 1} )$$

(15)$$\frac{{\partial I({{x_i},{y_j},{t_{m - s}}} )}}{{\partial y}} = I({{x_i},{y_{j + 1}},{t_{m - s}}} )- I({{x_i},{y_j},{t_{m - s}}} )({y < {s_y} - 1} )$$

which respectively show contours of images along x and y directions [10]. The derivative on time is usually obtained by

(16)$$\frac{{\partial I({{x_i},{y_j},{t_{m - s}}} )}}{{\partial t}} = I({{x_i},{y_j},{t_{m - s}}} )- I({{x_i},{y_j},{t_{m - s - 1}}} )$$

During every sampling period on CMOS, the spacetime $({{x_i},{y_j},{t_{m - s}}} )$ can be regarded as a photon source if $\partial I({{x_i},{y_j},{t_{m - s}}} )/\partial t > \textrm{0}$, and it can be considered as a photon sink if $\partial I({{x_i},{y_j},{t_{m - s}}} )/\partial t < \textrm{0}$. As for velocities $d{x_i}/dt$ and $d{y_j}/dt$, they can be computed by dense optical flow with two neighboring images $I({{t_{m - s}}} ),I({{t_{m - s - 1}}} )$. Dense optical flow is an algorithm invented by Gunner Farneback in 2003 [8] proved to be efficient for tracing moving objects and computing velocities [15]. There is already some research on tracing optical signals with optical flow for inter-satellite laser communication [16]. Here in our research, dense optical flow is applied to measure variations between neighboring optical images disturbed by both satellite motions and atmosphere disturbances. The process is shown here as

(17)$$[{{V_x},{V_y}} ]= VDOF[{I({{t_{m - s}}} ),I({{t_{m - s - 1}}} )} ]$$

where ${V_x} = {({\dot{x}({i,j,{t_{m - s}}} )} )_{{s_x} \times {s_y}}}$ and ${V_y} = {({\dot{y}({i,j,{t_{m - s}}} )} )_{{s_x} \times {s_y}}}$ are two dimensional velocity fields respectively along x and y axis, ${s_x}$ and ${s_y}$ are width and height of the captured images as defined before, and VDOF is an operator utilized to obtain velocity fields with dense optical flow [17]. According to the description above, in preprocessing module, derivatives $\partial I({{t_{m - s}}} )\textrm{/}\partial x$, $\partial I({{t_{m - s}}} )\textrm{/}\partial y$ respectively show contours of image $I({{t_{m - s}}} )$ along x and y directions [18] which are related with density gradients of photon distributions on CMOS at corresponding spacetime. Time variational fields $\partial I({{t_{m - s}}} )/\partial t$ are derived from increments or decrements of photon density distributions on CMOS. Velocity fields ${V_x}({{t_{m - s}}} )$ and ${V_y}({{t_{m - s}}} )$ are calculated by utilizing dense optical flow with two neighboring images $I({{t_{m - s}}} )$, $I({{t_{m - s - 1}}} )$, which respectively indicate photon motions along x and y directions on CMOS. Together with image $I({{t_{m - s}}} )$, six matrices will be obtained in total by carrying out calculations discussed above. According to Eqs. (9) and (10), these matrices contain extremely useful information for predictions on next optical images.

2.3 CNN module

Figure 1 shows the architecture of CNN module. It has a 6 channel input which is to accept data from the preprocessing module. Since the final task is to predict centroid positions for coming images, there is no need to predict the whole image, instead, at the time ${t_{m - s}}$ we are going to predict a feature vector about the image $I({{t_{m - s + 1}}} )$ by utilizing convolutional neural networks. As shown in Fig. 1, CNN contains nine convolutional layers, a dropout layer and two fully connected layers. A four-dimension vector is obtained as the output of CNN module shown as below,

(18)$${\vec{U}_4}({{t_{m - s + 1}}} )= CNN\left[ {\frac{{\partial I({{t_{m - s}}} )}}{{\partial x}},\frac{{\partial I({{t_{m - s}}} )}}{{\partial y}},\frac{{\partial I({{t_{m - s}}} )}}{{\partial t}},{V_x}({{t_{m - s}}} ),{V_y}({{t_{m - s}}} ),I({{t_{m - s}}} )} \right]$$

where ${\vec{U}_4}({{t_{m - s + 1}}} )$ is expected to be a feature vector related with centroid position and velocity for the image at time ${t_{m - s + 1}}$. After every two neighboring images are processed by CNN, we obtain $n$ 4d feature vectors ${\vec{U}_4}({{t_m}} ),{\vec{U}_4}({{t_{m - 1}}} ),\ldots ,{\vec{U}_4}({{t_{m - n + 1}}} )$. Due to the entanglement between different orientations, as well as correlations among data points sampled at different spacetime, all the n 4d vectors are grouped to get a $4 \times n$ matrix and then reshaped into a $4n$ dimension vector, the process is shown as Eq. (19)

(19)$${\vec{F}_{4n}}({{t_m},{t_{m - 1}},\ldots ,{t_{m - n + 1}}} )= reshape[{{{\vec{U}}_4}^T({{t_m}} ),{{\vec{U}}_4}^T({{t_{m - 1}}} ),\ldots ,{{\vec{U}}_4}^T({{t_{m - n + 1}}} )} ],$$

where ${\vec{F}_{4n}}$ is a 4n dimension vector which will be input to LSTM with a single layer input channel.

Fig. 1. Architecture of CNN.

Equipment	Parameter
Emitter	Telescope Aperture diameter: 50mm
	Laser wavelength: 807nm
	Diverging Anger: 800µrad
Cassegrain telescope	Primary mirror diameter: 127mm
	Secondary mirror diameter: 39.4mm
	Focal Length: 650mm
	FOV: 1.2mrad
Camera	Resolution: 64×64
	Pixel size: 6.7µm
	FPS: 550
	1394 grabber-card
Computer	GPU: GTX 2070
	CPU: I7 9700K

Abstract

1. Introduction

2. Theory

2.1 Label creation

2.2 Preprocessing module

2.3 CNN module

2.4 LSTM module

2.5 Description on Model PCL as an integrity

3. Experiment

3.1 Off-line training

3.2 On-line training and prediction

4. Conclusion

References

Cited By

Figures (24)

Tables (1)

Equations (30)

Optics Express