Abstract
Optical image tracing is one of key technologies to realize and maintain satellite-to-ground laser communication. Since machine learning has been proved to be a powerful tool for modeling nonlinear system, a model containing a preprocessing module, a CNN module (Convolutional Neural Network Module) as well as a LSTM module (Long-Short Term Neural Network Memory Module) was developed to process digital images in time series and then predict centroid positions under the influence of atmospheric turbulence. Different from most previous models composed of neural networks, some important physical situations are considered for light fields distributed on CMOS. By building and training this model, centroid positions can be predicted in real time for practical applications in laser satellite communication.
© 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement
1. Introduction
Laser communication has been considered as a powerful candidate for satellite communication due to its high-speed and large capacity [1]. There are all kinds of technologies required for laser satellite communication, among which tracing the optical signal is one of the critical. A telescope under automatic control and a camera connected with a computer are usually used for both transmitting and receiving laser signals. The algorithm for tracing the centroid of the optical image is utilized to maintain the stability of the laser link. Post feedback control is more widespread used than others in current experiments on laser satellite communication. Due to the complicity and strong fluctuations of atmosphere turbulence it is always a challenging task to trace optical signals in stability especially in the condition of bad weather. On the other hand, machine learning is an extremely hot topic in recent years. The basic ideas on neural networks were already proposed decades ago, however, researchers did not get great achievement until this decade. This is mainly because of newly developed hardware such as GPU and CPU for high speed calculations. By reasonable designing and well training, a machine learning system can become a very efficient tool for fitting a nonlinear system to find results or make decisions which were unable to achieve by utilizing linear algorithms before [2,3]. Scientists in different areas are still developing machine learning technologies fast. In the area of optics for light propagates through inhomogeneous medium, there is already some interesting work on phase retrieval wavefront sensing by applying convolutional neural networks [4]. In this paper we turn to the field on light propagates in random medium. A model is developed to trace the centroid of optical images for laser satellite communication.
Optical signals from satellites are always fluctuated because of disturbances from atmosphere turbulence as well as oscillations of satellite platforms. Computational fluent dynamics can be applied in aerodynamics to get results as accurate as possible by solving Navier Stokes Equations numerically. However, the numerical calculation consumes too much computing power to get real time results. Statistical theory is utilized for analyzing the scintillation as well as decoherence for light propagating through random medium, with the shortcoming that it is difficult to get reliable results to do some real time predictions for engineering requirements. Machine learning methods have more potentialities to solve these problems. There are both newly developed deep neural networks and some traditional machine learning methods available to do this job. Among the traditional machine learning methods, we choose extended Kalman filter to create labels (in Section 2.1), and probabilistic graphical model to describe our model architecture (in Section 2.5). Besides, some classic computer vision techniques are utilized to obtain optical and kinetic information on light fields (in Section 2.2). As for neural networks, CNN (convolutional neural networks) has strong ability to fit nonlinear system (in Section 2.3), and LSTM is an efficient tool for predictions on signals in time series (in Section 2.4). Hence, we combine CNN with LSTM to get a powerful architecture for the nonlinear random process, light propagates through atmosphere medium.
Though the relation between neural networks and probabilistic graph model goes beyond the main contents of this paper, it should be argued that a neural network’s goal is, in some sense, to estimate the likelihood for a Bayesian network. Correspondingly speaking, to find the best model weights for a Neural Network is equivalent to maximize the likelihood estimation for a Bayesian Network [5].
2. Theory
For the sake of tracing optical signals, our final task is to do real time prediction on centroid positions for optical images which will come at future times, according to previously obtained images. The probabilistic graphic model is a conditional random field shown as
2.1 Label creation
As it is well known that deep neural networks based on supervised learning can be efficient tools for predictions only if it is well trained with reliable data. Hence, before talking about more details on newly developed models, we need firstly describe how to create labels for the supervised learning process. Usually, the centroid position of an image can be calculated with the formulation as below [6]
As a recursive estimator firstly developed in the middle of last century, Kalman Filter has been proved to be efficient for object tracing disturbed by Gaussian noise. With the similar structure of Kalman Filter, EKF is widely used as a typical tool for estimations in nonlinear system which assumes the true state at time t is evolved from the state at (t−1) according to [7]
where F(tm) is the state transition model, B(tm) is the control-input mode, W(tm) is the process noise. Given that our task is to build a model for centroid position prediction according to previously obtained images in time series, from the time tm-n+1 to tm+1 we get n + 1 images, and n + 1 centroid positions are calculated by utilizing Eq. (2), and then n centroid positions for images obtained from the time tm-n+2 to tm+1 are estimated by applying EKF in order to get results in a higher accuracy. These centroid positions are used as labels for both training and testing.It can be pointed out that before our current model is developed, we have built a Model I and a Model II to predict centroid positions, in which CNN is utilized in Model I, and Model II combines CNN with LSTM. Our current model contains optical flow [8], CNN, LSTM [9], as well as some techniques in digital image processing [10]. However, neither Model I nor Model II gives ideal convergence. Because temporal correlations are not sufficiently accounted by CNN, so that no acceptable convergence was obtained by Model I. By combining CNN with LSTM, some acceptable results can be obtained by Model II, but without enough physical meaning contained in this model, it is difficult to do multiple time-step prediction. Hence, a Model III is developed with significant improvement which contains three modules, including a preprocessing module (P), a CNN module (C) and a LSTM module (L). To be convenient, the model is named by Model III or Model PCL in this paper.
2.2 Preprocessing module
Different from many deep learning models, a preprocessing module is applied before data processing in neural networks to extract important features from the view of kinetics and optics. And since these kinetic and optical features are extracted based on sophisticated computer vision techniques, the algorithm is expected to be robust for working under different weather conditions. Assume there is a coordinate $Oxyz$ fixed on the CMOS, with the origin on the position of pixel (0,0) of CMOS, and $Ox$ along the width, $Oy$ along the height. Given that the entrance pupil of the receiving telescope is parallel to CMOS, the wave function of light field on entrance presented by $R({x,y,l,{t_m}^\prime } )$ is shown as below
The term ${O_I}({\Delta t} )$ in Eq. (9) contains combinations of products of higher order derivations, including ${\partial ^n}I({{x_i},{y_j},{t_m}} )\textrm{/}\partial {x^n}$, ${\partial ^n}I({{x_i},{y_j},{t_m}} )\textrm{/}\partial {y^n}$, ${\partial ^n}I({{x_i},{y_j},{t_m}} )\textrm{/}\partial {t^n}$, ${d^n}{x_i}\textrm{/}d{t^n}$, ${d^n}{y_j}\textrm{/}d{t^n}$, $({n \ge 2} )$. Based on discrete numerical analysis methods, all these high order derivatives can be represented by combinations of products of first order derivations, which include $\partial I({{x_i},{y_j},{t_m}} )\textrm{/}\partial x$, $\partial I({{x_i},{y_j},{t_m}} )\textrm{/}\partial y$, $\partial I({{x_i},{y_j},{t_m}} )\textrm{/}\partial t$, $d{x_i}/dt$, $d{y_j}/dt$.
According to Eq. (9), by considering it as a simple Markov process, if the time interval $\Delta t$ between capturing neighboring images is small enough, then it is possible to predict the image $I({{x_i},{y_j},{t_{m + 1}}} )$ according to the current image $I({{x_i},{y_j},{t_m}} )$ together with the previous image $I({{x_i},{y_j},{t_{m - 1}}} )$. However, due to strong nonlinearity and complicity of turbulence, necessary time interval for doing this prediction is far below the interval between two neighboring frames. It is infeasible to do reliable prediction only based on Eq. (9). Hence statistical learning will be used by taking the account of a group of images in time series (introduced in Section 2.4). Assume we are going to do prediction on the centroid position of the image at time ${t_{m + 1}}$ according to the (n + 1) images $I({{t_m}} ),I({{t_{m - 1}}} ),\ldots ,I({{t_{m - n}}} )$, which are considered strongly correlated with the image $I({{t_{m + 1}}} )$, where n is an integer in compromise with the reliability of prediction and computing efficiency.
In the preprocessing module, for each couple of images obtained at time ${t_{m - s}}$, ${t_{m - s - 1}}$ $({s = 0,1,\ldots ,({n - 1} )} )$, digital image processing and computer vision techniques are utilized to get some first order derivative features, in which, $\partial I({{x_i},{y_j},{t_{m - s}}} )/\partial x$ and $\partial I({{x_i},{y_j},{t_{m - s}}} )/\partial y$ can be easily obtained by
2.3 CNN module
Figure 1 shows the architecture of CNN module. It has a 6 channel input which is to accept data from the preprocessing module. Since the final task is to predict centroid positions for coming images, there is no need to predict the whole image, instead, at the time ${t_{m - s}}$ we are going to predict a feature vector about the image $I({{t_{m - s + 1}}} )$ by utilizing convolutional neural networks. As shown in Fig. 1, CNN contains nine convolutional layers, a dropout layer and two fully connected layers. A four-dimension vector is obtained as the output of CNN module shown as below,
2.4 LSTM module
Theoretically speaking, temporal relations in the air flow can be calculated based on computational aerodynamics. But it is obviously unwise to solve Navier Stokes Equations in free space only for laser communication. Instead, some other fitting method for signals in time series is necessary. There are two important large time scales for atmosphere turbulence, ${T_1}$ and ${T_2}$. Here, ${T_1}$ is due to advection, estimated by ${L_0}/{V_ \bot }$, where ${L_0}$ is the outer scale of turbulence and ${V_ \bot }$ is the mean wind speed transverse to the observation path. The time scale ${T_1}$ is typically on the order of 1 s. The other time scale ${T_2}$, associated with the eddy turnover time, is typically on the order of 10s [19]. It should be emphasized that, the light field incident on a detector is influenced by large scale turbulence, including the distance along the propagating distance and the circular area with the diameter usually more than several meters near the receiver on transverse to observation path. Hence, current turbulence evolutions can be influential to light fields on detector many mini seconds or even several seconds later. This is related with the chaotic essential of turbulence. Since a CCD or CMOS detector applied to receive optical signals from satellite usually takes pictures at least in the frequency of 30fps or higher, a group of pictures taken within seconds can be correlated with each other, hence it is a wise choice to take the advantage of LSTM, which has been proved efficient for processing nonlinear signals in time series. The equations for a typical LSTM cell are presented as below [9]
As described before, the output from CNN module forms into a 4n-dimension feature vector ${\vec{F}_{4n}}({{t_m},{t_{m - 1}},\ldots ,{t_{m - n + 1}}} )$, which is a signal in time series. And a LSTM module with single layer input is utilized instead of the one with multiple channel input, so that the elements from feature vectors obtained at different times will be fully mixed in LSTM. The output $\vec{Q}({{t_{m + k}},{t_{m + k - 1}},\ldots ,{t_{m + 1}}} )$ for LSTM is a $2 \times k$ matrix which is also a signal in time series, shown as
The matrix $\vec{Q}({{t_{m + k}},{t_{m + k - 1}},\ldots ,{t_{m + 1}}} )$ is composed of k 2d position vectors at $k$ different times shown as
2.5 Description on Model PCL as an integrity
Assume centroid positions of optical images can be fully predicted by feature vectors ${\vec{U}_4}\left( {{t_i}} \right)\left( {i = m + 1, m,..., m - n + 2} \right)$, then Bayesian Network for Model PCL is presented as Eq. (23).
The conditional probability $P[{\vec{x}({{t_{m + 1}}} )|{{\vec{U}}_4}({{t_{m + 1}}} ),{{\vec{U}}_4}({{t_m}} ),{{\vec{U}}_4}({{t_{m - 1}}} ),\ldots ,{{\vec{U}}_4}({{t_{m - n + 2}}} )} ]$ is corresponding to LSTM module utilized to do statistical learning for feature vectors in time series. According to Eqs. (14)–(18), the feature vector ${\vec{U}_4}({{t_i}} )$ can be computed based on $I({{t_m}} ),I({{t_{m - 1}}} )$. Hence, the decoupling in Eq. (23) is reasonable, and $P[{{{\vec{U}}_4}({{t_{m + 1}}} )|({I({{t_m}} ),I({{t_{m - 1}}} )} )} ]$ is corresponding to Preprocessing Module and CNN Module as described in Section 2.2 and 2.3. Since the architecture of Model PCL contains a structure of CNN combined with LSTM, all the parameters both in CNN and LSTM nodules should be recursively updated in the same optimizing process.
Figure 2 shows the whole architecture of Model PCL. The preprocessing module is obviously important for Model PCL, in which several kinds of algorithms are utilized to get useful information about photon distributions and motions on CMOS, which also indicates some variations for light propagates through atmosphere medium. CNN Module is utilized to predict feature vectors which represent the information about coming images. LSTM Module is applied to treat feature vectors in time series and give predictions on centroid positions. In addition, based on Eq. (23), Model PCL is competent for multiple time-step predictions. Actually, it is the highlight of this model. LSTM plays an important role in this function. From Eq. (21) and Eq. (22), suppose the input for our model (including LSTM module) are signals in timeseries containing n time-steps, and the goal is to do m time-step predictions, then the output will firstly be a n×b array, here b is the batch size. To predict centroid positions for m time-steps, m out of these n signals can be easily extracted (usually, m < 0.5n) by data choosing or matrices dot multiplication. And then, the loss function can be defined together with labels for multiple time-step prediction. While the random initialization in LSTM usually causes big differences between different output channels, which lead to unbalances in the optimizing process (The output for some time-step might be still underfitting, while the output for some other time-step can be already overfitting). Hence, some additional terms to suppress these unbalances are necessary.
Assume $({{x_{i,1}},{y_{i,1}}} ),({{x_{i,2}},{y_{i,2}}} ),\ldots ,({{x_{i,m}},{y_{i,m}}} )$ centroid positions predicted by a m time-step model for time ${t_1},{t_2},\ldots ,{t_m}$ here i is corresponding to every image in a batch. Correspondingly, $({{X_{i,1}},{Y_{i,1}}} ),({{X_{i,2}},{Y_{i,2}}} ),\ldots ,({{X_{i,m}},{Y_{i,m}}} )$ are labels for centroid positions. To suppress the unbalances in optimizing process, loss functions are defined as below,
Then the loss function for m time-step prediction model can be expressed by
3. Experiment
We carried out the experiment to receive optical signals and predict centroid positions by utilizing the equipment shown in Fig. 3, which mainly contains an 800nm semiconductor laser generator, an emitting telescope (for laser emitting) and a Cassegrain telescope for signal receiving, as well as a camera connected with a computer.
Shown in Table 1 are main devices applied in the experiment. The angle at the entry for per pixel can be calculated by
Assume the prediction error is m in pixels, then the corresponding angular error can be defined byThe receiving telescope is set at the window of the lab (unfixed), which is on the roof of a building with 15 floors. The telescope might be shaken due to wind, which can be utilized to train and test the model on oscillating condition. The main task of the experiment is to make the prediction error less than 1 pixel (or 10.3 µrad).
Figure 4 shows a screenshot of Google map for local area, in which the green triangle is the emitting terminal on the altitude of 216m, and the red triangle is the receiving terminal on the altitude of 195m. The two terminals are respectively located in two buildings with the distance of 11.16 km across Songhua River in Harbin City.
Shown in Fig. 5 are four images obtained by the receiver with a 550fps camera. To capture image data for training Model PCL, experiments are carried out at different times and also in different windy conditions. An ideal model should be robust to atmosphere turbulence, as well as mechanic oscillations. No matter for atmosphere turbulence, or windy condition or mechanic oscillations, the standard deviation (SD) of centroid positions is always an effective quantity to describe the influences of environment. Hence, SD of centroid positions is calculated in on-line process to indicate the strength of fluctuations. In this experiment, the prediction error is expected to be smaller than one half of standard deviation.
It should be mentioned that deep machine learning is still consuming lots of computing power, so that it is challenging to do prediction for 550 times per second with a single computer. There are two choices to solve the problem. One is to utilize distributed computation by combining several computers, the other is to realize reliable multiple time-step prediction. Actually, multiple time-step prediction is truly valuable for practical use. As a model containing a LSTM module, it does not consume more computing power to do multiple time-step predictions than to predict centroid position for a single time-step. Hence, by applying a n time-step predictor, the forecasting job can be done in the frequency of 550/n Hz, so that lots of computing power can be saved. In Section 3.1∼3.2, one time-step, two time-step and four time-step predictions are discussed, in which the four time-step prediction should be highlighted, since it is very useful for high frequency predictions.
3.1 Off-line training
As it is well known that the experiment about neural networks is always relying on training. Since our task is to do real time prediction on centroid positions, our first target is to get an ‘easy training’ model with fast convergence. Due to the strong nonlinearity, randomness and complicity of turbulence, it is infeasible to build and train a model upon a time to fit centroid positions under atmosphere disturbances all the time. Neither is it a wise idea to predict centroid positions with an untrained model in real time. Hence, to obtain a practical model for real time prediction, we do them both. Firstly, after enough image data have been sampled and preprocessed, we train the model with 36000 images (30000 for training, 6000 for testing), which is called ‘off-line’ training. Subsequently, the pretrained model is fixed into working computer together with optimized parameters. And then, ‘on-line’ training will be carried out together with real-time prediction.
In the experiment, for off-line training, image data are both obtained in the early evening and after midnight. In the city of Harbin, it has been observed that optical signals captured in the dusk are more fluctuated than those obtained after midnight.
Some important parameters should be introduced before talking about training process. In every training epoch, a sub-dataset is extracted from the whole dataset with a random initial position. The sub-dataset is then reshaped into certain number (assume it to be N) of large batches, with each containing L small batches. The number L is an interior parameter for LSTM which is equal to input length for LSTM (as described in and below Eq. (21)). Every small batch contains m images (m = 25 is frequently used in the paper). Images captured in the evening are not mixed with those obtained at midnight in the same large batch while large batches obtained at different times can be mixed. In every epoch, the number of large batches is calculated by N = M/(L×m). Of course, these numbers are all integers.
The parameter L is firstly determined, several different numbers have been tried, 40 is chosen for the experiment. It is the same for training and testing. Both M and m are changeable. The sub-dataset size M is an important parameter which can be adjusted to control the training time, prevent overfitting and underfitting. These influences of parameters on convergence efficiency are more obvious in on-line training than in off-line training.
Shown in Fig. 6 is the training error in an off-line process together with its testing error. Sub-dataset size M = 10000, reshaped into 10 large batches, which are fed into the model for training in every epoch. Both the training and testing errors decrease fast. Stable convergence is sustained after the 200th epoch. (It is not suggested to do off-line training for a too long time here, or the parameters might be overfitting, and the pre-trained model would become uneasy to get fitted on images obtained in a new environment.) It can be seen that the training curve is very close to the testing curve. This phenomenon usually appears in some successful high frequency prediction process, especially for one time-step prediction. For one thing, it verifies that Model PCL is a powerful tool for fitting, for another, images obtained at 550fps are sometimes close with each other in both training and testing dataset, so that it is easy to get stable convergence in this condition. In Fig. 6, both training and testing errors are eventually between 1.3 and 1.5 pixel. Since they are square distances, the corresponding angular errors are constrained to be between 13.39 and 15.45 µrad.
Presented in Fig. 7 are off-line training and testing error curves for a two time-step prediction model. Sub-dataset size M = 12000. Each sub-dataset is reshaped into 12 large batches, which are fed into the model for training in every epoch. Training error 1 and testing error 1 respectively present the training and the testing error for centroid position at time ${t_{m + 1}}$, training error 2 and testing error 2 respectively show the training and the testing error for centroid position at time ${t_{m + 2}}$. The flat area before the 600th epoch of the curves can be relieved by decreasing the weight of norm function [22], but it has little influences on final convergence. As shown in Fig. 7, different curves are covered by each other, hence processes for different time-steps are shown separately as below. Figure 8 presents the first time-step training and testing error for the two time-step model, and Fig. 9 presents the second time-step training and testing error for the same model.
From Fig. 7, Fig. 8 and Fig. 9, we can see the abnormal phenomenon that training errors fluctuate fiercely while the testing curve goes smoothly. These fluctuations actually come from the binding term as shown in Eq. (28) in loss function. Because of the binding term, the interactions between the output at time ${t_{m + 1}}$ and ${t_{m + 2}}$ is enhanced. According to Eqs. (24), (25) and (27), the loss functions for time ${t_{m + 1}}$ and ${t_{m + 2}}$ (Loss1 and Loss2) have to be minimized together. Hence, in the optimizing process, Loss1 and Loss2 ‘drag’ each other. Because of asynchronization, the two loss functions are sometimes ‘dragged up’ and sometimes ‘dragged down’ by each other. Interestingly, the training error can be dragged up only slightly higher than the testing error and is mostly dragged down lower than testing error. Since the parameters of the model are optimized in the right trend, though fluctuations of training errors exist, the testing errors still get convergence smoothly. The second time-step training error fluctuates more fiercely due to the way of tensor multiplication in neural network model. Comparing with other loss function, the binding term has been verified helpful for getting effective convergence in multiple time-step prediction process.
As shown in Fig. 10 are two testing curves for dual time-step prediction model. Though there are some obvious difference between two testing error curves in the beginning, the two channels are finally close with each other. Final errors for the 1st and 2nd time-steps are respectively between 1.48∼1.55 pixels and 1.52∼1.58 pixels, corresponding angular errors are between 15.28∼15.96 µrad and 15.62∼16.29 µrad.
Shown in Fig. 11 are training and testing curves for a four time-step model. Sub-dataset size M = 15000, reshaped into 15 large batches, which are fed into the model for training in every epoch. Assume it is at time tm, then the curves for the 1st time-step, 2nd time-step, 3rd time-step and 4th time-step respectively present the training and testing errors at time tm+1, tm+2, tm+3, tm+4. Similar with the situation in Fig. 7, curves for different times are entangled and very close with each other. The training curves are not so fluctuated as Fig. 7, which is mainly because the binding term containing 4 square differences offers more constraints than before. Similar with before, there are still 32000 images fed into neural networks in every training epoch. Training and testing curves for different time-steps are presented separately as below.
Figure 12 presents the first time-step training and testing curves for the four time-step off-line process.
Figure 13 presents the 2nd time-step training and testing curves for the four time-step off-line process.
Figure 14 presents the 3rd time-step training and testing curves for the four time-step off-line process.
Figure 15 presents the 4th time-step training and testing curves for the four time-step off-line process.
Figure 16 presents four testing curves for the off-line four time-step training and testing process. Though four training curves fluctuate in optimizing process, all the testing curves get convergence smoothly and nicely.
Final errors for the 1st, 2nd, 3rd and 4th time-step are respectively between 1.61∼1.70 pixels, 1.44∼1.48 pixels, 1.54∼1.61 pixels and 1.55∼1.63 pixels. Corresponding angular errors are between 16.61∼ 17.54 µrad, 14.83∼15.28 µrad, 15.96 ∼16.61 µrad, 16.04∼16.86 µrad.
3.2 On-line training and prediction
As described before, after off-line training is done, we are going to do on-line training and testing by taking the advantage optimized parameters. It is obviously infeasible to do on-line training by utilizing datasets with the same size as large as the size in off-line training. Instead, we have to use much smaller datasets. What is more, multi-process technique is utilized [20], in which we start two processes. One for on-line prediction which is the main process also for receiving images, and the other for on-line training. Images obtained in on-line prediction process are utilized directly in on-line training process, and parameters optimized in on-line training process are also used directly in on-line prediction, both through memory sharing technique [21].
However, because machine learning usually takes too much computing power, it is still very challenging to do predictions 550 times per second. Hence, two identical computers can be utilized, each one contains a GTX 2070 graphic card and an I7 9700K CPU. Distributed computing technique is applied in both off-line and on-line process. For on-line process, one computer is utilized for receiving images from the camera and carrying out the prediction about 550 times per second. The other computer is responsible for training the model. And the communication between two computers is based on the protocol within tensorflow. Figure 9 presents distributed computing for the high frequency on-line optimizing and prediction process. Image datasets are transferred from computer 1 to computer 2, and optimized parameters are transferred from computer 2 to computer 1. Figure 17 illustrates the basic idea of distributed computing.
For single and double time-step on-line training and prediction process, distributed computing as shown in Fig. 17 is carried out. As described before, if four time-step on-line training and prediction process can be realized, distributed computing can be replaced by multi-process technique.
Figure 18 presents the data arrangement for multi-process computing in on-line process. Similar with off-line process, the training dataset are composed of N large batch, each large batch is composed of L small batches, and every small batch contains m1 images. The training dataset composed of ${M_1} = {N_1} \times {m_1} \times L$ images are maintained in memory based on first small batch in, first small batch out principle (FIFO), so that the training dataset can be updated in a reasonable frequency to catch up with the camera, and newly received images can be fed into the model in time. The small batch size m1 is changeable, the optimizing efficiency will be low if m1 is too small, while it might be difficult to catch up with the camera frame rate (550) if m1 is too large. In experiment described in paper, m1=10, N1=10. About 2 epochs will be finished in one second, which means model parameters are updated for 20 times per second.
Figure 19 presents an on-line training and testing process for 1000 seconds. Figure 20 magnifies the last half part (from the 500th to 1000th second) of Fig. 19 to distinguish the training and testing curves, and SD (standard deviation) of centroid positions is also presented. Similar with off-line process, the training curve is very close to the testing curve. Nice convergence is obtained after the 400th second. According to Fig. 20, the training error is finally 0.418 pixel, corresponding to the angular error 4.310 µrad, and testing error is finally around 0.430 pixel, corresponding to the angular error of 4.436 µrad. Both the training and testing error have been stabilized to be within 1/3 of standard deviation for centroid positions.
Figure 21 presents the training and testing error for a two time-step prediction model. The data were sampled in a sunny evening, from 9:00 p.m. to 9:25 p.m., while the temperature was between 18° and 20°. In Fig. 13, training error 1 and testing error 1 respectively present the training error and the testing error for centroid position predicted at time ${t_{m + 1}}$. Similarly, training error 2 and testing error 2 respectively show the training and the testing error for centroid position predicted at time ${t_{m + 2}}$. Unlike off-line training, fluctuations of training errors do not exist, which means the parameters have been pretrained very well. Errors decrease fast after the 400th second. It becomes stabilized after the 800th second.
Figure 22 shows the last half of the optimizing process. Training error 1 and testing error 1 are very close with each other, while training error 2 and testing error 2 are separated by a narrow gap with a width of about 0.15 pixel. The standard deviation of centroid positions varies around 1.6 pixel. The testing error 1 and testing error 2 are finally respectively 0.742 and 0.765 pixel, the corresponding angular errors are respectively 7.643 µrad and 7.880 µrad.
Figure 23 presents training and testing errors for a four time-step model. The data were sampled in an early evening with slight wind, from 7:00 p.m. to 8:10 p.m., while the temperature was between 17° and 19°. Nice convergence is obtained after the 500th second.
To prove the robustness of this model, we keep the on-line training and prediction process for more than one hour. Figure 24 shows the training and prediction errors for the four time-step on-line process from the 3000th to 4000th second. And SD (standard deviation) for centroid positions is also shown in the same figure. Different curves are distinguished here. Though standard deviation of centroid positions is higher than before, nice convergences for training and testing errors in different time-steps can still be obtained. According to Fig. 24, the final testing error for the 1st time-step, the 2nd time-step, the 3rd time-step and the 4th time-step are respectively 0.483 pixel, 0.605 pixel, 0.741 pixel and 0.572 pixel, corresponding angular errors are 4.97 µrad, 6.232 µrad, 7.632 µrad, 5.892 µrad. Sometimes, the curves are not so flat, slight fluctuations might exist, but errors can still be suppressed to stay under 1 pixel. While standard deviation for centroid positions is around 2 pixels.
Experiments show the model is insensitive to standard deviation of centroid positions, when SD varies between 1.5 to 2.2 pixels due to atmosphere turbulence, windy conditions and telescope oscillations. For a model well pretrained in off-line process, the predictions are more likely to be dependent on parameter adjustment, data arrangement and variable initialization. More detailed investigations will be carried out for these relations in further research.
4. Conclusion
According to the analysis above, a powerful tool for centroid prediction under nonlinear disturbances can be obtained by combining a preprocessing module with a CNN module and a LSTM module. By adjusting parameters, the prediction error can be suppressed to be less than 1 pixel, which proves the potentiality for practical use. But there are still plenty of things to do, such as, 1. To enhance robustness by more data training and parameter adjusting, 2. To improve optical and photoelectronic systems so that accuracy can be promoted, 3. To consider optical aberrations. Some interesting work has been going on, and more valuable results will come out in future.
References
1. D. Cornwell, “Space-Based Laser Communications Break Threshold,” Opt. Photonics News 27(5), 24–31 (2016). [CrossRef]
2. M. R. Clark, “Application of Machine Learning Principles to Modeling of Nonlinear Dynamic Systems,” J. Arkansas Acad. Sci. 48, 36–40 (1994).
3. K. Worden and P. L. Green, “A machine learning approach to nonlinear modal analysis,” Mech. Syst. Signal. Process. 84, 34–53 (2017). [CrossRef]
4. G. Ju, X. Qi, H. Ma, and C. Yan, “Feature-based phase retrieval wavefront sensing approach using machine learning,” Opt. Express 26(24), 31767–31783 (2018). [CrossRef]
5. D. Margaritis, “Learning Bayesian Network Model Structure from Data,” PhD dissertation, Carnegie Mellon University, (2013)
6. S. Arnon and N. S. Kopeika, “Adaptive suboptimum detection of an optical pulse-position-modulation signal with a detection matrix and centroid tracking,” J. Opt. Soc. Am. A 15(2), 443–448 (1998). [CrossRef]
7. P. Zarchan and H. Musoff, Fundamentals of Kalman Filtering (Progress in Aeronautics and Astronautics) (AIAA, 2015), pp. 210–212
8. G. Farnebäck, “Two-Frame Motion Estimation Based on Polynomial Expansion,” in Proceedings of the 13th Scandinavian Conference on Image Analysis (SCIA, 2003), pp. 363–370.
9. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation 9(8), 1735–1780 (1997). [CrossRef]
10. R. C. Gonzalez and R. E. Woods, Digital Image Processing (Pearson, 2017), pp. 57–120.
11. J. W. Goodman, Introduction to Fourier Optics (W.H. Freeman, 2017), pp. 441–452.
12. R. Y. Tsai, “A Versatile Camera Calibration Technique for High-Accuracy 3D Machine Vision Metrology Using Off-the-Shelf TV Cameras and Lenses,” IEEE J. Robot. Automat. 3(4), 323–344 (1987). [CrossRef]
13. O. Keller, Light - The Physics of the Photon (CRC Press, 2014), pp. 378–402.
14. K. Cahill, Physical Mathematics (Cambridge University Press, 2013), pp. 245–256.
15. J. Corso, “Motion and Optical Flow,” https://web.eecs.umich.edu/∼jjcorso/t/598F14/files/lecture_1015_motion.pdf, University of Chicago (2014).
16. Q. Wang, Y. Siyuan, L. Tan, and J. Ma, “Approach for Recognizing and Tracking Beacon in Inter-Satellite Optical Communication Based on Optical Flow Method,” Opt. Express 26(21), 28080–28090 (2018). [CrossRef]
17. D. B. Bungbung and D. Valero, “Application of the Optical Flow Method to Velocity Determination in Hydraulic Structure Models,” in 6th International Symposium on Hydraulic Structures and Water System Management, Portland, (2016).
18. E. R. Davies, Computer Vision: Principles, Algorithms, Applications, Learning (Academic Press, 2017), pp. 347–357.
19. L. C. Andrews and R. L. Phillips, Laser Beam Propagation through Random Media (SPIE Press, 1998).
20. https://docs.python.org/2/library/multiprocessing.html
21. https://docs.python.org/3.8/library/multiprocessing.shared_memory.html
22. I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning (The MIT Press, 2016), pp. 542–550.