The pistons of sparse aperture systems need to be controlled within a fraction of a wavelength for the system’s optimal imaging performance. In this paper, we demonstrate that deep learning is capable of performing piston sensing with a single wide-band image after appropriate training. Taking the sensing issue as a fitting task, the deep learning-based method utilizes a deep convolutional neural network to learn complex input-output mapping relations between the broadband intensity distributions and corresponding piston values. Given a trained network and one broadband focal intensity image as the input, the piston can be obtained directly and the capture range achieving the coherence length of the broadband light is available. Simulations and experiments demonstrate the validity of the proposed method. Using only in-focused broadband images as the inputs without defocus division and wavelength dispersion, obviously relaxes the optics complexity. In view of the efficiency and superiority, it’s expected that the method proposed in this paper may be widely applied in multi-aperture imaging.
© 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement
The synthetic aperture imaging system initially proposed by A.B. Meinel  in 1970, which is composed of several small sub-apertures, can equivalently achieve a high resolution of traditional monolithic primary mirror systems, making it possible to realize lightweight of telescopes and decrease in manufacturing costs. For such systems, only when the optical path difference (OPD) between sub-apertures, also referred to as the piston, is reduced to a fraction of the used wavelength, can high-resolution images be obtained. Thus, piston sensing has been one critical point for synthetic aperture imaging.
So far, many piston sensing methods have been developed, including the modified Shack-Hartmann sensors [2,3], pyramid sensors , dispersed fringe sensors [5,6] and so on. These methods have shown great success while additional optical components are required, which increase the hardware complexity. Image-based piston sensing techniques grow out of the quest for simpleness and low costs. Mourard et al proposed to extract pistons directly from the surrounding peaks of modulation transfer functions and used 3 wavelengths for the large capture range . The use of multi-wavelength might not contribute to simplifying the system. Phase diversity has been used to sense phasing errors based on diverse images usually captured on focal and defocused plane [8,9]. However, phase diversity needs an iterative optimization procedure causing a large amount of calculation, and suffers from ambiguity.
The method using neural networks is another effective image-based technique for piston sensing. The back propagation (BP) neural network was first proposed to detect phasing errors in 1990 [10,11], and Kendrick et al developed the method for extended objects [12,13]. Limited to the number of layers, the traditional BP neural network for piston sensing needs a pair of focus diversity images and lacks an ample capture range. Deep learning proposed by Hinton in 2006 [14–16] has addressed the limitation of layers. It has been successfully applied to wavefront sensing for traditional single-aperture systems , and the latest research has made great progress, which demonstrates that the convolutional neural network (CNN) can directly estimate wavefront from a single white-light intensity image . Differing from wavefront sensing for single-aperture systems, piston sensing is faced with some unique difficulties brought by multiple apertures, such as the phase discontinuity and 2π ambiguity. Recently, the CNN using defocus images as inputs for piston alignment of segmented optical mirrors has been demonstrated by means of simulations . This method eliminates the requirement of focus diversity and achieves a large capture range. However, there are still several drawbacks: 1) The network they used isn’t deep enough to work in an end-to-end mode. That’s to say, before being fed to the CNN, the raw images need to be complicatedly preprocessed. 2) The undesirable beam splitter is necessary, since the input images are captured on the defocused plane while what we are interested in is on the focal plane. 3) Two networks and four different wavelengths are required for the large capture range, which inevitably increase the complexity.
To alleviate the limitations, in this paper, we establish one deep convolutional neural network (DCNN) and further demonstrate that it is capable of learning to sense pistons with a single broadband focal image. With more layers and some deep learning strategies, our DCNN can discern patterns in the raw focal image without any preprocessing and output the piston directly, thus realizing an end-to-end piston sensing. The piston is directly extracted from the broadband intensity without 2π ambiguity. The capture range is equal to the coherence length of the wide-band light. Without wavelength dispersion and defocus division, this end-to-end method based on deep learning extremely compacts the sparse aperture system. Both simulations and experiments are performed to validate the effectiveness and accuracy of the proposed method with broadband illumination.
In the following, we describe the imaging system model, generation of training data, and the implementation of DCNN in our study in Section 2. Then the results are presented in Section 3, including simulation results and experimental results. Finally, concluding thoughts are offered in Section 4.
2.1 Sparse aperture imaging system
Based on the imaging principle, the synthetic image captured on the focal CCD can be expressed as:
The purpose of this paper is to correct the pistons, so here we assume that the tilt-tip and sub-aperture aberrations are corrected. The capacity of deep learning to sense all the wavefront distortions needs our further exploration in the future. When only pistons are taken into consideration, the corresponding can be represented as:
With broadband illumination, the intensity distribution captured on the CCD is the superposition of energy from all the wavelengths contained in the spectrum range. With spectrum from to , the corresponding PSF can be modeled as:
2.2 Generation of data set
To validate the feasibility of this piston error sensing method based on deep neural network, the data set for training is needed to establish the mapping relation between the intensity images and pistons. In practical engineering application, the illuminant cannot be regarded as a monochromatic light, but a broadband light with a certain bandwidth. Therefore, it is of great significance to study the co-phase error sensing technique with polychromatic light. In this paper, all the researches are carried out on the condition that the light source is wide-band light with spectrum 500-600 nm. And we set 21 monochrome lights with the interval of 5 nm in the spectral range and assume that the energy for each wavelength is uniformly distributed.
In the simulation stage, we model the two-aperture imaging system and the four-aperture imaging system using MATLAB according to the Fourier optics principle. Then a series of pistons are introduced to generate intensity images. Part of these intensity images together with the corresponding piston values constitute the training set, and the others are utilized to test the performance of the trained network. Considering the coherence length, the maximum value of the piston between two sub-apertures should be smaller than 3 μm. Therefore, random values ranging from −3 μm to 3 μm are generated as optical path differences to produce 10000 degraded PSF images, which can be divided into 8000 images for training and 2000 images for testing. Besides, to approximate the practical imaging environment, zero mean and 0.01 variance Gaussian distribution noise is introduced in the simulated PSF images. Also, since the tip-tilt and sub-aperture aberrations can’t be corrected entirely, slight phase distortions are generated by summing up the first 11 Zernike polynomials excluding the first one. The RMS errors of each wavefront aberration is 0.05(). Since the wavefront aberration is quite small, we assum that the aberration distribution for each wavelength is the same. The PSF images produced by the two-aperture imaging system and the four-aperture imaging system are shown in Figs. 1(a)-1(h). For experimental validation, an experimental platform is also set up to generate real PSF images for training and testing.
2.3 DCNN implementation
DCNN is one type of deep learning models which is the most commonly used in the area of image processing. Typically, the architecture of DCNN comprises one input layer, several convolutional layers, pooling layers, fully-connected layers, and one output layer, among which convolutional layers together with pooling layers play a role as feature extractors. Our DCNN has 26 layers, of which the structure is plotted as Fig. 2. In the convolutional layer, each convolution kernel is connected to local patches of the image from the previous layer, and feature maps are extracted by several different convolution kernels. Each pixel in one feature map comes from the same kernel, and this architecture of weight sharing can greatly reduce the parameter size while decreasing the complexity of the neural network and the potentiality of over-fitting compared to the conventional neural network in the form of multiple fully-connected layers. Down-sampling operation in pooling layers can further diminish the number of parameters and improve the generalization ability of the model. After a series of feature extraction operations, the pixels in each map are linked together in fully-connected layers. With the extraction of these abstract features, the network can learn the mapping relation between the intensity distributions and the corresponding pistons more expediently.
To improve the performance of our DCNN model, several deep learning strategies are introduced. In the aspect of architecture construction, stacked small convolution kernels are utilized instead of the traditional large convolution kernels, which have been proved to be more effective in extracting abstract features. Besides, Batch Normalization (BN) is introduced into the network, which can prevent vanishing gradient and exploding gradient problem while speeding up the training process. In the aspect of parameter updating, we utilize an adaptive learning rate algorithm , which can dynamically adjust the learning rates of every parameter with the iteration of the training data. With this algorithm, iterative learning rate will be captured in a fixed range after each update, which makes the weights update more stable.
The training procedure is illustrated in Fig. 3, which can be summarized as the forward propagation of data and the back propagation of error. In the forward propagation stage, each pixel of the training images is fed to neurons in the input layer and organized by the hidden layers to produce feature maps. Figure 4 displays some feature examples from the convolution kernels, pooling layers and non-linear mapping layers, from which we can learn about the image processing procedure in CNN more intuitively. In the back forward stage, mini-batch stochastic gradient descent is implemented to adjust the weights and biases to minimize the loss function until the outputs of the network are as close as possible to the values we really want to predict.
After the training procedure is completed, the structure of the network is fixed. Then testing set is applied to evaluate the performance of the trained network. If the training accuracy is high while the testing accuracy is low, there is an over-fitting problem; if the training accuracy and testing accuracy are both low, there is an under-fitting problem. In the training process of our DCNN, training set and testing set are fed to the network simultaneously and the loss values on two sets are printed on the console interface alternately, which can assist us to determine when the network training is accomplished.
3.1 Simulation results
In the numerical simulation, a two-aperture imaging system in the presence of wavefront errors caused by spatially varying and the imperfections of the surfaces shown in Fig. 5 is modeled, where the diameter of sub-aperture is 10 mm. The broadband light with spectrum 500-600 nm, passing through the pupil plane, is captured by the CCD in the focal plane with 500 mm focal length, and the pixel size of the CCD is 1.67 μm. As mentioned in Section 2, 8000 PSF images and 2000 other PSF images are then generated for training and testing, respectively. Specifically, we find that the deep learning algorithm can converge to the optimal solution only if the effective size of the training PSF images generated from the two-aperture imaging system is not less than pixels.
To evaluate the performance of our piston sensing method, several simulations are performed on an open-source machine learning software library called TensorFlow, which is run on a desktop computer using an NVIDIA GTX 1080 Ti graphics card. First, the computing framework of the network is defined, including the number of layers, loss function, optimization algorithm, initialization of the network weights as well as some parameters like learning rate, mini-batch size and dropout rate, which will greatly influence the training effect. Then 8000 images and corresponding piston values are arranged to TFRecord data format and fed to the network. When the response of the network to the input reaches a predetermined target range through repeated iterations, the weights and biases in every layer are fixed. After training, the testing images are utilized to evaluate the performance of the trained network.
The average value of the RMS errors between the outputs of the network and true piston values of the 2000 testing samples is about 9 nm, and the maximum value is 27 nm. The results in Fig. 6 show the residual RMS errors of 500 samples randomly selected from the testing set, and we can see that the vast majority of RMS errors are under 20 nm. Figure 7 is the distribution of the RMS errors on the training set and testing set, and the histogram shows that the results of training are largely consistent with that of the testing, which demonstrates there is no overfitting problem in our trained network. Furthermore, we can see that almost 70 percent of the piston sensing errors are within 10 nm.
To further evaluate the generalization ability of our network, we design another simulation based on the previous one. For the training of DCNN, the learning of mapping relation between images and piston values is a data-driven process, which means the performance of the network is strongly dependent on the training sets, thus limiting the generalization capability of the network. In this simulation, we change the imaging system parameters including the diameter of each-aperture (D) and focal length (f), respectively, and generate intensity images from these systems as testing sets to examine the generalization ability of the trained-network (original imaging system parameters with 10 mm diameter and 500 mm focal length) for new systems. It should be specially explained that the spacing between the sub-apertures is kept constant when the diameter of each-aperture is changed. Table 1 shows that the testing sets generated from systems with different focal lengths are similar in accuracy, while the generalization ability declines rapidly when the diameter is changed.
Next, we test to see if this DCNN model could be utilized to sense the pistons in the four-aperture imaging system with polychromatic light. Similar to the simulation above, we model a four-aperture imaging system using MATLAB, which is shown in Fig. 8. We collected 20000 PSF images for training and another 2000 for testing. The average value of the RMS errors over the testing samples is about 17 nm. The distribution of the RMS errors between the outputs of the network and true piston values over the training set and the testing set is shown in Fig. 9. These results clearly demonstrate that the proposed piston sensing method based on the deep neural network still works well in the four-aperture imaging system.
However, there is still a slight decrease in the performance on the four-aperture imaging system compared to that on the two-aperture imaging system. There are several potential reasons for the decreased accuracy. First, the more apertures are there in the system, the more complex the mapping relation is. When the method is applied to four or more apertures, the training data needed is greatly increased if we want to achieve an equivalent accuracy compared to the two-aperture system. Considering the complexity of the non-linear relation and the amount of computation, the decrease in accuracy is inevitable when we detect pistons on the systems that have more sub-apertures.
3.2 Experimental results
In this section, we validate the availability of the proposed method on experimental images. The experimental setup is shown schematically in Fig. 10. A broadband light source emitted from the laser device is collimated and expanded to form a parallel light with a certain beam width by a collimator, and then reflected by a nanopositioning stage, which is placed at a 45 degree angle. The nanopositioning stage composed of one sub-mirror and one reference mirror is controlled by the SC-200 Controller. The sub-mirror embedded in the reference mirror is connected to three actuators. By applying voltage to the actuators, the sub-mirror can do parallel motion in the direction perpendicular to the reference mirror plane, introducing piston errors in the light path. The light beams reflected by the sub-mirror and the reference mirror pass through two sub-apertures, respectively, then focused by a lens and captured by the CCD. The CCD used in our experiments is MER-1070-14U3x, with pixels and a 1.67 μm pitch. Here the light source is still a broadband light ranging from 500 nm to 600 nm. It should be specially explained that the laser source is not characterized and the spectra distribution doesn’t match the illumination assumptions made in the simulations.
The piston between the two pupils is first calibrated to zero by using the chromatic phase diversity . Then in the range of coherence length [-3 μm, 3 μm], we save one image after each additional 10 nm piston is introduced. The obtained 600 images are divided into two parts: 500 for training and 100 for testing. The training procedure is shown as Fig. 11. Here we ignore the loss function values in the first couple of iterations for convenient exhibition, since the beginning values decline sharply. It can be seen that the function begins to converge after about 200 iterations. A DCNN capable of sensing pistons is obtained with training and then we check its piston sensing performance by using the testing set. To eliminate the piston calibration error, the relative values are computed as the results. The average value of the RMS errors between the predicted values and real inputs of testing samples is about 15 nm, and the maximum one is 49 nm. As shown in Fig. 12 and Fig. 13, the sensing accuracy using experimental images is just slightly lower than that using simulated images. There are several potential reasons for the decreased accuracy. First, the network is trained with only 500 images due to the limited accuracy of the nanopositioning stage, the number of which is far less than 8000 images used in the simulation. Then, there is deviation when pistons are introduced by the nanopositioning stage, resulting in the inconsistency between the actual piston values in the PSF images and the training data. Besides, another factor may be the tilt error caused by the jitter in the experimental setup which has not been calibrated entirely. The PSF images before and after correction are shown in Figs. 14(a) and 14(b), from which we can see the effectiveness of the deep network on real images visually.
The piston mainly influences the mid-frequency of the sparse aperture system. When the system suffers large piston errors, the decrease of mid-frequency can badly affect the resolution. The pistons need to be controlled within 0.1λ so that the system can achieve optimal imaging performance . The fact that the sensing accuracy of our method in both simulations and experiments almost meets the evaluation criterion demonstrates our method can effectively improve the imaging performance.
To conclude, this paper has demonstrated that deep learning is capable of performing piston sensing with a single wide-band image after appropriate training. Firstly, simulations are performed to validate the piston sensing performance of the trained DCNN. The piston sensing results show that the average values of RMS errors on the two-aperture system and the four-aperture system can achieve 9 nm and 17 nm, respectively. Besides, several simulations are performed to further evaluate the generalization ability of this DCNN. The results indicate that no obvious decrease in piston sensing accuracy is caused when focal length changes. While the accuracy declines sharply when the size of the pupil is changed.
Then an experimental platform is set up to generate real images. The piston sensing accuracy on the two-aperture system with real images shares a similar value to that on simulation, which further demonstrates the effectiveness of piston sensing with real images. More surprisingly, the experiments in practical system show that 500 intensity images are enough to train a decent network, which means that our training process can be vastly simplified. Theoretically, the more samples there are in training set, the more precise the predicted outcome is. In practical application, we can expand the training set appropriately to achieve greater accuracy depending on the application requirement.
The proposed method can directly learn the mapping relations between the broadband intensity images and corresponding pistons, avoiding the problem of easily getting into local minimum and achieving greater robustness during training. Compared to the existing piston sensing approaches based on neural network, our DCNN-based method being able to remove 2π ambiguity and achieve high sensing accuracy is trained with only a single broadband in-focus image, which extremely relaxes the optics complexity. What’s more, the network can be trained directly using raw images as the input without hand-design feature extraction, making the training independent from prior knowledge and preprocessing effort, and thus realize an end-to-end sensing. In view of the efficiency and superiority, it’s expected that the piston sensing based on the DCNN proposed in this paper may find wide applications in multi-aperture imaging.
State Key Laboratory of Pulsed Power Laser Technology (SKL2018KF05); Excellent Youth Foundation of Sichuan Scientific Committee (2019JDJQ0012); Youth Innovation Promotion Association, CAS (2018411); CAS “Light of West China” Program; Young Talents of Sichuan Thousand People Program.
2. G. Chanan, M. Troy, F. Dekens, S. Michaels, J. Nelson, T. Mast, and D. Kirkman, “Phasing the mirror segments of the Keck telescopes: the broadband phasing algorithm,” Appl. Opt. 37(1), 140–155 (1998). [CrossRef] [PubMed]
6. F. Shi, D. C. Redding, A. E. Lowman, C. W. Bowers, L. A. Burns, P. Petrone III, C. M. Ohara, and S. A. Basinger, “Segmented mirror coarse phasing with a dispersed fringe sensor: experiment on NGST’s wavefront control testbed,” Proc. SPIE 4850, 318–328 (2003). [CrossRef]
7. D. Mourard, W. Dali Ali, A. Meilland, N. Tarmoul, F. Patru, J. M. Clausse, P. Girard, F. Henault, A. Marcotto, and N. Mauclert, “Group and phase delay sensing for cophasing large optical arrays,” Mon. Not. R. Astron. Soc. 445(2), 2082–2092 (2014). [CrossRef]
8. R. L. Kendrick, J.-N. Aubrun, R. Bell, R. Benson, L. Benson, D. Brace, J. Breakwell, L. Burriesci, E. Byler, J. Camp, G. Cross, P. Cuneo, P. Dean, R. Digumerthi, A. Duncan, J. Farley, A. Green, H. H. Hamilton, B. Herman, K. Lauraitis, E. de Leon, K. Lorell, R. Martin, K. Matosian, T. Muench, M. Ni, A. Palmer, D. Roseman, S. Russell, P. Schweiger, R. Sigler, J. Smith, R. Stone, D. Stubbs, G. Swietek, J. Thatcher, C. Tischhauser, H. Wong, V. Zarifis, K. Gleichman, and R. Paxman, “Wide-field Fizeau imaging telescope: experimental results,” Appl. Opt. 45(18), 4235–4240 (2006). [CrossRef] [PubMed]
9. R. G. Paxman and J. R. Fienup, “Optical misalignment sensing and image reconstruction using phase diversity,” J. Opt. Soc. Am. A 5(6), 914–923 (1988). [CrossRef]
10. J. R. P. Angel, P. Wizinowich, M. Lloyd-Hart, and D. Sandler, “Adaptive optics for array telescopes using neural-network techniques,” Nature 348(6298), 221–224 (1990). [CrossRef]
11. P. L. Wizinowich, M. Lloydhart, B. A. Mcleod, D. Colucci, R. G. Dekany, D. M. Wittman, J. R. P. Angel, D. W. McCarthy, W. G. Hulburd, and D. G. Sandler, “Neural network adaptive optics for the multiple-mirror telescope,” Proc. SPIE 1542, (1991).
13. H. Yi, Y. Li, C. Fan, and J. Wang, “A New Method of Phase Diversity Wave-front Sensing Based on SOFM NN,” Guangzi Xuebao 7(5), 352–354 (2008).
14. G. E. Hinton, “Deep belief networks,” Scholarpedia 4(6), 5947 (2006).
16. R. K. Olga, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Khosla, K. Aditya, M. Bernstein, A. C. Berg, and F. Li, “ImageNet Large Scale Visual Recognition Challenge,” Int. J. Comput. Vis. 115(3), 211–252 (2014).
19. D. Guerra-Ramos, L. Díaz-García, J. Trujillo-Sevilla, and J. M. Rodríguez-Ramos, “Piston alignment of segmented optical mirrors via convolutional neural networks,” Opt. Lett. 43(17), 4264–4267 (2018). [CrossRef] [PubMed]
20. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” Int. Conf. on Learn. Represent. (ICLR) (2015).
21. S.-J. Chung, D. W. Miller, and O. L. de Weck, “ARGOS testbed: study of multidisciplinary challenges of future spaceborne interferometric arrays,” Opt. Eng. 43(9), 2156–2167 (2004). [CrossRef]