Deep neural networks (DNNs) are used to classify and reconstruct the input images from the intensity of the speckle patterns that result after the inputs are propagated through multimode fiber (MMF). We were able to demonstrate this result for fibers up to 1 km long by training the DNNs with a database of 16,000 handwritten digits. Better recognition accuracy was obtained when the DNNs were trained to first reconstruct the input and then classify based on the recovered image. We observed remarkable robustness against environmental instabilities and tolerance to deviations of the input pattern from the patterns with which the DNN was originally trained.
© 2018 Optical Society of America under the terms of the OSA Open Access Publishing Agreement
Optical fibers are prominently used in telecommunications  and endoscopy for medical diagnosis . The need to increase the information throughput has encouraged the consideration of transmitting information through parallel channels in multimode fibers (MMFs). In telecommunications, this translates to launching multiple channels with different spatial modes. For endoscopic applications, the spatial modes of the MMF are used to carry the information of the different pixels in the image. However, when a pattern is projected on the proximal side of a MMF, the image we get on the distal side is a speckle pattern since the input couples into multiple fiber modes, which travel with different propagation constants along the fiber length. Additionally, local defects along the fiber length induce mode coupling, which further randomizes the propagation of the input field. Therefore, the phase between local image features decorrelates fast after a few millimeters of a MMF, resulting in the formation of a speckle pattern.
The transmission of images through a MMF was first presented in 1967 by Spitz and Wertz , who demonstrated experimentally that the distortion introduced by modal dispersion can be undone by phase conjugation. In the intervening years, numerous publications have described methods for transmitting image information through MMFs or scattering media [4–15]. These methods rely on coherent (holographic) recording of the speckle pattern detected at the distal end of the fiber, and they use phase conjugation or the transmission matrix method to compensate for the effects of modal dispersion and either focus the light at the distal end or project focused images. Iterative optimization algorithms can also optimize the phase of the input field in the MMFs in order to obtain the desired output . However, perturbations of the system in such calibration-based techniques pose practical difficulties due to temporal changes in the speckle patterns for a constant input. Since this can prevent image recovery, especially for very long fibers, there has always been a demand for more robust ways to transmit information through long fibers.
There have been two previous reports on the use of artificial neural networks (ANNs) for recovering the images transmitted through MMFs [17,18]. In these early demonstrations, two-layer networks were trained and were able to recognize a few () images after a 10 m long step-index fiber. Furthermore, learning techniques have been employed in a variety of ways in optical systems. Examples have been used to train a reconfigurable optical system [19–23] or train a digital computer to interpret or control the operation of a fixed optical system [24–26]. The work presented in this paper belongs to the second category. We present a novel application of the deep neural networks (DNNs) in the field of optics for information transmission through MMFs that could have a large impact in various fields. We collect a large number of intensity speckle patterns produced by launching images through a MMF and use these examples to train a DNN to interpret the input to the fiber. Specifically, we demonstrate the use of modern DNN architectures [25,26] with up to 14 hidden layers. A database of 20 k handwritten digits was used to train and assess these networks; this database was randomly split into a 16k training set, a 2 k validation set, and a 2 k testing set. Recognition or reproduction of an image launched at the proximal end of the fiber was achieved by detecting only the light intensity at the distal end facet. Then, we show that the classification performance by the DNN is greatly enhanced if we first use the network to reconstruct the input to the fiber followed by a separate DNN that is trained to recognize the reconstructed input images.
We believe that the work presented here will be a cornerstone for the research concerning the MMFs. Imaging using MMFs has been a subject of great interest in recent years using wavefront shaping and linear techniques such as phase conjugation or a transmission matrix. Fiber length and environmental instabilities were always two main parameters prohibiting the efficient throughput of information through MMFs. In the present paper, we show that a combination of optics with modern DNNs could open the path for solving the above-mentioned limitations and set the stage for new applications in telecommunications or medical endoscopic technologies.
In the following section, we describe the experimental apparatus used to collect the database with which we trained the DNN to recognize the digits presented at the input. The same dataset was used to train a different DNN to reconstruct the input digits given the speckle pattern measured at the distal end. The performance was evaluated for different fiber lengths up to a maximum of 1 km. We conclude with a discussion about the relative merits of using a combination of intensity detection with a DNN to interpret the measured data versus coherent (holographic) recording and linear inverse scattering methods to retrieve the input to the fiber.
2. MATERIALS AND METHODS
A. Experimental Setup
The optical system used to collect the data is shown in Fig. 1. The laser beam of a 560 nm wavelength diode laser is used to illuminate a graded-index (GRIN) multimode fiber with 62.5 μm core diameter and numerical aperture (NA) of 0.275 (GIF625, Thorlabs). The fiber supports approximately 4500 spatial modes at the specific wavelength. The input patterns are displayed on a spatial light modulator (SLM, , Pluto-Vis, Holoeye), and the SLM plane is imaged onto the proximal facet of the MMF by means of a imaging system. Another system is placed at the distal end of the fiber to image the speckle pattern emerging from the distal facet on a charge-coupled device (CCD) camera (Chameleon 3, , Mono, Point Grey). An additional camera is used on the proximal side to monitor the images reflected by the SLM. A half-wave plate and a linear polarizer are placed before and after the SLM (see Fig. 1), respectively, in order to test both phase and amplitude patterns as inputs to the GRIN fiber.
In our experiments, the patterns generated by the SLM were handwritten digits from the MNIST database. Before being processed by the DNN, each image recorded by CCD1 or CCD2 is cropped to a window centered on the digit and the speckle, respectively. The cropped images of the speckle patterns recorded by CCD2 were then downsampled to and used as input for the DNNs. An example of the projected digits at the proximal fiber facet is shown in Fig. 2, where the digits zero and four are shown for both amplitude [Figs. 2(c) and 2(d)] and phase modulation [Figs. 2(e) and 2(f)], along with the corresponding speckle patterns captured at the distal fiber end for a GRIN fiber that is 2 cm long. The speckle patterns [Figs. 2(g) and 2(h)] look similar to one another because their appearance is dominated by the DC component of the light from the SLM. However, when we subtract the intensity patterns [Figs. 2(d) and 2(h)] corresponding to the two digits, we reveal that there is a significant difference [Fig. 2(i)], which can be picked up by the DNN to distinguish the two inputs. The results presented in the remainder of the paper are obtained by adjusting the SLM so that the patterns entering the fiber are phase-only or amplitude-modulated images of the digits.
B. Data Processing
A “Visual Geometry Group (VGG)” type convolutional neural network (CNN), as developed by Simonyan and Zisserman , was used to classify the distal speckle images and reconstructed SLM input images [Fig. 3(a)]. These networks consist of a convolutional front end with downsampling for encoding and a fully connected back end for classification; see Fig. 3(a) for details. The use of such a deep CNN with very small filter kernels has been shown to provide high image classification accuracies.
A “U-net” type CNN with 14 hidden layers, as developed by Ronneberger et al. , was used to reconstruct the SLM input image from the recorded distal speckle intensity pattern [Fig. 3(b)]. This nearly symmetric network architecture comprises a convolutional encoding front end with downsampling to capture context and a deconvolutional decoding back end with upsampling for localization; see Fig. 3(b) for details. Skip connections copy feature layers produced in the contracting path with features layers in the expanding path of the same size, thus improving localization.
For training both networks, the obtained 20 k distal speckle pattern images were randomly split into 16 k training, 2 k validation, and 2 k testing sets. The training sets were processed in 50 and 500 image batches for the reconstruction and classification networks, respectively, with batch shuffling to minimize over-fitting. An Adam optimizer with a learning rate of was used to minimize a mean square error cost function. The networks were trained for a maximum of 50 epochs. For each case, training was carried out 10 times to provide statistics for the training accuracies. The DNNs were implemented using the TensorFlow 1.5 Python library on a single NVIDIA GeForce GTX 1080Ti graphics processing unit.
A. Input Reconstruction
In a first step, the ability of our DNN to reconstruct the input digits from the distal speckle intensity patterns was tested. In Fig. 4, we present the results of the reconstruction for the 0.1, 10, and 1 km fiber lengths with amplitude-modulated inputs into the proximal facet of the GRIN fiber.
Although appearing random, the speckle patterns contain information about the propagation of the input field through the fiber. In fact, the results confirm the above statement, showing that the recovery of the input is possible with an intensity-only image of the distal speckle pattern using the U-net CNN. Based on the reconstructed images obtained from our experiments (Fig. 4), the fidelity of the reconstruction decreases from 97.6% for a 0.1 m fiber to 90.0% for a 1 km fiber.
In the case of the 1 km long GRIN fiber, the speckle pattern at the distal end was unstable. Local temperature nonuniformities in the fiber induce changes in the optical path due to both thermal expansion of the material and the change of its refractive index. Thermal convection or mechanical vibrations around the fiber can lead to drifting of the distal speckle pattern, in time creating an extra “noise” on the acquired speckle patterns acquired (Supplement 1, see Visualization 1). Therefore, further care could be taken to thermally isolate the fiber and to maintain an isothermal environment, which might give an increase of performance. The high fidelity of the reconstructed SLM input images also shows that this technique effectively denoises the system by removing artifacts associated with the optical setup. Fidelity was measured as the percent mean square error of the reconstruction compared to the input. For example, the network recovers the SLM input image shown in Fig. 2(a) from the distal speckle intensity pattern shown in Fig. 2(g), while eliminating artifacts projected onto the proximal facet of the fiber, as shown in Figs. 2(c) and 2(e), as well as subsequent artifacts due to flaws and dirt or even misalignments on the proximal facet of the fiber.
B. Input Classification
Results for the classification of the distal speckle intensity patterns are presented in Table 1 and in the solid lines in Fig. 5. These show that the classification accuracy, defined as the percentage of correctly recognized digits, decreases with increasing fiber length for both amplitude and phase-modulated proximal facet input modes. Generally, the accuracy decreases from 90% for a 2 cm fiber to 30% for a 1 km fiber. The accuracy is calculated based on the label (class) of the inputs on the SLM [Figs. 2(a) and 2(b)]. This compares with a classification accuracy of 98.4% for the original SLM input digit images. This decrease can be attributed to increased scattering losses, mode coupling, and drifting of the distal speckle pattern with increasing fiber lengths.
In the movie included in the online version of Visualization 1, one can observe changes in the distal speckle pattern over a 5 s period. This movie was recorded at a frame rate of 83 fps for the 1 km long GRIN fiber with a constant blank image as the SLM input. Two frames were taken from this video and are shown in Fig. 6 along with the difference between the two intensity patterns. The two frames were recorded 2 s apart. Clearly, the speckle pattern has changed substantially. Therefore, although the proximal input does not change, the speckle intensity at the distal end of the fiber changes rapidly with time. This can be attributed to fluctuations of the ambient temperature or airflow over the optical setup that induce slight perturbations on the GRIN fiber that become significant over its 1 km length. Therefore, changes on the distal output caused by the projection of different digits while the training dataset is acquired can be buried in the “noise” caused by the drifting of the speckle pattern. In order to test the effect of the drifting distal speckle patterns on the accuracy of the classifications, the VGG network was trained on the first 10,000 samples of images of an acquired dataset and tested on images from the second half (recorded several hours later), and vice versa. The results showed no significant change in the classification accuracy. This suggests that the fluctuations seen in the video are not entirely random and that the neural network has learned them.
The results also show that, for fibers up to 10 m in length, phase-modulated input provides slightly better classification accuracies, probably due to the more uniform distribution of the injected light across the fiber modes. For amplitude-modulated inputs, there is a selective spatial excitation of the fiber modes at the input facet, which may limit the number of modes that actually participate in transporting the information. On the other hand, for the 1 km fiber, the amplitude-modulated proximal input image provides better classification accuracies. This can be attributed to the fact that as light propagates for long distances, the information is distributed to all the modes.
In order to improve the classification accuracies, the neural network was also trained with the reconstructed SLM input images. As shown in Table 1 and by the normalized confusion matrices in Fig. 7, this provided a significant increase in classification accuracy. For the 1 km case, there is a general confusion between the 4 and 9 s digits, and between the 3, 5, 6, and 8 s ones. The similarities between these classes are also evident in the reconstructed SLM input images for the 1 km fiber shown in Fig. 4.
The main result of this paper is that DNNs can efficiently reconstruct and recognize the inputs to a MMF from intensity-only measurements of its corresponding output. The measured classification accuracy was excellent for the 10 m fiber (96.8%) and reduced to 69.9% for the 1 km fiber. In all cases, the classification performance improved when we first used the U-net DNN to reconstruct the input image followed by a second DNN (VGG) that was trained to classify the reconstructed images. We attribute this to the fact that for the longer fiber the mapping from input to output becomes a random mapping  and objects that are similar to one another at the input are dispersed in the intensity measurement at the distal end. Therefore, the ability of the classifier DNN to generalize (recognize objects it has not seen before) diminishes for longer fibers. When we first recover the input images with a U-net DNN, the random mapping is partially inverted and the classification network can recognize objects of the same class it has not seen before. This behavior is evidenced in Fig. 8, where the classification performance of the VGG network when trained with the intensity of the raw speckle patterns is plotted as a function of iteration number during the learning process for the 10 m [Fig. 8(a)] and the 1 km [Fig. 8(c)] fibers. For the 10 m fiber, the classification accuracy is the same for the training and validation sets. On the contrary, for the 1 km fiber in a steady state, the training set is memorized well, but the validation set is classified accurately only 29.3% of the time. The discrepancy in recognition rate between the training set and validation sets is an indication that the network is not able to generalize well. Also shown in Fig. 8 are the learning curves when training the VGG network with the reconstructed images from the U-nets. In this case, the recognition rate is the same for the training and validation sets. In general, we can improve the recognition rate on the validation and test sets while decreasing the performance on the training set by reducing the number of weights in the network and/or increasing the size of the training set.
The recognition or reconstruction of the field at the proximal end of a MMF from complex field measurements at the distal end can be considered as an alternative to the DNN-based inversion methods we describe in this paper. The simplicity of intensity-only detection is a clear advantage in practice. At the same time, linear inversion methods (e.g., the transmission matrix) learn the fiber, not the inputs. In other words, any input can be recognized or reproduced. DNNs, on the other hand, are trained on a class of objects and rely on statistical averaging within that class. In principle, the performance of the transmission matrix method should be independent of fiber length. However, as the fiber length is increased, additional background noise accumulates at the output because of scattering at the core for the fiber and the core–cladding interface. In addition, the temperature and mechanical instabilities that contaminate the measured data are to some extent learned by the DNN (Supplement 1, see Visualization 1) whereas they directly degrade the reconstructions of coherent methods. Finally, the neural network can be directly trained to reproduce or recognize the versions of the input images as they are stored in the computer. Any nonlinearities, aberrations, speckle, pixilation, phase wrapping, or other distortions that are introduced before the light enters the input facet of the MMF [i.e., Fig. 2(a) versus Fig. 2(b)] are conveniently accounted for.
These results can provide solutions to various fiber applications, such as unscrambling of the propagated field in spatial multiplexing for telecommunication or endoscopy. Concerning the time frame for the DNNs to classify and reconstruct an image for in vivo applications, it is important to point out that the time-consuming part of the process is the DNN training stage. Once the DNN weights have been optimized, it effectively becomes a look-up table and can be rapidly applied to any input in milliseconds, depending on the available computational hardware.
We have shown that DNNs can recognize and reconstruct distorted images at the output of a MMF. The DNNs can directly recover the phase of the input image. This is a doubly nonlinear mapping due to the square law at the output and the exponential dependence of the field in phase. An interesting subject to explore in the future is whether DNNs recognize inputs when the intensity of the light source is increased to the point where nonlinear effects significantly impact the output intensity patterns. A related question is whether neural network controllers [30–32] can be used to control nonlinear light propagation in MMFs [31,32].
CEPF SFA, CERAMIC X.0: High-precision micro-manufacturing of ceramics; Bertarelli Program in Translational Neuroscience and Neuroengineering (10271).
See Supplement 1 for supporting content.
1. D. J. Richardson, J. M. Fini, and L. E. Nelson, “Space-division multiplexing in optical fibres,” Nat. Photonics 7, 354–362 (2013). [CrossRef]
2. B. A. Flusberg, E. D. Cocker, W. Piyawattanametha, J. C. Jung, E. L. M. Cheung, and M. J. Schnitzer, “Fiber-optic fluorescence imaging,” Nat. Methods 2, 941–950 (2005). [CrossRef]
3. E. Spitz and A. Wertz, “Transmission des images à travers une fibre optique,” Comptes Rendus Hebdomadaires Des Seances De L Academie Des Sciences Serie B 264, 1015 (1967).
4. A. Gover, C. P. Lee, and A. Yariv, “Direct transmission of pictorial information in multimode optical fibers,” J. Opt. Soc. Am. 66, 306–311 (1976). [CrossRef]
5. Y. Choi, C. Yoon, M. Kim, T. D. Yang, C. Fang-Yen, R. R. Dasari, K. J. Lee, and W. Choi, “Scanner-free and wide-field endoscopic imaging by using a single multimode optical fiber,” Phys. Rev. Lett. 109, 203901 (2012). [CrossRef]
6. A. M. Caravaca-Aguirre, E. Niv, D. B. Conkey, and R. Piestun, “Real-time resilient focusing through a bending multimode fiber,” Opt. Express 21, 12881–12887 (2013). [CrossRef]
7. D. B. Conkey, N. Stasio, E. E. Morales-Delgado, M. Romito, C. Moser, and D. Psaltis, “Lensless two-photon imaging through a multicore fiber with coherence-gated digital phase conjugation,” J. Biomed. Opt. 21, 045002 (2016). [CrossRef]
8. D. B. Conkey, E. Kakkava, T. Lanvin, D. Loterie, N. Stasio, E. Morales-Delgado, C. Moser, and D. Psaltis, “High power, ultrashort pulse control through a multi-core fiber for ablation,” Opt. Express 25, 11491–11502 (2017). [CrossRef]
9. I. N. Papadopoulos, S. Farahi, C. Moser, and D. Psaltis, “High-resolution, lensless endoscope based on digital scanning through a multimode optical fiber,” Biomed. Opt. Express 4, 260–270 (2013). [CrossRef]
10. T. Čižmár and K. Dholakia, “Exploiting multimode waveguides for pure fibre-based imaging,” Nat. Commun. 3, 1027 (2012). [CrossRef]
11. E. R. Andresen, G. Bouwmans, S. Monneret, and H. Rigneault, “Toward endoscopes with no distal optics: video-rate scanning microscopy through a fiber bundle,” Opt. Lett. 38, 609–611 (2013). [CrossRef]
12. S. M. Popoff, G. Lerosey, M. Fink, A. C. Boccara, and S. Gigan, “Controlling light through optical disordered media: transmission matrix approach,” New J. Phys. 13, 123021 (2011). [CrossRef]
13. R. N. Mahalati, D. Askarov, J. P. Wilde, and J. M. Kahn, “Adaptive control of input field to achieve desired output intensity profile in multimode fiber with random mode coupling,” Opt. Express 20, 14321–14337 (2012). [CrossRef]
14. K. Aoki, A. Okamoto, Y. Wakayama, A. Tomita, and S. Honma, “Selective multimode excitation using volume holographic mode multiplexer,” Opt. Lett. 38, 769–771 (2013). [CrossRef]
15. R. Di Leonardo and S. Bianchi, “Hologram transmission through multi-mode optical fibers,” Opt. Express 19, 247–254 (2011). [CrossRef]
16. I. M. Vellekoop, “Feedback-based wavefront shaping,” Opt. Express 23, 12189–12206 (2015). [CrossRef]
17. S. Aisawa, K. Noguchi, and T. Matsumoto, “Remote image classification through multimode optical fiber using a neural network,” Opt. Lett. 16, 645–647 (1991). [CrossRef]
18. R. K. Marusarz and M. R. Sayeh, “Neural network-based multimode fiber-optic information transmission,” Appl. Opt. 40, 219–227 (2001). [CrossRef]
19. D. Psaltis, D. Brady, and K. Wagner, “Adaptive optical networks using photorefractive crystals,” Appl. Opt. 27, 1752–1759 (1988). [CrossRef]
20. Y. Owechko, G. J. Dunning, E. Marom, and B. H. Soffer, “Holographic associative memory with nonlinearities in the correlation domain,” Appl. Opt. 26, 1900–1910 (1987). [CrossRef]
21. G. Van der Sande, D. Brunner, and M. C. Soriano, “Advances in photonic reservoir computing,” Nanophotonics 6, 561–576 (2017). [CrossRef]
22. A. N. Tait, T. F. de Lima, E. Zhou, A. X. Wu, M. A. Nahmias, B. J. Shastri, and P. R. Prucnal, “Neuromorphic photonic networks using silicon photonic weight banks,” Sci. Rep. 7, 7430 (2017). [CrossRef]
23. Y. Shen, N. C. Harris, S. Skirlo, M. Prabhu, T. Baehr-Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, D. Englund, and M. Soljačić, “Deep learning with coherent nanophotonic circuits,” Nat. Photonics 11, 441–446 (2017). [CrossRef]
24. U. S. Kamilov, I. N. Papadopoulos, M. H. Shoreh, A. Goy, C. Vonesch, M. Unser, and D. Psaltis, “Learning approach to optical tomography,” Optica 2, 517–522 (2015). [CrossRef]
25. A. Sinha, J. Lee, S. Li, and G. Barbastathis, “Lensless computational imaging through deep learning,” Optica 4, 1117–1125 (2017). [CrossRef]
26. Y. Rivenson, Z. Göröcs, H. Günaydin, Y. Zhang, H. Wang, and A. Ozcan, “Deep learning microscopy,” Optica 4, 1437–1443 (2017). [CrossRef]
27. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in International Conference on Learning Representations (ICLR) (2015).
28. O. Ronneberger, P. Fischer, and T. Brox, “U-net: convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention– MICCAI 2015, N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi, eds. (Springer, 2015), Vol. 9351, pp. 234–241.
29. A. Saade, F. Caltagirone, I. Carron, L. Daudet, A. Drémeau, S. Gigan, and F. Krzakala, “Random projections through multiple optical scattering: approximating Kernels at the speed of light,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2016), pp. 6215–6219.
30. D. Psaltis, A. Sideris, and A. A. Yamamura, “A multilayered neural network controller,” IEEE Control Syst. Mag. 8(2), 17–21 (1988). [CrossRef]
31. R. Guenard, K. Krupa, R. Dupiol, M. Fabert, A. Bendahmane, V. Kermene, A. Desfarges-Berthelemot, J. L. Auguste, A. Tonello, A. Barthélémy, G. Millot, S. Wabnitz, and V. Couderc, “Nonlinear beam self-cleaning in a coupled cavity composite laser based on multimode fiber,” Opt. Express 25, 22219–22227 (2017). [CrossRef]
32. Z. Liu, L. G. Wright, D. N. Christodoulides, and F. W. Wise, “Kerr self-cleaning of femtosecond-pulsed beams in graded-index multimode fiber,” Opt. Lett. 41, 3675–3678 (2016). [CrossRef]