Optics-free imaging of complex, non-sparse and color QR-codes with deep neural networks

Soren Nelson; Evan Scullion; Rajesh Menon

doi:10.1364/OSAC.403295

Lensless cameras [1] have been studied extensively because they enable “seeing” around corners either with “natural or accidental” cameras [2] or with active illumination, [3] and generally promise smaller, simpler and cheaper cameras [4]. Phase-only diffractive masks with no lenses have been used for 3D imaging [5], absorption-free color and multi-spectral imaging [6]. Lensless imaging can be applied for deep-brain microscopy as well [7,8]. Imaging of simple objects via transport of incoherent signals over multi-mode fibers and machine learning has also been demonstrated [9]. By replacing all optics with a transparent window we have shown “see-through” computational cameras [10]. Although there are many other examples of lensless imaging with coherent light, here we are interested in only cameras that perform imaging of incoherent light for generality. Machine-learning has also been widely applied to lensless imaging, primarily for image interpretation for human consumption [11] and less widely for image classification or inferencing directly from the raw (non-human) images [12].

Optics-free imaging has been explored in microscopy with both partially-coherent as well as incoherent, self-emitting objects (fluorophores) [13]. However, these approaches are limited to either high temporal coherence (narrow spectral bands) or high spatial coherence, and as far as we are aware, have not been explored for imaging beyond microscopy. Furthermore, previous approaches required the object to typically be in close proximity (<1mm) to the sensor. In this paper, we demonstrate imaging of objects with low temporal and low spatial coherence (a liquid-crystal display) with the object spaced 1mm or more away from the sensor. Previously, we demonstrated imaging with an optics-free camera (using only the bare image sensor) of simple objects [14]. Machine-learning was used to perform classification of these simple objects without image-reconstructions for human consumption [15]. Such non-anthropocentric cameras promise enhanced privacy, among other advantages. However, it is not clear, if more complex objects could be imaged in the same manner. Compared to our previous work, where only images from the relatively simple MNIST dataset were considered, here we utilize arbitrary QR codes, which have much higher spatial frequencies (see discussion in supplement). Furthermore, identification of QR codes is a very useful application. Rather than using regularization-based singular-value decomposition, here we utilize a deep artificial-neural network (ANN) to perform the conversion from the Machine (“raw sensor”) image to human-readable form. We note that such conversion may be unnecessary in the future, when only machine inferencing is required [12].

Similar to our previous work [14,15], the experiment is performed with a bare image sensor (Mini-2MP-Plus, Arducam) placed at a distance z away from a liquid-crystal display (LCD, Acer G276HL 1920 X 1080). The QR code is displayed on the LCD as illustrated in Fig. 1(a). The QR code is comprised of 21 X 21 boxes, each of which can be either white (or colored in the case of color codes) or black. There is a white 4-box-wide frame around the periphery of the code, resulting in a total size of 29 X 29 boxes. The physical size of 21 X 21 boxes is 6mm X 6mm, resulting in a box width of 286μm. The sensor pixel width is 2.2μm. Each QR code is created using Python (“qrcode” library), with randomly generated 10-character strings. The experiment was performed in a dark room with only the LCD turned on. The exposure time for each image was 66ms.

Fig. 1. Details of Experiments. (a) Photograph of setup. (b) Architecture of artificial-neural network. (c) Exemplary output image. The corresponding input image from the sensor is shown in (a).

Download Full Size | PDF

Previously, we used the singular-value decomposition (SVD) method with regularization to invert a transfer function in order to obtain the images for human consumption from the raw data [14]. Here, we train artificial neural networks (ANNs) to achieve the same results. ANNs have the main advantage that they can scale to larger images and higher resolutions in a more efficient fashion. Furthermore, their performance can be enhanced by additional data and transfer learning. Finally, although not demonstrated here, ANNs could be adapted to perform inferencing directly from the raw data and bypass the image reconstruction step entirely.

We utilized an ANN based on the encoder-decoder architecture as shown in Fig. 1(b) [11]. A separate ANN was trained for each value of z=1 mm, 5 mm and 10 mm. Each ANN was comprised of 87 hidden layers and trained on 100 passes over a set size of 30,000 training images. Subsequently, the trained network was validated with a set of 5,000 images, which were excluded from the training set. Each batch size had 32 frames from the sensor. Each frame was comprised of 3 color channels, each of size 320 X 240 sensor pixels. The output of the ANN was a single frame of size 320 X 240 pixels. The frame sizes at each major layer of the network is illustrated in Fig. 1(b). We empirically determined that the images with full sensor resolution performed worse, which led us to use this image size. Using the fact that a QR code is a binary array, we formulated a classification problem in which the ANN predicts a 0/1 for each box in the 29x29 QR code array. This allows us to use average classification accuracy (labelled accuracy in Fig. 2) as the metric. We also augmented the dataset (only for the monochrome codes) by synthetically rotating the images by a random angle between -5 and 5 degrees along the central axis normal to the image, which improved performance. The training and validation accuracy averaged over 30,000 (training) and 5,000 (validation) images were plotted as functions of epoch in Fig. 2(a) for z=1 mm, 5 mm and 10 mm, labelled as ANN₁, ANN₅ and ANN₁₀, respectively. The smallest value of z performs the best. This is expected, since in any optics-free system, free-space propagation over longer distances will tend to increase the mixing of the spatial details of the image. Exemplary images reconstructed by each ANN are illustrated in Fig. 2(b). Clearly, qualitatively good reconstructions of the QR code are obtained at z=1 mm. However, we note that reconstruction is not of sufficient fidelity for reading by a conventional QR-code scanner.

Fig. 2. Results from the ANNs. (a) Accuracy for training and validation with z=1 mm, 5 mm and 10 mm. (b) Exemplary sensor, ground truth and ANN-output images z=1 mm and 5 mm. More images are included in the supplement. (c) Effect of defocus on accuracy. Accuracy is highest at the trained value of z (“focus”).

Download Full Size | PDF

The sensitivity of our optics-free camera is important for practical applications. In order to study this, we experimentally collected images, while varying the gap (z) from 1 mm to 10 mm (100 images were captured at each gap), and then used each of the previously trained ANNs to reconstruct the images, and computed the average accuracy at each gap (Fig. 2(c)). The changing gap is equivalent to defocus in a lensed camera. As expected, each ANN performs best at the gap that it was trained on (“focus”). The rate of degradation of accuracy with z seems to be similar for all 3 ANNs. We conclude from this study that focus precision of -/+ 0.5 mm should be sufficient to maintain accuracy to within ∼10% of its peak value. We also explored the robustness of the ANNs to other perturbations such as relative translation and rotation between the sensor and the object. The results, summarized in the supplement suggest that the ANNs can be very robust to small perturbations of the system.

Finally, we collected color QR codes on a white background. The QR codes were equally divided between the 3 primary colors: red, green and blue. The same network architecture was used as before, but the output was of size 29x29x3. The ANNs were trained over 30,000 images and validated over 5,000 images. The results are summarized in Fig. 3(a). Exemplary images for each of the 3 colors are shown for z=1 mm (Fig. 3(b)), 5 mm (Fig. 3(c)) and 10 mm (Fig. 3(d)).

Fig. 3. Imaging primary-colored QR codes. (a) Training and validation accuracy of the ANNs. Exemplary images of 3 colors for z=1 mm (b) 5 mm (c) and 10 mm (d).

Download Full Size | PDF

Next, we also trained a new network (ANN_1C*) with QR codes of 6 colors (including 3 non-primary colors). A total of 60,000, 5000 and 1000 images were used for training, validation and testing, respectively. The experiments were performed for z=1mm, and an accuracy of 96% was achieved after 70 epochs (Fig. 4(a)). Exemplary images are also illustrated in Fig. 4(b). In order to demonstrate the versatility of the approach, we also trained another network on a dataset acquired by displaying colored emojis (Emojipedia.org) on the LCD. 30,000, 5500 and 1110 images were used for training, validation and testing, respectively. An accuracy of 77% was achieved. All experiments in Fig. 4 were performed at z=1mm. Also, note that although the QR codes contain binary values in each color channel, the emojis contained 8-bit values.

Fig. 4. Imaging non-primary-colored QR codes and Emojis, all at z=1 mm. (a) Training and validation accuracy of the ANN_1C*. (b) Exemplary results from the 6 colors used. (c) Exemplary color-emojis results. Left to right: sensor image, ground truth and ANN output.

Download Full Size | PDF

Finally, we also explored the impact of sensor noise on ANN reconstructions by synthetically adding Gaussian noise with 0 mean and varying standard deviations, and then processing these with the trained ANN₁ (monochrome QR codes at z=1mm). The results summarized in the supplement show that our approach can attain reconstruction accuracy of 90% even with 10% standard deviation noise. In conclusion, we demonstrated an optics-free camera comprised of a trained artificial neural network and a bare image sensor that is able to convert the raw sensor frames to human-recognizable images. It is important to note that our approach does not rely on high temporal or spatial coherence, and can be used for fast imaging of objects spaced 1mm or more from the sensor. Such optics-free cameras have the potential to enable ultra-thin, lightweight, and inexpensive application-specific non-anthropocentric imaging.

Funding

National Science Foundation (1533611); National Institutes of Health (10052061).

Acknowledgments

We would like to thank Z. Pan and R. Guo for fruitful discussion, and assistance with experiments and software.

Disclosures

RM: University of Utah (P).

See Supplement 1 for supporting content.

References

1. V. Boominathan, J. K. Adams, M. S. Asif, B. W. Avants, J. T. Robinson, R. G. Baraniuk, A. C. Sankaranarayanan, and A. Veeraraghavan, “Lensless imaging: A computational renaissance,” IEEE Signal Process. Mag. 33(5), 23–35 (2016). [CrossRef]

2. A. Torralba and W. T. Freeman, “Accidental pinhole and pinspeck cameras: revealing the scene outside the picture,” IEEE Computer Vision and Pattern Recognition (CVPR)2012.

3. A. Velten, T. Willwacher, O. Gupta, A. Veeraraghavan, M. G. Bawendi, and R. Raskar, “Recovering three-dimensional shpare around a corner using ultrafast time-of-flight imaging,” Nat. Commun. 3, 745 (2012). [CrossRef]

4. P. R. Gill and D. G. Stork, Lensless ultra-miniature imagers using odd-symmetry spiral phase gratings, Comput. Opt. Sens. Imag. (2013).

5. N. Antipa, G. Kuo, R. Heckel, B. Mildenhall, E. Bostan, R. Ng, and L. Waller, “DiffuserCam: lensless single-exposure 3D imaging,” Optica 5(1), 1–9 (2018). [CrossRef]

6. P. Wang and R. Menon, “Ultra-high sensitivity color imaging via a transparent diffractive-filter array and computational optics,” Optica 2(11), 933–939 (2015). [CrossRef]

7. G. Kim and R. Menon, “An ultra-small 3D computational microscope,” Appl. Phys. Lett. 105(6), 061114 (2014). [CrossRef]

8. G. Kim, N. Nagarajan, E. Pastuzyn, K. Jenks, M. Capecchi, J. Sheperd, and R. Menon, “”Deep-brain imaging via epi-fluorescence computational cannula microscopy,”,” Sci. Rep. 7(1), 44791 (2017). [CrossRef]

9. N. Shabairou, E. Cohen, O. Wagner, D. Malka, and Z. Zalevsky, “Color image identification and reconstruction using artificial neural networks on multimode fiber images: Towards an all-optical design,” Opt. Lett. 43(22), 5603–5606 (2018). [CrossRef]

10. G. Kim and R. Menon, “Computational imaging enables a “see-through” lensless camera,” Opt. Express 26(18), 22826–22836 (2018). [CrossRef]

11. G. Barbastathis, A. Ozcan, and G. Situ, “On the use of deep learning for computational imaging,” Optica 6(8), 921–943 (2019). [CrossRef]

12. Z. Pan, B. Rodriguez, and R. Menon, “Machine-learning enables Image Reconstruction and Classification in a “see-through” camera,” OSA Continuum 3(3), 401–409 (2020). [CrossRef]

13. A. Ozcan and E. McLeod, “Lensless imaging and sensing,” Annu. Rev. Biomed. Eng. 18(1), 77–102 (2016). [CrossRef]

14. G. Kim, K. Isaacson, R. Palmer, and R. Menon, “Lensless photography with only an image sensor,” Appl. Opt. 56(23), 6450–6456 (2017). [CrossRef]

15. G. Kim, S. Kapetanovic, R. Palmer, and R. Menon, “Lensless-camera based machine learning for image classification,” arXiv:1709.00408 [cs.CV] (2017).

Optics-free imaging of complex, non-sparse and color QR-codes with deep neural networks

Abstract

Funding

Acknowledgments

Disclosures

References

Supplementary Material (1)

Cited By

Figures (4)

OSA Continuum