LumiConSense, a transparent, flexible, scalable, and disposable thin-film image sensor has the potential to lead to new human-computer interfaces that are unconstrained in shape and sensing-distance. In this article we make four new contributions: (1) A new real-time image reconstruction method that results in a significant enhancement of image quality compared to previous approaches; (2) the efficient combination of image reconstruction and shift-invariant linear image processing operations; (3) various hardware and software prototypes which, realize the above contributions, demonstrating the current potential of our sensor for real-time applications; and finally, (4) a further higher quality offline reconstruction algorithm.
© 2014 Optical Society of America
1. Introduction and previous work
Advances in material science as well as in mechanical and optical engineering will enable entirely new applications. Imagine car windshields that can sense passengers and shopping windows that can recognize customers. Glasses can track gaze direction, while floors can track crowds. 2D screens no longer have to be touched, and 3D screens can be interacted with from a distance. Plastic or glass containers can be aware of their content, and our clothing can be aware of our environment. As a small step toward this grander goal, we are developing LumiConSense, the first image sensor to use a luminescent concentrator (LC), a transparent polycarbonate film doped with fluorescent dyes. LumiConSense is transparent, and has no integrated circuits or other structures such as grids of optical fibers or photodiodes; flexible, making curved sensor shapes possible; scalable, in that the sensor can range from small to large sizes at a similar cost, and the pixel size is not restricted to the photodiode size; and disposable in that the sensing area is low-cost and replaceable.
Related imaging approaches that are based on organic photodiodes [1–3] or silicon photodiodes [4, 5] enable curved and flexible sensors. Due to integrated circuits, however, they are neither fully transparent nor arbitrarily flexible and scalable. Common touch sensor technology, such as [6–9] is mainly limited to planar shapes and interaction through direct touch. These are constraints that we want to overcome with LumiConSense. Luminescent concentrators are described in detail in . While they have been used earlier for simple (laser-)point imaging [11–14], we have been the first to show that reconstruction of entire grayscale images is possible with LC-based sensors .
In our previous work, we have presented the optical design, manufacturing details, basic image reconstruction and calibration methods , as well as an approach for lensless multi-focal imaging that enables LumiConSense to sense distant objects and estimate their depth without additional optics . We summarize the basic principles of LumiConSense in section 2.
This article makes four new contributions: (1) In section 3, a new machine learning image reconstruction algorithm is presented which calibrates LumiConSense with a set of training images that are randomly selected from online image-databases, such as Flickr. In combination with linear regression, this leads to a significant enhancement of real-time image reconstruction compared to previous approaches. (2) Section 4 describes how general rules of linear systems can be applied to efficiently combine image reconstruction with common image processing operations, such as convolution and integration. (3) Enabled by the improved image reconstruction and embedded image processing pipeline, we outline three hardware prototypes (cf. Figs. 10(a)–10(c): a curved sensor, a planar sensor with front-projection screen, and a vertical transparent sensor), and four sample demonstrators (6DoF marker tracking, 3D hand-tracking and gesture recognition, a shadow controlled game interface, and 2D scanning) that illustrate LumiConSense’s current application potential. They are described in section 6. (4) Finally, we further present an offline method for image reconstruction using more sophisticated non-linear machine learning techniques in section 5.
2. Imaging with thin-film luminescent concentrators
Our LC consists of a transparent thin-film polycarbonate (Bayer Makrofol LISA Green) that is doped with a fluorescent dye. Depending on the fluorophore used, a particular band of the light spectrum that penetrates the film is partially absorbed and emitted in longer wavelengths. While only a lower amplitude of the affected band is transmitted, all remaining wavelengths pass the film uninfluenced. The LC shown in Fig. 1(a) absorbs part of the blue light and emits green light. The emitted light is mostly trapped inside the LC film due to total internal reflection. It propagates inside the LC towards the outer edges while losing energy over the travel distance according to the Beer-Lambert law, i.e., I = I0e−μd, where I0 and I are the intensities of incident light and the transmitted light, d is the travel distance, and μ the absorption coefficient of the material. Thus, the LC film acts as a two-dimensional light guide.
Different light impulses that penetrate the LC simultaneously at various positions on the film surfaces accumulate to a complex light integral inside the LC. If these local light impulses are caused by focused image points, then the challenge of image reconstruction lies in measuring and decoding the accumulated light integral to estimate the corresponding image that caused it.
As explained in , the optical solution to this problem is to multiplex the light integral into a two-dimensional light field, l(x, ϕ), before it reaches the LC edges where it is measured. This is achieved by cutting out triangular areas at various spatial positions, x, along the four LC edges, and by filling them with a light-blocking material (plasticine in our case), as shown in Fig. 1(b). The gaps in between the remaining LC material (also triangularly shaped) through which light can still propagate represent simple one-dimensional pinhole apertures that multiplex the passing light signal with respect to its direction of incidence, ϕ, before it is projected onto the LC edge. The spatio-directionally multiplexed light signal is then forwarded with optical fibers from the LC edges to external (linearized) line-scan cameras, where it is measured. This allows the LC film to be flexible and have a non-planar shape that is not constrained to the shape of the photosensor technology used for measurement. How to determine the ideal dimensions of these triangle apertures and additional details on material, manufacturing, and optical configurations are described in .
Mathematically, the light transport can be formulated with
In , the coefficients of T are determined by measuring the resulting basis light-fields, li(x,ϕ) − e, for single light impulses projected (with a video projector) at each discrete image point position, pi, on the sensor surface. Thereby, each basis light-field represents one column of T. Thus, T is of size i × j, where j is the number of photosensor measurements and i is the image resolution. For further information on calibration and linearization, we refer the reader to .
With offline calibrated T and e, and online measured l, tomographic image reconstruction techniques can be applied to estimate p. While17] that leads to blurred reconstructions, solving Eq. (1) iteratively for p results in blur-free images, as achieved with filtered backprojection  and ideal filter kernels that represent the sampling PSF optimally. In [15, 16], a combination of the biconjugate gradients stabilized method (BiCGStab)  and the simultaneous algebraic reconstruction technique (SART) , called BiSART, was used for image reconstruction.
The signal-to-noise ratio (SNR) of the line-scan cameras sets a fundamental limitation for the reconstruction quality that is achievable with the approach described above. With an increasing image resolution and a decreasing size of the image points, pi, the SNR of the basis light-fields, li(x, ϕ), that are measured for calibrating T drops dramatically. As a consequence, only relatively poor and low-resolution reconstruction results have been possible previously.
3. Enhanced learning-based image reconstruction
To overcome this limitation, we propose a new machine learning technique to determine the inverse light-transport matrix, T−1, directly during calibration from a large set of training images, and to implement online image reconstruction interactively using simple matrix-vector multiplication:20].
During calibration, we randomly select images from online image-databases, such as Flickr, project them onto the LC surface at a high resolution, and measure the corresponding integral light-fields. Since the entire LC surface is illuminated during one measurement rather than only a single image point, the SNR is constantly high and independent from the calibrated image resolution. Figure 2 illustrates the sample training set of 60,000 Flickr images that was used in our experiments.
With n trained image-measurement pairs, T−1 is calculated by linear regression:Eq. (4) simultaneously solves i systems of linear equations–one for each image pixel–that are underdetermined only if the number of image-measurement pairs n is lower than the number of sensor elements on the line scan cameras. Thus, the estimation of the inverse light transport matrix is not limited by the number of image pixels i.
Figure 3 compares reconstruction results that have previously been possible with BiSART and the row-wise calibration of T (as summarized in section 2 and explained in detail in ) with the results that are achieved with our new approach.
While image quality enhances significantly, image resolution is no longer dominantly limited by the SNR of the sensor but by the sampling rate of the integral light-field measurements. With 4 × 32 = 128 positional samples (i.e., triangle apertures, x), and 54 directional samples per position (i.e., optical fibers / photosensors for each triangle aperture, ϕ) in our prototype, a total of 128 × 54 = 6, 912 light samples are recorded. We found that this sampling rate is too sparse to reconstruct additional image details in resolutions higher than 64 × 64 pixels (see Fig. 5).
Figure 4 indicates that the reconstruction error (when compared to the ground truth at the reconstruction resolution with root-mean-square error (RMSE) and structural similarity index (SSIM) ) saturates at different levels that depend on the desired resolution. While low resolutions reach a high optimum quickly, larger resolutions saturate slower at a lower optimum. Figure 9 presents additional reconstruction results at a resolution of 64 × 64 pixels. Note, that these images have been recorded while being focussed onto the LC surface with a video projector, and that none of them was part of the training set. By implementing Eq. (3) on the GPU and with an adjusted exposure time of 50 ms, we achieve an imaging rate (measurement and reconstruction) of approximately 8–10 fps and an average RMSE (unfiltered, for 4,000 test images) of 0.021. With BiSART, a highest reconstruction resolution of 32 × 32 pixels and an average RMSE of 0.0874 could not be exceeded.
4. Linear operation rules
Since Eq. (3) is linear, we can apply general rules of linear systems to efficiently combine image reconstruction with common image processing operations. The most obvious linear systems rule is homogeneity (scalar rule), which defines thatEquations (5) and (6) imply that arbitrarily complex image processing operations, formulated as a combination of convolution and scaling, can be carried out directly as part of the image reconstruction, instead of performing them additionally after image reconstruction. Thus, the time needed for image reconstruction plus image processing is constant and does not exceed the time needed for stand-alone image reconstruction (i.e., a simple matrix-vector multiplication in our case). To support this, the inverse light-transport matrix has to be (offline) updated.
For updating the inverse light-transport matrix, T−1, with respect to the desired image processing operation (an arbitrary, possibly non-separable 2D image filter), its i column vectors have to be first reshaped to matrices of the reconstruction resolution. The image processing operation is then applied to each of these matrices before they are reshaped back to column vectors of the new, updated inverse light-transport matrix. This is illustrated in Fig. 6 for several classical filter operations. Figure 7 presents images that have been reconstructed with one of the updated inverse light-transport matrices that are shown in Fig. 6.
Additivity is the last linear systems rule that we can apply to efficiently compute temporal integral images. They find applications in, for instance, coded exposure imaging for motion deblurring  or motion history imaging for hand gesture recognition  and body movement classification . Equation (7) states that summing τ reconstructed images equals summing their corresponding sensor measurements and reconstructing the integrated values:23, 24]: Figure 8 illustrates examples computed by summing two and three measurement sets of projected static images before reconstruction, as well as a video sequence of temporal integral images (using Eqs. (7) and (8)) of a hand moving over the LC surface (summing a window of five subsequent measurement sets for each temporal integral image, i.e. τ = 5).
5. Image reconstruction with non-Linear methods
Our previous linear regression approach combined with linear operation rules is designed to perform in real-time, and ultimately trades some image reconstruction quality for efficiency. For other offline imaging applications, more sophisticated machine learning techniques can be used to achieve higher quality image reconstructions. In this section, we describe a kernel-based method for image reconstruction [25–27]. By making use of the representer theorem , we can rewrite the reconstructed image, p, as a linear combination of kernel functions for some coefficients . The function K is a kernel function evaluated using all the training points, , and the current input, (l − e) (i.e. the measured light signal). The associated optimization problem becomes:
The coefficients, when placed into a n × i matrix C, are computed by minimizing Eq. (9):29] for computing the solution of the optimization problem. Notably, the optimization problem in Eq. (9) introduces two extra parameters that need to be estimated: the regularization term λ and the scale factor of the RBF kernel σ. For estimating these parameters we used a grid search on two intervals [λmin, λmax] and [σmin, σmax].
The bounds of the interval strongly depends on the training data. The σ bounds are determined considering the minimum and the maximum distances between the training sensor signals, the regularization boundaries instead are computed from the minimum and maximum eigenvalue of the kernel matrix. All these functionalities are implemented in GURLS .
At run time an image, p, is reconstructed with . Our non-linear method leads to more accurate results than the linear approach explained in section 3. However, its computational cost increases linearly with the number of training samples, n. Indeed, at testing time, each sensor vector of the training set, (l − e)j, is used to compute the kernel function with the current measured sensor signal, (l − e). Thus, for real-time applications we prefer the method explained in section 3. Figure 9 presents reconstruction results that compare our offline technique with our online method described in the previous sections. Compared to the online method, the RMSE of the offline method (unfiltered) drops from 0.021 down to 0.017 (unfiltered, for 4,000 test images).
We have built three sensor prototypes and implemented four interactive software demonstrators that are enabled by the improved reconstruction quality and the efficient combination of basic image processing and image reconstruction. Figures 10(a)–10(c) present our hardware prototypes: one concavely shaped and two planar LC sensors (one located underneath a diffuse front-projection layer, and a transparent one in a vertical alignment). For all prototypes, we used 216 mm × 216 mm × 300 μm Bayer Makrofol LISA Green thin-film polycarbonate, Jiangxi Daishing POF Co., Ltd polymethylmethacrylat (PMMA) step-index multi-mode fibers with a numerical aperture of 0.5, four CMOS Sensor Inc. M106-A4-R1 CIS (contact image sensor) with 1728 elements on 216 mm together with four programmable USB controllers (USB-Board-M106A4 of Spectronic Devices Ltd). Each of the 128 aperture triangles were 6.25 mm wide, 3.25 mm high, and spanned an aperture opening of 500 μm. A Samsung SP-M250S LCD projector was applied for calibration and for projecting evaluation images. Image reconstruction, as explained in sections 3 and 4, was implemented in CUDA/cuBLAS and is carried out on a NVIDIA GTX285 graphics processing unit (GPU). Details on optimal optical configurations, manufacturing, and calibration can be found in [15, 16].
Figures 10(d)–10(g) present four simple software applications that demonstrate the capabilities of our sensor. The examples shown in Figs. 10(d)–10(f) are based on shadows cast onto the film surface. In Fig. 10(d), 6DoF pose estimation of a rigid tracking marker is shown. The marker is transparent and made from the same LC material as the sensor. Since it absorbs the same spectrum as the sensor, the marker casts a shadow in the sensing wavelength (blue in our case) while other wavelengths are transmitted. Figure 10(e) shows the sensor placed below a diffuser. Front-projection onto the diffuser allows displaying arbitrary graphics with a constant bias in the blue color channel that is necessary for illumination. The shadow cast of a hand is analyzed to support 3D hand-tracking and gesture recognition. Figure 10(f) presents a vertical and transparent configuration that projects a video game through the LC onto a large front-projection screen. Since reconstructed and projected images are registered, detected shadow images can be used to control the video game (2D collision detection in our example). In Fig. 10(g), our sensor is used as a 2D scanner, that detects the printed image of a paper page placed on top of the sensor surface while illuminating the page from above. Hand-writing and drawings on the page while on the sensor can also be reconstructed. Based on the application, the inverse light-transport matrix used for image reconstruction was updated with adequate filter-kernel combinations, such as Gaussian filters for de-noising. All demonstrators are presented in the accompanying video. Note, that while we implemented these demonstrators with the planar sensor prototypes exclusively, the curved sensor prototype (also shown in the accompanying video) delivers equal image reconstruction results. However, an appropriate application for it, such as omnidirectional imaging, has not yet been realized and belongs to our future work.
Several limitations exist that are directly related to our hardware prototypes:
- First, the highest reconstruction resolution is mainly constrained by the sampling rate of the integral light-fields. A larger number of smaller triangle apertures, higher resolution line-scan cameras, and finer optical fibers will lead to a more dense sampling and consequently to the reconstruction of finer image details.
- Second, our prototypes are fairly fragile. A more professional manufacturing will lead to more robust devices.
- Third, we currently support only the reconstruction of back-lit objects that cast shadows onto the LC surface. This requires a back light source with a relatively small area footprint, since large area light sources result in soft and dim shadows of distant objects that are difficult to sense and reconstruct. To overcome this, and for capturing focussed front-lit objects directly, the depth-of-field of our sensor (i.e., the solid angle in which light is collected at every point on the LC surface) needs to be reduced. An optical solution is to add a second thin-film layer of micro-apertures on top of the LC layer, as explained in . How to shift focus of our sensor in axial direction to image distant objects is also described in .
- Fourth, the imaging speed of our sensors is mainly limited by the exposure time required for the line-scan cameras to achieve a proper SNR. Higher quality HDR line-scan cameras that allow lower exposure times at an adequate SNR level will increase the imaging speed.
- Fifth, the LC film used in our prototypes is greenish, and therefore not fully transparent as part of the blue spectrum is absorbed and emitted in green. For implementing a fully transparent sensor, the polycarbonate has to be doped with fluorophores that emit light at the invisible part of the spectrum (e.g., infrared or ultraviolet). This, for instance, can be achieved by absorbing red or infrared light and by emitting a longer wavelength in the infrared range.
8. Summary and future work
We presented LumiConSense, a transparent, flexible, scalable, and disposable thin-film image sensor, and made four contributions: A new real-time image reconstruction method that leads to a significant enhancement of image quality compared to previous approaches; the efficient combination of image reconstruction and shift-invariant linear image processing operations; various hard- and software prototypes that, enabled by the above contributions, demonstrate the current potential of our sensor for real-time applications; and a more sophisticated offline method for image reconstruction.
Stacking multiple LC layers enables a variety of information, such as color, dynamic range, spatial resolution, defocus, or 4D light fields to be sampled simultaneously. To investigate stacked and geometrically complex thin-film LC configurations is on the top of our research agenda. We will further investigate other image reconstruction methods, such as compressed sensing, to increase image quality and the image resolution. In contrast to widely applied touch sensors which are mainly limited to planar shapes and interaction through direct touch, LumiConSense has the potential to lead to new human-computer interfaces that are unconstrained in shape and sensing-distance.
We thank Robert Koeppe of isiQiri interface technologies GmbH for fruitful discussions and for providing LC samples. This work was supported by Microsoft Research under contract number 2012-030(DP874903) – LumiConSense.
References and links
1. T. N. Ng, W. S. Wong, M. L. Chabinyc, S. Sambandan, and R. A. Street, “Flexible image sensor array with bulk heterojunction organic photodiode,” Appl. Phys. Lett. 92(21), 213303 (2008). [CrossRef]
2. G. Yu, J. Wang, J. McElvain, and A. J. Heeger, “Large-area, full-color image sensors made with semiconducting polymers,” Adv. Mater. 10(17), 1431–1434 (1998). [CrossRef]
3. T. Someya, Y. Kato, S. Iba, Y. Noguchi, T. Sekitani, H. Kawaguchi, and T. Sakurai, “Integration of organic fets with organic photodiodes for a large area, flexible, and lightweight sheet image scanners,” IEEE T. Electron Dev. 52(11), 2502–2511 (2005). [CrossRef]
4. H. C. Ko, M. P. Stoykovich, J. Song, V. Malyarchuk, W. M. Choi, C. J. Yu, J. B. Geddes III, J. Xiao, S. Wang, Y. Huang, and J. A. Rogers, “A hemispherical electronic eye camera based on compressible silicon optoelectronics,” Nature 454(7205), 748–753 (2008). [CrossRef] [PubMed]
5. Y. M. Song, Y. Xie, V. Malyarchuk, J. Xiao, I. Jung, K. J. Choi, Z. Liu, H. Park, C. Lu, R. Kim, R. Li, K. B. Crozier, Y. Hung, and J. A. Rogers, “Digital cameras with designs inspired by the arthropod eye,” Nature 497(7447), 95–99 (2013). [CrossRef] [PubMed]
6. S. Hodges, S. Izadi, A. Butler, A. Rrustemi, and B. Buxton, “ThinSight: versatile multi-touch sensing for thin form-factor displays,” in Proceedings of the 20th annual ACM symposium on User interface software and technology (Association for Computing Machinery, New York, 2007), 259–268. [CrossRef]
7. J. Y. Han, “Low-cost multi-touch sensing through frustrated total internal reflection,” in Proceedings of the 18th annual ACM symposium on User interface software and technology, (Association for Computing Machinery, New York, 2005), 115–118. [CrossRef]
8. N. Villar, S. Izadi, D. Rosenfeld, H. Benko, J. Helmes, J. Westhues, S. Hodges, E. Ofek, A. Butler, X. Cao, and B. Chen, “Mouse 2.0: multi-touch meets the mouse,” in Proceedings of the 22nd annual ACM symposium on User interface software and technology, (Association for Computing Machinery, 2009), 33–42. [CrossRef]
9. J. Moeller and A. Kerne, “Scanning FTIR: unobtrusive optoelectronic multi-touch sensing through waveguide transmissivity imaging,” in Proceedings of the fourth international conference on Tangible, embedded, and embodied interaction, (Association for Computing Machinery, New York, 2010), 73–76. [CrossRef]
10. J. S. Batchelder, A. H. Zewail, and T. Cole, “Luminescent solar concentrators. 1: Theory of operation and techniques for performance evaluation,” Appl. Optics 18(18), 3090–3110 (1979). [CrossRef]
11. P. J. Jungwirth, “Photoluminescent concentrator based receptive fields,” Ph.D. Thesis, Simon Fraser University (1996).
12. P. J. Jungwirth, I. S. Melnik, and A. H. Rawicz, “Position-sensitive receptive fields based on photoluminescent concentrators,” P. Soc. Photo-Opt. Ins. 3199, 239–247 (1998).
13. I. S. Melnik and A. H. Rawicz, “Thin-film luminescent concentrators for position-sensitive devices,” Appl. Optics 36(34), 9025–9033 (1997). [CrossRef]
14. R. Koeppe, A. Neulinger, P. Bartu, and S. Bauer, “Video-speed detection of the absolute position of a light point on a large-area photodetector based on luminescent waveguides,” Opt. Express 18(3), 2209–2218 (2010). [CrossRef] [PubMed]
15. A. Koppelhuber and O. Bimber, “Towards a transparent, flexible, scalable and disposable image sensor using thin-film luminescent concentrators,” Opt. Express 21, 4796–4810 (2013). [CrossRef] [PubMed]
16. A. Koppelhuber, C. Birklbauer, S. Izadi, and O. Bimber, “A transparent thin-film sensor for multi-focal image reconstruction and depth estimation,” Opt. Express 22, 8928–8942 (2014). [CrossRef] [PubMed]
17. G. T. Herman, Fundamentals of Computerized Tomography: Image Reconstruction from Projections, 2nd ed. (Springer Verlag, 2010).
18. Y. Saad, Iterative Methods for Sparse Linear Systems, (Society for Industrial Mathematics, 2003). [CrossRef]
20. T. M. Buzug, Computed Tomography from Photon Statistics to Modern Cone-Beam CT (Springer Verlag, 2008)
21. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: From error visibility to structural similarity,” IEEE T. Image Process. 13(4), 600–612 (2004). [CrossRef]
22. R. Raskar, A. Agrawal, and J. Tumblin, “Coded exposure photography: Motion deblurring using fluttered shutter,” ACM T. Graphic 25(3), 795–804 (2006). [CrossRef]
23. S. Taylor, C. Keskin, O. Hilliges, S. Izadi, and J. Helmes, “Type-hover-swipe in 96 bytes: A motion sensing mechanical keyboard,” in Proceedings of the 32nd annual ACM conference on Human factors in computing systems, (Association for Computing Machinery, 2014), 1695–1704. [CrossRef]
24. A. F. Bobick and J. W. Davis, “The recognition of human movement using temporal templates,” IEEE T. Pattern Anal. 23(3), 257–267 (2001). [CrossRef]
25. V. Vapnik, Statistical Learning Theory (John Wiley and Sons, Inc., 1998).
26. A. N. Tikhonov, A. S. Leonov, and A. G. Yagola, Nonlinear Ill-Posed Problems (Chapman & Hall, 1998). [CrossRef]
27. T. Evgeniou, M. Pontil, and T. Poggio, “Regularization networks and support vector machines,” Adv. Comput. Math. 13(1), 1–50 (2000). [CrossRef]
28. B. Schölkopf, R. Herbrich, and A. J. Smola, “A generalized representer theorem,” in Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory, (Springer-Verlag, London, 2001), 416–426.
29. A. Tacchetti, P. K. Mallapragada, M. Santoro, and L. Rosasco, “GURLS: A Least Squares library for supervised learning,” J. Mach. Learn. Res. 14, 3201–3205 (2013).