We present a fully transparent and flexible light-sensing film that, based on a single thin-film luminescent concentrator layer, supports simultaneous multi-focal image reconstruction and depth estimation without additional optics. Together with the sampling of two-dimensional light fields propagated inside the film layer under various focal conditions, it allows entire focal image stacks to be computed after only one recording that can be used for depth estimation. The transparency and flexibility of our sensor unlock the potential of lensless multilayer imaging and depth sensing with arbitrary sensor shapes – enabling novel human-computer interfaces.
© 2014 Optical Society of America
1. Introduction and previous work
New findings in material science, progress in mechanical and optical engineering, and advances in computational methods have led to novel sensors and camera designs that go beyond conventional CCD or CMOS chips. Hemispherical image sensors that are inspired by the curved retina of the human eye  or the compound apposition layouts of arthropod eyes  demonstrate wide field-of-view low-aberration imaging. Woven polymeric photodetecting fibers , thin-film luminescent concentrator foils , and organic photodiodes together with ink-jet digital lithography  enable scalable image sensors with complex geometric shapes. Compressive sensing  has made lensless imaging approaches  possible.
Our sensor consists of a thin, transparent polycarbonate film, referred to as luminescent concentrator (LC), that is doped with fluorescent dyes. Light of a particular wavelength sub-band that penetrates the film is emitted in longer wavelengths, while wavelengths outside the sub-band are fully transmitted. The example shown in Fig. 1(a) absorbs blue and emits green light. The emitted light is mostly trapped inside the film by total internal reflection (TIR), and is transported with reduced multi-scattering towards the LC edges while losing energy over transport distance. The bright film edges indicate decoupling of the light integral transported to each edge point from all directions inside the LC. Luminescent concentrators are described in detail in Batchelder et al. .
The challenge of reconstructing an image that is focussed on the LC surface without in situ photodetectors was addressed previously . It lies in measuring and demultiplexing the complex light integral that leaves the edges. As illustrated in Figs. 1(b) and 1(d), we achieve this by cutting the LC border regions into a structure of adjacent triangular apertures. The apertures act as an array of one-dimensional pinholes that multiplex the light integral into a two-dimensional light field l(x, ϕ), which encodes the fractions of the light transported from a particular direction ϕ to a particular position x on the LC edges. Our new sensor prototype applies ribbons of optical fibers to transport the projected light field to line scan cameras (LSCs) for measurement (Figs. 1(b)–1(e)). Compared to glueing the LSCs directly to the LC surface (as done in ), this leads to a brighter light signal and to true flexibility of the sensor. Details of the optical design of our new prototype are provided in section 5.
The goal is to reconstruct the image that corresponds to the measured light field (more details are provided in ):
Initially, the sensor is calibrated by sequentially projecting a uniform light impulse to each discrete image point (pixel) pi on the sensor surface while measuring the resulting basis light field li(x, ϕ). These basis light-field measurements (Figs. 2(a) and 2(b)) are stored in the columns of the light transport matrix T, and represent the individual transport of light from each pixel, through the apertures and optical fibers, to each photosensor of the LSCs. Since our film sensor does not consist of physical pixels, its resolution is virtual and depends on the resolution applied to measure T. Thus, the size of T equals the number of virtual pixels (number of columns) times the number of photosensors (number of rows) of the line scan cameras.
In principle, the light transport of a complex monochrome image p focussed on the LC can be formulated as l = Tp + e, where p is the vector-form representation of the image’s pixels, and l and e are the vector-form representations of the integral light fields measured during online image recording and offline recording of the baseline light level (part of the calibration), respectively. Image reconstruction (Fig. 2(c)) can thus be achieved by solving for p = T−1(l − e). Tomographic image reconstruction techniques, such as SART (simultaneous algebraic reconstruction technique)  are well suited to addressing this problem. In practice, the resolution of the reconstructed images is limited by the signal-to-noise ratio (S/N) of the LSCs used. Higher image resolutions can be generated from LSCs with low S/N by combining the reconstruction results of multiple transport matrices calibrated with sub-pixel-shifted light impulses .
While our previous work addressed the reconstruction of a single image that is focussed on the sensor surface, the main contribution of this work is a novel solution that allows the sensor to shift its focus in axial direction without additional optics. We present an image reconstruction technique that computes an entire focal image stack after only one recording. The focal stack can then be used for depth estimation. Thus, we describe a lens-less and transparent thin-film sensor for simultaneous multi-focal imaging and depth estimation that–in contrast to widely applied touch sensors which are mainly limited to planar shapes and interaction through direct touch– has the potential to lead to new human-computer interfaces that are unconstrained in shape and sensing-distance.
2. Multi-focal image reconstruction
If the light impulses used for calibration are focussed on the LC surface (f0 in Fig. 3), the measured light transport matrix T0 can only reconstruct sharp images that are optically focussed on the LC surface. Computing images that are optically out of focus (f1 and f2) with T0 leads to defocussed reconstructions.
The focus of our sensor can, for instance, be shifted to f1 or f2 without additional optics by using transport matrices T1 or T2 for image reconstruction, calibrated with light impulses focussed at f1 or f2 respectively (Fig. 3). Instead of individually calibrating matrices that correspond to all possible focal distances, we measure only T0 and compute the others at an arbitrary range and axial step width by assigning the calculated basis light fields li,f = T0pi,f + e of a defocussed point image pi,f to column i of Tf. Each image pi,f contains a blurred (by the amount f) version of pixel i. Since in our experiments we assume that optical defocus causes a Gaussian point spread, we apply convolution with a 2D Gaussian kernel (where f correlates with the standard deviation σ of the Gaussian) to compute these images. Thus, by measuring only a single integral light field l we can determine a whole focal stack by reconstructing multiple images with transport matrices computed for varying σ. In contrast to conventional lens optics, images of objects reconstructed with our sensor at too long focal distances will result in overestimated focus as opposed to defocus (Fig. 3(b)).
A naive alternative to reconstructing focal stacks with various instances of light transport matrices Tf is to first recover the image on the LC surface pf=0 with T0, and then apply a de-convolution sequence using a Gaussian PSF with increasing σ-values to pf=0 for computing all focal stack slices. Figure 4 compares the results achieved with Lucy-Richardson deconvolution (Matlab’s deconvlucy) with our results. In both cases, the same σ-values have been applied for each corresponding focal stack slice.
The PSF of the optical image is permuted by the light-field sampling and the tomographic image reconstruction of our sensor. Both lead to locally variant, non-constant PSFs that are also influenced by noise. Since this information is contained in T0 and in all instances Tf thereof, focal stack reconstruction succeeds in our case, while it fails for deconvolution that requires a spatially constant PSF.
While the Lucy-Richardson algorithm maximizes the likelihood of the deconvolved image being an instance of the original image without knowledge of the noise level, we have also evaluated other deconvolution methods, such as Wiener filtering (Matlab’s deconvwnr) and blind deconvolution (Matlab’s deconvblind). Since the Wiener filter requires the noise-to-signal power ratio of additive noise to be given, we brute-force evaluated a whole series of possible noise levels. Blind deconvolution estimates the optimal PSF based on an initial guess (we used the Gaussian PSF with corresponding σ-value for this) for each focal stack slice individually.
In both cases, it was impossible to enhance the focal stack reconstruction over the Lucy-Richardson results presented in Fig. 4(a).
3. Depth from shadow-defocus
With reconstructed focal stacks, we can apply a variation of depth from defocus  to estimate the distance to objects in front of the sensor. Figure 5 presents results of real experiments with physical measurements.
The low S/N of the applied LSCs, however, makes the reconstruction of fine and dark blur differences impossible. For this reason, we applied back illumination to reconstruct bright images of defocussed shadows with an adequate contrast for a proof-of-concept in our experiments. Nevertheless, section 4 discusses how our approach can be extended for depth estimation from image-defocus of front-lit objects.
For depth from shadow-defocus, we found that the image variance of a focal stack slice is a good indicator of a depth-matching focal distance. It increases with increasing focus and saturates with increasing focus overestimation (Figs. 5(d) and 5(g)).
Solving Eq. (1) for dp leads to
The blur radius equals the scaled standard deviation, where s is a constant scaling factor that depends on imaging and sampling properties of the applied optics :Eq. (1) with sσ and solving for dp again yields
Thus, we register our experimental setup to physical depths by measuring (σ1, dp1) and (σ2, dp2) to determine and dl that are used in Eq. (4) to relate any σ-value derived by our depth-from-defocus algorithm to the corresponding depth in world space. Note, that the relation between dp and σ (i.e., Eq. (4)) is not linear, as shown in the plots of Figs. 5(d) and 5(g).
In our experiments, we used the top and bottom target positions shown in Figs. 5(a) and 5(e) for registration. All other depths (center in Figs. 5(a) and 5(e) and bottom, center in Figs. 5(f) and 5(h)) are based on this registration.
How the σ-values are determined for different lateral scanning regions of a reconstructed focal stack in the course of our shadow-defocus experiments is explained below:
We apply SART (simultaneous algebraic reconstruction technique)  for tomographic reconstruction of the focal stack. Its solutions are constrained between 0 and 1. For every lateral position x, y in the focal stack, we extract a sub-focal stack around a (2w + 1) × (2w + 1) lateral neighborhood window and along the entire axial depth z (number of σ-steps), and compute the image variance v(z) for each of its slices with12, 13] to v(x, y, z) along the z dimension. The size of the Parzen-Rosenblatt window depends on the reconstructed depth resolution (number of σ steps). In our experiments, we reconstructed up to 100 sigma-steps in z direction and applied a Parzen-Rosenblatt window of size 17.
For the filtered variance, we compute the first- and second-order partial derivatives v′(x, y, z) = dv(x, y, z)/dz and v″(x, y, z) = dv(x, y, z)/dz2. To determine the correct focal distance within this window region, the first slice with steepest descending variance-gradient is to be found in the axial direction z (diverging from the sensor). The point of steepest descending variance-gradient (PSDVG) is located at the first minimum of v″(x, y, z) following the maximum of v′(x, y, z) in direction z (increasing σ). This is illustrated in Figs. 5(d) and 5(g). Equation (4) can then be used to convert the corresponding σ to the registered depth. Since image regions that belong to uniform objects lead to a small variance value (independently of defocus), the PSDVG is considered unreliable if its variance is below the threshold tv. In this case, we successively increase the neighborhood window w around x, y to cover a larger axial region, until the PSDVG is reliable. Thus, we initially start with a small w and adaptively increase it with respect to the image content (object size) around x, y.
After for each lateral pixel x, y a depth value has been determined, we apply k-means clustering on the resulting depth map, whereby k depends on the maximal number of objects to be detected. For each depth cluster, we determine one PSDVG in the same way as we did for neighborhood windows.
Once all clusters c with depths zc are found and all pixels x, y are assigned to one of these clusters, we additionally subtract the background from the depth map. That is, we set all depths to 0 if S(x, y, zc) is above a foreground-intensity threshold (0.3–0.5 in our shadow-defocus experiments, as bright pixels are background and dark pixels are foreground).
Figures 6 – 8 illustrate the different steps of our depth reconstruction algorithm for a simulation of the physical experiment shown in Figs. 5(f)–5(h). In the simulation, the transport matrix T0 and the recorded blur images p were computed (assuming a perfect sensor) instead of measured. All remaining steps were identical to the physical experiment. The computation of T0 and p avoids introducing noise. Thus, in contrast to our physical experiments, the simulation results are near-optimal (no sensor influence, but remaining limitations of tomographic image reconstruction with SART). Figures 6(a) and 6(b) show the computed depth maps before and after k-means clustering. Figure 6(c) is the final result after background subtraction.
Figures 7(a) and 7(b) plot the variance and its first- and second-order partial derivatives for the two detected depth clusters. The estimated and the ground-truth PSDVGs are indicated in both cases. Figures 7(c) and 7(d) present the same plots for the measured results of our physical experiment. Figure 8 illustrates the variance plots (simulated and measured) for the experiment shown in Figs. 5(c)–5(e).
By comparing the simulation results (Figs. 7(a) and 7(b) and Figs. 8(a)–8(c)) with the measurement results from our physical experiments (Figs. 7(c) and 7(d) and Figs. 8(d)–8(f)), the influence of sensor noise for the precise determination of σ can be seen. Remaining image reconstruction artifacts caused by the SART solver lead to the PSDVG-estimation error shown in Fig. 7(b).
Figure 9 illustrates 16 × 16, 32 × 32, and 64 × 64 simulations for high-frequent sceneries with multiple depth levels. Depth reconstruction becomes more imprecise at cluster borders (where blur of various σ-values largely overlap). In this case background subtraction fails.
4. Shadow-defocus vs. image-defocus
Figure 10(a) depicts that the depth-of-field of our sensor depends on the radius rl of the light source and its distance dl to the LC. It is directly related to the cone in which each point on the LC integrates light, and can be represented by the integration angle α:
The field-of-view angle β of our sensor is constrained by the smallest incident angle of light rays that reach the LC surface. Therefore it increases with increasing sensor radius rs, growing light source radius rl, and decreasing light source distance dl:
The PSF of objects outside the field-of-view frustum or beyond the clipping distance dc cannot be (fully) imaged:
From Eq. (1), it can be seen that the blur radius rb(σ) (i.e., the amount of defocus) of an object increases as rl increases and the object-to-light distance dl − dp decreases.
For depth from shadow-defocus experiments illustrated in Fig. 5, for instance, we determined dl=538 mm and from registration. We measured a light source radius of rl=12.5 mm. This yields a scaling factor of s=4.8, an integration angle of α=2.7°, a field-of-view angle of β =14°, and a clipping distance dc=440 mm. Thus, for the farthest object distance in our experiments of dp=395 mm, the largest blur radius of rb=35 mm (or approx. 5 pixels for a pixel size of 108 mm / 16 pixels = 6.75 mm) was achieved (Eq. (1)).
Due to the low S/N, however, our sensor cannot reconstruct low PSF intensities. This leads to a truncation of the captured PSFs (i.e., low PSF intensities are clipped to zero) and consequently to an underestimation of σ. The registration procedure described in section 3 compensates for this with the scaling factor s (s >1) to estimate correct object distances dp.
Figure 10(b) and Eq. (12) present the direct relationship between the defocus of an object’s back-lit shadow, cast through a light source of radius rl at distance dl, and the same object’s front-lit image captured through an aperture of radius ra at distance da:
In the case of a pin-hole camera model with non-infinitely small aperture, the depth-of-field increases with decreasing aperture radius ra and increasing distance da:
The field-of-view angle increases with increasing sensor radius rs and aperture radius ra, and decreasing aperture distance da:
From Eqs. (9) and (10) as well as Eqs. (13) and (14), it becomes clear that field-of-view and depth-of-field in both back- and front-lit defocus cases are identical if rl = ra and dl = da. The light source has the same function as the aperture – it constrains the light rays that reach the LC surface. Only the relation between object distance and defocus reverses (Eq. (12)). While shadow-defocus decreases as the distance between the object and LC decreases, image-defocus decreases with an increasing distance to the aperture. Consequently, object points must be above the clipping distance dc to be fully imaged (Eq. (11)).
A directional restriction of light that is being integrated at every point on the LC is essential for efficient image reconstruction. Without optical constraint (e.g., a finite aperture or light source), α extends to 180° and each LC point integrates light over a hemispherical solid angle. The resulting depth-of-field would be extremely shallow – allowing to reconstruct adequately focussed images only of those objects that are extremely close to the LC surface. Increasing the depth-of-field by decreasing α extends the applicable focus (depth) range of our sensor, but reduces light-efficiency if apertures are applied.
An alternative to a pin-hole aperture that retains the thin-film characteristics of our sensor is to apply an additional light directing layer of spaced, two-dimensional microlouvers directly in front of the LC layer, as shown in Fig. 10(c). In this case, the depth-of-field increases (the integration angle α decreases) with a decreasing spacing sm and with increasing height hm of the microlouvers:
The field-of-view angle β is independent of the sensor size and increases with increasing spacing and decreasing height of the microlouversEq. (11)). The following relations exists between field-of-view angle β and integration angle α:
To achieve the same depth-of-field (with integration angle α = 2.7°) with microlouvers as in our shadow-defocus experiments, the field-of-view angle would be β = 5.4° while the clipping distance rises to dc = 1146 mm (for our 108 × 108 mm sensor). This can be achieved with a microlouver ratio of sm/hm = 0.047, for instance, with hm=300 μm (which matches the thickness of the LC film) and sm=14 μm (which is far above diffraction limit). Note, that the light absorbed by the microlouvers leads to a reduced light efficiency of the sensor which would require LSCs with a higher S/N.
5. Optical design
Figure 11 outlines our new optical design and the light multiplexing capability of our sensor. The luminescent concentrators used for our experiments were Bayer Makrofol® LISA Green LC films in two sizes: 108 mm × 108 mm × 30 μm and 216 mm × 216 mm × 30 μm. In both cases, the aperture triangles that were cut into the film with a GraphRobo Graphtec cutting plotter were 6.25 mm wide, 3.25 mm high, and span aperture openings of 500 μm. These dimensions were found by constrained optimization, as explained in . The optical fibers were Jiangxi Daishing POF Co., Ltd polymethylmethacrylat (PMMA) step-index multi-mode fibers with a numerical aperture of 0.5 which equals an acceptance cone of 60°. To improve the coupling of light into the optical fibers, a 40 μm thick layer of carbon paper has been placed between the LC edges and the fiber ribbons. It diffuses light in a wider range that better covers the optical fibers’ acceptance cones, but also decreases light intensity. To maximize the light yield of our 300 μm thick LC film, we applied best matching commercially available PMMA fibers of a 250 μm diameter. Since the size of our LSCs’ photo sensors is 125 μm, each fiber was sampled by two photo sensors. The LC edges and the ends of the fiber ribbons were protected with opaque, flexible acrylic covers. The line scan cameras were two CMOS Sensor Inc. M106-A4-R1 CIS (contact image sensor) with 1728 elements on 216 mm together with two programmable USB controllers (USB-Board-M106A4 of Spectronic Devices Ltd). The CIS modules record 10-bit gray scales with low S/N (we measured a 20-log ratio as low as 20 dB). For calibration, we used a Samsung SP-M250S LCD projector. Real-time image reconstruction was implemented using the Compute Unified Device Architecture (CUDA) language, and a NVIDIA GTX285 graphics processing unit (GPU).
We have presented a transparent, lensless thin-film image sensor that is capable of changing focus without additional optics. It allows reconstructing an entire focal stack after only one recording. The focal stack can be applied to estimate depth from defocus. By stacking multiple film sensors, it enables a variety of information, such as color, dynamic range, spatial resolution, and defocus, to be sampled simultaneously. Given that our film is flexible and low-cost, it can be leveraged as a ”smart skin” for autonomous robots or other devices and objects to sense their interactions with environments and humans in richer ways. Several limitations that are related to the current implementation of our sensor and experiments have to be discussed.
As explained earlier, the strongest limitation of our sensor is set by the signal-to-noise ratio of the applied LSCs. They are not capable of measuring small differences in radiance and very low or very high intensities. As shown by the simulation results presented in section 3, this leads to a restriction in spatial and the depth resolutions of the reconstructed focal stacks and depth values. Enhanced CIS-cameras and advances in photosensor technology will relax these constraints.
The reconstruction quality and performance of focal stacks is also limited by the image reconstruction method applied, and by both the number and length of exposure times required for adequate measurements. Both can be enhanced by considering more advanced image reconstruction techniques that are optimized for sparse and noisy data. Besides classical tomographic image reconstruction methods, such as SART, new approaches based on dictionary learning  could be suitable alternatives.
For focal stack reconstruction and depth estimation, we assume a Gaussian PSF. For our shadow-defocus experiments, the shape of the PSF, however, depends on the shape of the light source. We could include the measurement of the PSF shape in our calibration, as done for the baseline light level. For aperture-based imaging alternatives, as discussed in section 4, the corresponding PSF would be an intrinsic and constant property of the sensor.
The depth reconstruction algorithm presented in section 3 is limited to the reconstruction of shadows in our experiments. The applied descending variance-gradient measure is robust for images that tend to be binary if perfectly focused (causing a maximal variance of 0.25), but fails for arbitrary grayscale images. In case of arbitrary images, alternative blur or ringing metrics, as commonly applied for blind-deconvolution, could be used. They, however, usually require higher S/N and image resolution. Many depth-from-defocus methods (like ours) apply a focal stack for depth estimation. Alternative algorithms that estimate depth from a single defocussed image suffer from blur/sharp edge ambiguity (i.e., blurred in-focus edges and sharp out-of-focus edges are indistinguishable).
The numerical stability of image reconstruction by solving the equation system l = Tσp + e for p decreases with increasing image resolution and increasing image-defocus σ. Figure 12 plots the condition number κ for simulated transport matrices T computed for an increasing σ and various reconstruction resolutions. The simulation avoids the influence of measurement noise. For a κ close to 1, the equation system is well-conditioned and an accurate solution can be determined. A very large κ indicates an ill-conditioned equation system that is prone to large numerical errors (Tσ is almost singular). For an infinitely high κ, Tσ is not invertible and the equation system is not solvable. As illustrated in Fig. 12, our sensor is limited by the amount of defocus that can reliably be compensated or diminished, as the numerical stability decreases mainly with increasing σ (only slightly with increasing resolution). In our experiments, we could compensate defocus with σ-values of up to 2.0 before the low S/N and resolution of the sensor destruct image reconstruction entirely. With a higher S/N and resolution we expect that larger defocus levels can be compensated. But we also expect an upper bound at which very large defocus cannot be handled due to numerical instabilities.
We thank Robert Koeppe of isiQiri interface technologies GmbH for fruitful discussions and for providing LC samples. This work was supported by Microsoft Research under contract number 2012-030(DP874903) – LumiConSense.
References and links
1. H. C. Ko, M. P. Stoykovich, J. Song, V. Malyarchuk, W. M. Choi, C. J. Yu, J. B. Geddes III, J. Xiao, S. Wang, Y. Huang, and J. A. Rogers, ”A hemispherical electronic eye camera based on compressible silicon optoelectronics,” Nature 454(7205), 748–753, (2008). [CrossRef] [PubMed]
2. Y. M. Song, Y. Xie, V. Malyarchuk, J. Xiao, I. Jung, K. J. Choi, Z. Liu, H. Park, C. Lu, R. Kim, R. Li, K. B. Crozier, Y. Hung, and J. A. Rogers, ”Digital cameras with designs inspired by the arthropod eye,” Nature 497(7447), 95–99, (2013). [CrossRef] [PubMed]
3. A. F. Abouraddy, O. Shapira, M. Bayindir, J. Arnold, F. Sorin, D. S. Hinczewski, J. D. Joannopoulos, and Y. Fink, ”Large-scale optical-field measurements with geometric fibre constructs,” Nat. Mater. 5(7), 532–536, (2006). [CrossRef] [PubMed]
4. A. Koppelhuber and O. Bimber, ”Towards a transparent, flexible, scalable and disposable image sensor using thin-film luminescent concentrators,” Opt. Express 21, 4796–4810 (2013). [CrossRef] [PubMed]
5. T. N. Ng, W. S. Wong, M. L. Chabinyc, S. Sambandan, and R. A. Street, ”Flexible image sensor array with bulk heterojunction organic photodiode,” Appl. Phys. Lett. 92(21), 213303 (2008). [CrossRef]
6. D. L. Donoho, Compressed sensing, IEEE T. Inform. Theory 52(4), 1289–1306 (2006). [CrossRef]
8. J. S. Batchelder, A. H. Zewail, and T. Cole, ”Luminescent solar concentrators. 1: theory of operation and techniques for performance evaluation,” Appl. Optics 18(18), 3090–3110, (1979). [CrossRef]
9. A. H. Andersen and A. C. Kak, ”Simultaneous algebraic reconstruction technique (SART): a superior implementation of the ART algorithm,” Ultrasonic Imaging 6(1), 81–94, (1984). [PubMed]
10. A. Pentland, ”A new sense for depth of field,” IEEE T. Pattern Anal. 9, 523–531 (1987). [CrossRef]
11. S. Chaudhuri and A. N. Rajagopalan, Depth From Defocus: A Real Aperture Imaging Approach. (Springer Verlag, 1999).
12. M. Rosenblatt, ”Remarks on some nonparametric estimates of a density function,” Ann. Math. Stat. 27, 832–837 (1956). [CrossRef]
13. E. Parzen, ”On estimation of a probability density function and mode,” Ann. Math. Stat. 33, 1065 (1962). [CrossRef]
14. I. Tosic and P. Frossard, ”Dictionary learning,” IEEE Signal Proc. Mag. 28, 27–38 (2011). [CrossRef]