We present a thin-film sensor that optically measures the Radon transform of an image focussed onto it. Measuring and classifying directly in Radon space, rather than in image space, is fast and yields robust and high classification rates. We explain how the number of integral measurements required for a given classification task can be reduced by several orders of magnitude. Our experiments achieve classification rates of 98%–99% for complex hand gesture and motion detection tasks with as few as 10 photosensors. Our findings have the potential to stimulate further research towards a new generation of application-oriented classification sensors for use in areas such as biometry, security, diagnostics, surface inspection, and human-computer interfaces.
© 2015 Optical Society of America
The Radon transform (RT)  is a fundamental tool with applications in many areas of image processing, such as tomographic reconstruction [2, 3], pattern recognition [4–6], shape detection [7,8], texture classification [9,10], pose estimation , feature description [12,13], radar imaging [14, 15], geophysical imaging , medical imaging [17–19], and nondestructive testing . It concentrates the information of an image into a few high-valued coefficients that allow detection of linear singularities while boosting low frequencies and being robust to noise. Not only linear, but also quasi-linear and non-linear patterns can be exploited effectively in Radon space . For an image of resolution N × N, the complexity of RT is O(N3), but it can be decreased to O(N2logN) by means of the projection slice theorem .
The 2D Radon transform Rf of a continuous function f = f (x,y) is defined by line integrals (Fig. 1(e))
2. Optical Radon transform
Our sensor (Figs. 1(a) and 2(a)) consists of a thin-film luminescent concentrator [22,23] (LC) – a transparent and flexible polycarbonate foil doped with fluorescent dyes– that serves as a two-dimensional multimode light guide. A particular wavelength sub-band that penetrates the film is emitted in longer wavelengths, while wavelengths outside the sub-band are fully transmitted. Most of the emitted light is trapped inside the film by total internal reflection (TIR), and is transported with reduced multi-scattering towards the edges while losing energy over transport distance. The edges are cut to form triangular apertures (Figs. 1(b) and 1(c)) that multiplex the integral light-transport into a two-dimensional light field which encodes the fractions of light transported from a particular direction to a particular position on the edges. This light field has been used for image reconstruction [24, 26] and depth estimation .
In this work, we exploit the fact that the light field equates to a variant of the Radon transform of the image focussed on the sensing surface. It can be directly measured by forwarding it from the edges through ribbons of optical fibers (Figs. 1(a)–(c)) to the photosensors of line scan cameras (LSC).
With given aperture geometry (Fig. 1(d))
The LC material itself introduces various sources of light loss  that affect the measured Radon transform: (1) Matrix absorption, whose attenuation is defined by the material-specific absorption coefficient μ, decreases exponentially with increasing travel distance d. According to the Beer-Lambert law, the matrix absorption factor is e−μd. (2) Self-absorption occurs due to overlapping absorption and emission spectra of the fluorescent dye. Light emitted by a fluorescent particle (first-generation photons) is absorbed and re-emitted by other particles. These second-generation photons are subject to the same losses as the first generation photons. The probability of self-absorption depends on the distance the light travels through the LC, the density of fluorescent particles, and how much the absorption and emission spectra overlap. Self-absorption depends on (3) cone loss, which denotes the amount of emitted light that escapes at the surface of the LC when the critical angle for TIR is not exceeded.
Cone loss (lc) and self-absorption (ls) are given by
Considering these differences leads to an extended Radon transform that describes the measurements of our sensor:
Note that, due to the non-linear matrix absorption, integration is no longer symmetric about the integration angle ϕ, which implies that 0° ≤ ϕj ≤ 359° in our case. Furthermore, the apertures’ FOV α limits the integration range (Fig. 1(e)). Depending on ϕ, light is blocked at various projection positions x if it approaches an aperture outside its FOV. The sinograms shown in Fig. 1(f) illustrate the difference between classical Radon transform (Eq. (1)) and our sensor measurements (Eq. (5)). Note that ϕ has been phase-shifted by 45° to cluster integrals measured at the four edges of the sensor.
3. Sensor prototype
We experimented with a square sensor prototype (216mm×216mm) made of Bayer Makrofol® LISA Green LC foil (thickness: 300μm, refractive index: n=1.58). The triangular structure was cut out with a cutting plotter and filled with light-blocking plasticine. While a narrow aperture opening a is desirable to reduce the integration FOV β, its size is constrained by the precision of the cutting plotter and the flexibility of the LC host material, which might break during extreme flexing if a is too small. For our prototype, we used 500μm wide apertures (triangle base width: 6.75mm, triangle height: 3.25mm), which results in 32 aperture triangles per edge. The light integrals are transported to four contact image sensor (CIS) line scan cameras (CMOS, each with 1728 photosensors measuring 125μm × 125μm behind an array of rod lenses) by ribbons of polymethylmethacrylate (PMMA) optical fibers (Jiangxi Daishing POF Co. C250, diameter: 250μm). The ribbons consist of step-index multi-mode optical fibers with a numerical aperture of 0.5, which equals an acceptance cone of 60°. To improve the coupling of light into the optical fibers, a 40μm thick diffusor layer was placed between the LC edges and the fiber ribbons. The CIS modules are controlled by a programmable USB controller (Spectronic Devices Ltd.). A total number of 54 integrals per aperture triangle were measurable, yielding a constant directional sampling resolution of 4 × 54 = 216 samples over ϕ = 1..360° and a projection resolution of 32 samples in x.
4. Classification in Radon space
For classification, we use a random forest , which is a set of different decision trees . It is computed from training observations of classes to be distinguished, where each observation consists of multiple variables. In our case, the observations are measured Radon transforms, and the variables are the Radon coefficients. To achieve a required degree of invariance, multiple varying observations are measured and trained for each class.
Figure 3 illustrates an experiment that shows 22 classes of different hand gestures being cast as shadows onto the sensor. The recorded Radon transforms contain a total of 6912 coefficients (i.e., the number of photosensors used by our sensor prototype). Each hand gesture is recorded in 300 different pose variations. Thus, our training set consists of 22 × 300 = 6600 Radon transforms. In addition, we recorded an independent test set of 22 × 110 = 2400 Radon transforms.
Computing a random forest from the training set means deriving a number of binary decision trees. Each tree is built by randomly selecting a coordinate ϕ,x per tree node , and determining a threshold within the coefficient range at ϕ,x that splits all classes into two groups: classes with ϕ,x-coefficients above and below the threshold. These two groups define the two branches at the node. Repeating this recursively until single-class groups can no longer be split leads to one binary decision tree. Computing multiple such trees results in a random forest. Due to the random variable selection at all nodes, each tree in the forest is different. The leaf nodes of the trees are associated with the corresponding leaf classes. Thus, traversing a decision tree with a new Radon transform of the test set yields a classification decision. Traversing all trees in the forest with the same Radon transform might lead to different decisions. The class with the highest decision count over all trees is chosen. Details on splitting decisions, tree depths, forest sizes, etc. can be found in the literature [27–30].
The advantages of a random forest compared to other classifiers is that it is fast, robust against overfitting, and does not require parameter optimization. In particular, when classification is carried out in Radon space, high classification rates are achieved. Training a random forest with the training set from the experiment shown in Fig. 3 leads to a classification rate of 99.59% for the test set if full Radon transforms are applied.
Figure 4 presents the results of a motion detection experiment. Here, 12 different classes of hand movements (horizontal, vertical, diagonal, upwards/downwards movements, and rotations) in 30 different pose and speed combinations can be classified correctly in 99.39% of 390 test cases (test set different from training set) if full Radon transforms are applied. Age-weighted sensor integrals (i.e., multiple photosensor values) measured, weighted and added over a constant detection time of 200ms are used as input for training and classification of movements, as explained in . This is equivalent to multiple Radon transforms integrated over the detection time.
Note that the pose variations were limited in our experiments. For hand gesture recognition the rotation of the hand was kept within a range of approximately +/−10deg and within a distance of no more than 20cm above the sensor. Relaxing these limitations (e.g. full rotation invariance) requires a larger amount of training data. Errors in the training set (e.g. contradicting class labels) and strongly varying lighting conditions may also lead to lower classification rates.
5. Dimensionality reduction
The number of Radon coefficients required to achieve high classification rates can, however, be decreased by several orders of magnitude after dimensionality reduction (Fig 2(b)): This is achieved by iteratively building a list of the most important coefficients (or photosensors) where in each iteration all previously selected coefficients are tested with the remaining coefficients for the highest classification rate. Training M one-coefficient random forests for all M coefficients in a first iteration allows selecting the coefficient that leads to the highest classification rate. In a second iteration, this coefficient is combined with all M−1 remaining coefficients to train two-coefficient random forests for all remaining M−1 coefficient combinations. The coefficient pair that leads to the the highest classification rate is selected again. Repeating this over I iterations leads to a ranked list of I coefficients (each corresponding to one photosensor) that are most important for the classification task.
Determining the optimal sensor configuration for a given classification task (i.e., computing the photosensors necessary and their positions on the LC edges) requires input measurements of a high (spatial and directional) resolution sensor for dimensionality reduction. To support this, we simulate the training measurements of such a sensor. Based on the dimensionality reduction results, a physical sensor can be built.
Note that our physical prototype always takes high-resolution measurements. This is necessary for comparisons with results before dimensionality reduction and for the same sensor to be usable in multiple classification experiments. However, we use only the pre-selected photosensors after dimensionality reduction for computing the results of each classification experiment.
Our experiments revealed that –even for difficult classification tasks– the number of required integral samples can be reduced by about two to three orders of magnitude. Figures 3 and 4 illustrate the top-ten photosensors in Radon space and the sensor’s light-field space. The plots show the resulting increase in classification rate increase of our sensor relative to the number of selected highest-ranked photosensors. In the two experiments, respective classification rates of 97% and 93% were achieved for 5 photosensors and 99% and 98% for 10 photosensors.
6. Sampling analysis
Equations (1) and (5) compare the classical Radon transform with the measurements of our sensor. Integral sampling of our sensor depends on the choice of design parameters (Fig. 1): the triangular aperture height (h), width (w), and opening (a), which influence its FOV (α) and integration area (β); and the number of triangular apertures per edge and the number of photosensors per aperture, which influence the sampling density of the Radon transform in the x and ϕ domains. Below, we discuss the sampling limitations of our current prototype and how the sensor design can be changed to improve sampling.
Figure 5 illustrates three examples of images (Siemens star, sinusoidal zone plate, and finger print) with spatial image resolutions that are too high to be resolved with our prototype. The bottom row shows original images and corresponding Radon transforms.
The top row in Fig. 5 presents the full Radon transforms (RT) integrated by our sensor (32 triangular apertures per edge, 54 photosensor measurements per triangle, w = 6.75mm, h = 3.25mm, and a = 500μm), and the results of the inverse Radon transform (IRT) computed with filtered back projection. The fundamental difference between our measurements and classical RT were described in section 2. Clearly, high-frequency integrals are lost in our measurements for several reasons: (1) area integration over β rather than line integration; (2) exponential light loss of the LC material rather than an unattenuated integral measure; (3) limited FOV α that restricts directional sampling; and (4) lower sampling rates in x and ϕ that are constrained by the number of aperture triangles and the number of photosensors per triangle. Since high-frequency integrals are not covered in our RT, IRT cannot reconstruct high-frequency spatial image details.
The second row in Fig. 5 shows a sensor simulation with higher spatial and directional sampling rates, and with a narrow integration area (128 triangular apertures per edge, 128 photosensor measurements per triangle, w = 1.68mm, h = 0.85mm, and a = 10μm). These parameters are easy to achieve with a manufacturing process that is more advanced that that of our prototype. In this case, high-frequency integrals are better covered in the RT, while spatial images are better reconstructed in the IRT.
The third row in Fig. 5 illustrates a hypothetic sensor with the same configuration as above (128 triangular apertures per edge, 128 photosensor measurements per triangle, w = 1.68mm, h = 0.84mm) but with line integrals (rather than area integrals) and without light attenuation (rather than the exponential light attenuation due to the LC material). In this case, the resulting coefficients in the RT are nearly identical to those in the ground truth RT (bottom row) with the exception of the blocked-out regions that are due to the small FOV α of 90°, as explained in section 2, which limits the integration range.
When the FOV α is, additionally, increased to 160°, as shown in the fourth row of Fig. 5, the RT of our sensor simulations is nearly identical to the ground truth RT (bottom row). Remaining differences are due to interpolation as part of our simulation.
The simulations presented in Fig. 5 illustrate that our sensor can be improved by increasing the number of photosensors per triangle (leading to a higher sampling rate in ϕ), by increasing the number of aperture triangles (leading to a higher sampling rate in x), and by applying wider but lower aperture triangles (leading to a larger FOV α). In theory, the number of projections in x must at least equal the maximum spatial resolution of the input image, while the number of directions in ϕ must be equal to or greater than the number of projections in x times π/2 . Most important, however, is the reduction of the integration area α. Compared to a line integral, the integration over a wider area corresponds to a low-pass filter over x that drastically reduces high-frequency integral information in the RT. Hence, the aperture opening a must be made as narrow as manufacturing tolerance and the highest acceptable exposure time of the photosensors used allow.
As explained in section 3, the smallest possible aperture opening of our prototype was limited to a = 500μm due to manufacturing constraints. The dimensions of the triangular apertures (w = 6.75mm, h = 3.25mm) were chosen such that their FOV α approaches at least 90° (96° in our case), which is the minimum FOV to ensure a full 360° sampling in φ over all four edges (i.e., w ≥ 2h).
Note that the hand gesture and motion classification experiments presented above require no high-frequency RT coefficients. However, for classification tasks that heavily depend on fine spatial image details and their corresponding high-frequency integrals (such as finger print detection), the sampling capabilities of our current prototype must be improved.
7. Summary and future work
Designing and manufacturing task-optimized classification sensors that record a minimal number of samples leads to simple devices with low power consumption and fast read-out times. Both depend on the number of photosensor measurements. For CMOS sensors, the read-out time is proportional to the number of photosensors and inversely proportional to the sensor’s clock frequency, while the power consumption of the analog signal chain is proportional to both the number of photosensors and the clock frequency .
Measuring and classifying directly in Radon space yields robust and high classification rates. We believe that our findings will stimulate further research towards a new generation of application-oriented classification sensors for use in areas such as biometry, security, diagnostics, surface inspection, and human-computer interfaces.
The relatively wide FOV of our apertures, which is currently determined by manufacturing constraints of our prototype, is the main limitation that needs to be addressed. We are also interested in investigating constraints on sensor shape in the course of reducing physical size in order to create application-specific classification sensors with minimal area and specific aspect ratio or contours. An additional thin-film layer of a 2D microlens array on top of the LC film enables the detection of front-lit objects, while arrays of 1D microlenses instead of our triangular apertures can improve the signal-to-noise ratio (we measured a 20-log ratio as low as 20 dB for our prototype).
We thank Robert Koeppe from isiQiri interface technologies GmbH for fruitful discussions and for providing LC samples. This work was supported by Microsoft Research under contract number 2012-030(DP874903) – LumiConSense.
References and links
1. J. Radon, “On the determination of functions from their integral values along certain manifolds,” IEEE Trans. Medical Imaging 5(4), 170–176 (1986). [CrossRef]
2. G. T. Herman, Fundamentals of Computerized Tomography: Image Reconstruction from Projections, 2. (Springer, 2010).
3. A. C. Kak and M. Slaney, Principles of Computerized Tomographic Imaging (SIAM, 2001). [CrossRef]
4. E. Magli, G. Olmo, and L. L. Presti, “Pattern recognition by means of the Radon transform and the continuous wavelet transform,” Signal Process. 73(3), 277–289 (1999). [CrossRef]
5. W. Jian-Da and Y. Siou-Huand, “Driver identification using finger-vein patterns with Radon transform and neural network,” Expert Syst. Appl. 36(3), 5793–5799 (2009). [CrossRef]
6. J. S. Seo, J. Haitsma, T. Kalker, and C. D. Yoo, “A robust image fingerprinting system using the Radon transform,” Signal Process.:,” Image Commun. 19(4), 325–339 (2004).
7. S. Tabbone, L. Wendling, and J.-P. Salmon, “A new shape descriptor defined on the Radon transform,” Comput. Vis. Image Und. 102(1), 42–51 (2006). [CrossRef]
8. S. Tabbone and L. Wendling, “Technical symbols recognition using the two-dimensional Radon transform,” in Proc. of the 16th Int. Conf. Pattern Recognition3, 200–203 (2002).
9. K. Jafari-Khouzani and H. Soltanian-Zadeh, “Rotation-invariant multiresolution texture analysis using Radon and wavelet transforms,” IEEE Trans. Image Process. 14(6), 783–795 (2005). [CrossRef] [PubMed]
10. P. Cui, J. Li, Q. Pan, and H. Zhang, “Rotation and scaling invariant texture classification based on Radon transform and multiscale analysis,” Pattern Recognit. Lett. 27(5), 408–413 (2006). [CrossRef]
11. M. Singh, M. Mandal, and A. Basu, “Pose recognition using the Radon transform,” in 48th Midwest Symposium on Circuits and Systems2, 1091–1094 (2005).
12. D. V. Jadhav and R. S. Holambe, “Feature extraction using Radon and wavelet transforms with application to face recognition,” Neurocomputing 72(7), 1951–1959 (2009). [CrossRef]
14. D. L. Mensa, S. Halevy, and G. Wade, “Coherent Doppler tomography for microwave imaging,” in Proceedings of the IEEE71(2), 254–261 (1983).
15. D. C. Munson Jr., J. D. O’Brien, and W. K. Jenkins, “A tomographic formulation of spotlight-mode synthetic aperture radar,” in Proceedings of the IEEE71(8), 917–925 (1983).
16. G. Nolet, Seismic Tomography: With Applications in Global Seismology and Exploration Geophysics (D. Reidel, 1987). [CrossRef]
17. P. Kuchment, The Radon Transform and Medical Imaging (SIAM, 2014).
18. L. Giancardo, F. Meriaudeau, T. P. Karnowski, Y. Li, K. W. Tobin, and E. Chaum, “Microaneurysm detection with Radon transform-based classification on retina images,” in Proceedings Intl. Conf. IEEE Eng. Med. Biol. Soc.5939–5942 (2011).
19. P. J. Drew, P. Blinder, G. Cauwenberghs, A. Y. Shih, and D. Kleinfeld, “Rapid determination of particle velocity from space-time images using the Radon transform,” J. Comput. Neurosci. 29(1–2), 5–11 (2010). [CrossRef]
20. E. H. Lehmann, A. Kaestner, C. Grünzweig, D. Mannes, P. Vontobel, and S. Peetermans, “Materials research and non-destructive testing using neutron tomography methods,” Int. J. Mater Res. 105(7), 664–670 (2014). [CrossRef]
21. W. A. Götz and H. J. Druckmüller, “A fast digital radon transform–an efficient means for evaluating the hough transform,” Pattern Recogn. 29(4), 711–718, (1996). [CrossRef]
22. J. S. Batchelder, A. H. Zewail, and T. Cole, “Luminescent solar concentrators. 1: Theory of operation and techniques for performance evaluation,” Appl. Optics 18(18), 3090–3110, (1979). [CrossRef]
23. J. Roncali and F. Garnier, “Photon-transport properties of luminescent solar concentrators: analysis and optimization,” Appl. Optics 23(16), 2809–2817, (1984). [CrossRef]
24. A. Koppelhuber and O. Bimber, “Towards a transparent, flexible, scalable and disposable image sensor using thin-film luminescent concentrators,” Opt. Express 21(4), 4796–4810 (2013). [CrossRef] [PubMed]
25. A. Koppelhuber, C. Birklbauer, S. Izadi, and O. Bimber, “A transparent thin-film sensor for multi-focal image reconstruction and depth estimation,” Opt. Express 22(8), 8928–8942 (2014). [CrossRef] [PubMed]
26. A. Koppelhuber, S. Fanello, C. Birklbauer, D. Schedl, S. Izadi, and O. Bimber, “Enhanced learning-based imaging with thin-film luminescent concentrators,” Opt. Express 22(24), 29531–29543 (2014). [CrossRef]
27. L. Breiman, “Random forests,” Mach. Learn. 45(1), 5–32 (2001). [CrossRef]
28. L. Breiman, J. H. Friedman, R. Olshen, and C. Stone, Classification and Regression Trees (Wadsworth, 1984).
29. T. G. Dietterich, “An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, Boosting, and Randomization,” Mach. Learn. 40(2), 139–157 (2000). [CrossRef]
30. A. Criminisi and J. Shotton, Decision Forests for Computer Vision and Medical Image Analysis (Springer, 2013). [CrossRef]
31. R. LiKamWa, B. Priyantha, M. Philipose, L. Zhong, and P. Bahl, “Energy characterization and optimization of image sensing toward continuous mobile vision,” in Proceeding of the 11th annual international conference on Mobile systems, applications, and services69–82 (2013).