We report the first computational super-resolution imaging using a camera array at longwave infrared wavelengths combined with the first demonstration of video-rate multi-camera integral imaging at these wavelengths. We employ an array of synchronized FLIR Lepton cameras to record video-rate data that enables light-field imaging capabilities, such as 3D imaging and recognition of partially obscured objects, while also providing a four-fold increase in effective pixel count. This approach to high-resolution imaging enables a fundamental reduction in the track length and volume of LWIR imaging systems, while also enabling use of low-cost lens materials.
© 2018 Optical Society of America under the terms of the OSA Open Access Publishing Agreement
The application of multi-camera and multi-aperture imaging for light-field imaging and computational integral-imaging reconstruction (CIIR) [1–10] provides additional 3D-imaging capabilities such as imaging through obscurants and novel ‘refocusing’ techniques. For 2D imaging, (pixel) super-resolution (SR)  of recorded images offers more compact imaging systems with a shorter optical track length, lower weight and lower cost than equivalent imaging systems employing a single aperture [12–14]. This is of particular interest in long-wave infrared (LWIR) wavelengths (8-12 µm), where the corresponding more compact and thinner optics enables the use of lower cost materials . Multi-camera imaging, in which each lens is associated with an independent detector array, offers many advantages over the use of a multi-aperture array of lenslets combined with a single detector (as discussed below), but has not previously been demonstrated at thermal-infrared wavelengths. Furthermore, the progress in light-field imaging [7–10] and its combination with SR [16,17] at visible wavelengths, has not yet been equaled at thermal-infrared wavelengths.
Here we report the first demonstrations at longwave-infrared (LWIR) wavelengths of: (1) multi-camera SR, (2) snapshot and video-rate CIIR, and (3) simultaneous SR and CIIR. This approach is made practical by the recent availability of low-cost cameras at LWIR wavelengths: hitherto camera arrays in the LWIR have been prohibitively expensive.
The application of SR to multiple low-resolution (LR) aliased images enables construction of 2D images with an increased pixel count and with an increased angular resolution that is limited only by diffraction. The effectiveness of SR is dependent on diversity in the spatial sampling of the scene by the detector array so that each LR image is recorded with a distinct spatial sampling offset. An attractive route to super-resolution of a single LR imaging system is to exploit the irregular sampling introduced by video-rate imaging of a scene  from a non-stable platform, as is typical for hand-held imaging. Multiple LR images extracted from an image sequence can then be registered and combined, after a time delay, into a single high-resolution image. This approach has the advantage of being relatively low cost, but for many applications the reduction in effective time resolution and the ineffectiveness of SR for dynamic scene components, such as moving people or vegetation, is a major disadvantage. In these cases snapshot multi-image capture provides an advantage and can be achieved using either a multi-aperture lenslet array with a single detector array [17,19,20] or by an array of independent cameras, where each objective lens is coupled to an independent detector . SR imaging using camera arrays has become particularly attractive in recent years due to the transformative reduction in the cost of cameras developed for the consumer market (in particular for mobile telephones) and provides the fundamental advantage of enabling a reduction in the track length of imaging systems by use of shorter focal-length lenses [12–14]. In particular, the use of arrays of independent cameras also enables CIIR with an arbitrary arrangement of lenses of arbitrary focal length and dimension. Furthermore, in contrast to monolithic integrated systems , arbitrary baselines and angular resolution can be designed, where the camera apertures and spacing can be much greater than the dimensions of a single detector array, to enable optimizations of field of view, imaging through obscurations and depth-estimation sensitivity (similarly to multiple-baseline stereo ).
The feasibility of CIIR in the LWIR has recently been demonstrated using multiple images recorded in time sequence from a single camera scanned in the aperture plane . Here we extend this concept to real-time operation using an array of non-identical synchronized cameras to record video sequences from which we have generated video-rate CIIR and also SR. This has involved the development of calibration and synchronization techniques necessary for high-quality, video-rate imaging using mutually calibrated cameras that are subject to independent thermal drifts. We achieve the multi-camera sub-pixel registration accuracy required for effective SR using a multi-pair stereo homographic calibration .
This multi-camera approach also facilitates a reduction in lens cost. Low-cost LWIR cameras achieve reduced cost through the use of small focal-plane arrays (FPA), which in turn require lenses with a short focal length. These lenses are thinner and enable the use of higher-loss materials such as silicon or polyethylene  that can also be molded at low cost. For a given pixel count in the final output image, the application of SR with arrays of low-resolution LWIR cameras with short focal-length, thin lenses, thus offers a route to lower total lens cost than the use of a single high-resolution array combined with a single longer focal-length lens (which typically would be diamond machined from germanium). Figure 1 compares the variation of lens optical transmission with focal length for representative anti-reflection- (AR-) coated germanium and silicon f/1 and f/2 doublet lenses (averaged across the LWIR using optical transmissions from ). The lenses were individually optimized and modelled in Zemax OpticStudio. While the transmission losses for a high-cost f/1 germanium lenses are insignificant at f = 10mm and rises to only 5% for f = 100mm, (in addition to losses due to imperfect AR coating of four surfaces) for a silicon lens the additional transmission losses are 13% at f = 10mm and an unusable 80% at f = 100mm.
Super resolution is effective only when there is significant aliasing of recorded images, as is normally the case for the fast optics required for high sensitivity at LWIR wavelengths. For example, a pixel width of 17 µm is almost twice the size of the diffraction-limited point-spread function produced by an f/1 lens, however pixel sizes are progressively reducing with time, potentially reducing the possible benefit from SR . In Fig. 2(a) we show calculated system MTFs (averaged across the LWIR band) for several pixel widths for a diffraction-limited, f/1, LWIR imaging system, normalized to the sampling frequency (the reciprocal of the pixel width, assuming a 100% fill factor), and in Fig. 2(b) we show the MTF at the Nyquist frequency as a function of pixel size. It can be seen that aliasing occurs, and hence SR is effective, for pixel sizes down to 5 µm. There appears to be little prospect to introduce LWIR detectors with pixels this small in the foreseeable future  and hence the SR techniques described here are of long-term importance for low-cost imaging at LWIR wavelengths.
2. Optical characterization and image construction
Our demonstration system employs an array of six synchronized LWIR cameras based on FLIR Lepton cameras, which employ a focal-plane array of 80x60, 17-µm pixels with a f = 3mm silicon doublet lens yielding a 25° horizontal field of view, with a depth of field from 10 cm to infinity. The cameras are arranged in a 2x3 array with a 27x33 mm cell size as shown in Fig. 3(a), although the technique is applicable to arbitrary configurations of co-aligned cameras. FLIR Lepton cameras can operate at a frame rate of ~27 Hz (that is, approximately video rate) although the devices we employed were restricted to 9 Hz in accordance with US export restrictions.
Each camera is controlled by a dedicated single-board computer (RaspberryPi 2B) interfaced via an ethernet interface to a personal computer. System synchronization is achieved by a combination of broadcast of an ethernet datagram to initiate video capture from all cameras simultaneously and hardware synchronization of the individual camera clocks at the beginning of every video sequence. Following calibration as described below, the camera array enables multi-functional operation of 3D-imaging using CIIR and computational SR to enhance angular resolution.
The scope for image enhancement through computational SR is contingent upon the optics yielding a sufficiently high modulation-transfer function (MTF) at frequencies above the Nyquist-frequency of the detector array. Figure 3(b) shows the calculated MTFs of: the camera optics, the detector, and the combined system MTF together with the measured camera spatial-frequency response (SFR). The MTFs are calculated based on the 17-µm pixel size of the FLIR Lepton camera, assuming diffraction-limited, f/1.1 optics, averaged across the LWIR band. The SFR has been calculated using the standard slanted-edge methods  as an approximation to the MTF and there is good consistency between the cameras.
The presence of spectral components in the system SFR with significant amplitudes above the Nyquist frequency suggests that the lens is well corrected and SR can be effective in enhancing spatial resolution up to the optical cutoff frequency, which is about twice the Nyquist frequency. In principle this can yield an increase in the system pixel count (space-bandwidth product) by a factor of four requiring, in principle, only four cameras. This is true however only for an ideal sampling phase by each detector array (i.e., the sampling phase is uniformly offset by one half pixel), but in practice, the phase is pseudo random in both transverse directions due to imperfect alignment, optical distortion, and parallax, yielding less-than-optimal increase in spatial resolution. Indeed this randomization of sampling is in general beneficial since it leads to a consistent image quality with range, whereas for a hypothetical perfect sampling phase for infinite conjugate imaging, the change in sampling phase with range leads to strong variations in quality of SR . We have employed six rather than four cameras since this offers an efficient and pragmatic trade between improvement in angular resolution with increasing camera number and the increasing complexity of the camera-array.
Computational SR requires accurate sub-pixel registration of the six images to enable accurate reconstruction of the high-frequency components from the aliased camera images. The feature-matching, techniques that are effective for high-resolution visible-band images  are insufficiently robust for registration of low-resolution, low-contrast LWIR images, however we have found that CIIR combined with SR provides effective image registration. To this end we have performed a multi-camera calibration adapting calibration procedures employing test targets that are commonly used for calibrating stereo cameras in the visible band [27,28]. We used a 3D-printed target, consisting of a square array of holes shown in Fig. 4(a), back-illuminated by a heated surface to provide well-defined calibration loci, capturing a set of calibration images (with the calibration target at different positions and orientations) as illustrated in Fig. 4(b). From this calibration, we obtained an estimation of the multiple extrinsic, intrinsic, and distortion parameters  of the camera array. The mean retro-projection errors (~0.1 pixels or less) are shown to be compatible with the sub-pixel registration accuracy required for simultaneous CIIR-SR. Using these calibrations, the images could be registered by homography  with sufficient accuracy to perform computational SR.
Using the extrinsic and intrinsic parameters deduced in the previous calibration process, we correct for the optical distortion of each camera, and deduce the relation between the coordinates of each camera k and the reference camera (k = 1) by calculation of the 3x3 homography matrix, Hk,z,  for a plane at a distance z to the reference camera . One option to solve the computational SR problem is to invert the forward model that describes the image capture,
Computational SR aims to reconstruct the high-resolution image yHR, which leads to the set of captured images yLR,k. Several SR algorithms have been reported, such as non-uniform interpolation, maximum-likelihood estimation, error-reduction energy, maximum a priori estimation, and projection into convex sets . Here we have applied a maximum-likelihood estimation, commonly used to estimate parameters from noisy data, specifically Richardson-Lucy deconvolution approximation of yHR.
By repeating this procedure using the forward model described by Eq. (1) with the corresponding plane-induced homographies at range z, we obtained CIIR-SR 3D, volumetric reconstruction of the scene, where digital refocusing consists of: not only blurring objects that are “out-of-focus” (as in conventional light-field imaging), but also of increasing the native pixel resolution of objects that are “in-focus”, towards the diffraction-limit of the camera apertures.
We report here the application of the processes with the 6x LWIR multi-aperture system described above, for registration and CIIR-SR reconstruction for imaging of three scenes: (a) static objects at a known distance to provide for a qualitative and quantitative demonstration of the resolution improvement achieved by computational SR; (b) multiple static objects at several distances showing the simultaneous CIIR-SR digital refocusing capabilities in a static scenario; (c) a dynamic scene demonstrating the full capability of 4D LWIR imaging (the spatial dimensions plus time) by performing a video-rate volumetric reconstruction of people at dissimilar ranges.
For the first example, we show images of four static objects: a 3D-printed bust of Lord Kelvin at ~1.5 m (with approximate height-width-depth dimensions of 25x12x15 cm) heated obliquely by radiation from a ceramic thermal lamp, and three resolution targets at ~0.9 m: a concentric-circles pattern (with a diameter of 16 cm), a star chart (with a star diameter of 15 cm) and a standard USAF-51 target (within a square frame of 17 cm), which had been 3D printed in plastic (PLA) and back illuminated by a heated high-emissivity surface. In Fig. 5 we present visible-band images of the four objects in the left-most column, example low-resolution images from a single LWIR camera module in the center column and SR images constructed from the six low-resolution images in the right column. The SR reconstructions demonstrate a clear improvement over the native resolution of a single camera module. For example, the higher spatial frequencies in the test targets that are not visible in the low-resolution images, are clearly visible in the SR images. Furthermore the aliasing effects in both low-resolution images, and particularly striking in the image of the spokes and concentric-circles, are absent in the SR image.
We have quantified the SR performance by estimating the contrast transfer function (CTF) from the measured elements of the USAF-51 target , using similar USAF-51 target and distance as in Fig. 5. We have also analyzed the effect on SR performance associated with thermal drift of the camera array during an extended period of shutter-less operation. Typical temporal thermal drift of the maximum and minimum values in thermal images recorded by the six cameras over a period of 2.25 hours are shown in Fig. 6(a), subtracting the initial minimum value of each camera for reference. The resulting contrast transfer function (CTF) estimated from the elements of the target is shown in Fig. 6 (b). As can be observed, the CTF shows significant contrast up to approximately double the Nyquist frequency, improving the effective resolution by factor of approximately 2. It can be observed that computational SR not only increases the effective resolution, but also eradicates aliasing artefacts as is apparent in the figures in Fig. 5. The variation of the CTF during the long acquisition shows only a marginal variation, represented by the error bars in Fig. 6(b), indicating robustness of the SR image recovery against thermal drift of the camera array, even during a long shutter-less acquisition.
We further demonstrate the SR imaging performance using the star resolution targets shown in Fig. 7 (using similar star target and distance as in Fig. 5), where the left column depicts the ground-truth target and spectrum, the middle column shows a low-resolution image from a single LWIR module and its spatial-frequency spectrum and the SR-reconstructed image and its spectrum are shown in the right column. As can be seen from the frequency spectra, the application of SR reconstructs frequency spectra in the SR image, which in the low-resolution image, are aliased into the baseband of frequencies to yield characteristic aliasing artefacts in Fig. 7(b) that are absent in the SR image in Fig. 7(c).
We now discuss the simultaneous CIIR-SR capabilities of the system in the second scenario of imaging a 3D scene. A visible-band image of a 3D scene of model trees and a car is shown in Fig. 8(a) and a low-resolution LWIR image is shown in Fig. 8(b). Digital refocusing at the ranges of 1.02 m (rear bush), 0.885 m (toy car), and 0.820 m (front bush) shown in (c), (d) and (e) demonstrates simultaneous digital refocusing and SR on each object. Digital refocusing is a term often used in light-field imaging to refer to digital defocus of the images of scene components displaced from a plane of interest; that is, it corresponds to localized reduction in information. The digital refocusing applied here refers to a combination of both SR of the targeted object range, increasing local information content of those scene components, combined with digital defocusing for displaced scene components.
In the third scenario we demonstrate video-rate imaging of dynamic 3D scenes, with simultaneous CIIR-SR for 4D volumetric reconstruction. We include digital refocusing and an improvement of the native resolution at arbitrary object planes in the video sequence. The images in Fig. 9 are taken from a video sequence and show: a single low-resolution image in Fig. 9(a) while Fig. 9(b) and 9(c) show CCIIR-SR reconstructions of the distal and proximal personnel and Fig. 9(d) and 9(e) are expanded versions of the hand detail from the low-resolution and SR sequences respectively, which highlight the resolution enhancement and digital refocusing of CIIR-SR, while the digitally-defocused areas appear as a discrete overlapping of shifted object images in Fig. 9(b) and 9(c), producing a blurring effect. This video sequence (see Visualization 2) was processed using Matlab (2015a) on a PC (Intel Xeon 3.4 GHz, 8 GB RAM), with a computation time of 60 milliseconds per super-resolved frame. We have processed the video sequence offline and did not optimize the processing speed for real-time operation, however, even using this high-level, non-compiled programming language, processing speed was sufficient for registration and super-resolution in real-time at up to ~17 FPS (although this exceeds the current 9-Hz maximum frame rate of our FLIR Lepton cameras). Processing using a compiled language or field-programmable gate array (FPGA) would enable real-time operation at much higher frame rates.
In summary, we report the first demonstrations at LWIR wavelengths of: (a) multi-camera SR, (b) of full implementation of multi-camera CIIR, and (c) of the simultaneous application of both CIIR and SR. We have demonstrated SR construction of high-resolution LWIR images to yield a two-fold improvement in the angular resolution and hence four-fold improvement in space-bandwidth product of recovered images . Image reconstruction is robust in the presence of multi-camera thermal drifts during long periods of shutter-less image acquisition.
This approach to high-resolution imaging enables a fundamental reduction in both the track length and volume of an imaging system by enabling the use of reduced-focal-length lenses, which may be manufactured at lower cost from silicon or polyethylene. Overall multi-camera LWIR imaging offers a flexible route to high pixel count combined with the established capabilities of integral imaging. The future reduction in cost of LWIR detectors, in accordance with Moore’s law, offers fertile ground for the development of intelligent and agile imaging systems employing camera arrays to demonstrate simultaneously high space-bandwidth products and integral-imaging capabilities such as 3D imaging, digital refocusing, ranging, and recognition of obscured objects. Furthermore, these camera arrays may be deployed conformally to host platforms.
European Union’s Horizon 2020 research and innovation Programme (645114); Leverhulme Trust (ECF-2016-757).
1. X. Xiao, B. Javidi, M. Martinez-Corral, and A. Stern, “Advances in three-dimensional integral imaging: sensing, display, and applications [Invited],” Appl. Opt. 52(4), 546–560 (2013). [CrossRef] [PubMed]
4. B. Wilburn, N. Joshi, V. Vaish, E.-V. Talvala, E. Antunez, A. Barth, A. Adams, M. Horowitz, and M. Levoy, “High performance imaging using large camera arrays,” ACM Trans. Graph. 24(3), 765 (2005). [CrossRef]
6. M. G. Lippmann, “Épreuves réversibles. Photographies intégrales,” Comptes Rendus de l’Académie des Sciences. 146, 446–451 (1908).
8. E. H. Adelson and J. Y. A. Wang, “Single lens stereo with a plenoptic camera,” IEEE Trans. Pattern Anal. Mach. Intell. 14(2), 99–106 (1992). [CrossRef]
9. R. Ng, M. Levoy, M. Brédif, G. Duval, M. Horowitz, and P. Hanrahan, “Light field photography with a hand-held plenoptic camera,” in Tech. Rep. CSTR 2005–02 (Stanford University Computer Science Department, 2005).
10. R. Ng, “Digital light field photography,” Ph.D. thesis (Stanford University, 2006).
11. W. Luo, Y. Zhang, A. Feizi, Z. Göröcs, and A. Ozcan, “Pixel super-resolution using wavelength scanning,” Light Sci. Appl. 5(4), e16060 (2016). [CrossRef]
15. T. Grulois, G. Druart, N. Guérineau, A. Crastes, H. Sauer, and P. Chavel, “Extra-thin infrared camera for low-cost surveillance applications,” Opt. Lett. 39(11), 3169–3172 (2014). [CrossRef] [PubMed]
17. K. Venkataraman, D. Lelescu, J. Duparré, A. McMahon, G. Molina, P. Chatterjee, R. Mullis, and S. Nayar, “PiCam: an ultra-thin high performance monolithic camera array,” ACM Trans. Graph. 32(6), 166 (2013). [CrossRef]
18. M. Elad and A. Feuer, “Restoration of a single superresolution image from several blurred, noisy, and undersampled measured images,” IEEE Trans. Image Process. 6(12), 1646–1658 (1997). [CrossRef] [PubMed]
19. J. Tanida, T. Kumagai, K. Yamada, S. Miyatake, K. Ishida, T. Morimoto, N. Kondou, D. Miyazaki, and Y. Ichioka, “Thin Observation Module by Bound Optics (TOMBO): Concept and Experimental Verification,” Appl. Opt. 40(11), 1806–1813 (2001). [CrossRef] [PubMed]
20. M. Shankar, R. Willett, N. Pitsianis, T. Schulz, R. Gibbons, R. Te Kolste, J. Carriere, C. Chen, D. Prather, and D. Brady, “Thin infrared imaging systems through multichannel sampling,” Appl. Opt. 47(10), B1–B10 (2008). [CrossRef] [PubMed]
21. M. Okutomi and T. Kanade, “A multiple-baseline stereo,” IEEE Trans. Pattern Anal. Mach. Intell. 15(4), 353–363 (1993). [CrossRef]
22. S. Komatsu, A. Markman, A. Mahalanobis, K. Chen, and B. Javidi, “Three-dimensional integral imaging and object detection using long-wave infrared imaging,” Appl. Opt. 56(9), D120–D126 (2017). [CrossRef] [PubMed]
23. R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, 2nd. ed. (Cambridge University Press, 2003).
24. E. D. Palik, Handbook of Optical Constants of Solids (Academic, 1985)
26. “Resolution and spatial frequency responses,” ISO:12233:2000(E).
27. Z. Zhang, “A flexible new technique for camera calibration,” IEEE Trans. Pattern Anal. Mach. Intell. 22(11), 1330–1334 (2000). [CrossRef]
28. J. Heikkila and O. Silven, “A four-step camera calibration procedure with implicit image correction,” in IEEE International Conference on Computer Vision and Pattern Recognition, (1997) [CrossRef]
29. J. Schott, Remote Sensing: The Image Chain Approach (Oxford University Press, 2007).
30. A. W. Lohmann, R. G. Dorsch, D. Mendlovic, C. Ferreira, and Z. Zalevsky, “Space–bandwidth product of optical signals and systems,” J. Opt. Soc. Am. A 13(3), 470–473 (1996). [CrossRef]