Focus and depth of field are conventionally addressed by adjusting longitudinal lens position. More recently, combinations of deliberate blur and computational processing have been used to extend depth of field. Here we show that dynamic control of transverse and longitudinal lens position can be used to decode focus and extend depth of field without degrading static resolution. Our results suggest that optical image stabilization systems may be used for autofocus, extended depth of field, and 3D imaging.
© 2015 Optical Society of America
Real-world scenes are characterized by three spatial dimensions, a temporal dimension, a spectral dimension, and a polarization dimension. Commercial cameras are instantaneously sensitive only to the two transverse spatial dimensions and partially sensitive to the spectral dimension through use of color filter arrays. For the frequent sensing case of planar, incoherent scenes, pixellated detectors integrate the focal and polarization channels with sequential temporal sampling and interlaced spectral sampling. Integrating the focal dimension results in information loss (in the form of optical blur) for sections of the scene volume not optically conjugated to the detector. This blur may be reduced by stopping down the aperture at the expense of light throughput and information capacity .
A variety of approaches to improve the information capture rate have been proposed over the past few decades. Focal sweeping  and pupil coding  methods have been proposed for extended depth of field (EDOF) imaging. Focal stacking, array cameras , depth from defocus , depth-variant pupil coding [6,7], light field imaging , and image-space coding [9,10] have been proposed for 3D (transverse space and depth) imaging for incoherent light. In  we considered these strategies in the context of compressive measurement and tomography. Pupil coding strategies engineer the optical point spread function (PSF) to change as a function of depth while retaining sufficient modulation transfer sensitivity over all focal channels. In  the authors proposed a two-lobe PSF that rotates when defocused; the rotation angle encodes the object’s depth. Captured images may be deconvolved and localized with nanometer-scale precision for 3D superresolution microscopy and centimeter-scale precision with incoherent, broadband illumination for larger systems . Such pupil coding strategies may be implemented directly within the aperture of the system or indirectly via a spatial light modulator (SLM) placed in a pupil plane.
Image space coding works on the premise of code division multiple access (CDMA). We have demonstrated CDMA-based physical layer compression for hyperspectral , polarization , temporal , and focal  datacubes. These strategies embed pseudo-random projections of the underlying high-dimensional structure into lower-dimensional subspaces spanned by the compressed measurements. Reconstruction algorithms are used to recover the high-dimensional data.
Both pupil coding and image-space coding require physical masks to be placed within the optical system and, hence, sacrifice light throughput, increase system complexity, and/or permanently degrade the system’s modulation transfer function (MTF). Here, we demonstrate a method that overcomes these limitations by augmenting intraexposure focal motion with intraexposure transverse image motion. Such focal motion is classically deployed for EDOF applications; such transverse motion (which cannot by itself yield depth information) is classically deployed for motion invariant photography  and synthetic blurring . We integrate focal and transverse intraexposure image translation to engineer depth-varying PSFs in a way that does not require coding masks. Importantly, such 3D image translation may be used to form any PSF attainable by pupil or image modulation but, in contrast with these methods, may be easily changed between exposures or switched off when high-MTF 2D imaging is desired. We show how the proposed framework may be implemented with a camera’s optical image stabilization (OIS) module, and we present tomographic imagery reconstructed from a single experimental measurement.
Let object- and image-space coordinates be denoted, respectively, with unprimed and primed variables. An incoherent 4D object produces a 2D image based on the in-focus object distance according toSupplement 1 for details), denotes a rectangular pulse, and , are the pixel pitch and integration time, respectively. Sweeping the focus with motion parameterized by modifies the PSF according to 2]. This PSF is classically employed with in EDOF applications, where the translation velocity depends on the exposure time and desired depth of field. Although this approach can extend a camera’s depth of field after a single deconvolution, sweeping the focus in this way does not lend itself to tomographic imaging since the resulting PSFs are approximately depth-invariant. Our proposed framework augments this focal sweeping strategy with concurrent transverse image translation to gain tomographic imaging performance.
Similar to tomographic pupil coding strategies proposed in [6,7], we can encode depth into 2D measurements by engineering PSFs that have two properties: (1) they must change as a function of depth, and (2) their transfer functions must be as high as possible over the object ranges in question to maximize reconstruction quality. Rather than coding the pupil explicitly, we may engineer depth-dependent, high-bandwidth kernels by synchronously translating the optics or sensor in the planes (as shown in Fig. 1) with motion parameterized by to modify Eq. (2) to
We now simulate this image translation framework for a 50 mm, diffraction-limited, incoherently illuminated circular pupil with a wavelength of 550 nm. For this simulation, we set the image translation PSF parameters as , in units of pixels and in units of diopters, with a target depth of field diopters. The standard and image translation PSFs are shown for various levels of defocus relative to optical infinity in Fig. 2. The image translation PSF rotates in a quarter circle with defocus and maintains a compact support size relative to the standard PSF.
We compare the MTF cross sections of the image translation PSF and of the traditional focal sweep PSF in Fig. 3 over the same amount of defocus. We note that the focal sweep MTF (red) appears nearly depth-invariant and that the horizontal (blue) and vertical (purple) MTFs switch at .03 diopters of defocus due to the 90° rotation over this range.
Since Eq. (1) is a shift-variant transformation, we may reconstruct focal channels of the object from the data by using a local deconvolution algorithm. Given patches of image data (with superscripts ), our algorithm takes the th measurement patch and a known set of local image translation PSFs , as inputs. We reconstruct each patch of image data using the hyper-Laplacian priors algorithm proposed in . For all image patches, we repeat this process times and estimate the true object and depth by solving4) is highly parallelizable since it operates at the patch level. See Supplement 1 for more details about the reconstruction algorithm.
We now turn to the physical implementation of image translation in an imaging system. Though image translation may be theoretically implemented via a camera’s OIS module, low-level access to such hardware is unavailable at the time of writing. Additionally, the required software and firmware modifications to deploy such functionality onto modern cameras could be substantial. These aside, we herein identify the components necessary to emulate the functionality of the OIS module.
OIS modules function by mechanically or electrically altering the angle and/or position of the lens or sensor relative to the system’s optical axis. Though this is commercially used to counteract camera handshake during long-exposure photography or video capture, it is also suitable for arbitrary 3D image translation when combined with control of the focus module. To emulate this functionality, we must translate the optical datacube in three dimensions within a single exposure time .
To facilitate alignment and synchronization, we choose a deformable mirror (DM) to simultaneously sweep focus and translate the image. Our prototype features OKOTech’s MMDM-17-TT, which is a 15 mm diameter, 17-actuator, tip/tilt-integrated DM with a maximum central stroke of 10 μm, tip/tilt angle of , and operating frequency of 200 Hz and 2 kHz for the tip/tilt and actuator channels, respectively. The DM is controlled via UDP communication with a 40-channel Ethernet DAC board (EDAC40, OKOTech). One of the extra channels is configured to synchronously send trigger pulses to the camera.
We employ a beam-splitter cube (Edmund Optics) to direct incident light onto the DM. The DM is placed one focal length away from the front principal plane of our 50 mm, microvideo lens (Edmund Optics #66-897), which is mounted onto a CCD camera (AVT Guppy Pro). In this configuration [shown in Fig. 4(a)], the mirror is the aperture stop, the system is image-side telecentric and works at , and the focal and transverse image modulation ranges are 0.7 diopters and 15 image pixels in each of the directions, respectively.
Due to two passes through the 50%-transmissive beam-splitter cube, 75% of the incident light is lost immediately, excluding surface reflections at each of the lens and beam-splitter faces. Because of this, we set an exposure time of . The camera’s integration is synchronized with the sinusoidal modulation of the DM’s tip/tilt channels [Fig. 4(b)].
The DM’s tip/tilt actuators drive the PSFs to rotate in a quarter circle with an 11-pixel radius during a 0.313-diopter focal sweep; hence the transverse motion is parameterized (in units of pixels) by , , and the focal sweep motion is given by diopters. Here, the rotation angle of the PSFs’ irradiance peaks varies linearly with dioptric distance (Fig. 5). The rotation angle is found by fitting a circle with an 11-pixel radius to each PSF and computing the angle of the peak irradiance relative to the circle’s center.
To demonstrate the proposed image translation framework, we sense volumetric imagery within a focal workspace of (0.313 diopters). The volumetric scene consists of a colored brick figurine, a page of Chinese characters, and a resolution target located at , , and from the camera, respectively. The scene, fixed-focus images, and video of the scene as observed by the prototype when translating at 1/14 the snapshot rate are given in Fig. 6. As a calibration procedure, we capture image translation PSFs shown in Fig. 5 at these positions; these PSFs are input to the reconstruction algorithm. Three focal planes are selected as an example to demonstrate the hardware and algorithm; more planes may be used in future work. The single frame of data captured with our prototype is shown in Fig. 7(a). Solving an unoptimized MATLAB implementation of Eq. (4) takes roughly 60 s for an all-in-focus image and a rough depth map of a image on an 8-core CPU running at 3.3 GHz. The reconstructed EDOF image and depth map are shown in Figs. 7(b) and 7(c). We can see detail within the resolution target, the characters, and the figurine that greatly surpasses the total observable detail for any single fixed-focus image. Each level of this depth map corresponds to one of the depth-varying PSFs that minimize Eq. (4) at the patch level. We use a sliding window technique (with 4 pixels of overlap) for patches of size pixels.
Unfortunately, the deformable mirror was fabricated with Zernike astigmatism; this was mostly corrected directly with the DM (see Supplement 1 for details about the correction procedure). Despite the correction, residual aberrations associated with the DM and beam splitter remain, hence the observable sidelobes of the PSF (compare the shapes of the experimental PSFs shown in Fig. 5 with the shapes of the simulated PSFs shown in Fig. 2). Despite these degraded PSFs, the algorithm still converges to a plausible all-in-focus image and depth map, with the exception of areas with little texture. This is a known challenge for single-aperture 3D imaging techniques .
In this Letter, we have demonstrated a maskless imaging technique capable of snapshot focal tomography. This image translation framework may be used to form PSFs attainable by pupil coding or image-space coding strategies with the benefit that it may be switched off when high-MTF 2D imaging is desired. Future work will focus on designing the image translation PSFs to improve tomographic imaging performance, and on improving the number of object ranges and the depth map processing.
Defense Advanced Research Projects Agency (DARPA) (N66001-11-1-4002).
See Supplement 1 for supporting content.
1. D. J. Brady, in Optical Imaging and Spectroscopy (Wiley, 2009).
2. D. Miau, O. Cossairt, and S. K. Nayar, in IEEE International Conference on Computational Photography (ICCP) (IEEE, 2013).
3. E. R. Dowski and W. T. Cathey, Appl. Opt. 34, 1859 (1995). [CrossRef]
4. K. Venkataraman, D. Lelescu, J. Duparré, A. McMahon, G. Molina, P. Chatterjee, R. Mullis, and S. K. Nayar, ACM Trans. Graphics 32, 1 (2013). [CrossRef]
5. C. Zhou, S. Lin, and S. K. Nayar, in IEEE 12th International Conference on Computer Vision (IEEE, 2009), pp. 325–332.
6. A. Levin, R. Fergus, F. Durand, and W. T. Freeman, ACM Trans. Graphics 26, 70 (2007). [CrossRef]
7. S. R. P. Pavani, M. A. Thompson, J. S. Biteen, S. J. Lord, N. Liu, R. J. Twieg, R. Piestun, and W. E. Moerner, Proc. Natl. Acad. Sci. USA 106, 2995 (2009).
8. R. Ng, M. Levoy, M. Brédif, G. Duval, M. Horowitz, and P. Hanrahan, Comput. Sci. Tech. Rep. 2, 11 (2005).
9. P. Llull, X. Yuan, X. Liao, J. Yang, L. Carin, G. Sapiro, and D. J. Brady, in Computational Optical Sensing and Imaging (Optical Society of America, 2014), paper CM2D-3.
10. X. Lin, J. Suo, G. Wetzstein, Q. Dai, and R. Raskar, in IEEE International Conference on Computational Photography (ICCP) (IEEE, 2013).
11. D. J. Brady and D. L. Marks, Appl. Opt. 50, 4436 (2011). [CrossRef]
12. S. Quirin and R. Piestun, Appl. Opt. 52, A367 (2013). [CrossRef]
13. A. A. Wagadarikar, N. P. Pitsianis, X. Sun, and D. J. Brady, Opt. Express 17, 6368 (2009). [CrossRef]
14. T.-H. Tsai and D. J. Brady, Appl. Opt. 52, 2153 (2013). [CrossRef]
15. P. Llull, X. Liao, X. Yuan, J. Yang, D. Kittle, L. Carin, G. Sapiro, and D. J. Brady, Opt. Express 21, 10526 (2013). [CrossRef]
16. Y. Bando, B.-Y. Chen, and T. Nishita, Comput. Graphics Forum 30, 1869 (2011). [CrossRef]
17. A. Mohan, D. Lanman, S. Hiura, and R. Raskar, in IEEE International Conference on Computational Photography (ICCP) (IEEE, 2009).
18. D. Krishnan and R. Fergus, in Advances in Neural Information Processing Systems (NIPS, 2009), pp. 1033–1041.