## Abstract

Optical imaging systems in which the lens and sensor are free to rotate about independent pivots offer greater degrees of freedom for controlling and optimizing the process of image gathering. However, to benefit from the expanded possibilities, we need an imaging model that directly incorporates the essential parameters. In this work, we propose a model of imaging which can accurately predict the geometric properties of the image in such systems. Furthermore, we introduce a new method for synthesizing an omnifocus (all-in-focus) image from a sequence of images captured while rotating a lens. The crux of our approach lies in insights gained from the new model.

© 2017 Optical Society of America

## 1. INTRODUCTION

In this paper, we introduce a geometric model for imaging systems in which the lens and the sensor are free to rotate about independent pivots. An example of such an imager is a Scheimpflug camera.

Although there are existing models, several of them use the thin lens approximation that is overly simplistic. For example, using a thin-lens model, it is impossible to describe the shift in the image field observed upon tilting a lens. On the other hand, thick-lens models that represent the lens using the cardinal planes do not explicitly consider the effects of the pupils on image formation. The absence of pupil parameters in these models makes it difficult to predict the exact nature of the warping in the image field induced by lens rotation.

The model in this paper is no more (or less) accurate than the current thick-lens models, yet it is better suited for predicting the nature of warping in the image when we rotate the lens about an arbitrary point along the optical axis. In the absence of aberrations, the center of perspective projection resides at the center of the entrance pupil [1]. That is, the bundle of chief rays emerging from points in the object space converges at the center of the entrance pupil, forming the vertex of the object-side perspective cone. Concomitantly, the bundle of chief rays diverging from the center of the exit pupil constitutes the vertex of the perspective cone on the image side. Therefore, it seems natural that to make accurate predictions, the image formation model must explicitly incorporate the pupil parameters (the location and size of the entrance and exit pupils), which directly influence the nature of image warping.

We have divided this paper into two main parts. In the first part (Sections 2 and 3), we derive two general relationships: (1) Eq. (12) represents the mapping between image and object points in a system with an arbitrarily rotated lens and sensor planes; (2) Eq. (19) is the relationship between the position and orientations of the object, lens, and sensor planes required for focusing in such systems.

In the second part of the paper (Section 4), we combine salient features from Scheimpflug imaging and focus stacking to propose a new computational technique for circumventing the problem of limited depth of field (DOF). Specifically, we demonstrate how to synthesize an omnifocus (all-in-focus) image from a sequence of images captured while rotating a lens about the center of its entrance pupil.

The underlying mechanisms of our technique stem from the insights we gain about the properties of the geometric image using the derived model. We discuss the impact of the pupils on the correspondence problem between the images in the stack. In particular, we demonstrate that we can register the sequence of images analytically if we rotate the lens about the center of its entrance pupil. Analytic registration is advantageous because it avoids iterative algorithms and is unaffected by the noise and optical blurring that is inevitable in such methods. Our model also shows that if the entrance and exit pupils are of equal size, then the transformation between the images obtained while rotating a lens reduces to a combination of simple translation and scaling.

## 2. GEOMETRIC MODEL OF IMAGING WITH TILTED LENS AND SENSOR

#### A. Geometric Image for Tilted Lens and Sensor

### 1. Transfer of Chief Rays’ Direction Cosine from Entrance to Exit Pupil

To make the problem tractable, yet provide sufficient complexity required for the model, we have made a few assumptions. Specifically, we assume the lens to be rectilinear and free of optical aberrations. We have also utilized a few constructs of paraxial optics theory, such as we approximate the entrance and exit pupils as planes, and the object and image distances from the entrance and exit pupils, respectively, are large compared to the semi-diameter of the corresponding pupils. We also restrict the rotation angles of the object, lens, and sensor planes between $-\pi /2$ and $\pi /2$ about both $x$ and $y$ axes (in-plane rotations or rotations about the $z$ axis are irrelevant for our purpose). This constraint warrants non-negative values for the $z$-component of the plane normals and permits us to estimate the plane normal unambiguously.

Figure 1 shows a schematic of a general camera represented by the pupil and sensor planes. The figure also enumerates the set of symbols used in the mathematical derivation of our model. We denote the camera coordinate frame by $\{C\}$. The pivot for the lens is at the origin of $\{C\}$, about which the optical axis may rotate about the $x$ and $y$ axes. The centers of the paraxial entrance and exit pupils—represented by $E$ and $\stackrel{\xb4}{E}$—lie along the optical axis at distances ${d}_{e}$ and ${\stackrel{\xb4}{d}}_{e}$, respectively, from the origin of $\{C\}$. The diameters of the entrance and exit pupils are ${h}_{e}$ and ${\stackrel{\xb4}{h}}_{e}$, respectively. The symbol $\{I\}$ denotes the two-dimensional image coordinates. The origin of $\{I\}$, at which the sensor plane is pivoted, is located at ${\mathit{t}}_{i}={[0,0,{\stackrel{\xb4}{z}}_{o}]}^{T}$ in $\{C\}$. The figure also illustrates two rays that are fundamental to geometric optics from the object space to the image space—the chief ray and the marginal ray. These two rays, along with the optical axis, always lie in the meridional plane that spans across the object and image spaces [2,3].

The chief ray, with direction cosine $\mathit{l}$, emerges from the object point $\mathbf{x}$, passes through the center of the entrance pupil $E$, reemerges from the center of the exit pupil $\stackrel{\xb4}{E}$ with direction cosine $\stackrel{\xb4}{\mathit{l}}$, and intersects the sensor plane at $\stackrel{\xb4}{\mathbf{x}}$. We expect the input direction cosine $\mathit{l}$ and the output direction cosine $\stackrel{\xb4}{\mathit{l}}$ to be coplanar; but are $\mathit{l}$ and $\stackrel{\xb4}{\mathit{l}}$ equal? Before we attempt to answer this question, we first consider a simpler question: if the chief ray makes angles $\omega $ and $\stackrel{\xb4}{\omega}$ with the optical axis in the object and image space, respectively, then is $|\omega |=|\stackrel{\xb4}{\omega}|$?

To find the relationship between $\omega $ and $\stackrel{\xb4}{\omega}$, we consider the marginal rays and the pupils. The marginal ray in the object space originates at the base (projection) of the object point $\mathbf{x}$ on the optical axis and travels to the edge of the paraxial entrance pupil at height ${h}_{e}/2$. The marginal ray in the image space travels from the edge of the exit pupil at height ${\stackrel{\xb4}{h}}_{e}/2$ to the base of the image point $\stackrel{\xb4}{\mathbf{x}}$ on the optical axis. Suppose the marginal ray makes an angle $\mathrm{\Omega}$ with the optical axis in the object space and an angle $\stackrel{\xb4}{\mathrm{\Omega}}$ with the optical axis in the image space. Then, if ${z}_{e}\gg {h}_{e}$ and ${\stackrel{\xb4}{z}}_{e}\gg {\stackrel{\xb4}{h}}_{e}$ (generally the case in macroscopic imaging), we obtain

Note that although the image point $\stackrel{\xb4}{\mathbf{x}}$ lies in the sensor plane (by definition), its projection on the optical axis may not. The projection of $\stackrel{\xb4}{\mathbf{x}}$ on the optical axis lies in the sensor plane only in the special, yet common case when the optical axis is normal to the sensor plane.

The ratio of the paraxial exit pupil height to the entrance pupil height, ${\stackrel{\xb4}{h}}_{e}/{h}_{e}$, is defined as the *pupil magnification* ${m}_{p}$ [1,4,5]. Further, according to the *Lagrange invariant* property [4] of the two rays (the chief and the marginal rays), the transverse magnification ($\stackrel{\xb4}{y}/y$) is reciprocal to the angular magnification ($\stackrel{\xb4}{\mathrm{\Omega}}/\mathrm{\Omega}$). Therefore, Eq. (1) reduces to

Equation (2) has been derived in [5] using a different approach. We see that unlike nodal rays, the angles that the chief ray makes with the optical axis in the object space are, in general, not equal to the angle it makes with the optical axis in the image space.

To derive the relation between the object and image space direction cosine of the chief ray—$\mathit{l}$ and $\stackrel{\xb4}{\mathit{l}}$—let us initially suppose that the lens is in the nominal orientation in which the optical axis is coincident with the $z$ axis of frame $\{C\}$ (we shall relax this condition later). Consequently, the zenith angle of all chief rays in the object space and all chief rays in the image space are $\omega $ and $\stackrel{\xb4}{\omega}$, respectively. For a chief ray, let the azimuthal angles in the object and image space be $\phi $ and $\stackrel{\xb4}{\phi}$, respectively. If we represent $\mathit{l}={[l,m,n]}^{T}$ and $\stackrel{\xb4}{\mathit{l}}={[\stackrel{\xb4}{l},\stackrel{\xb4}{m},\stackrel{\xb4}{n}]}^{T}$, then in terms of the azimuthal and zenith angles we have

Please note that if the optical axis and the $z$ axis of $\{C\}$ were always coincident, then, utilizing axial symmetry, we could simplify the definition of the direction cosines by letting $\phi =90\xb0$ (and $\stackrel{\xb4}{\phi}=90\xb0$), and restricting our analysis to the meridional plane. However, since the lens and the sensor are free to rotate about their independent pivots, the system is not axially symmetric. Therefore, we use the full definition of direction cosines: using both azimuthal and zenith angles.

Following a few algebraic steps using Eq. (2), Eq. (3), and using the fact that a chief ray in the object and image space is always confined to the same meridional plane (i.e., $\stackrel{\xb4}{\phi}=\phi $), we obtain

We can write Eq. (4) compactly as

where ${M}_{p}$ is a $3\times 3$ diagonal matrix with 1, 1, and ${m}_{p}$ as the diagonal elements. Further, we can safely drop the negative sign in Eq. (5) since the ray emerging from the exit pupil travels in the direction of the positive $z$ axis toward the sensor plane. Equation (5) represents the relationship between the input and output direction cosines $\mathit{l}$ and $\stackrel{\xb4}{\mathit{l}}$ when the lens is not rotated (i.e., the optical axis coincides with the $z$ axis of frame $\{C\}$).To derive the general expression for the transfer of the chief ray’s direction cosine, we first introduce ${R}_{\ell}\in {\mathbb{R}}^{3\times 3}$—the rotation matrix applied to the optical axis to rotate the lens about its pivot (at the origin of $\{C\}$). We also introduce a local coordinate frame $\{\mathcal{L}\}$ with its origin also at the lens’s pivot, but fixed to the lens such that the $z$ axis of $\{\mathcal{L}\}$ is along the optical axis. The pupil planes and the reference frame $\{\mathcal{L}\}$ rotate along with the optical axis when the lens rotates. As before, we represent the input direction cosine of the chief ray in frame $\{C\}$ as $\mathit{l}$. The vector $\mathit{l}$ in frame $\{\mathcal{L}\}$ becomes ${}^{\mathcal{L}}\mathit{l}={R}_{\ell}^{T}\mathit{l}$. Consequently, ${}^{\mathcal{L}}n$, the $z$-component of ${}^{\mathcal{L}}\mathit{l}$, is ${n}_{R}={\mathit{r}}_{\ell ,3}^{T}\mathit{l}$, where ${\mathit{r}}_{\ell ,3}$ is the third column of ${R}_{\ell}$. Using Eq. (5), we obtain the output direction cosine of the chief ray in reference frame $\{\mathcal{L}\}$ as

Finally, we obtain the output direction cosine of the chief ray, in frame $\{C\}$, that emerges from the exit pupil as $\stackrel{\xb4}{\mathit{l}}={R}_{\ell}{}^{\mathcal{L}}\stackrel{\xb4}{\mathit{l}}$. That is,

We expect the direction cosine $\stackrel{\xb4}{\mathit{l}}$ to have unit magnitude. It is indeed straightforward to show the ${\ell}^{2}$-Norm of $\stackrel{\xb4}{\mathit{l}}$ equals unity, and ${(1+({m}_{p}^{2}-1){({}^{\mathcal{L}}n)}^{2})}^{-1}$ is the normalizing term. Note that if the pupil magnification ${m}_{p}$ of the lens equals unity, then $\stackrel{\xb4}{\mathit{l}}=\mathit{l}$. This result implies that the opening angles of the image and object space perspective cones are equal irrespective of the orientation of the optical axis if ${m}_{p}=1$. In terms of geometric optics, ${m}_{p}=1$ also implies that the paraxial entrance and exit pupil planes are coincident with the front and rear principal planes, respectively. Such lenses in which ${m}_{p}=1$ are called symmetric lenses.

### 2. Expression of Image Coordinates for Arbitrary Orientation of Lens and Sensor Planes

Equation (7) relates the direction cosines of the chief ray in the object and image spaces. The expression already includes important parameters we would like to model—pupil magnification and the lens rotation matrix. All we are left to do is to incorporate the sensor plane’s orientation, the object point $\mathbf{x}$, and the image point $\stackrel{\xb4}{\mathbf{x}}$. In this section, we build upon Eq. (7) and use properties of planes and ray–plane intersection to obtain an expression for the image point coordinates $\stackrel{\xb4}{\mathbf{x}}$.

The centers of the entrance and exit pupils are located at distances ${d}_{e}$ and ${\stackrel{\xb4}{d}}_{e}$ from the origin of $\{C\}$ along the optical axis. Following the rotation of the optical axis, the new locations of the pupil centers in frame $\{C\}$ become ${R}_{\ell}{[0,0,{d}_{e}]}^{T}={d}_{e}{\mathit{r}}_{\ell ,3}$ and ${R}_{\ell}{[0,0,{\stackrel{\xb4}{d}}_{e}]}^{T}={\stackrel{\xb4}{d}}_{e}{\mathit{r}}_{\ell ,3}$. Further, we express the chief ray emerging from the exit pupil as $\mathit{k}(\lambda )={\stackrel{\xb4}{d}}_{e}{\mathit{r}}_{\ell ,3}+\lambda \stackrel{\xb4}{\mathit{l}}$, where the parameter $\lambda \in \mathbb{R}$ determines the length of the ray. Substituting Eq. (7) for $\stackrel{\xb4}{\mathit{l}}$ we obtain

We would like to determine the expression for $\lambda $ for which $\mathit{k}(\lambda )=\stackrel{\xb4}{\mathbf{x}}$. Let ${\stackrel{\xb4}{z}}_{o\perp}$ be the perpendicular distance of the sensor plane from the origin of $\{C\}$. Further, if the sensor plane has surface normal ${\widehat{\mathit{n}}}_{i}$, then ${\widehat{\mathit{n}}}_{i}^{T}\stackrel{\xb4}{\mathbf{x}}={\stackrel{\xb4}{z}}_{o\perp}$ represents the equation of the sensor plane in frame $\{C\}$ in Hessian normal form. Therefore, when $\mathit{k}(\lambda )=\stackrel{\xb4}{\mathbf{x}}$, we obtain

Furthermore, if we represent the orientation of the sensor plane by ${R}_{i}\in {\mathbb{R}}^{3\times 3}$, then ${\widehat{\mathit{n}}}_{i}={R}_{i}{[0,0,1]}^{T}$. Also, since the point ${\mathit{t}}_{i}={[0,0,{\stackrel{\xb4}{z}}_{o}]}^{T}$ lies on the sensor plane, we can write ${\stackrel{\xb4}{z}}_{o\perp}={\widehat{\mathit{n}}}_{i}^{T}{\mathit{t}}_{i}={\widehat{\mathit{n}}}_{i}(3){\stackrel{\xb4}{z}}_{o}$.

Substituting Eq. (9) into Eq. (8) and using ${\stackrel{\xb4}{z}}_{o\perp}={\widehat{\mathit{n}}}_{i}(3){\stackrel{\xb4}{z}}_{o}$, we obtain the expression for the image point $\stackrel{\xb4}{\mathbf{x}}$ as

Let the location of the entrance pupil in $\{C\}$ be ${\mathbf{x}}_{e}={d}_{e}{\mathit{r}}_{\ell ,3}$. We express $\mathit{l}$ in terms of $\mathbf{x}$ and ${\mathbf{x}}_{e}$ as $\mathit{l}=-(\mathbf{x}-{d}_{e}{\mathit{r}}_{\ell ,3})/(\Vert {\mathbf{x}}_{e}-\mathbf{x}\Vert )$. Substituting $\mathit{l}$ into Eq. (10) yields

Equation (11) expresses the image point $\stackrel{\xb4}{\mathbf{x}}$ in the camera frame. It is more useful to represent $\stackrel{\xb4}{\mathbf{x}}$ in the two-dimensional image frame $\{I\}$. If we represent the coordinates of the image point in the camera frame $\{C\}$ as ${}^{C}\stackrel{\xb4}{\mathbf{x}}$, and the equivalent image point coordinate in the image frame $\{I\}$ as ${}^{I}\stackrel{\xb4}{\mathbf{x}}$, then ${}^{I}\stackrel{\xb4}{\mathbf{x}}={R}_{i}^{T}({}^{C}\stackrel{\xb4}{\mathbf{x}}-{\mathit{t}}_{i})$. Therefore, the expression for the image point in the two-dimensional image coordinates when the lens and sensor planes are free to rotate about their own pivots follows as

Equation (12) gives the coordinates of the image point (in the image frame) as a function of the corresponding object point (in the camera frame), the position and orientation of the sensor plane, the orientation of the lens, the locations of the entrance and exit pupils, and the pupil magnification. Note that the image coordinates obtained in Eq. (12) are expressed in physical units.

We establish the veracity of Eq. (12) in Section 3.A by comparing the image point coordinates computed using Eq. (12) against the corresponding values generated via ray tracing in Zemax for several different combinations of the parameters. But first, we provide a very brief qualitative study of the effects of lens rotations on the geometric properties of the image. Specifically, we investigate how pupil magnification ${m}_{p}$ and the location of the lens’s pivot effects the geometric distortions. This study will help us in understanding the underlying mechanisms of the omnifocus image synthesis technique presented in Section 4.

Figure 2 shows the type of distortions in “images” of two planes in the object space. For this qualitative study the term “image” just means the point of intersection (POI) of the chief ray from the object point with the sensor plane obtained using Eq. (12). The object space consists of two planes—a near plane and a far plane. The near plane is a square of 88.15 mm on each side, and the far plane is a square of 178.3 mm on each side placed at twice the distance of the near plane from the entrance pupil. The object points consist of $7\times 7$ square grids on each of the object planes; however, the subplots in Fig. 2 show only three rows out of seven. The exact distance of the near plane (and consequently the far plane) from the lens varies depending upon the pupil magnification, such that the images of the two planes are 4.5 mm on each side on the sensor plane. The sensor plane is not tilted for this study. Furthermore, when the optical axis is perpendicular to the sensor and object planes, the images of the two object planes perfectly overlap. The rotation of the lens distorts the image fields (set of image points) from the two object planes. We can observe that the nature of the distortion is affected by both the pupil magnification ${m}_{p}$ and the location of the lens’s pivot point with respect to the entrance pupil (${d}_{e}$). If ${m}_{p}=1$, the extent of scaling and transverse shift is uniform across the field. More importantly, if the lens is pivoted at the entrance pupil, then the geometric warping of the image field becomes independent of the object distance.

#### B. Object, Lens, and Image Plane Relationships for Focusing using a Scheimpflug Camera

Hitherto, we have expressed the coordinates of the image point corresponding to an object point when the lens and sensor planes are free to rotate about their respective pivots. However, we did not apply any constraints on the orientations of the object, lens, and sensor planes such that points on the object plane are brought to focus (geometric) on the sensor plane. To that effect, we use a variant of the Gaussian lens formula for the parallel plane imaging configuration that relates the object-plane-to-entrance-pupil distance $u$, exit-pupil-to-image-plane distance $\stackrel{\xb4}{u}$, pupil magnification ${m}_{p}$, and focal length $f$ as shown below [5,6]:

In Eq. (13) we specify the directed distances $u$ and $\stackrel{\xb4}{u}$ along the optical axis. Let us suppose that the object plane is pivoted at $(0,0,{z}_{o})$ in the camera frame $\{C\}$. Also, we represent the orientation of the object plane using the rotation matrix ${R}_{o}\in {\mathbb{R}}^{3\times 3}$. Then, the object plane normal, following rotation, is the vector ${\widehat{\mathit{n}}}_{o}={R}_{o}{[0,0,1]}^{T}$. Now, suppose the orientations of the three planes are such that points in the arbitrarily tilted object plane form focused images on the arbitrarily tilted image plane. Then, the projection of the chief ray in the object space from $\mathbf{x}$ to ${\mathbf{x}}_{e}$ on the optical axis and the projection of the chief ray in the image space from ${\stackrel{\xb4}{\mathbf{x}}}_{e}$ to $\stackrel{\xb4}{\mathbf{x}}$ on the optical axis must satisfy Eq. (13).

Following a similar formulation of the chief ray as in Section 2.A.2, we obtain $\underset{\u02dc}{u}$, the length of the chief ray from $\mathbf{x}$ to ${\mathbf{x}}_{e}$ as (${\widehat{\mathit{c}}}_{z}={[0,0,1]}^{T}$)

The ray vector of length $\underset{\u02dc}{u}$ and direction $\mathit{l}$ in the object space is $\underset{\u02dc}{u}\mathit{l}$. The projection of this ray vector on the optical axis $\widehat{\mathit{o}}$; ($={R}_{\ell}{\widehat{\mathit{c}}}_{z}$) is $\underset{\u02dc}{u}(\mathit{l}\xb7\widehat{\mathit{o}})$, and the corresponding directed distance (from ${\mathbf{x}}_{e}$ toward $\mathbf{x}$) is $u=-\underset{\u02dc}{u}(\mathit{l}\xb7\widehat{\mathit{o}})$ Similarly, the projection of the ray in the image space on the optical axis (and the corresponding directed distance) is $\stackrel{\xb4}{u}=\underset{\u02dc}{\stackrel{\xb4}{u}}(\stackrel{\xb4}{\mathit{l}}\xb7\widehat{\mathit{o}})$. Substituting $u$ and $\stackrel{\xb4}{u}$ into Eq. (13), and using Eq. (7), we obtain

Following some algebraic manipulations, especially noting that ${\widehat{\mathit{n}}}_{i}^{T}{R}_{\ell}{M}_{p}{R}_{\ell}^{T}\mathit{l}$ is equivalent to ${({R}_{\ell}{M}_{p}{R}_{\ell}^{T}{\widehat{\mathit{n}}}_{i})}^{T}\mathit{l}$ because ${M}_{p}$ is a diagonal matrix and ${R}_{\ell}$ is a rotation matrix, we obtain

The ${\ell}^{2}$-Norm of the direction cosine $\mathit{l}$ equals one, and $\mathit{l}$, in general, cannot be perpendicular to the vector $[-\frac{{\widehat{\mathit{n}}}_{o}}{{m}_{p}[{z}_{o}{\widehat{\mathit{n}}}_{o}(3)-{d}_{e}({\widehat{\mathit{n}}}_{o}^{T}{\mathit{r}}_{\ell ,3}]}+\frac{{R}_{\ell}{M}_{p}{R}_{\ell}^{T}{\widehat{\mathit{n}}}_{i}}{[{\stackrel{\xb4}{z}}_{o}{\widehat{\mathit{n}}}_{i}(3)-{\stackrel{\xb4}{d}}_{e}({\widehat{\mathit{n}}}_{i}^{T}{\mathit{r}}_{\ell ,3})]}-\frac{{\mathit{r}}_{\ell ,3}}{f}]$. Therefore, we obtain

Further, we can simplify Eq. (18) if we let ${\underset{\u02dc}{\widehat{\mathit{n}}}}_{o}=\frac{{\widehat{\mathit{n}}}_{o}}{{\widehat{\mathit{n}}}_{o}(3)}={[\frac{{\widehat{\mathit{n}}}_{o}(1)}{{\widehat{\mathit{n}}}_{o}(3)},\frac{{\widehat{\mathit{n}}}_{o}(2)}{{\widehat{\mathit{n}}}_{o}(3)},1]}^{T}$ and ${\underset{\u02dc}{\widehat{\mathit{n}}}}_{i}=\frac{{\widehat{\mathit{n}}}_{i}}{{\widehat{\mathit{n}}}_{i}(3)}={[\frac{{\widehat{\mathit{n}}}_{i}(1)}{{\widehat{\mathit{n}}}_{i}(3)},\frac{{\widehat{\mathit{n}}}_{i}(2)}{{\widehat{\mathit{n}}}_{i}(3)},1]}^{T}$. Then, after factoring ${\widehat{\mathit{n}}}_{o}(3)$ and ${\widehat{\mathit{n}}}_{i}(3)$ out of the denominator terms, we can write Eq. (18) as

This expedient simplification from Eq. (18) to Eq. (19) is possible because we can describe the unit normal vectors ${\widehat{\mathit{n}}}_{o}$ and ${\widehat{\mathit{n}}}_{i}$ using only the components along the $x$ and $y$ axis. In other words, if we know the $x$ and $y$ components of the normal, we can determine the $z$ component uniquely because the planes are limited to rotations between $-\pi /2$ and $\pi /2$ about both $x$ and $y$ axes (one of the assumptions in this model).

Equation (19) is most general in the sense that it yields the specific formulas for standard imaging configurations such as fronto-parallel imaging, focusing with only sensor tilt, focusing with only lens tilt, or focusing with both sensor and lens tilts.

## 3. VERIFICATION OF MODEL FOR IMAGING WITH TILTED LENS AND SENSOR

#### A. Verification of the Imaging Equation in Zemax

We verified the accuracy of the imaging equation Eq. (12) by comparing the numerically computed values of image points (intersection of the chief ray with a tilted image plane) using Eq. (12) with the corresponding image points obtained by tracing chief rays from a grid of object points belonging to a tilted object plane. Figure 3 shows the layout plot of the optical system implemented in Zemax showing (1) an object plane, (2) an ideal lens made from two paraxial surfaces and pivoted about a point away from the entrance pupil (${d}_{e}=-5\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{mm}$), and (3) an image plane pivoted about the image plane pivot along the $z$ axis.

The results of the simulation are tabulated in Table 1, which shows the set of object points, the numerically computed image points, the ray traced image points, and the absolute difference between the numerically computed and ray traced image points. We observe that the numerically computed and ray traced values of the image points are very close; the small difference in their values can be attributed to the error associated with floating point operations. This comparison demonstrates the accuracy of the analytically derived expression [Eq. (12)] representing the geometric relationship between a three-dimensional object point and its image point in the absence of optical aberrations.

#### B. Verification of Equation for Focusing on Tilted Planes in Zemax

While several relationships between the object, lens, and image plane can be derived from Eq. (19) which correspond to specific cases of Scheimpflug imaging configurations, here we show and verify the relationships for focusing on an object plane tilted about the $x$ axis by rotating a thick lens about the center of its entrance pupil. For this configuration, we obtain the following two relationships—expression for the image plane pivot distance ${\stackrel{\xb4}{z}}_{o}$ and the object tilt angle $\beta $—starting from Eq. (19):

Table 2 enumerates the results of our test. To verify the above equations, we implemented a thick-lens model of focal length $f=24.0\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{mm}$ in Zemax using two paraxial surfaces (to simulate aberration-free, geometric imaging) having pupil magnification ${m}_{p}=2$. The lens surfaces were grouped within two coordinate break surfaces that allowed the lens to be tilted about the entrance pupil. The object plane surface was placed at ${z}_{o}=-504.0\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{mm}$ from $\{C\}$ (and from the entrance pupil). For every object plane orientation $\beta $ (*col.* 1), the appropriate lens tilt angle $\alpha $ (*col.* 2) and image plane distance ${\stackrel{\xb4}{z}}_{o}$ (*col.* 3) were obtained using Zemax’s optimization function, to minimize the spot radius across the field. Following optimization for every $\beta $, the value of $\alpha $ obtained from Zemax (along with the values of ${m}_{p}$, ${z}_{o}$, $f$) was used to numerically compute $\beta $ (*col.* 4) and ${\stackrel{\xb4}{z}}_{o}$ (*col.* 5) using the derived Eqs. (20) and (21). We can observe that the values of $\beta $ and ${\stackrel{\xb4}{z}}_{o}$ obtained numerically using the derived equations are very closely matched.

It must be noted that while Eq. (21) is useful for finding the value of the object plane tilt angle $\beta $ for a given value of lens tilt angle $\alpha $, obtaining the inverse function for evaluating $\alpha $ in terms of $\beta $ is not straightforward. However, a simple iterative algorithm, which starts from an initial estimate of $\alpha $ by setting ${m}_{p}=1$, can be used to estimate the required lens tilt angle $\alpha $ required for focusing on a tilted object surface.

## 4. APPLICATION OF THE MODEL FOR OMNIFOCUS IMAGING USING LENS TILT

#### A. Theory

We can infer several insights about the geometric properties of the image formed in a Scheimpflug camera from Eq. (12). In this section, we use one such interesting consequence of Eq. (12) that is useful for synthesizing an omnifocus image by selectively blending multiple images captured while rotating a lens about its entrance pupil center.

An omnifocus image has everything in the close foreground to far background in sharp focus [7]. Lenses can focus only on a single surface—usually, the plane of sharp focus—as dictated by the laws of optics. Consequently, objects fore and aft the plane of sharp focus gradually become out of focus and appear blurry in the image. This interplay of light and lenses leads to the limited depth of field (DOF) problem. Several methods have been proposed to circumvent this problem, for example, depth-dependent image deconvolution, wavefront coding, plenoptic imaging, Scheimpflug imaging, focus stacking, etc.

In Scheimpflug imaging, the lens or the sensor or both are rotated, which induces a rotation of the plane of sharp focus, allowing scenes with significant depths (or object planes that are tilted) to be in focus at the image plane [8].

In focus stacking (or z-stacking), a number of images are captured at multiple focus depths by changing either the focal length or the image plane distance. Consequently, regions of the scene that are a specific distance from the lens and within the DOF are in focus only in a single image. Collectively, however, the stack contains all or most regions of the scene in focus distributed among the images in the stack. An omnifocus image is created by registering the images, followed by identifying and blending the in-focus regions [7,9].

The DOF region in Scheimpflug imaging is still limited to a small region (approximately a wedge) around the plane of sharp focus. In focus stacking, significant portions of each DOF region extend perpendicular to the optical axis of the lens and beyond the field-of-view of the camera, resulting in suboptimal utilization of each DOF.

Our analysis of Eq. (12) suggests that we can borrow the central ideas of Scheimpflug imaging and focus stacking methods to devise a simple technique for creating omnifocus images while bypassing the above shortcomings of either method. Our technique relies on capturing multiple images of the scene while rotating a lens about the entrance pupil center. We also show that the proposed method is simplest if the pupil magnification ${m}_{p}$ of the lens equals one (i.e., a symmetric lens).

A critical step in the synthesis of an omnifocus image from a set of images is registration, which is the process of spatially aligning the images in the stack to a reference image by applying a mapping function—either known *a priori* from the model or estimated from the images. The degree of accuracy of image registration directly influences the quality of the synthesized image.

In general, a rotation of the lens about a pivot along the optical axis results in a complex depth-dependent warping of the image field. The extent of distortion of the points in the image is a function of the point’s depth in the object space. In other words, different parts of the scene warp by different amounts when the lens is rotated. This phenomenon is called the parallax effect. Although there are algorithms for registering images of the same scene exhibiting local variations, the methods are typically iterative in nature, and there are fundamental limits to the achievable registration accuracy [10], especially in the presence of noise and non-geometric distortions such as defocus blur.

If, however, the lens is rotated about its entrance pupil, then the image field warping is independent of the scene depth and we can unwarp the image using a single transformation matrix. Moreover, from a purely geometric standpoint, the images in the stack are pair-wise bilinear through a mapping $H(\delta \alpha )\text{:}{\stackrel{\xb4}{\mathbf{x}}}_{i}\to {\stackrel{\xb4}{\mathbf{x}}}_{j}$, where $\delta \alpha $ is the difference angle of the lens’s orientation between ${\stackrel{\xb4}{\mathbf{x}}}_{i}$ and ${\stackrel{\xb4}{\mathbf{x}}}_{j}$. Further, we can derive this mapping, called the *inter-image homography* [11], from Eq. (19), allowing us to analytically register the images in the sequence. Thus, the registration process is efficient (not requiring an iterative algorithm) and exact.

The specific structure of the inter-image homography matrix depends on the pupil magnification ${m}_{p}$. Interestingly, if the pupil magnification equals one (i.e., a perfectly symmetric lens), the inter-image homography between the image obtained under a lens tilt of $\alpha $ about the $x$ axis (from $+z$ axis toward $+y$ axis) and the reference image that is obtained under no lens tilt, reduces to a simple similarity transformation consisting of only scaling and translation components. This mapping between ${\stackrel{\xb4}{\mathbf{x}}}_{0}$ and ${\stackrel{\xb4}{\mathbf{x}}}_{n}$ is shown below:

In the following subsection, we verify the above theory of omnifocus image synthesis using a simulation in Zemax. Please note that our goal in the next section is not to present a new or best possible algorithm for detecting and fusing focused regions from the images in the stack, but rather to present another method of overcoming the depth of field problem which we believe has some advantages over existing methods.

#### B. Simulation

Figure 4(a) shows a schematic of the image simulation setup in Zemax. We implemented a F/2.5 thick-lens model using two paraxial surfaces of focal lengths ${f}_{1}=40\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{mm}$ and ${f}_{2}=30\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{mm}$ with $s=20\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{mm}$ separation, resulting in an effective focal length $f=24\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{mm}$ $[1/f=1/{f}_{1}+1/{f}_{2}-s/({f}_{1}{f}_{2})$]. A circular stop ($\mathrm{diameter}=7.14\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{mm}$) surface was placed behind the first paraxial surface at a distance $a=11.43\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{mm}$, resulting in a pupil magnification ${m}_{p}=1$ $[{m}_{p}=({f}_{2}/{f}_{1})((a-{f}_{1})/(s-a-{f}_{2}))$]. For tilting the object and lens independently, we set the object surface type as “Tilted,” and bracketed all surfaces associated with the lens within coordinate breaks.

The Image Simulation analysis tool in Sequential Mode in Zemax is powerful and offers an extensive set of tuning parameters. However, in order to produce a representative simulation, the parameters must be chosen carefully based on the objective of the experiment. The most important parameters within the context of the current simulation are: (1) field height of the source bitmap, (2) oversampling factor (if required), (3) pupil sampling, (4) image sampling, (5) aberrations, (6) reference, (7) pixel size, and (8) X pixels and Y pixels. The image simulation process in Zemax essentially consists of the three steps [12]: (a) the source bitmap image is convolved with a point spread function (PSF) grid, that is space variant and accounts for optical aberrations, generated in the object space whose fidelity depends on the set field height, oversampling factor, and number of pixels; (b) the convolved image, in the object space, is transferred to the image space to account for geometric distortions and system magnification; and (c) the sampling effects of a discrete detector is simulated based on the set pixel size and detector size (inferred from pixel size and number of pixels). Since the paraxial surfaces are devoid of any aberrations, we inserted a Zernike Standard Phase surface at the location of the exit pupil to introduce slight spherical aberration. The small amount of spherical aberration also increased the spot size of the PSFs, ensuring adequate pixels to represent each PSF. Additionally, we set sufficiently fine pupil sampling and image sampling (both $64\times 64$) that influences how accurately the PSFs represent system aberrations.

The three-dimensional scene consists of three playing cards ($64\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{mm}\times 89\text{\hspace{0.17em}\hspace{0.17em}}\mathrm{mm}$) placed at 800 mm, 1000 mm, and 1200 mm from the lens’s vertex (before rotating the lens). However, the Image Simulation tool was not designed to simulate imaging three-dimensional scenes. Therefore, we run the image simulation for each depth plane (three), with identical settings and integrate the outputs of each simulation into a single image. An obvious shortcoming of the simple integration process is that it fails to accurately simulate imaging portions of the scene where objects overlap in the image space. To avoid this problem, we spatially separated the three cards along the transverse direction (using appropriate fields setting in Zemax) such that their images (following blurring) do not overlap in the image plane (by picking “Vertex” as the reference under detector settings). This limitation (and the workaround) does not, however, detract from the main purpose of the simulation—to test the feasibility of synthesizing an omnifocus image from a series of images captured under lens tilts.

To simulate the imaging of a scene consisting of $m$ depth planes for $n$ orientations of the lens, we need to execute the Image Simulation tool $m\times n$ times while setting the appropriate simulation parameters and integrating the $m$ outputs for every orientation. We used PyZDDE [13] to automate the entire process of tilting the lens about the $x$ axis pivoted at the center of the entrance pupil to create a sequence of 13 images between $\pm 8\xb0$.

Figure 4(b) shows the integrated image of the scene for lens tilt angle $\alpha =-8\xb0$. Note the transverse shift (downward) of the image field. Although not apparent in the figure, the individual images of the three cards in the image plane are vertically shifted and de-magnified by the same amount, as predicted by Eq. (22). The in-focus regions in this image, detected using a Laplacian of Gaussian (LoG) filter, are shown in Fig. 4(c). Note that no single plane is in complete focus, but parts of each plane that lie within the wedge-shaped DOF surrounding the tilted plane of sharp focus form sharp regions in the image plane. The 13 images were analytically registered (geometric transformation) using the inter-image homography matrix $H(\alpha )$ shown in Eq. (22). Following registration, a composite image was created by blending the in-focus regions (detected using LoG) from the images. Figure 4(d) shows the synthesized image in which the complete scene consisting of three depth planes is in focus. Figure 4(e) shows the degree of focus on the three planes in the composite image measured using the LoG filter.

We have made the simulation code (including Zemax files, Python scripts, and computational notebook) and results available for the interested reader. See Code 1, Ref. [14].

## 5. DISCUSSION AND CONCLUSION

We have proposed a new geometric model for imaging in systems in which the lens and sensor are free to rotate about independent pivots. The proposed model is useful for describing and predicting the properties of images in such systems because it incorporates all the optical parameters that directly influence image formation. The pair of equations—Eq. (12) and Eq. (19)—completely describe the image and focusing relationships in these systems, such as in a Scheimpflug camera. Following the verification of these two equations, we presented an application for addressing the problem of limited depth of field in optical imaging systems. Specifically, we showed a method of computationally generating an omnifocus (all-in-focus) image from a sequence of images obtained while rotating a lens about the entrance pupil. We demonstrated, using a simulation in Zemax, that we can analytically register the images in the stack if the lens is rotated about its entrance pupil. Furthermore, if the lens has unity pupil magnification (symmetric lenses), then the transformation required for registering the images is a simple combination of scaling and transverse shift. The mechanisms underlying our technique for generating omnifocus image can be fully appreciated only in light of the geometric model presented. The closed form expressions for analytic registration were obtained directly from Eq. (12). At this point, it should be noted that if the exact values of the sensor pivot ${\stackrel{\xb4}{z}}_{o}$ and the inter-pupil distance $d$ are unknown, then we must rely on algorithmic registration. Furthermore, the above technique can also be used to increase the depth of field of a Scheimpflug camera from a sequence of images obtained while perturbing the lens’s orientation around the baseline orientation obtained using Eq. (19).

## Funding

U.S. Army Research Laboratory (ARL) (W911NF-06-2-0035).

## Acknowledgment

The work described in this paper was sponsored by the Army Research Laboratory and was accomplished under Cooperative Agreement Number W911NF-06-2-0035. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation heron.

## REFERENCES

**1. **A. Walther, *The Ray and Wave Theory of Lenses*, 1st ed. (Cambridge University, 2006).

**2. **R. Kingslake and R. B. Johnson, *Lens Design Fundamentals*, 2nd ed. (Academic, 2009).

**3. **R. R. Shannon, *The Art and Science of Optical Design*, 1st ed. (Cambridge University, 1997).

**4. **J. E. Greivenkamp, *Field Guide to Geometrical Optics* (SPIE Publications, 2003).

**5. **A. Hornberg, *Handbook of Machine Vision*, 1st ed. (Wiley-VCH, 2006).

**6. **P. Rangarajan, *Pushing the Limits of Imaging Using Patterned Illumination* (Southern Methodist University, 2014).

**7. **N. Xu, K.-H. Tan, H. Arora, and N. Ahuja, *Generating Omnifocus Images Using Graph Cuts and a New Focus Measure* (IEEE, 2004), pp. 697–700.

**8. **R. Jacobson, S. Ray, G. G. Attridge, and N. Axford, *Manual of Photography*, 9th ed. (Focal, 2000).

**9. **C. H. Anderson, J. R. Bergen, P. J. Burt, and J. M. Ogden, *Pyramid Methods in Image Processing* (RCA Engineers, 1984).

**10. **L. C. G. Brown, “A survey of image registration techniques,” ACM Comp. Surv. **24**, 325–376 (1992). [CrossRef]

**11. **A. Criminisi, *Accurate Visual Metrology from Single and Multiple Uncalibrated Images* (University of Oxford, 1999).

**12. **ZEMAX, *Optical Design Program, User’s Manual* (ZEMAX Development Corporation, 2011).

**13. **I. Sinharoy, C. Holloway, and J. Stuermer, “PyZDDE,” in *Zenodo* (2016).

**14. **I. Sinharoy, cosi2016_omnifocus: release of simulation code, files and dataset [Software] (2016), Zenodo. http://doi.org/10.5281/zenodo.59647.