## Abstract

Lack of accurate focus cues in conventional stereoscopic displays has potentially significant effects on depth perception accuracy and visual fatigue. Recently several multi-focal plane display prototypes have been demonstrated with the promise of improving the accuracy of focus cue rendering in stereoscopic displays. In this paper, we present a systematic method to address two fundamental issues in designing a multi-focal plane display: (1) the appropriate dioptric spacing between adjacent focal planes; and (2) the depth-weighted fusing function to render a continuous three-dimensional (3-D) volume using a sparse number of focal planes placed in the space. By taking account of both ocular factors of the human visual system (HVS) and display factors of a multi-focal plane system, we determine that an appropriate spacing between two adjacent focal planes should be ~0.6 diopter (D) while a smaller spacing may be necessary for further improving retinal image quality. We further develop a set of nonlinear depth-weighted fusing function with the promise of balancing perceptual continuity of a 3-D scene and retinal image quality. Our method was based on quantitative evaluation of the modulation transfer functions (MTF) of depth-fused images formed on retina.

© 2010 OSA

## 1. Introduction

Stereoscopic displays have found numerous applications, spanning the fields of scientific visualization, training and simulation, cinema, and home entertainment systems. Although conventional stereoscopic displays can render pictorial and binocular disparity cues fairly well and thus create compelling depth perceptions, they suffer the drawbacks of vergence accommodation conflict [1,2], distorted depth perception [3], and visual fatigue [4]. Several alternative technologies may potentially overcome these drawbacks, including holographic displays [5], volumetric displays [6–8], and multi-focal plane displays [9–16]. Among those technologies, multi-focal plane three-dimensional (3-D) displays leverage computational power, accuracy on depth perception, and conventional two-dimensional (2-D) display technologies.

Different from conventional stereoscopic displays where stereoscopic image pairs are rendered on 2-D flat surfaces at a single fixed focal distance from the eye, a multi-focal plane display consists of a series of carefully placed focal planes at discrete focal distances. These discrete focal planes sample a 3-D scene volume into multiple zones along the visual axis. Objects contained within a zone are rendered on a corresponding pair of adjacent focal planes. Multi-focal plane displays may be implemented either by spatially multiplexing a stack of 2-D displays [9–12] in the visual space or by fast switching the focal distance of a single 2-D display in synchronization with the frame rendering of multi-focal images [13–16].

Although the emergence of multi-focal plane displays have demonstrated promising potentials for reducing the visual artifacts to various extents [2,3,10–12,15,16], there have been relatively few systematic investigations on the optical design and image quality assessment of such displays. There are two fundamental issues that need to be addressed—(1) the optimal dioptric spacing between adjacent focal planes to produce acceptable image quality and depth resolution, which essentially determines the number of focal planes necessary for creating a multi-focal plane display to cover a given volumetric space; and (2) a method to render correct or near-correct focus cues for a continuous scene volume through a finite number of sparsely-spaced focal planes without creating noticeable depth aliasing effect and depth discontinuity. Several pioneering work addressed some of these issues from different aspects. For instance, in a theoretical work Rolland *et al.* suggested that the spacing between adjacent focal planes should be a constant dioptric distance of 1/7 diopter (D) and 14 focal planes are needed to cover the depth range from 0D to 2D [9]; Akeley *et al.* demonstrated a three-focal plane display prototype by optically combining three 2-D displays through beamsplitters which creates a constant dioptric separation of 0.67D for adjacent focal planes [10]; Liu and Hua presented a dual-focal plane display prototype with addressable focal distances throughout a volumetric space from 8D to 0D enabled by a liquid lens device [15]; Love *et al.* recently developed a four-focal plane prototype with discretely addressable focal planes enabled by birefringence lenses [16]; and Lee *et al.* demonstrated an immaterial two-focal plane display with the two image planes configured in an L-shape [12].

A large number of focal planes and small dioptric spacing are desirable for improving image quality and reducing perceptual effects in multi-focal plane displays. On the other hand, it is practically very challenging to achieve a large number of focal planes with the affordance of current technologies. In order to reduce the number of necessary focal planes to a manageable level, a depth-weighted blending technique may be implemented. Such method leads to a depth-fused 3-D (DFD) perception [11], where two overlapped images displayed on two transparent screens at two different depths may be perceived as a single-depth image. In addition, the luminance ratio between the two images may be modulated to change the perceived depth of the fused image. Hereafter multi-focal plane technologies that incorporate the DFD effect are referred to as depth-fused multi-focal plane displays, or DFD displays.

Due to the lack of a systematic quality metrics, in existing DFD displays, the choice of the spacing between adjacent focal planes differ significantly, ranging from 0.028D [11] to 0.67D [10]. Furthermore, a linear form of depth-weighted fusing function was mainly adopted in existing DFD prototypes [2,10,11,16], with few considerations of optimizing the function form to account for key ocular factors of the human visual system (HVS). Hoffman *et al.* [2] investigated the retinal image quality in a three-focal plane DFD display prototype. The contrast modulation of the retinal images was found to be affected by pupil size, ocular aberrations, and target spatial frequency.

In this paper, we present a systematic approach to address the two aforementioned fundamental issues in designing a depth-fused multi-focal plane display. Our method is based on quantitative evaluation of the modulation transfer functions (MTF) of the DFD images formed on the retina. It takes into account most of the ocular factors, such as pupil size, monochromatic and chromatic aberrations, diffraction, Stiles-Crawford effect (SCE) [17], and accommodation, and display factors, such as dioptric midpoint, dioptric spacing, depth filter, and spatial frequency of the target. Based on the MTFs of the retinal images of a DFD display and the depth of field (DOF) of the human visual system under photopic viewing conditions, we determine the optimal arrangement of the focal planes and characterize the depth-weighted fusing function between adjacent focal planes. In Section 2, we describe the generalized DFD display method. In Section 3, the simulation models and methods to predict the retinal image quality, or MTF, of a DFD display is described. In Section 4, we present the simulation results that investigated the effects of various display factors on retinal image quality and suggest guidelines for optimizing a DFD display design. Finally in Section 5, we simulate retinal images and MTFs of DFD displays using our proposed methods and compare to those generated by the prior methods.

## 2. Generalized DFD display method

Figure 1
illustrates the depth fusion concept of two images displayed on two adjacent focal planes separated by a dioptric distance of Δ*z*. The dioptric distance from the eye to the front focal plane is *z*
_{1} and to the rear plane is *z*
_{2}. When the images shown on the two-layer displays are aligned such that each pixel on the front and rear planes subtends the same visual angle to the eye, the front and back pixels (e.g. A and B, respectively) are then viewed as completely overlapped at the viewpoint and fused as a single pixel (e.g. C). The luminance of the fused pixel (*L*) is summed from the front and rear pixels (*L*
_{1} and *L*
_{2}, respectively), and the luminance distribution between the front and back pixels is weighted by the rendered depth *z* of the fused pixel. These relationships may be expressed as:

*w*

_{1}(

*z*) and

*w*

_{2}(

*z*) are the depth-weighted fusing functions modulating the luminance of the front and back focal planes, respectively. Typically

*w*

_{1}(

*z*) +

*w*

_{2}(

*z*) = 1 is enforced such that the luminance of the fused pixel is

*L*

_{1}when

*w*

_{1}(

*z*) = 1 and is

*L*

_{2}when

*w*

_{2}(

*z*) = 1. We hereafter assume the peak luminance of individual focal planes is normalized to be uniform, without considering system-specific optical losses potentially in some forms of multi-focal plane displays (e.g. in spatially multiplexed displays where light may be projected through a thick stack of display panels). Optical losses of a system should be characterized to normalize non-uniformity across the viewing volume before applying depth-weighted fusing functions.

Depth-fused 3-D perception effect indicates [10,11], as the depth-weighted fusing functions (*w*
_{1} and *w*
_{2}) change, the perceived depth $\widehat{z}$ of the fused pixel will change accordingly, formulated as:

For instance, when *w*
_{1}(*z*) = 1, the perceived depth should be *z*
_{1}, and vice versa is *z*
_{2} when *w*
_{2}(*z*) = 1.

In a generalized *n*-focal plane DFD system, the dioptric distances from the eye to the *n*-focal planes are denoted as *z*
_{1}, *z*
_{2}, ….*z*
_{n} in distance order where *z*
_{1} is the closest one to the eye. We assume that the 3-D scenes contained between a pair of adjacent focal planes are rendered only on this corresponding focal plane pair. Under this assumption, a given focal plane at *z*
_{i} will render all the 3-D scenes contained between the (*i*-1)^{th} and the (*i* + 1)^{th} focal planes. Within the depth range of *z*
_{i-1}≥*z*≥*z*
_{i + 1}, many scene points may be projected onto the same pixel of the *i*
^{th} focal plane, among which only the closest scene point to the eye is un-occluded and thus effectively determines the depth-weighted fusing function modulating the luminance of the specific pixel. The closest scene point corresponding to a specific pixel can typically be retrieved from the *z*-buffer in computer graphics renderer [18]. Let us assume the depth of the closest 3-D scene point projected onto a given pixel of the *i*
^{th} focal plane is *z*. Based on the depth-fused 3-D perception described above, the luminance of the 3-D point is distributed between the (*i*-1)^{th} and *i*
^{th} focal planes if *z*
_{i-1}≥*z*≥*z*
_{i}, otherwise between the *i*
^{th} and (*i* + 1)^{th} focal planes if *z*
_{i}≥*z*≥*z*
_{i + 1}. The luminance attribution to the *i*
^{th} focal plane is weighted by the depth *z*. It may be characterized by the ratio of the luminance attribution *L*
_{i}(*z*) on the i^{th} focal plane at *z*
_{i} to that of the total scene luminance *L*(*z*), written as *g*
_{i}(*z*) = *L*
_{i}(*z*)/*L*(*z*), where *L*(*z*) = *L*
_{i-1}(*z*) + *L*
_{i}(*z*) if *z*
_{i-1}≥*z*≥*z*
_{i} or *L*(*z*) *= L*
_{i}(*z*) *+ L*
_{i + 1}(*z*) if *z*
_{i}≥*z*≥*z*
_{i + 1}. In general, the depth-weighted fusing function, *w*
_{i}(*z*), of the *i*
^{th} focal plane can be defined as:

In summary, by knowing the rendered depth *z* of a 3-D virtual scene, the luminance levels of the multi-focal plane images can be modulated accordingly by the depth-weighted fusing functions in Eq. (3) to render pseudo-correct focus cues.

## 3. Models and methods for DFD quality assessment

In DFD displays, the adjacent focal planes are separated in space at a considerable distance. The retinal image quality apparently is expected to become worse when the eye is accommodated at a distance in between than when focusing on either the front or back focal planes. However, both the dioptric spacing between adjacent focal planes and the depth-weighted fusing functions can be selected such that the perceived depth of the fused pixel $\widehat{z}$ closely matches with the rendered depth *z* and the image quality degradation is minimally perceptible as the observer accommodates to different distances between the focal planes.

The optical quality of a fused pixel in DFD displays may be quantitatively measured by the point spread function (PSF) of the retinal image, or equivalently by the modulation transfer function (MTF) which is characterized by the ratio of the contrast modulation of the retinal image to that of a sinusoidal object on the 3-D display. Without loss of generality, hereafter a dual-focal plane display is assumed and the results can be extended to *n*-focal planes. Based on Eq. (1), when the eye is accommodated at the rendered distance *z*, the PSF of the fused pixel, *PSF*
_{12}, may be described as:

*PSF*

_{1}(

*z*,

*z*

_{1}) and

*PSF*

_{2}(

*z*,

*z*

_{2}) are the point spread functions of the front and back pixels, respectively, corresponding to the eye accommodated distance

*z*. The MTF of a DFD display can then be calculated via the Fourier Transform (FT) of the

*PSF*

_{12}and subsequently the FT of the

*PSF*

_{1}and

*PSF*

_{2}.

Many factors may affect the retinal image quality — *PSF*
_{12} and *MTF*
_{12} — of a DFD display. Table 1
categorizes those parameters, along with their notation and typical range, into two types: ocular and display factors. Ocular factors are mostly related to the human visual system when viewing DFD images from a viewer’s perspective. These variables, including pupil size, pupil apodization, reference wavelength, and accommodation state, need to be carefully considered when we model the eye optics. Display factors are related to the practical configurations of DFD displays, such as the covered depth range, dioptric midpoint of two adjacent focal planes to the eye, dioptric spacing between two adjacent focal planes, depth-weighted fusing functions, as well as the spatial frequency of the displayed target.

Instead of using observer- and display-specific measurements to evaluate the PSF and MTF of DFD displays [2,3,16], we adopt a schematic Arizona eye model to simulate and analyze the retinal image quality from simulated targets to derive generalizable results. In the fields of optical design and ophthalmology, various schematic eye models have been widely used to predict the performance of an optical system involved with human observers [19–21]. In this study, the Arizona eye model was set up in CODE V [22]. The Arizona eye model is designed to match clinical levels of aberration, both on- and off-axis fields, and can accommodate to different distances. Detailed parameters of the model can be found in [19,23]. The accommodative distance *z*, as shown in Fig. 1, determines the lens shape, conic constant and refractive index of the surfaces in the schematic eye. The distances of the front and back focal planes, *z*
_{1} and *z*
_{2} respectively, and their spacing *z* are varied to simulate different display configurations.

Ocular characteristics of the HVS, such as depth of field, pupil size, diffraction, Stiles-Crawford effect, monochromatic and chromatic aberrations, and accommodation, play important roles on the perceived image quality of DFD displays. Although previous literatures [2] have investigated the image quality dependence upon pupil size, high-order aberration, and accommodation, the treatment to the aforementioned factors lack generality to average observers and to a full-color DFD display with different display configurations. For instance, only monochromatic aberrations specific to one user’s eye were considered and a linear depth-weighted fusing function was assumed.

In order to accurately simulate PSF/MTF of the retinal images in DFD displays, we firstly examined the dependence of the polychromatic MTF of a fused pixel upon eye pupil diameter while fixing other ocular and display factors. Particularly, we examined the MTFs under the condition that the luminance of a rendered pixel is equally distributed between the front and back focal planes separated by 0.5D and the eye is accommodated at the midpoint between the two focal planes. The midpoint is generally expected to have the worst retinal image quality for a fused pixel. Assuming the same pupil size, we further compared the MTFs of the fused pixel against that of a real pixel that is physically placed at the dioptric midpoint of the two focal planes. For pupil diameters no larger than 4mm, we found the MTF differences of the fused pixel from a real pixel at the same distance is acceptable for spatial frequencies below 20cpd while a considerable degradation is observed for larger pupils. Therefore, we set the eye pupil diameter of the eye model to be 4mm, which in fact corresponds well to the pupil size when viewing typical HMD-like displays. Secondly, in order to account for the directional sensitivity of photoreceptors on the human retina, which commonly refers to the Stiles-Crawford effect (SCE) [17], a Gaussian apodization filter was applied to the entrance pupil with an amplitude transmittance coefficient of *β* = −0.116*mm*
^{−2} [23,24]. Consequently, SCE may induce a slightly contracted effective pupil, and thus reduce spherical aberration and improve MTF. Furthermore, the image source in the model was set up with polychromatic wavelengths, including F, d, and C components as listed in Table 1, to simulate a full-color DFD display. To compensate the longitudinal chromatic aberration (LCA) which commonly exists in human eyes [23,25], we inserted a zero optical power achromat at 15mm away from the cornea vertex with the LCA opposite to the Arizona eye model. Similar approach had been used in subjective measurement of DOF of the human eye [26]. In practical DFD designs, instead of inserting an achromat directly in front of the eye, the display optics may be optimized to have equivalent chromatic aberration to compensate the LCA of the visual system. Finally, the effect of diffraction is accounted in the modeling software—CODE V while simulating PSFs, and the effect of accommodation will be discussed with the depth filters in the next section.

Based on the model setup described above, for a given eye accommodation status and display settings, *PSF*
_{1}(*z*,*z*
_{1}) and *PSF*
_{2}(*z*,*z*
_{2}) for an on-axis point source are simulated separately in CODE V. Using the relationship in Eq. (3), a series of *PSF*
_{12}(*z*) are computed by varying *w*
_{1} from 1 to 0, which corresponds to varying the rendered depth *z* from *z*
_{1} to *z*
_{2}. The corresponding *MTF*
_{12}(*z*) of the DFD display is derived by taking the FT of *PSF*
_{12}.

To evaluate the retinal image quality of a depth-fused pixel against a physical pixel placed at the same distance, we further simulate the PSF of a real point source placed at distance *z*, *PSF*
_{ideal}(*z*), and compute the corresponding *MTF*
_{ideal}(*z*). The degradation of *MTF*
_{12}(*z*) from *MTF*
_{ideal} (*z*) is expected to vary with the dioptric spacing of the two adjacent focal planes, rendered depth *z*, as well as eye-specific parameters. Through comprehensive analysis of the retinal image quality of DFD displays, threshold values are established to ensure the degradation from a real display condition is minimally perceptible to average observers. Optimal depth-weighted fusing functions are then obtained.

## 4. Optimal DFD designs

As mentioned earlier, a fused pixel that is rendered to be at the dioptric midpoint of two adjacent focal planes is expected to have the worst retinal image quality compared with other points between the focal planes [2,16]. Therefore, in the analysis hereafter, we use the retinal image quality of a fused pixel rendered at the midpoint of two adjacent focal planes as a criterion to determine appropriate settings for display designs.

#### 4.1 Optimal dioptric spacing

In this study, the overall focal range of a DFD display covers the depth varying from 3D (*z*
_{1}) to 0D (*z*
_{n}). Within this range, we further assume a constant dioptric spacing between two adjacent focal planes (e.g. *z*
_{i} and *z*
_{i + 1}) independent of the dioptric midpoint of the focal plane pair relative to the eye noted as *z*
_{i,i + 1} = (*z*
_{i} + *z*
_{i + 1})/2 in Table 1. Using the simulation method described in Section 3, we validated this assumption by examining the dependence of the MTF of a fused pixel at the midpoint of two focal planes upon the dioptic distance of the midpoint to the eye whiling fixing other ocular and display factors (i.e. *w*
_{1} = *w*
_{2} = 0.5, Δ*z* = 0.5D, *z* = *z*
_{i,i + 1}). As expected the MTF of a fused pixel at the midpoint varies as the midpoint gets closer to the eye due to the fact that the ocular aberrations are highly correlated to accommodation [27,28]. However, the average variation is less than 15% for spatial frequencies below 20cpd for *z*
_{i,i + 1} within the 0D~3D range.

Under these assumptions, it is feasible to study the effect of dioptric spacing on DFD displays by setting the midpoint of a pair of adjacent focal planes at an arbitrary position within the depth range without loss of generality. We thus chose 1D as the midpoint of a focal plane pair and varied their dioptric spacing Δ*z* from 0.2D to 1D at an interval of 0.2D. For each dioptric spacing condition, the MTF of a fused pixel at the dioptric midpoint (i.e. *MTF*
_{12} (z = *z*
_{i,i + 1})) of the two focal planes was calculated with the assumption that the luminance level is evenly divided between front and back focal planes. Figure 2 (a)
plots the results corresponding to different dioptric spacings. For comparison, on the same figure we also plotted *MTF*
_{ideal}, which corresponds to the MTF of a real pixel placed at the midpoint, and the *MTF*
_{+0.3D} and *MTF*
_{-0.3D}, which correspond to the MTF of the eye model with + 0.3D and −0.3D defocus from the midpoint focus, respectively. It is worth noting that ± 0.3D defocus was chosen to match the commonly accepted DOF of the human eye [25,26,29]. As expected, *MTF*
_{12} consistently degrades with the increase of the spacing of the focal planes. However, when Δ*z* is no larger than 0.6D, *MTF*
_{12} falls within the region enclosed by *MTF*
_{ideal} (green dashed line) and the ± 0.3D defocused MTFs (the overlapped blue and red dashed lines). The results indicate that the DOF of the human eye under photopic viewing conditions can be selected as the threshold value of the dioptric spacing in multi-focal plane display designs, which ensures the degradation of the retinal image quality of a DFD display from an ideal display condition is minimally perceptible to average observers. If better retinal image quality is required for certain applications, a smaller Δ*z* may be used but at the expenses of adding more focal planes. For instance, if Δ*z =* 0.6D is selected, 6 focal planes would be sufficient to cover the depth range from 3.0D to 0D, while 9 focal planes would be necessary to cover the same range if Δ*z =* 0.4D is selected. At the medium spatial frequency of 20cpd, the improvement of *MTF*
_{12} at the dioptric midpoint from 0.1 to 0.4 may be worth of the cost for demanding applications.

#### 4.2 Optimal depth-weighted fusing function

By setting a dioptric spacing of Δ*z* = 0.6D and a dioptric midpoint of *z*
_{12} = 1D from the eye, we further examined the MTF of a fused pixel while incrementally varying the eye accommodation distance from the front focal plane (*z*
_{1} = 1.3D) to the back focal plane (*z*
_{2} = 0.7D) at an increment of 0.1D, as shown in Fig. 2 (b). As expected, an accommodation distance at the dioptric midpoint (*z* = *z*
_{12} = 1D) would maximize the MTF of the fused pixel, while shifting the accommodation distance toward either front or back focal planes will always decrease the MTF. For instance, the MTF values for a target spatial frequency of 10cpd is reduced from 0.6 when *z* = 1D to nearly 0.when *z* = 1.3D or *z* = 0.7D. Many studies have investigated the effects of stimulus contrast and contrast gradient on eye accommodation in viewing real-world scenes [30]. Evidences from these studies suggested that the accommodative response attempts to maximize the contrast of the foveal retinal image and the contrast gradient helps stabilize the accommodation fluctuation of the eye on the target of interest. Therefore, we may conclude that pseudo-correct focus cues can be generated at the dioptric midpoint by applying an appropriate depth fusing filter even without the existence of a real focal plane.

To further demonstrate the pseudo-correct focus cues created through DFD displays, we set a dual-focal plane display model to be the same as used in the previous paragraph (i.e. *z*
_{12} = 1D, and Δ*z =* 0.6D). We simulated multiple retinal images of a Snellen E target by convolving the target with the *PSF*
_{12}(*z*) defined in Eq. (3) while the luminance of the target is evenly divided between the two focal planes (i.e. *w*
_{1} = *w*
_{2} = 0.5). Thus the fused target is expected to appear at the dioptric midpoint of the two focal planes. In Fig. 3
, the left to right columns correspond to the eye accommodation distances of *z* = 1.3, 1, 0.7D, respectively; while the top to bottom rows correspond to the target spatial frequencies of *v* = 2, 5, 10, 30cpd, respectively. As predicted by the results in Fig. 2 (b), the retinal image contrast is higher when the eye is focused at *z* = 1D rather than at either *z* = *z*
_{1} = 1.3D or *z* = *z*
_{2} = 0.7D. Meanwhile, at the same accommodation distance, the retinal image contrast clearly depends on the spatial frequency of the target where the targets with lower spatial frequencies (e.g. 2, 4, and 10 cpds) have better image contrast than the higher frequencies (e.g. *v* = 30cpd).

To derive the dependence of the rendered accommodation cue on the depth-weighted fusing function as described in Eq. (2), we extended the MTF simulation shown in Fig. 2 (b) by incrementally varying *w*
_{1} from 1 to 0 at an increment of 0.01 while having *w*
_{2} = 1-*w*
_{1}. For each *w*
_{1} increment, we simulated the *MTF*
_{12} of a fused pixel while incrementally varying the eye accommodation distance from the front focal plane (*z*
_{1} = 1.3D) to the back focal plane (*z*
_{2} = 0.7D) at an increment of 0.02D. We selected the accommodation distance that maximizes the *MTF*
_{12} to be the rendered accommodation cue corresponding to the given depth-weighted fusing factor (*w*
_{1}) of the front focal plane. The accumulated results yield the optimal depth weighted luminance (*L*
_{1} and *L*
_{2}) of the front and back focal planes to the luminance of the fused target (*L*) as a function of the accommodation distance (*z*) for a focal plane pair. We may further extend the method described above to more than two focal planes covering a much larger depth range. As an example, we chose a 6 focal plane DFD display design covering a depth range from 3D to 0D. By assuming a 0.6D dioptric spacing, 6 focal planes were placed at 3D (*z*
_{1}), 2.4D (*z*
_{2}), 1.8D (*z*
_{3}), 1.2D (*z*
_{4}), 0.6D (*z*
_{5}), and 0D (*z*
_{6}) respectively. In this display configuration, we repeated the above-described simulations independently to each adjacent pair of focal planes. The black solid curves in Fig. 4
plotted the luminance ratio *g*
_{i} = *L*
_{i}/*L* (*i* = 1,2,3,4,5) of the front focal plane in each focal plane pair of (*i, i* + 1) as a function of the rendered accommodation cue *z.* On the same figure we also plotted a typical box filter (blue dashed curves), which corresponds to multi-focal plane displays without applying depth-weighted fusing method [9], and a linear depth-weighted filter (green dashed curves) as proposed in [10,11]. It is clear that the fusing functions based on the maximal *MTF*
_{12} values demonstrated noticeable nonlinearity. As mentioned in Section 3, since the retinal image quality is affected by defocus, we tentatively attribute the nonlinearity effect to the nonlinear degradation of the retinal image quality with defocus.

Based on the simulated results shown in Fig. 4, a periodical function *g*
_{i}(*z*) can be used to describe the dependence of the luminance ratio of the front focal plane in a given pair of focal planes upon the scene depth:

*z*’

_{i,i + 1}represents the pseudo-correct accommodation cue rendered by a luminance ratio of

*g*

_{i}(

*z*=

*z*’

_{i,i + 1}) = 0.5 and Δ

*z*’ characterizes the nonlinearity of

*g*

_{i}(

*z*). Ideally,

*z*’

_{i,i + 1}should be equal to the dioptric midpoint

*z*

_{i,i + 1}. Table 2 lists detailed parameters of

*g*

_{i}(

*z*) for the 6-focal plane DFD display. As the distance of the focal planes from the eye increases from 2.7D to 0.3D, we found the difference between

*z*

_{i,i + 1}and

*z*’

_{i,i + 1}increased from −0.013D to + 0.024D. The slight mismatch between

*z*’

_{i i,i + 1}and

*z*

_{i,i + 1}may be attributed to the dependence of spherical aberration on eye accommodation distances. The nonlinear fittings of the luminance ratio functions were plotted as red dashed curves in Fig. 4 with a correlation coefficient of 0.985 to the simulated black curves. The depth-weighted fusing function

*w*

_{i}, as defined in Eq. (3), for each focal plane of an N-focal plane DFD display can then be obtained.

## 5. Retinal image qauality analysis

Figure 5
demonstrated the simulated retinal images of a 3-D scene through a 6-focal plane DFD display with depth-weighted nonlinear fusing functions given in Eq. (5), as well as with the box and linear filters shown in Fig. 4. The six focal planes were placed at 3, 2.4, 1.8, 1.2, 0.6, 0D, and the accommodation of the observer’s eye was assumed at 0.5D. The 3-D scene consists of a planar object extending from 3D to 0.5D at a slanted angle relative to the z-axis (depth-axis) and a green grid as ground plane spanning the same depth range. The planar object is textured with a Sinusoidal grating subtending a spatial frequency of 1.5~9cpd from its left (front) to right (back) ends. The entire scene subtends a FOV of 14.2x10.7 degrees. The simulation of the DFD images takes five steps. We first render a regular 2-D perspective image of a 3-D scene using computer graphics rendering techniques. A 2-D depth map (Fig. 5a) in the same size as that of the 2-D perspective image is then generated by retrieving the depth (*z*) of each rendered pixel from the *z*-buffer in OpenGL shaders [18]. Next, a set of six depth-weighted maps is generated, one for each of the focal planes by applying the depth-weighted filtering functions in Eq. (5) to the 2-D depth map. In the fourth step, we then render six focal plane images by individually applying each of the depth-weighted maps to the 2-D perspective image rendered in the first step through an alpha-blending technique. Finally, the 6-focal plane images are convolved with the corresponding PSFs of the eye determined by the specific accommodation distance (*z* = 0.5D) and the focal plane distances, and the resultant retinal image is then obtained by summing up the convolved images. Figures 5 (b), (c) and (d) show the simulated retinal images of the DFD display by employing a box [10], linear [10,11], and nonlinear depth-weighted fusing function, respectively. As expected, the 3-D scene rendered by the box filter (Fig. 5b) indicates strong depth discontinuity effect around the midpoint of two adjacent focal planes, while those rendered by linear and nonlinear filters show smoothly rendered depths. In term of the contrast of these images, although we expect that the non-linear filters yielded higher image contrast in general than the linear filters, the contrast differences are barely visible by solely comparing Fig. 5 (c) and (d), partially due to the low spatial frequency of the grating target.

In order to quantitatively evaluate the retinal image quality differences between the linear and nonlinear fusing functions, we further evaluated the MTFs of the retinal images simulated with the method described in Section 3. A dual-focal plane display configuration, with *z*
_{1} = 1.8D and *z*
_{2} = 1.2D, was assumed in the simulation without loss of generality. The eye accommodation distance *z* was varied from *z*
_{1} to *z*
_{2} at an interval of 0.1D. For each eye accommodation distance, Fig. 6
plotted the MTFs of the retinal images simulated with the linear (green circle) and nonlinear (red square) depth-weighted fusing functions. As shown in Figs. 6 (a), (d), and (g), when the accommodation distance is at *z*
_{1}, *z*
_{2}, or *z*
_{12}, the MTFs of using the linear depth filter is nearly identical to that of using the nonlinear filters; while at all other accommodation distances, the MTFs of using the nonlinear filter is consistently better than that of using the linear filter indicated by Figs. 6 (b), (c), (e), and (f). More interestingly, even though previous literatures [2] have assumed the worst image quality occurs at the dioptric midpoint by employing a linear depth filter, our quantitative analysis shows this assumption indeed is not supported by a linear filter while being true for the nonlinear filter. For instance the green-colored MTF in Fig. 6 (b) (as *z* = 1.7D) is even worse than that in Fig. 6 (d) (as *z* = *z*
_{12} = 1.5D). In summary, the nonlinear depth-weighted fusing functions shown in Fig. 4 can produce better retinal image quality compared to a linear filter. In consequence it may better approximate the real 3-D viewing condition and further improve the accuracy on depth perception.

## 6. Conclusion

We presented a systematic method to address two fundamental issues in designing a multi-focal plane display: (1) the appropriate dioptric spacing between adjacent focal planes; and (2) the depth-weighted fusing function to render a continuous 3-D volume. By taking account of both ocular and display factors, we determined the optimal spacing between two adjacent focal planes to be ~0.6D to ensure the MTF of a fused pixel at the dioptric midpoint to be comparable to the DOF effect of the HVS on the MTF of a real pixel at the same distance under photopic viewing conditions. We further characterized the optimal form of a set of depth-weighted fusing functions as a function of rendered accommodation cues. Based on simulation results, the proposed nonlinear form of depth filters appears to be better than a box filter in term of improved depth continuity, and better than a linear filter in term of retinal image contrast modulation. Although our method, as discussed in Section 2, does not take into account some other ocular factors such as scattering on the retina and psychophysical factors such as the neuron response, it provides a systematic framework that can objectively predict the optical quality and guide the design of DFD displays. Further validation of the proposed DFD design methods requires a supplemental approach where the optical quality of a DFD display is evaluated through objective or subjective tasks with real observers in the loop. This approach requires a DFD display platform that offers the flexibility to reconfiguring the key display parameters such as focal plane placements and requires careful control of experimental conditions to maintain generality of the measurements. Due to the slow response speed of the microdisplay and active optical elements, our previous prototypes [15,31] were unable to render as many as 6 focal planes at an acceptable refresh rate without visible flickering. In the future, we plan to build a 6-focal plane DFD display with faster microdisplay and active optical components and carry out further experiments to validate our methods.

## Acknowledgements

This research was funded by National Science Foundation (NSF) grant award 09-15035. We thank Professor Martin Banks and David Hoffman for the discussions on their previous paper. We also thank Craig Pansing for providing the I/O macro for CODE V.

## References and links

**1. **J. P. Wann, S. Rushton, and M. Mon-Williams, “Natural problems for stereoscopic depth perception in virtual environments,” Vision Res. **35**(19), 2731–2736 (1995). [CrossRef] [PubMed]

**2. **D. M. Hoffman, A. R. Girshick, K. Akeley, and M. S. Banks, “Vergence-accommodation conflicts hinder visual performance and cause visual fatigue,” J. Vis. **8**(3), 1–30 (2008). [CrossRef] [PubMed]

**3. **S. J. Watt, K. Akeley, M. O. Ernst, and M. S. Banks, “Focus cues affect perceived depth,” J. Vis. **5**(10), 834–862 (2005). [CrossRef]

**4. **M. Mon-Williams, J. P. Warm, and S. Rushton, “Binocular vision in a virtual world: visual deficits following the wearing of a head-mounted display,” Ophthalmic Physiol. Opt. **13**(4), 387–391 (1993). [CrossRef] [PubMed]

**5. **J. F. Heanue, M. C. Bashaw, and L. Hesselink, “Volume holographic storage and retrieval of digital data,” Science **265**(5173), 749–752 (1994). [CrossRef] [PubMed]

**6. **G. E. Favalora, J. Napoli, D. M. Hall, R. K. Dorval, M. G. Giovinco, M. J. Richmond, and W. S. Chun, “100 million-voxel volumetric display,” Proc. SPIE **4712**, 300–312 (2002). [CrossRef]

**7. **A. Sullivan, “A solid-state multi-planar volumetric display,” SID Symposium Digest of Technical Papers **34**, 1531–1533 (2003).

**8. **A. Jones, I. McDowall, H. Yamada, M. Bolas, and P. Debevec, “Rendering for an interactive 360° light field display,” ACM Trans. Graph. **26**, 40–1–40–10 (2007).

**9. **J. P. Rolland, M. W. Krueger, and A. Goon, “Multifocal planes head-mounted displays,” Appl. Opt. **39**(19), 3209–3215 (2000). [CrossRef]

**10. **K. Akeley, S. J. Watt, A. R. Girshick, and M. S. Banks, “A stereo display prototype with multiple focal distances,” ACM Trans. Graph. **23**(3), 804–813 (2004). [CrossRef]

**11. **S. Suyama, S. Ohtsuka, H. Takada, K. Uehira, and S. Sakai, “Apparent 3-D image perceived from luminance-modulated two 2-D images displayed at different depths,” Vision Res. **44**(8), 785–793 (2004). [CrossRef] [PubMed]

**12. **C. Lee, S. Diverdi, and T. Höllerer, “Depth-fused 3D imagery on an immaterial display,” IEEE Trans. Vis. Comput. Graph. **15**(1), 20–33 (2009). [CrossRef]

**13. **S. Suyama, M. Date, and H. Takada, “Three-dimensional display system with dual frequency liquid crystal varifocal lens,” Jpn. J. Appl. Phys. **39**(Part 1, No. 2A), 480–484 (2000). [CrossRef]

**14. **B. T. Schowengerdt and E. J. Seibel, “True 3-D scanned voxel dis-plays using single or multiple light sources,” J. Soc. Inf. Disp. **14**(2), 135–143 (2006). [CrossRef]

**15. **S. Liu and H. Hua, “Time-multiplexed dual-focal plane head-mounted display with a liquid lens,” Opt. Lett. **34**(11), 1642–1644 (2009). [CrossRef] [PubMed]

**16. **G. D. Love, D. M. Hoffman, P. J. W. Hands, J. Gao, A. K. Kirby, and M. S. Banks, “High-speed switchable lens enables the development of a volumetric stereoscopic display,” Opt. Express **17**(18), 15716–15725 (2009), http://www.opticsinfobase.org/abstract.cfm?URI=oe-17-18-15716. [CrossRef] [PubMed]

**17. **W. S. Stiles and B. H. Crawford, “The luminous efficiency of rays entering the eye pupil at different points,” Proc. R. Soc. Lond., B **112**(778), 428–450 (1933). [CrossRef]

**18. **G. Riguer, N. Tatarchuk, and J. Isidoro, *ShaderX2: Shader Programming Tips and Tricks with DirectX 9*, (Wordware, 2003).

**19. **J. E. Greivenkamp, J. Schwiegerling, J. M. Miller, and M. D. Mellinger, “Visual acuity modeling using optical raytracing of schematic eyes,” Am. J. Ophthalmol. **120**(2), 227–240 (1995). [PubMed]

**20. **Y. L. Chen, B. Tan, and J. W. L. Lewis, “Simulation of eccentric photorefraction images,” Opt. Express **11**(14), 1628–1642 (2003), http://www.opticsinfobase.org/oe/abstract.cfm?URI=oe-11-14-1628. [CrossRef] [PubMed]

**21. **H. Hua, C. W. Pansing, and J. P. Rolland, “Modeling of an eye-imaging system for optimizing illumination schemes in an eye-tracked head-mounted display,” Appl. Opt. **46**(31), 7757–7770 (2007). [CrossRef] [PubMed]

**22. **http://www.opticalres.com.

**23. **J. Schwiegerling, *Field Guide to Visual and Ophthalmic Optics* (SPIE Press, 2004).

**24. **R. A. Applegate and V. Lakshminarayanan, “Parametric representation of Stiles-Crawford functions: normal variation of peak location and directionality,” J. Opt. Soc. Am. A **10**(7), 1611–1623 (1993). [CrossRef] [PubMed]

**25. **D.A. Atchison, and G. Smith, *Optics of the Human Eye* (Oxford 2000).

**26. **F. W. Campbell, “The depth of field of the human eye,” J. Mod. Opt. **4**(4), 157–164 (1957).

**27. **D. A. Atchison, M. J. Collins, C. F. Wildsoet, J. Christensen, and M. D. Waterworth, “Measurement of monochromatic ocular aberrations of human eyes as a function of accommodation by the Howland aberroscope technique,” Vision Res. **35**(3), 313–323 (1995). [CrossRef] [PubMed]

**28. **H. Cheng, J. K. Barnett, A. S. Vilupuru, J. D. Marsack, S. Kasthurirangan, R. A. Applegate, and A. Roorda, “A population study on changes in wave aberrations with accommodation,” J. Vis. **4**(4), 272–280 (2004). [CrossRef] [PubMed]

**29. **K. N. Ogle and J. T. Schwartz, “Depth of focus of the human eye,” J. Opt. Soc. Am. **49**(3), 273–280 (1959). [CrossRef] [PubMed]

**30. **P. A. Ward, “The effect of stimulus contrast on the accommodation response,” Opththal. Physiol. Opt. **7**(1), 9–15 (1987). [CrossRef]

**31. **S. Liu, H. Hua, and D. W. Cheng, “A novel prototype for an optical see-through head-mounted display with addressable focus cues,” IEEE Trans. Vis. Comput. Graph. **16**(3), 381–393 (2010). [CrossRef] [PubMed]