Optimal allocation of quantized human eye depth perception for multi-focal 3D display design

Alireza Aghasi; Barmak Heshmat; Leihao Wei; Moqian Tian

doi:10.1364/OE.412373

1. Introduction

There has been a lot of traction toward more immersive lightfield and autostereoscopic experiences due to advancement in electronics and micro fabrications. Unlike stereoscopic 3D where the 3D perception is created based only on the left-eye, right-eye disparity with fixed accommodation for eye lens, a richer lightfield experience manipulates the divergence of the beam of the light going to each eye with finer angular fidelity across each eye’s pupil to create depth perception via accommodation monocularly. This reduces or eliminates the well known accommodation-vergence mismatch [1,2] and therefore the eye stress [3,4]. There has been a push and many breakthroughs for realizing ever more realistic lightfield experiences in optics, graphics, and display community [5–17]. The major categories of methods for creating such experiences are geometrical [5,6], computational [7–9], multi-focal [10–13], phase based holographic [14,15], and multi-angular [16,17]. Each method has its own weaknesses and advantages. For example, super multi-view (SMV) provides lightfield at very compact form factor but is limited to very small viewing zone (also known as eyebox for head mounted displays) and low resolution [16,17], computational methods increase the resolution but come with haze and temporal flickering artifacts [7–9], and holographic methods mostly struggle with color nonuniformity and fringing or speckle artifacts [14,15]. The multi-focal method provides a clean image but requires large volume and gives a sense of depth discontinuity if its not combined with parallax [10–13].

Despite all the research that has been going toward realizing richer lightfield experiences both in industry and academia [1–17], since the majority of lightfield modalities have been rather recent, there is no clear notion of what is the quantized limit of human eye saturation in depth, and what are even elementary lightfield depth levels that should be created in a discrete quantized manner to provide the richest experience with current band limited solutions [10,18]. What is the equivalent digital resolution of the human eye in depth both with binocular and monocular view? Where are these depth levels located in 3D space? Which distances are most important if the system is band limited and cannot render all the depths? These are fundamental questions with major impact on emerging virtual reality (VR) and augmented reality (AR) industry as well as upcoming lightfield displays.

Some of these questions are pondered over for decades in psychophysics of human perception and neuroscience literature with completely different approaches [19–33]. For example, there has been numerous studies on depth perception in specific age groups [25,26], stereoscopic depth acuity [19,20], monocular depth of field variations [24,30], contrast perception [22], spatial acuity [21], accommodation optical characterization [23,27–33], and color perception of the eye [34] and the relations between all these different parameters.

While these studies are very informative, psychophysical studies are human centric with no specific interest in digital augmentation or quantization of human perception with emerging display technologies [33]. For example while there are scattered studies on human eye depth of field for different age groups and pupil conditions [24–26], there is no accurate or scientific guidelines or technical tool to design a VR or AR display that would satisfy such perception acuity [10,18]. A specific example of this gap between these two research communities is the case for eye temporal response, which for a long time [35] was considered to be around 60 Hz by psychophysics literature, but was rather contradicted by display manufacturers as they realized that the customers, specifically those at the gaming and computer graphics sector, were demanding 120-240 Hz frame rates. This contradiction was resolved when new studies found that the eye temporal response is very much color and contrast dependent [36].

In this study we find such design guidelines for depth perception by developing a mathematically efficient optimization framework which uses psychophysical theories. The optimization framework that we will put forth is general and can operate with any set of potential depth of field (DoF) profiles at an arbitrary spatial frequency. Given a dense set of potential DoF profiles, our program allows maximally covering the scope of vision with any $T$ number of profiles, where $T$ is a positive integer. The covering is performed in a way that on average the highest visual quality is achieved. In its conventional form the underlying optimization is intractable, however using ideas from hypographs and shape composition, we manage to present an equivalent formulation with an efficient convex relaxation.

We explore the depth perception quantization to understand what can be the equivalent depth resolution of the eye from display design perspective. We then explore and discuss the priority of these depth levels for a band limited system impacted by user statistics and display parameters. Finally, we briefly discuss the experimental method that may be applied to validate these guidelines. These guidelines can be implemented and mapped into different parameters in a broad range of lightfield displays such as super multiview [5,13], high density directional [37], and depth filtering multi-focal [2,3,38,39] lightfield displays both in head mounted or far standing, augmented or virtual reality modalities. They may also be adapted for design of lightfield cameras and eliminating redundant acquisition of depth information [40,41].

In some display technologies, such as super multi-view (SMV) displays, the accommodation is triggered by providing angular variation to the light that is entering each pupil, while in other embodiments the monocular depth is set by providing discrete focal planes. In this study our focus is on the latter as the divergence of the beam of light has a direct one-to-one relation with depth, whereas in SMV it goes through yet another angular subsampling to mimic the original intended monocular depth. So the method still applies to SMV design but since SMV does not provide direct focal planes, the designer has to consider the relation between the intended monocular depth and the angular distribution that needs to be shone on each eye pupil diameter in variable brightness levels.

Mathematical Notations: For our technical discussions, we use lowercase and uppercase boldface for vectors and matrices, respectively. Given two vectors $\boldsymbol {a}, \boldsymbol {b}\in \mathbb {R}^{n}$ with elements $a_i$ and $b_i$, the notation $\boldsymbol {a}\leq \boldsymbol {b}$ indicates that $a_i\leq b_i$, for $i=1,2,\ldots ,n$. For a more concise representation, given an integer $n$, we use $[n]$ to denote the set $\{1,2,\ldots ,n\}$. Given a function $f:\mathbb {R}^{n}\to \mathbb {R}$, $\texttt {supp}~\! f$ denotes the support of the function, that is, $\texttt {supp}~\! f = \{\boldsymbol {x}\in \mathbb {R}^{n}: f(\boldsymbol {x})\neq 0\}$. Also, $[f(\boldsymbol {x})]^{+}$ and $[f(\boldsymbol {x})]^{-}$ represent the positive and negative thresholding of $f(\boldsymbol {x})$, i.e., $[f(\boldsymbol {x})]^{+} = \max (f(\boldsymbol {x}),0)$, and $[f(\boldsymbol {x})]^{-} = \min (f(\boldsymbol {x}),0)$.

2. Problem statement: monocular depth resolution

There are dozens of stereoscopic and monocular depth cues that contribute to the human depth perception [32], perspective, depth from motion, occlusion, shadow stereopsis, etc. Here for simplicity and practicality we only focus on optics of two major depth cues that is crystalline lens accommodation and vergence of two eyes, and assume that the rest are perception based and solely neurological. We frame our mathematical model on monocular depth levels and later slightly cover stereoscopic depth in the discussion section. The main idea is to quantize and prioritize a discrete level of depths that the display can provide in such a way to minimize the depth perception error for an arbitrary image with varying contrast. For this purpose, we first must have a rough estimate or idea about the maximum or total monocular resolution measured in the lab. The assumption is that the perceived continuum of depth versus distance from the eye can be decomposed into an overlap of a set of fixed sparse DoF profiles. A fair assumption is that the distance between the peaks of these DoF profiles should not be smaller than minimum DoF measured in former experimental studies because otherwise the human eye would not be able to discern the two depth from each other. The DoF is assumed to be measured in highest acuity at the foveal region; this helps us put an upper bound on the number of profiles in the set. In reality, the DoF profile is a three-dimensional function impacted by numerous parameters as it will be further explored in the following sections. The assumptions made here help to find the optimal number of depth levels needed to reach a desired accommodation error for images with maximum discernable spatial frequency (narrowest DoF). If the spatial frequency of the image reduces, the DoF is elongated and the optimization can be repeated to find the new optimal locations.

The accommodation capability of individual eyes provides the monocular depth perception. If the largest diopter range reported in the literature for very young eye (15 diopter [25,26]) is divided by the shallowest depth of field reported in the literature (0.15 D FWHM [24]) then the 100 depth levels are the absolute maximum number that human eye at age of 10 can distinguish with 6 mm pupil size in the dark environment. However, if one assumes an average of 6 D range for adults with 0.15 D depth of field, this maximum number of depth levels reduces rapidly to 40 levels. It is noteworthy to mention that the depth of field is a diopter range that depends on many other parameters, such as image intensity and defocus, wavelength, eye point spread function, and/or modulation transfer function (MTF) at different spatial frequencies which all also vary with eye pupil diameter and age. A common way to measure the DoF at a given eye pupil diameter is to record the image intensity of a delta function and look at the maximum intensities and threshold at a desired number of the normalized profile based on an agreement. The maximum intensities tend to have a Gaussian-like profile based on [24]. This type of profile is also noticed for MTF at a given cycles-per-degree. Accommodation range significantly varies with age [26]; therefore, it is essential to consider these two parameters in laying out the physical localizations of these depth levels. Here we use an iterative method to localize the depth levels up to 10 meters.

The DoF for 2 mm, 4 mm and 6 mm pupil diameter has been previously studied in [24]. Anderson et al. [26] has used the objective method to measure the accommodative amplitude in a wide age range of individuals, and has given a sigmoidal function fit to the measured data. The functions were used to find the max accommodative amplitude. For quantization of monocular depth levels we started at the nearest focal plane given by this max accommodation and iteratively found the next focal plane at a step of DoF from [24]. The iteration stops when the focal plane distance is larger than 10 m (Fig. 1(a) and (b)). This figure shows that as one grows older, the distance to the nearest focal plane becomes larger and the number of total focal planes decreases; for example, one is able to distinguish 13 focal planes at age of 10 but can distinguish only 2 quantized focal planes at age of 50. For depth larger than 10 m the depth levels become exponentially sparse based on this iterative quantization. Additionally, experimental studies confirm that after only 8 m the eye reaches its monocular infinity in all pupil sizes and starts to show accumulative negative error even more severely than stereoscopic depth perception, which will be further discussed in the discussion section [42]. If one uses the depth of field profile measurement in [24] as a convolving profile at a given depth in the iterative method, then a continuum of DoF profiles with distance can be calculated as in Fig. 1(c). Here the vertical axis is the normalized maximum intensity profile of a delta function seen at each distance enforced by the iteration steps.

Fig. 1. Quantized monocular distinguishable focal planes with age and pupil diameter variations. Focal plane distances are within 10 meters. Colorbar shows the corresponding total eye diopter (60 D from a relaxed eye, plus accommodation power). (a): Pupil size equals 2 mm. (b): Pupil size is 6 mm. (c): The depth of field profiles for a pupil diameter 6 mm and age 30.

Download Full Size | PDF

Now that we have found a rough estimate of these monocular depth levels using such iterative approach, given the statistics of eye diopters across different ages [26] and based on daily task operations [43], we would want to find the priority of allocating depth levels in an optimal fashion. This is significant in the design of 3D displays as the bandwidth is limited and only a limited number of monocular depth levels can be rendered. The model has to be universally applicable to any type of displays that provides monocular depth; therefore, the methodology that we put forth is general.

To explain the problem mathematically, in Fig. 2(a) we have shown a train of 30 functions (symbolically representing a continuum of DoF profiles), where the goal is selecting $T=4$ functions such that the union of areas under them maximally covers the space (representing the design of a display which restricts the number of depth levels to $T=4$). Figure 2(b) shows the optimal selection of the functions, whereas Figure 2(c) shows an alternative selection which covers a significantly smaller area. As will be discussed in the next section, this is a challenging combinatorial problem, solving which would be the focus of next section.

Fig. 2. Problem statement. (a): A set of $n=30$ positive and bounded Gaussian knolls defined in $[0,5]$. (b): A selection of $T=4$ knolls from the functions in (a), which together maximally cover the box defined by $[0,5] \times [0,1]$. (c): For any alternative selection of four knolls other than that shown in panel (b), the union of the areas under the knolls covers a smaller area.

Download Full Size | PDF

3. Mathematical modeling

Consider a closed domain $D\in \mathbb {R}^{s}$ and a set of $n$ positive and bounded functions $f_1,f_2,\ldots ,f_n:D\to [0,1]$. In our DoF allocation problem, $s$ is the number of factors (such as depth, age, etc) that affect the DoF profile, and $f_i$’s are a continuum of normalized profiles. For convenience, we adopt the naming convention in [44], and refer to each $f_i$ as a knoll. To maintain a general formulation, the only assumptions that we make about a knoll is being bounded (i.e., $0\leq f_i(\boldsymbol {x})\leq 1$) and defined for every point in $D$. Now, consider $T\in \mathbb {N}$, which is a number less than $n$. Restricting the selection to $T$ knolls, the goal is picking a subset of $f_1,f_2,\ldots ,f_n$ such that the union of the volumes under these selected knolls maximally covers the box $D\times [0,1]$. To relate the notation, Fig. 2(a) shows $n=30$ Gaussian-shaped knolls with varying means and variances defined in $D= [0,5]$, and Fig. 2(b) shows the optimal selection of $T=4$ functions that maximally cover the box $[0,5]\times [0,1]$.

The proposed problem can be cast as an optimization program with binary decision variables. More specifically, to find the $T$ optimal covering knolls, one may address the following combinatorial program

(1)$$\begin{aligned} \mathop{\textrm{minimize}}\limits_{\alpha_1,\ldots,\alpha_n}&~~\int_D \mbox{d}\boldsymbol{x} - \int_D ~\max\{\alpha_1f_1(\boldsymbol{x}),\alpha_2 f_2(\boldsymbol{x}),\ldots\alpha_n f_n(\boldsymbol{x})\}~\mbox{d}\boldsymbol{x}\\ \mbox{subject to:}& ~~\sum_{i=1}^{n}\alpha_i \leq T , ~~\mbox{and}~~\alpha_i\in\{0,1\}, ~~~i\in[n].\end{aligned}$$

Notice that the objective of (1) is the difference between the area (volume) of the box $D\times [0,1]$ and the selected knolls. Hence minimizing this objective certifies a maximum covering of the box with the $T$ selected knolls.

Mathematically, the objective function in (1) is concave in $\alpha _i$ (this is simply because $\max \{\alpha _1f_1(\boldsymbol {x}),\alpha _2 f_2(\boldsymbol {x}),\ldots \alpha _n f_n(\boldsymbol {x})\}$ is convex), which poses an additional challenge on top of having binary decision variables. Usually, when the objective function and the constraints are convex (disregarding the binary constraints), a relaxation of the binary decision variables yields a convex relaxation to the original combinatorial problem. Such convex relaxation is useful as a computational approximation of the original problem and in many cases can even accurately find the combinatorial solution. Unfortunately, in its original form, the combinatorial problem (1) does not offer a straightforward convex relaxation.

In the sequel we propose a reformulation of (1), which remedies the computational issues stated above, and allows us to solve it for the global minimizer.

3.1 Alternative binary program with convex relaxation

To present a computationally tractable reformulation of (1), we proceed by introducing some technical notions.

Given a function $f(\boldsymbol {x}):\mathbb {R}^{s}\to \mathbb {R}$, the hypograph of $f$ is the set of points lying on or below its graph, i.e.,

\texttt{hyp}(f) = \left\{ (\boldsymbol{x},y)\in\mathbb{R}^{s}\times\mathbb{R}: y\leq f(\boldsymbol{x})\right\}\subseteq \mathbb{R}^{s+1}.

The following theorem, relates the point-wise maximum of a set of functions to the union of their corresponding hypographs.

Theorem 1 Given functions $f_1,f_2,\ldots ,f_n:\mathbb {R}^{s}\to \mathbb {R}$, consider the pointwise maximum-function defined as

f_\lor (\boldsymbol{x}) = \max \{f_1(\boldsymbol{x}), f_2(\boldsymbol{x}),\ldots,f_n(\boldsymbol{x})\}.

Then $\texttt {hyp}(f_\lor ) = {\bigcup} _{i=1}^{n} \texttt {hyp}(f_i).$

Proof:

The proof follows from the basic properties of the pointwise-maximum operation:

\begin{aligned} \texttt{hyp}(f_\lor) &= \left\{ (\boldsymbol{x},y)\in\mathbb{R}^s\times\mathbb{R}: y\leq \max \{f_1(\boldsymbol{x}), f_2(\boldsymbol{x}),\ldots,f_n(\boldsymbol{x})\}\right\}\\ &= \left \{ (\boldsymbol{x},y)\in\mathbb{R}^s\times\mathbb{R}: \left(y\leq f_1(\boldsymbol{x})\right) \lor \left(y\leq f_2(\boldsymbol{x})\right)\lor \ldots \left (y\leq f_n(\boldsymbol{x})\right )\right\}\\ &=\bigcup_{i=1}^n \left\{ (\boldsymbol{x},y)\in\mathbb{R}^s\times\mathbb{R}: y\leq f_i(\boldsymbol{x})\right\}\\&= \bigcup_{i=1}^n \texttt{hyp}(f_i). \end{aligned}

In the second equality, the operand $\lor$ denotes the logical OR operator.

Using the result of Theorem 1, program (1) can be cast as finding a set $\Omega \subseteq \{1,2,\ldots ,n\}$, which addresses

(2)$$ \mathop{\textrm{minimize}}\limits_{\Omega~\subseteq ~ \{1,2,\ldots,n\}}~~\int_D\mbox{d}\boldsymbol{x} - \int_{\bigcup_{i\in\Omega} \texttt{hyp}(f_i) } \mbox{d}\boldsymbol{x}~\mbox{d}y, \quad \mbox{subject to:} ~~|\Omega| \leq T. $$

Here, $|\Omega |$ denotes the cardinality of the set $\Omega$. One may immediately observe that if $\Omega ^{*}$ is a solution to (2), then the set of $\alpha _i^{*}$ quantities defined via

\alpha_i^{*} = \left\{\begin{array}{cc}1 & i\in\Omega^{*}\\ 0 & i\notin\Omega^{*} \end{array} \right., i\in[n],

forms a solution to (1).

Program (2) can be further reshaped into a more standard form. Consider $\pi _{f_i}(\boldsymbol {x},y)$ to be the indicator function of the set $\texttt {hyp}(f_i)$, i.e.,

\pi_{f_i}(\boldsymbol{x},y) = \left\{\begin{array}{lc} 1 & (\boldsymbol{x},y)\in \texttt{hyp}(f_i) \\ 0 & (\boldsymbol{x},y)\notin \texttt{hyp}(f_i) \end{array}\right..

It is straightforward to see that

\bigcup_{i\in\Omega} \texttt{hyp}(f_i) = \texttt {supp} \sum_{i\in\Omega} \pi_{f_i}(\boldsymbol{x},y),

where $\texttt {supp}$ denotes the support of a given function (the set of points on which the function takes nonzero values). While the functions $\pi _{f_i}(\boldsymbol {x},y)$ are binary-valued, the function $\sum _{i\in \Omega } \pi _{f_i}(\boldsymbol {x},y)$ can take integer values greater than one, depending on the level of overlap among the functions $\pi _{f_i}$. Using the function $c(u) = \min (u,1)$ to clip the values of $\sum _{i\in \Omega } \pi _{f_i}$ that are greater than one, the objective in (2) can be related to $\sum _{i\in \Omega } \pi _{f_i}$ via

\int_{\bigcup_{i\in\Omega} \texttt{hyp}(f_i) } \mbox{d}\boldsymbol{x}~\mbox{d}y = \int_{D\times [0,1]} \min\left( \sum_{i\in\Omega} \pi_{f_i}(\boldsymbol{x},y),1\right)\mbox{d}\boldsymbol{x}~\mbox{d}y.

This again allows using a set of binary decision variables $\alpha _i$, to reformulate (2) as

(3)$$\begin{aligned} \mathop{\textrm{minimize}}\limits_{\alpha_1,\ldots,\alpha_n}&~~\int_D\mbox{d}\boldsymbol{x} - \int_{D\times [0,1]} \min\left( \sum_{i=1}^{n} \alpha_i \pi_{f_i}(\boldsymbol{x},y),1\right)\mbox{d}\boldsymbol{x}~\mbox{d}y\\ \mbox{subject to:}& ~~\sum_{i=1}^{n}\alpha_i \leq T, ~~\mbox{and} ~~\alpha_i\in\{0,1\}, ~~~i\in[n].\end{aligned}$$

A major advantage of (3) over the original program (1) is that the objective in (3) is convex in $\alpha _i$, and a convexification of the combinatorial program is easily possible by relaxing the binary constraints to $0\leq \alpha _i\leq 1$. As a matter of fact, program (3) is a special case of the shape composition problem proposed and implemented in [45–48]. Specifically, in [48] the authors present a general set of sufficient conditions under which the convex relaxation accurately identifies the solution to the combinatorial problem. Furthermore, the convex relaxed program can be cast as a linear program (LP) and solved very efficiently. In the next sections, we will provide some notes on the stability and sensitivity of the proposed framework to the errors in characterizing the knolls. We will also discuss the process of accurately solving (3) by either solving an LP with binary constraints or an LP representing the relaxed program.

3.2 Stable identification of optimal knolls

A key component of the proposed framework is the availability of the knolls $f_1(\boldsymbol {x}),\ldots ,f_n(\boldsymbol {x})$, which represent our measured DoF profiles for different depths. In this section we will show that the error introduced by an inaccurate characterization of the knolls at most linearly affects the coverage area of the selected knolls, which indicates a stable identification of the optimal knolls through our formulation.

Suppose that $f_1(\boldsymbol {x}),\ldots ,f_n(\boldsymbol {x})$ are the ideal knolls (DoF profiles), which are unavailable and instead the inaccurate profiles $f_1(\boldsymbol {x})+\delta _1(\boldsymbol {x}),\ldots ,f_n(\boldsymbol {x})+\delta _n(\boldsymbol {x})$ are available. Each function $\delta _i(\boldsymbol {x})$ is a representative of the point-wise error in an exact characterization of $f_i(\boldsymbol {x})$. Given a binary vector $\boldsymbol {\alpha } = (\alpha _1,\ldots ,\alpha _n)$ representing our selection of the knolls, based on (1), the area (volume) covered by the selected knolls can be written as

(4)$$\mathcal{C}(\boldsymbol{\alpha};f_1,\ldots,f_n) = \int_D ~\max\{\alpha_1f_1(\boldsymbol{x}),\alpha_2 f_2(\boldsymbol{x}),\ldots\alpha_n f_n(\boldsymbol{x})\}~\mbox{d}\boldsymbol{x}.$$

The following theorem relates the reduction introduced in the coverage to the error in the precise characterization of the knolls.

Theorem 2 Suppose that $\boldsymbol {\alpha }^{*}$ is the solution to (1) when $f_1(\boldsymbol {x}),\ldots ,f_n(\boldsymbol {x})$ are used as the knolls, and $\tilde {\boldsymbol \alpha }$ is the solution to (1) when $f_1(\boldsymbol {x})+\delta _1(\boldsymbol {x}),\ldots ,f_n(\boldsymbol {x})+\delta _n(\boldsymbol {x})$ are used as the knolls. Then

(5)$$0\leq \mathcal{C}(\boldsymbol{\alpha}^{*};f_1,\ldots,f_n)-\mathcal{C}(\tilde{\boldsymbol \alpha};f_1,\ldots,f_n)\leq T(u_\delta+\ell_\delta),$$

where

u_\delta = \max_{i=1,\ldots,n}\int_D \left[ \delta_i(\boldsymbol{x})\right]^{+}~\mbox{d}\boldsymbol{x}, ~~~~~\mbox{and}~~~~~ \ell_\delta ={-} \min_{i=1,\ldots,n}\int_D \left[ \delta_i(\boldsymbol{x})\right]^{-}~\mbox{d}\boldsymbol{x}.

A complete proof of Theorem 2 is presented in the Supplementary Note 1. Intuitively, Eq. (5) shows that the ideal coverage reduction caused by the inaccuracies in characterizing the exact knolls, in the worst case, scales by the number of selected knolls and the term $u_\delta +\ell _\delta$ which linearly relates to the magnitude of the perturbations $\delta _i$. One can immediately see that as $\delta _i(\boldsymbol {x})$ tends to zero, the coverage $\mathcal {C}(\tilde {\boldsymbol \alpha };f_1,\ldots ,f_n)$ converges to $\mathcal {C}( \boldsymbol {\alpha }^{*};f_1,\ldots ,f_n)$.

3.3 LP representation of the problem

Consider uniformly discretizing the domain $D\times [0,1]$ into $p$ voxels (or pixels in 2D): $(\boldsymbol {x}_1,y_1),\ldots ,(\boldsymbol {x}_p,y_p)$. We can form a matrix $\boldsymbol {\Pi } \in \mathbb {R}^{p\times n}$, where each column corresponds to a discrete representation of $\pi _{f_i}$. To reformulate (3) as an LP, consider a variable $\boldsymbol {\beta }\in \mathbb {R}^{p}$ with the entries

(6)$$\begin{aligned} \beta_j &= \min \left( \sum_{i=1}^{n} \alpha_i \pi_{f_i} (x_j,y_j),1\right)\\ & = \min\left(\boldsymbol{e}_j^{\top} \boldsymbol{\Pi} \boldsymbol{\alpha} ,1 \right),~~j\in[p],\end{aligned}$$

where $\boldsymbol {e}_j$ is the $j$-th standard basis vector. Equation (6) would naturally imply $\boldsymbol {e}_j^{\top } \boldsymbol {\Pi } \boldsymbol {\alpha }\geq \beta _j$ and $\beta _j\leq 1$. Representing the integral with a sum and dropping the constant term $\int _D\mbox {d}\boldsymbol {x}$), program (3) can now be cast as the following mixed binary program (MBP):

(MBP)$$ \textrm{minimize}_{\boldsymbol{\alpha},\boldsymbol{\beta} }~ - \boldsymbol{1}^{\top} \boldsymbol{\beta} ~~ ~ \mbox{subject to:}~ \left\{\begin{array}{l}\boldsymbol{\beta}\leq \boldsymbol{\Pi}\boldsymbol{\alpha} , ~~~ \boldsymbol{1}^{\top} \boldsymbol{\alpha}\leq T \\ \boldsymbol{\beta}\leq \boldsymbol{1},~~~~~~~ \alpha_i\in\{0,1\} ~~ i\in[n] \end{array} \right.,$$

which offers the straightforward LP relaxation

(LP)$$ \textrm{minimize}_{\boldsymbol{\alpha},\boldsymbol{\beta} }~~ - \boldsymbol{1}^{\top} \boldsymbol{\beta} \quad \mbox{subject to:}\quad \left\{\begin{array}{l}\boldsymbol{\beta}\leq \boldsymbol{\Pi}\boldsymbol{\alpha}, ~~~ \boldsymbol{1}^{\top} \boldsymbol{\alpha}\leq T \\ \boldsymbol{\beta}\leq \boldsymbol{1},~~~~ \boldsymbol{0}\leq \boldsymbol{\alpha}\leq \boldsymbol{1} \end{array} \right..$$

We propose using Gurobi [49] to address (MBP). While Gurobi is capable of solving the combinatorial problem (MBP) using integer programming routines, our strategy is to start by solving (LP) first (which is computationally much faster than the original combinatorial problem). If the $\boldsymbol {\alpha }$ component of the solution to (LP) is binary, that solution automatically would correspond to (MBP) as well, otherwise we try solving (MBP) using the integer programming tools of Gurobi.

A MATLAB implementation of our algorithm is available online. An overview of the implementation along with the link to the code is provided in Section 2 of the Supplementary Notes. It is noteworthy that while our code handles both (MBP) and (LP), thanks to the tight relaxation, in all the experiments performed in this paper the LP relaxation produced binary solutions (which naturally correspond to the solutions of (MBP)). Moreover, in the program overview, we have discussed a way to condense the matrix $\boldsymbol {\Pi }$, without any change to the solution (see Section A of the Supplement 1). Even when $p$ (the number of voxels) is in order of millions, after condensing $\boldsymbol {\Pi }$, our program can find a solution in fractions of a second, on a standard desktop computer. This is essential in applications such as real-time foveated rendering in headmounted displays or lightfield displays. Meaning we can calculate which set of sparse depth to show to minimize the error per frame.

Finally, as discussed above, the knoll functions are general and therefore if the knolls (the eye response) is changed to influence a certain desired parameter such as human eye wavelength sensitivity or 3D spatial profile of depth, there would be no impact on the algorithm performance, and still the proposed programs (MBP) and (LP) are applicable.

4. Assessment and results

We use the DoF profile patterns proposed in [26], which are representable in terms of the distance from the eye and the age. To generate a train of DoF profiles, we consider a uniform diopter spacing between the DoF planes, ranging between $D_{\min }$ and $D_{\max }$. The value of $D_{\max }$ is calculated based on [26], and we choose $D_{\min }$ to be 0.5 D (corresponding to the maximum visual range of 2 meters). This is a range for indoor visual activities such as monitor use. We have also performed similar experiments for $D_{\min }=0.09$ D (i.e., visual range expanded to 11.1 meters), which are mainly moved to the Supplementary Note 3 to abide by the paper length guidelines. In the calculation of the DoF profiles, we set the pupil diameter to $p=$3 mm and 2 mm, where the former is set for an average 250 nits monitor at 66 cm working distance, and the latter is considered for high dynamic range (HDR) monitors. Finally, we consider different ways of weighting the profiles based on the age distribution of the users, and the average monitor distance setting.

We use Fig. 3, to present a schematic of the plain DoF profiles, and the different ways they are weighted based on the age and the office working distance. For all the plotted profiles in this figure we have used $p=3$ mm and $D_{\min } = 0.5$ D, which yield a DoF train with 151 profiles considering uniform 0.044 D increments in the diopter domain. The vertical axis is the normalized maximum image intensity profile plotted versus the distance captured from a delta function (bright spot) positioned at that distance. The horizontal axis on the left is the depth, and the horizontal axis on the right corresponds to the user’s age. Since plotting all 151 profiles occupies the entire domain, in Fig. 3(a) we have shown a sparse subset of the DoF profiles with only 14 profiles uniformly picked from the original train. Figure 3(b) shows the result of weighting the profiles by a desired target age distribution for which the display is being designed. Specifically, the users’ age is considered to obey a Gamma distribution with a shape parameter $k=3$ and a scale parameter $\theta = 10$. This would correspond to a skewed bell-shaped distribution with mean 30 and a standard deviation 17.3, which models the distribution that we empirically found for the users’ age. Figure 3(c) shows an alternative weighting of the profiles by age, which uses the US population, obtained from [50]. Figure 3(d) shows a similar setting as Fig. 3(c), where aside from using the US population weight for the age, the diopter range is weighted by a Gaussian distribution with mean 1.5 D and standard deviation 0.5 D. Notice that such distribution introduces a reciprocal normal distribution weight on the depth axis. Finally, Fig. 3(e) shows a similar setting as Fig. 3(c), where the depth axis is weighted by a Gaussian with mean 66 cm and standard deviation 20 cm representing an average monitor distance setting.

Fig. 3. A schematic of 14 DoF profiles uniformly selected from a train of 151 profiles calculated for $p=3$ mm, with uniform diopter spacing between $D_{\min } = 0.5$ D and $D_{\max } = 7.08$ D. The right horizontal axis is age and the left horizontal axis is distance in mm. (a): The plain profiles without any weight on the depth or age. (b): The profiles after weighting the age axis by an empirical Gamma distribution. (c): The profiles after weighting the age axis by the US population. (d): Using the US population to weight the age, and a Gaussian weight in the diopter range. (e): The DoF profiles after weighting the age component by the US population and the depth range by a Gaussian.

Download Full Size | PDF

We apply the optimization scheme proposed in Section 3 to allocate the DoF planes for minimizing the accommodation error. Figure 4 shows the result of this optimization for different values of $T$, and for the parameter setting outlined in Fig. 3. These results show, exactly, how the depth levels should be allocated for a band limited display starting from $T= 1$ monocular depth level, all the way to $T= 9$ levels, such that the accommodation error considering different factors such as age profile, working distance profile and brightness of the display is minimized.

Fig. 4. The optimized DoF plane allocation results for different values of $T$, and the parameter settings outlined in Fig. 3. The optimized allocations for the plain DoFs without a weight on the age or the depth are plotted with green circles as the background in all the panels. The red crosses indicate the allocations after applying different age/depth weights. (a): The allocation after weighting the age axis by an empirical Gamma distribution. (b): The allocation after weighting the age axis by the US population. (c): Optimal allocation, using the US population to weight the age, and a Gaussian weight in the diopter range. (d): The allocation after weighting the age by the US population and the depth range by the Gaussian with center at 66 cm.

Download Full Size | PDF

The optimized locations for the plain DoF train without a weight on the age or the depth are plotted with green circles as the background throughout all the sub-figures, to better help a comparison as different factors kick in. The red cross marks in Fig. 4(a) show the optimized locations when the age component of the profiles is weighted by a Gamma distribution as outlined above (a display product made for average age of 30 and standard deviation of 17.3). As noted, the target age is pushing the depth levels to shorter distances for smaller number of planes favoring younger eyes, but for larger number of planes the impact becomes more and more negligible.

Figure 4(b), corresponds to the weighting of the age by the US population impacting diopter ranges. The distribution is relatively flat, therefore, not much impact is noticed compared to a uniform age distribution. In a similar fashion, Fig. 4(c) and (d) correspond to the settings described in panels (d) and (e) of Fig. 3. As noted, we have a major shift of depth allocation in Fig. 4(c) because of the diopter range that we are targeting around 1.5 D and distance range that we are targeting around 66 cm. What is important here is the transition between the first three depth levels. It is evident that the population profile impact is less pronounced, if the display is designed for a certain working range. Another notable observation is how the optimization leans toward longer distances in lower number of depth levels. This figure indicates that, for example, if there is a VR or an AR headset with two monocular depth levels designed for mass population, then a depth level around 170 cm, and the other around 83 cm, together introduce the minimal amount of accommodation error (panel(b)). However, if this headset is expected to represent a virtual monitor setup at 66 cm distance, then based on panel (d) the optimal depth choices are 58 cm and 87 cm.

Figure 5 shows a comparison of unweighted optimized levels for the case of a normal display (assumption of 3 mm pupil diameter) and an HDR enabled display with over 250 nits brightness (assuming a 2 mm pupil diameter). In reality the display may have a high dynamic range, so the pupil size varies based on the brightness of the content shown. Since for brighter contents the pupil diameter is smaller and the DoF is larger, therefore it is expected that less number of depth levels are needed to cover a desired diopter range. Here we are assuming that the content is so bright that the pupil diameter is fixed at 2 mm at all times. As seen, the depth levels become more spread apart, indicating a larger coverage per level.

Fig. 5. A comparison of the optimized DoF allocation for the unweighted reference cases, with $p=2$ mm and $p=3$ mm.

Download Full Size | PDF

Figure 6 a shows the impact of different factors on the allocation of the depth levels for 2 mm pupil diameter (HDR, or brighter displays). Figure 6(a) shows the impact of age profiling (an average age of 30, and a standard deviation of 17.3). Figure 6(b) shows the results for average US population, which is relatively uniform. As noted in both panels (a) and (b), up to the three first levels, the age range almost has no impact in the optimized allocation of the depth levels. This is a significant observation, as one would think that because of dominant farsightedness in older population, one would need to shift the depth to further distances. For example, one would expect that allocating a depth level at 52 cm, when the bandwidth limits the depth levels to $T= 3$, would be undesired since almost all people after the age of 50 would lose acuity at that distance. But this optimization shows that such counter intuitive allocation still reduces the cost of accommodations error across the entire population. The population profile ultimately only slightly skews the optimized depth level allocations at higher $T$ values. Figures 6(c) and (d) show the impact of distance and diopter targeting, respectively (similar depth and diopter profile considered as Fig. 4(c) and (d)). Because of the longer DoF at higher brightness, here again the impact of the profiling is less significant compared to pupil diameter of 3 mm.

Fig. 6. Optimized allocation results similar to the cases indicated in Fig. 4, with the pupil diameter set to $p=$2mm.

Download Full Size | PDF

Similar analysis is done for a larger diopter (depth) range that goes all the way to 0.09 D (11.1 meters). The results are shown in the Supplementary Figure S2. In such a long range the optimization dominantly favors the last level in the range, in its selection to minimize the error. However, for $T=1$, the optimized allocation (not impacted by the age or application) is at 5.6 m (and 7 m for $p=2$ mm). This is rather different compared to the 1.48 (1.41) meter distance when the optimization is ran only up to 2 meters. Certainly, if one expects the user to see mostly nearby 3D objects (as is the case of AR and VR), then the 2 meter range optimization is a better fit. If a display is desired to give the minimum accommodation error along the largest distance range all the way to infinity, then setting that first level to 5.5 meter is the best bet for an average 250 nits display. Supplementary Figure S3 shows these depth levels compared to an average male body in one-to-one scale.

Figures 7(a-d) show the difference between the coverage errors for the different values of $p$ and $D_{\min }$ discussed in the paper. Coverage error is the accommodation error that is calculated between a continuum of accommodation (the real world), and what the eye perceives based on the quantization of accommodation, considering the DoF profile at each level. Mathematically, this error corresponds to the portion of the area (volume) of the box that is not covered by the union of the DoF profiles. Considering the iterative method (mentioned in Fig. 1), based on the shallowest measured DoF of 0.15 D at the largest pupil diameter of 6 mm, the human eye does not really discern a continuum of depth due to the limited acuity. Therefore, there is an intrinsic accommodation error in the image formed at the back of the eye at the highest acuity region of the fovea, which is not (at least based on psychophysics literature) perceivable to the human eye. One can consider this intrinsic error as the base error level of the human eye optics. This intrinsic error is about 16.7% of 2 meter range and about 1.8% of the 11.1 meter range in distance space.

Fig. 7. Comparison between the coverage errors of equidistant DoF plane allocation in red, and the optimized allocation in blue (no weighting applied to the age or depth). The dashed red lines show the intrinsic errors along with their values. The setups are: (a) $p=$3 mm, $D_{\min } =$0.5D; (b) $p=$3 mm, $D_{\min } =$0.09D; (c) $p=$2 mm, $D_{\min } =$0.5D; (d) $p=$2 mm, $D_{\min } =$0.09D.

Download Full Size | PDF

Here the blue bars show the coverage error for an approach where the depth levels are allocated by dividing the range to equal distances and red bars show the coverage error for optimized allocation. The horizontal axis is the number of depth levels, $T$. By comparing the results from optimized allocation of depth to the equidistant allocation, it is evident that the optimization is significantly reducing the coverage error. At the two meters distance range optimization (Panels (a) and (c) of Fig. 7), the error reduction may not seem as significant at the first glance, due to the existence of intrinsic errors. However, a closer look shows that for example with this optimized allocation for only $T=3$ depth levels, the coverage error is on par or better than all the 9 depth levels using equidistant allocation. This is a factor of three reduction in the needed number of layers based on this metric. The results become even more significant for longer range (11.1m) distance optimization, where clearly the coverage error is reduced up to factor of five for $T=9$ depth levels and only 2-3 level of optimized allocation can be on par with 9 equidistant allocations. This is a hard proof that uniform equidistant allocation of depth in diopter domain is a waste of bandwidth in lightfield displays with monocular depth, and such allocation becomes even more inefficient for larger number of layers. Also considering the intrinsic coverage error (16.7% for panels (a), (c) and 1.8% for panels (b), (d)) one can see that with only 6-7 number of optimized allocated depths, the coverage error is already starting to become unperceivable. What this means is that on average the lack of sharpness in the perceived image as a result of accommodation error becomes on par with the intrinsic lack of sharpness in the image due to DoF of the human eye at the given pupil diameter.

5. Discussion

5.1 Quantized monocular depth levels in 3D space

Another parameter of interest is the aberration profile of the eye for defining 3D shape of the monocular depth levels. This is especially more interesting for applications like foveated rendering in lightfield displays or head mounted displays, where the effort is to reduce the rendering computational cost by adapting to the eye nonuniform spatial acuity profile in 3D space. While there are thorough understanding of eye aberration and point spread function (PSF) profile [51–53], it is difficult to theoretically pinpoint the single eye depth profile in the $(x,y)$ for each level. This is because the sampling outside the fovea becomes exponentially less dense, so defining a quantization parameter based on a single objective contrast sensitivity parameter becomes a multi-variable function of $(x,y,z)$ [54–56].

In case one considers the eye rotation (regardless of the type of eye movement), it can be roughly assumed that the monocular depth levels are on hemispherical surfaces with the eye rotation center at the center of the hemispheres. Obviously these spherical surfaces are then trimmed by the horizontal and vertical field of view of the eye into an irregular shape that varies based on structure of each person’s nose. For fixated eye on this sphere, the monocular DoF gets larger and larger as the angle is increased from the fovea to parafovea to perifovea passing the macula boundary to near, mid, and far peripheral vision areas. Following a typical perimetry result for the eye visual field (the normal hill of vision) the monocular depth profile would be similar to a horn torus shape with concave sides hitting the minimum at the fovea region and a small hole at the place of the optic nerve head (Fig. 8(a)). Certainly, there are many parameters such as the eye aberration; pupil diameter and retina curvature that will impact this shape, but that level of accuracy is most likely not needed to be considered for designing a band limited display and thus is not in the focus of our study. The mathematical formulation presented in Section 3 is general for any number of dimensions. Hence, based on the 3D monocular depth profile, one can increase the number of variables in our optimization approach to find the optimal allocations in 3D space and find the coverage error in higher dimensions.

Fig. 8. (a): A schematic of the Traquair hill of vision. (b): Stereoscopic distinguishable depth variation in longitudinal horopter vs depth of observation distance. (c): The 3D horopters in space based on empirical models from Ogle and Helmholtz measurements [19,20,57] . Color indicates the depth change in centimeters that is distinguishable on that horopter.

Download Full Size | PDF

With higher number of focal planes, for example 20 or 30, the accommodation error improvement is negligible as indicated in Fig. 7. This is because the distance between the focal planes becomes much smaller than the depth of field of the eye, therefore the human eye cannot distinguish the depth difference monocularly. However, this does not mean that such distinction cannot be made binocularly or through motion parallax (also known as cardboard effect). As we will discuss in the next section, the density of the binocular layers are much higher than monocular levels, however, providing binocular depth with displays is historically known to be much easier than monocular depth levels. In larger scales, binocular parallax directly relates to spatial resolution of only two images given to the left eye and right eye, whereas monocular depth requires additional rendering per layer in an uncompressed scheme. More specifically, if the bandwidth needed to show an image is $B$, the needed bandwidth to show two images (one left and one right) to get binocular depth is at most $2B$, however, to provide additional $T$ monocular depth levels, the maximum bandwidth is multiplied by the number of layers (i.e., $2TB$). The correlation and compression of such data cube is not the interest of this paper, however based on the results in Fig. 7 it is evident that at 7 depth levels optimally distributed per eye (14 $B$ maximum bandwidth, 25 cm-11.1 m) the monocular depth error should be indistinguishable to the human eye.

5.2 Generalization to binocular depth resolution

Similar to the monocular depth in Fig. 1, using horizontal angular disparity $\delta$ and its relation with interpupillary distance (IPD) and depth, one can iteratively find the binocular depth levels (horopters) and use our optimization approach to allocate limited number of horopters for a band limited stereoscopic display. This can be used to compress the stereoscopic data not based on the data redundancy, but based on the human eye vision limits. The maximum stereoscopic acuity is reported from 0.167 arcmin max to 0.5 arcmin average in the literature [42,58]. However this acuity is variant with angle with regards to the center line that is perpendicular to the face plane. This is why the horopter levels are different at different angles. Assuming the experimental stereoscopic acuity as 0.5 arcmin, and the average pupil distance [42,58], $I$, equals 64 mm, at each observation distance, $z$, we calculate the physical depth between two closest objects that are detected to be at different depths (see Supplement 1). We find the next observation distance or horopter level by substituting the distance with $z+\Delta z$. The iteration starts from 25 cm and terminates when the observation distance is over 15 m.

As shown in Fig. 8(b), human eyes are very sensitive at differentiating the depth levels on the order of sub-millimeter when the observation distance is near (less than 66 cm). This is only for the center line of the stereoscopic depth levels or horopters at maximum acuity. Based on this iterative simulation, one can find the center point of the stereoscopic depth levels, their relative location and their total number. The total number of levels from 25 cm to 15m is 1731 (for this IPD = 64 mm) and based on ANSUR IPD data [59] related to different races this number varies with 7.7% between races. There is minimal effect from monocular DoF on the number of these levels in the 25 cm-15 m range [58]. If one ignores the effect of eye PSF [31] down to 15 cm to the face, then the number of levels is increased to 2905 from 15 cm to infinity, but as pointed in the literature the shallowest depth that is detectable based on human vernier acuity (hyper acuity) cannot be smaller than 100 micron [60,61]. This will cap the depth levels to 2667. Above the 15 m distance, stereoscopic depth perception shows super linear accumulative negative error [42]. For example at 15 m this error is about -5 m so the subjects detect the 15 m to be closer around 10 m and at 31 m they perceived the depth to be around 18 m and at 248 they perceived the depth to be only 50 m. These data indicate that there are not many more depth levels that can be quantized after 15 m all the way to infinity, at least based on the non-contextual monocular or stereoscopic cues. Figure 8(b) further informs these three different regimes of stereoscopic depth perception (near distances dominated by PSF, mid range that is iteratively quantized, far that has negative accumulative error). Supplementary Figure S4(a) shows these depth levels at one-to-one scale with regards to an average 40 years old, male anatomy.

5.3 Quantized binocular depth levels in 3D space

In order to find the 3D spatial profile of binocular depth levels we used the Ogle [19,20] equation for horopters along with the Helmholtz experimental results as in Supplementary Eq. (S12). [57]. This estimation, unlike the Vieth-Muller circle theory, considers the Hering-Hillebrand deviation. To find all $H$ values at an arbitrary distance, we performed a linear fit with respect to the dioptric distance. The resulted shape of horopters are shown in Fig. 8(c) (see Visualization 1 for full 3D renders of the horopters and the Supplementary Figure S4(b). Horopters get denser as the distance becomes closer to the face. At shorter observation distances, horopters take the convex shape and gradually become concave on x axis while still keeping a convex shape on y. Based on this result it is evident that a flat-screen or virtual image is significantly sub-optimal. A more accurate empirical measurement is recently done in [62,63] to study the different mechanisms of saccade and vergence from a neurological view point.

Finally, in order to experimentally validate the iterative depth quantization criteria in this study a Badal setup as in Supplementary Figure S5 can be used to measure the number of monocular depth levels for subjects. The experiment has to run on different age groups and at different lighting conditions to testify the optimized allocation of the levels. This is a topic of our future studies. One can expand the same approach for allocation of binocular horopters.

6. Conclusion

We defined a method to quantize and allocate monocular depth levels in an optimized fashion. The method sets a fundamental guideline for designing 3D displays based on human visual perception capacity. From display science perspective, this is essentially equivalent to human eye depth resolution. The iterative depth allocation approach results in maximum 40 monocular depth levels and 2667 binocular levels which saturate even the best and youngest eye at most demanding experimental conditions possible. This maximum quantized discernible levels quickly falls to 15 monocular levels and 1731 binocular spread from 25 cm to 15m for 3 mm pupil diameter and IPD of 64 mm. To optimally allocate monocular depth levels to a lightfield or a 3D display, we manage to cast the problem to an integer program with a computationally accurate and efficient convex relaxation. In some cases, the results beat an equidistant allocation of monocular depth levels in diopter space by a factor of 5. Variety of design parameters are studied through this study. Our method shows that with only 6-7 quantized monocular depth levels allocated optimally for an AR/VR headset or a lightfield display application one can have an on-par error with intrinsic error of human eye. The method goes on further to pin point the location of these depth levels for varieties of desired scenarios.

Acknowledgements

The authors thank Steven Cholewiak of UC Berkeley, Adam Samaniego, Youngeun Park, and Stefano Baldasi from the analytic department at former Meta Co. for their help and consultation.

Disclosures

The authors declare no conflicts of interest.

Supplemental document

See Supplement 1 for supporting content.

References

1. R. Patterson, “Human factors of 3-d displays,” J. Soc. Inf. Disp. 15(11), 861–871 (2007). [CrossRef]

2. D. M. Hoffman, A. R. Girshick, K. Akeley, and M. S. Banks, “Vergence–accommodation conflicts hinder visual performance and cause visual fatigue,” J. Vis. 8(3), 33 (2008). [CrossRef]

3. K. Ukai and P. A. Howarth, “Visual fatigue caused by viewing stereoscopic motion images: Background, theories, and observations,” Displays 29(2), 106–116 (2008). [CrossRef]

4. C. Vienne, L. Sorin, L. Blondé, Q. Huynh-Thu, and P. Mamassian, “Effect of the accommodation-vergence conflict on vergence eye movements,” Vision Res. 100, 124–133 (2014). [CrossRef]

5. H. Huang and H. Hua, “Systematic characterization and optimization of 3d light field displays,” Opt. Express 25(16), 18508–18525 (2017). [CrossRef]

6. D. Fattal, Z. Peng, T. Tran, S. Vo, M. Fiorentino, J. Brug, and R. G. Beausoleil, “A multi-directional backlight for a wide-angle, glasses-free three-dimensional display,” Nature 495(7441), 348–351 (2013). [CrossRef]

7. G. Wetzstein, D. Lanman, W. Heidrich, and R. Raskar, “Layered 3d: tomographic image synthesis for attenuation-based light field and high dynamic range displays,” in ACM SIGGRAPH 2011 papers, (2011), pp. 1–12.

8. D. Lanman, G. Wetzstein, M. Hirsch, W. Heidrich, and R. Raskar, “Polarization fields: dynamic light field display using multi-layer lcds,” in Proceedings of the 2011 SIGGRAPH Asia Conference, (2011), pp. 1–10.

9. A. Jones, I. McDowall, H. Yamada, M. Bolas, and P. Debevec, “Rendering for an interactive 360 light field display,” in ACM SIGGRAPH 2007 papers, (2007), pp. 40–es.

10. D. Yoo, J. Cho, and B. Lee, “Optimizing focal plane configuration for near-eye multifocal displays via the learning-based algorithm,” in Ultra-High-Definition Imaging Systems III, vol. 11305 (International Society for Optics and Photonics, 2020), p. 113050W.

11. Y. Jo, S. Lee, D. Yoo, S. Choi, D. Kim, and B. Lee, “Tomographic projector: large scale volumetric display with uniform viewing experiences,” ACM Trans. Graph. 38(6), 1–13 (2019). [CrossRef]

12. J. P. Rolland, M. W. Krueger, and A. Goon, “Multifocal planes head-mounted displays,” Appl. Opt. 39(19), 3209–3215 (2000). [CrossRef]

13. N. Matsuda, A. Fix, and D. Lanman, “Focal surface displays,” ACM Trans. Graph. 36(4), 1–14 (2017). [CrossRef]

14. S. Tay, P.-A. Blanche, R. Voorakaranam, A. Tunç, W. Lin, S. Rokutanda, T. Gu, D. Flores, P. Wang, G. Li, P. St. Hilaire, J. Thomas, R. Norwood, M. Yamamoto, and N. Peyghambarian, “An updatable holographic three-dimensional display,” Nature 451(7179), 694–698 (2008). [CrossRef]

15. F. Yaraş, H. Kang, and L. Onural, “State of the art in holographic displays: a survey,” J. Disp. Technol. 6(10), 443–454 (2010). [CrossRef]

16. Y. Takaki and N. Nago, “Multi-projection of lenticular displays to construct a 256-view super multi-view display,” Opt. Express 18(9), 8824–8835 (2010). [CrossRef]

17. D. Teng, Z. Pang, Y. Zhang, D. Wu, J. Wang, L. Liu, and B. Wang, “Improved spatiotemporal-multiplexing super-multiview display based on planar aligned oled microdisplays,” Opt. Express 23(17), 21549–21564 (2015). [CrossRef]

18. B. Heshmat, “Fundamental limitations for augmented reality displays with visors, waveguides, or other passive optics,” in 3D Image Acquisition and Display: Technology, Perception and Applications, (Optical Society of America, 2018), pp. 3M5G–1.

19. K. N. Ogle, “An analytical treatment of the longitudinal horopter; its measurement and application to related phenomena, especially to the relative size and shape of the ocular images,” J. Opt. Soc. Am. 22(12), 665–728 (1932). [CrossRef]

20. K. N. Ogle, “Stereoscopic acuity and the role of convergence,” J. Opt. Soc. Am. 46(4), 269–273 (1956). [CrossRef]

21. F. W. Campbell, “The depth of field of the human eye,” Opt. Acta: Int. J. Opt. 4(4), 157–164 (1957). [CrossRef]

22. D. Green and F. Campbell, “Effect of focus on the visual response to a sinusoidally modulated spatial stimulus,” J. Opt. Soc. Am. 55(9), 1154–1157 (1965). [CrossRef]

23. G. E. Legge, K. T. Mullen, G. C. Woo, and F. Campbell, “Tolerance to visual defocus,” J. Opt. Soc. Am. A 4(5), 851–863 (1987). [CrossRef]

24. S. Marcos, E. Moreno, and R. Navarro, “The depth-of-field of the human eye from objective and subjective measurements,” Vision Res. 39(12), 2039–2049 (1999). [CrossRef]

25. C. E. Granrud, A. Yonas, and L. Pettersen, “A comparison of monocular and binocular depth perception in 5-and 7-month-old infants,” J. Exp. Child Psychol. 38(1), 19–32 (1984). [CrossRef]

26. H. A. Anderson, G. Hentz, A. Glasser, K. K. Stuebing, and R. E. Manny, “Minus-lens–stimulated accommodative amplitude decreases sigmoidally with age: a study of objectively measured accommodative amplitudes from age 3,” Invest. Ophthalmol. Vis. Sci. 49(7), 2919–2926 (2008). [CrossRef]

27. I. Bülthoff, H. Bülthoff, and P. Sinha, “Top-down influences on stereoscopic depth-perception,” Nat. Neurosci. 1(3), 254–257 (1998). [CrossRef]

28. A. J. Parker, “Binocular depth perception and the cerebral cortex,” Nat. Rev. Neurosci. 8(5), 379–391 (2007). [CrossRef]

29. Y. Tsushima, K. Komine, Y. Sawahata, and N. Hiruma, “Higher resolution stimulus facilitates depth perception: Mt+ plays a significant role in monocular depth perception,” Sci. Rep. 4(1), 6687 (2015). [CrossRef]

30. P. Bernal-Molina, R. Montés-Micó, R. Legras, and N. López-Gil, “Depth-of-field of the accommodating eye,” Optom. Vis. Sci. 91(10), 1208–1214 (2014). [CrossRef]

31. H. Ginis, G. M. Pérez, J. M. Bueno, and P. Artal, “The wide-angle point spread function of the human eye reconstructed by a new optical method,” J. Vis. 12(3), 20 (2012). [CrossRef]

32. I. P. Howard and B. J. Rogers, Seeing in depth, Vol. 2: Depth perception. (University of Toronto Press, 2002).

33. J. M. Fulvio and B. Rokers, “Use of cues in virtual reality depends on visual feedback,” Sci. Rep. 7(1), 16009–13 (2017). [CrossRef]

34. L. Hofmann and K. Palczewski, “Advances in understanding the molecular basis of the first steps in color vision,” Prog. Retinal Eye Res. 49, 46–66 (2015). [CrossRef]

35. D. Kelly, “Spatio-temporal frequency characteristics of color-vision mechanisms,” J. Opt. Soc. Am. 64(7), 983–990 (1974). [CrossRef]

36. J. Davis, Y.-H. Hsieh, and H.-C. Lee, “Humans perceive flicker artifacts at 500 hz,” Sci. Rep. 5(1), 7861 (2015). [CrossRef]

37. Y. Takaki, “High-density directional display for generating natural three-dimensional images,” Proc. IEEE 94(3), 654–663 (2006). [CrossRef]

38. K. J. MacKenzie, D. M. Hoffman, and S. J. Watt, “Accommodation to multiple-focal-plane displays: Implications for improving stereoscopic displays and for accommodation control,” J. Vis. 10(8), 22 (2010). [CrossRef]

39. S. Liu and H. Hua, “A systematic method for designing depth-fused multi-focal plane three-dimensional displays,” Opt. Express 18(11), 11562–11573 (2010). [CrossRef]

40. T. E. Bishop and P. Favaro, “The light field camera: Extended depth of field, aliasing, and superresolution,” IEEE Trans. Pattern Anal. Mach. Intell. 34(5), 972–986 (2012). [CrossRef]

41. K. Takahashi, Y. Kobayashi, and T. Fujii, “From focal stack to tensor light-field display,” IEEE Trans. on Image Process. 27(9), 4571–4584 (2018). [CrossRef]

42. S. Palmisano, B. Gillam, D. G. Govan, R. S. Allison, and J. M. Harris, “Stereoscopic perception of real depths at large distances,” J. Vis. 10(6), 19 (2010). [CrossRef]

43. F. Richter, “Always on: Media usage amounts to 10+ hours a day,”.

44. A. Aghasi and J. Romberg, “Sparse shape reconstruction,” SIAM J. Imaging Sci. 6(4), 2075–2108 (2013). [CrossRef]

45. A. Aghasi and J. Romberg, “Convex cardinal shape composition,” SIAM J. Imaging Sci. 8(4), 2887–2950 (2015). [CrossRef]

46. A. Aghasi and J. Romberg, “Convex cardinal shape composition and object recognition in computer vision,” in 2015 49th Asilomar Conference on Signals, Systems and Computers, (IEEE, 2015), pp. 1541–1545.

47. A. Redo-Sanchez, B. Heshmat, A. Aghasi, S. Naqvi, M. Zhang, J. Romberg, and R. Raskar, “Terahertz time-gated spectral imaging for content extraction through layered structures,” Nat. Commun. 7(1), 12665–7 (2016). [CrossRef]

48. A. Aghasi and J. Romberg, “Extracting the principal shape components via convex programming,” IEEE Trans. on Image Process. 27(7), 3513–3528 (2018). [CrossRef]

49. L. Gurobi Optimization, “Gurobi optimizer reference manual,” (2020).

50. U. S. C. Bureau, “Age and sex composition in the united states,”.

51. J.-M. Wang, C.-L. Liu, Y.-N. Luo, Y.-G. Liu, and B.-J. Hu, “Statistical virtual eye model based on wavefront aberration,” Int. J. Ophthalmol. 5(5), 620–624 (2012). [CrossRef]

52. L. A. Carvalho, “Accuracy of zernike polynomials in characterizing optical aberrations and the corneal surface of the eye,” Invest. Ophthalmol. Vis. Sci. 46(6), 1915–1926 (2005). [CrossRef]

53. L. A. V. Carvalho, J. Castro, and L. A. V. Carvalho, “Measuring higher order optical aberrations of the human eye: techniques and applications,” Braz. J. Med. Biol. Res. 35(11), 1395–1406 (2002). [CrossRef]

54. A. B. Watson, “A formula for human retinal ganglion cell receptive field density as a function of visual field location,” J. Vis. 14(7), 15 (2014). [CrossRef]

55. R. Navarro, E. Moreno, and C. Dorronsoro, “Monochromatic aberrations and point-spread functions of the human eye across the visual field,” J. Opt. Soc. Am. A 15(9), 2522–2529 (1998). [CrossRef]

56. S. H. Wong and G. T. Plant, “How to interpret visual fields,” Pract. Neurol. 15(5), 374–381 (2015). [CrossRef]

57. A. Ames, K. N. Ogle, and G. H. Gliddon, “Corresponding retinal points, the horopter and size and shape of ocular images,” J. Opt. Soc. Am. 22(11), 575–631 (1932). [CrossRef]

58. W. L. Larson and A. Lachance, “Stereoscopic acuity with induced refractive errors,” Am. journal optometry and physiological optics 60(6), 509–513 (1983). [CrossRef]

59. N. A. Dodgson, “Variation and extrema of human interpupillary distance,” in Stereoscopic Displays and Virtual Reality Systems XI, vol. 5291 (International Society for Optics and Photonics, 2004), pp. 36–46.

60. A. Lit, “Depth-discrimination thresholds as a function of binocular differences of retinal illuminance at scotopic and photopic levels,” J. Opt. Soc. Am. 49(8), 746–752 (1959). [CrossRef]

61. G. Westheimer and M. W. Pettet, “Contrast and duration of exposure differentially affect vernier and stereoscopic acuity,” Proc. R. Soc. Lond. B 241(1300), 42–46 (1990). [CrossRef]

62. A. Gibaldi and M. S. Banks, “Binocular eye movements are adapted to the natural environment,” J. Neurosci. 39(15), 2877–2888 (2019). [CrossRef]

63. A. Gibaldi, A. Canessa, and S. P. Sabatini, “The active side of stereopsis: Fixation strategy and adaptation to natural environments,” Sci. Rep. 7(1), 44800 (2017). [CrossRef]

Name	Description
Supplement 1	Supplemental Material
Visualization 1	3D Horopter Rotation (Animated Version of Figure 8(c))

Optimal allocation of quantized human eye depth perception for multi-focal 3D display design

Abstract

1. Introduction

2. Problem statement: monocular depth resolution

3. Mathematical modeling

3.1 Alternative binary program with convex relaxation

3.2 Stable identification of optimal knolls

3.3 LP representation of the problem

4. Assessment and results

5. Discussion

5.1 Quantized monocular depth levels in 3D space

5.2 Generalization to binocular depth resolution

5.3 Quantized binocular depth levels in 3D space

6. Conclusion

Acknowledgements

Disclosures

Supplemental document

References

Supplementary Material (2)

Cited By

Figures (8)

Equations (16)

Optics Express