## Abstract

In an integral imaging near-eye light field display using a microlens array, a point on a reconstructed depth plane (RDP) is reconstructed by sampled rays. Previous studies respectively suggested the accommodative response may shift from the RDP under two circumstances: (i) the RDP is away from the central depth plane (CDP) to introduce defocusing in sampled rays; (ii) the sampled ray number is too low. However, sampled rays’ defocusing and number may interact, and the interaction’s influence on the accommodative response has been little revealed. Therefore, this study adopts a proven imaging model providing retinal images to analyze the accommodative response. As a result, when the RDP and the CDP coincide, the accommodative response matches the RDP. When the RDP deviates from the CDP, defocusing is introduced in sampled rays, causing the accommodative response to shift from the RDP towards the CDP. For example, in a system with a CDP of 4 diopters (D) and 45 sampled rays, when the RDP is at 3, 2, 1, and 0 D, the accommodative response shifts to 3.25, 2.75, 2, and 1.75 D, respectively. With fewer rays, the accommodative response tends to further shift to the CDP. Eventually, with fewer than five rays, the eye accommodates to the CDP and loses the 3D display capacity. Moreover, under different RDPs, the ray number influences differently, and vice versa. An x-y polynomial equation containing three interactive terms is finally provided to reveal the interaction between RDP position and ray number. In comparison, in a pinhole-based system with no CDP, the accommodative response always matches the RDP when the sampled ray number is greater than five.

© 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

While talking about near-eye displays for virtual reality (VR) and augmented reality (AR) devices, high resolution, wide field of view (FoV), compact volume, and true-3D display free of the vergence-accommodation conflict (VAC) are always vital requirements [1–3]. To fulfill these requirements, several near-eye display technologies have been proposed, including birdbath optics [4,5], waveguide systems [6–8], retinal scan [9,10], holographic display [11,12], integral imaging light field display (InIm-LFD) [13], and other novel component-based solutions recently proposed [14–16]. Of the technologies, the InIm-LFD is a very unique and promising one because it intrinsically provides near-continuous monocular focus cues for true-3D display, as well as simple hardware and compact volume [13]. Thus, in these years, there have been many studies on near-eye InIm-LFDs [17–26].

The true-3D display feature of InIm-LFDs is, in essence, achieved by using multiple vectorial rays, also known as the 4D plenoptic function, to sample the continuous angular information of the light emanated from a real object [27–29], as shown in Fig. 1(a) and Fig. 1(b). Moreover, the vectorial rays used for the sampling are implemented by an image rendering component (usually a microdisplay) and a light modulation component (typically a microlens or pinhole array), as shown in Fig. 1(c) and Fig. 1(d). Here, the near-eye InIm-LFD focused in this paper refers to the system generating virtual images, as shown in Fig. 1(c). Another type of InIm-LFD that generates real images is shown in Fig. 1(d), which shares the same InIm principle with the near-eye type but is usually used for floating-3D or table-top systems with shorter depth of field (DoF) and larger viewing zone [20,21].

For a real object, a human eye’s accommodative response must match the real object to obtain the most focused retinal image, namely the focus cue. While the object is discretely sampled, or to say mimicked by sampled rays, in both near-eye and real-image InIm-LFDs, it is presumed that the accommodative response still matches the plane on which the sampled rays intersect, also known as the reconstructed depth plane (RDP). The presumption of accommodative response is the foundation on which elemental images are derived, as shown in Fig. 1(c) and Fig. 1(d), where several pixels that can provide sampled rays to reconstruct a target point are turned on. Note there is another way to understand the focus cue. Multiple elemental views of a 3D scene (i.e., views corresponding to elemental images) with slight parallaxes are simultaneously projected to an eyebox. The parallaxes determine the RDP [17–19], as shown in Fig. 1(e) with a near-eye system as an example. According to this understanding, each elemental image is obtained by projecting its corresponding elemental view onto the image rendering component. It is easy to know the two ways to understand the focus cue are equivalent and can attain the same elemental images.

The accommodative response presumed to match the RDP is the critical mechanism that elicits the true-3D display feature of InIm-LFDs. However, the real-world imaging using continuous angular information and the integral imaging using sampled angular information are not exactly the same. Moreover, the sampled rays in a real microlens or pinhole-based system are not ideal rays but light bundles, as shown in Fig. 2(a) and Fig. 2(b), making a real system different from real-world imaging. Thus, it is necessary to investigate whether the accommodative response in an InIm-LFD coincides with the mimicked real object (i.e., the RDP). If the accommodative response shifts, a designer must compensate for the binocular vergence to match the shifted accommodative response.

Regarding the problem of accommodative response, many studies have been dedicated [28–36], forming a unique research line in the area of LFDs. In early studies using typical InIm-LFDs and optometric experiments [30–32], the VAC and related visual fatigue were suppressed for a majority of experimental subjects compared with conventional autostereoscopic displays, suggesting the accommodative response should roughly match the RDP to suppress the VAC. Nevertheless, in a recent study on microlens-based near-eye InIm-LFDs [33], Chen et al. measured a non-negligible accommodative shift when the RDP did not coincide with the central depth plane (CDP) of the microlens array.

In addition to the optometric measurement, several studies analyzed the accommodative response from a theoretical perspective by adopting imaging models providing retinal images. Huang et al. [28,29] utilized a Fresnel diffraction-based model. They demonstrated that when the sampled rays on the eye pupil were insufficient, the accommodative response in a real-image InIm-LFD shifted from the RDP for a few tenths of diopters (D). This phenomenon in real-image InIm-LFDs was then supported by an experimental study by Chen et al. [34]. In addition to the finding that insufficient sampled rays cause accommodative shifts, we, in a previous study [35], considered that sampled rays were defocused when an RDP was not at a system’s CDP due to the optical nature of a microlens array. We adopted a Rayleigh-Sommerfeld diffraction-based imaging model and revealed that the accommodative response in a near-eye system always shifts towards the CDP from the RDP, for even greater than 1 D sometimes, because the defocusing that is the lightest at the CDP also contributes to the integral imaging. Experimental verification was also provided in [35]. The revelation of the defocusing-induced accommodative shift supported the experimental result in [33] because non-negligible accommodative shifts under similar RDP-CDP combinations were measured in [33]. Zhao et al. [36] further revealed two conditions should be met for a shift-free accommodative response in a microlens-based near-eye system: (1) the RDP should not be away from the CDP too much; (2) the sampled ray number on the eye pupil should be large enough. This study not only agreed with our previous study [35] that the accommodative response shifts more from the RDP with increasing defocusing but also implied that the RDP position and the sampled ray number might interactively influence

Previous studies analyzed the accommodative response as a single-variable function of RDP position (i.e., defocusing) or ray number. They revealed significant accommodative shifts independently caused by defocused sampled rays or a low ray number. However, to our knowledge, few have discussed how the two factors interact, i.e., have not derived the accommodative response under different combinations of RDP position and ray number, not to mention quantitatively revealing the two factors’ interaction. While designing an InIm-LFD, the RDP position and the sampled ray number are two primary parameters to determine; thus, the accommodative response jointly influenced by them and their interaction must be highlighted to maintain the true-3D display feature. In particular, the accommodative shift issue tends to be more severe in near-eye systems because the DoF is much longer compared with real-image systems. Thus, it is incumbent upon us to focus on the accommodative response problem in this promising near-eye display technology.

As follows, we will adopt an imaging model proposed and proven in our previous studies [21,22,35] to analyze the accommodative response as a two-variable function. In addition to a microlens-based near-eye system, a pinhole-based system where the defocusing is absent will also be examined for comparison. In Sec. 2, the modeling method of InIm-LFDs will be introduced. In Secs. 3 and 4, analysis will be performed for a microlens-based system and a pinhole-based system, respectively, followed by discussions and conclusions in Secs. 4 and 5.

## 2. Modeling target near-eye InIm-LFDs

#### 2.1 Specifications of target systems

Target microlens and pinhole-based near-eye InIm-LFDs with typical specifications are configured before modeling them using the imaging model, as listed in Table 1. To concentrate on the influences of defocusing and ray number without being disturbed by oblique visual fields and the polychromatic vision, we make the target point always orthogonally face the eye and adopt a single wavelength of 550 nm, as the pilot study [35] did. Besides, several important considerations are discussed below.

- (i) The microdisplay is set to have a pixel size of 7.8 µm (3256 ppi), in compliance with currently mainstream microdisplays.
- (ii) The lenslet/pinhole pitch is the sampling interval, which is variable from 0.5 to 2 mm. Under the fixed eye pupil aperture (4 mm, a typical value), when the lenslet/pinhole pitches are 0.5 and 2 mm, the sampled ray numbers are 45 and 1, respectively. Adjusting the pitch can acquire in-between sampled ray numbers.
- (iii) The typical eye-relief of 24 mm cannot guarantee always eliminating pseudo images while adjusting the lenslet/pinhole pitch [37]. However, the pseudo images will not influence the accommodative response to the target image point.
- (iv) The CDP is variable by moving the microdisplay to change the distance d
_{2}(see Fig. 3(a)) between the microdisplay and the lens array. The microdisplay is always inside the focal point of the lenslets to form virtual images. For example, the distance d_{2}are 0.78 and 0.81 mm for CDPs of 4 and 2 D, respectively. - (v) The pinhole aperture is determined by optimally balancing between diffraction and aberration. According to [38], it equals (2λd)
^{0.5}, i.e., 74 µm, where λ is the working wavelength of 550 nm, and d is the distance of the pinhole array to the microdisplay.

Another thing is that the analysis in this study is for the accommodative response to a single reconstructed point, a fundamental case of InIm-LFDs. The accommodative response may slightly vary with image content containing various spatial frequencies [28,29]. Thus, if extending this study to analyze a specific scene, the following analysis should be re-performed by replacing the retinal image of a single point with that of the scene.

#### 2.2 Modeling the target systems

Our imaging model of InIm-LFDs with high accuracy was proposed in [21,22] and then successfully adopted for accommodative response analysis in [35], a pilot study of this one. Exact details of the model can be found in [21], and the effectiveness and accuracy of the model were also experimentally verified in [21]. As follows, we will review the modeling method so that the model, a core tool of this study, can be understood clearly.

The model is constructed in Zemax OpticStudio and contains a microdisplay with emanating pixels, a microlens or pinhole array, and an Arizona eye model with adjustable accommodation [39], as Fig. 3(a) and Fig. 3(b) show. In Fig. 3(a), the microlens array is modeled as real plano-convex lenses made of PMMA with a fixed lens thickness d_{1} and a curvature radius R (see the enlarged view), for a fixed paraxial focal length of 5 mm. By doing so, high accuracy is guaranteed for imaging with non-negligible aberration. The distance d_{2} between the microdisplay and the microlens array is variable by moving the microdisplay to vary the CDP.

To calculate the wave propagation from emanating pixels to the eye model’s retina while incorporating all factors affecting the image formation, including diffraction, aberration, defocusing, and pixel size, the model uses the fundamental Rayleigh-Sommerfeld diffraction. The diffraction is sequentially calculated from a surface in the model to the next, and the mathematical formulation of the diffraction is given by Eq. (1) and Fig. 4. Zemax numerically performs the diffraction integral through its “*Huygens Integral*” function with the “*Force Spherical*” option on to account for non-planar wavefronts in the system.

_{p}(

**r**) and E(

**r’**) are complex amplitudes on the observation surface T and the pupil surface ∑ respectively determined by vectors

**r**and

**r’**; λ and k are the wavelength and the wave number. For a spherical wave originating from S, θ

_{d}represents the angle between the vector

**r**-

**r’**and the normal of the wavelet on ∑. For example, while calculating from the planar surface of a lenslet to the convex surface, the convex surface is T, and the planar surface is ∑. E

_{p}(

**r**) and E(

**r’**) are complex amplitudes on the convex surface and the planar surface, respectively.

In addition to the diffraction integral calculated by Zemax, other non-optical calculations such as sampled ray generation and retinal image integral are performed by Matlab. Data are transmitted between Zemax and Matlab through ZOS-API.

By taking a case as an example (microlens-based system, lenslet pitch p = 0.5 mm, CDP = RDP = 4 D, and accommodation = 4 D), the modeling method contains following steps.

- (i) Sampled rays are determined by tracing from the target point on the designated RDP to all lenslets or pinholes. Here, the lenslet is treated as a thick lens with two nodal points, and the pinhole is treated as an ideal point. After the tracing, rays that can reach the 4-mm pupil are selected according to the illustration in Fig. 1(c). In this example, there are 45 sampled rays. The intersections between the sampled rays and the microdisplay are determined to be emanating pixels’ center positions. Note that, by performing this step, the RDP is generated through content rendering, and this step is done in Matlab.
As Fig. 5(a) shows, an ideal sampled ray cannot always hit the center of a pixel on the microdisplay due to the finite pixel size, called the “sampling error,” causing ideal sampled rays to deviate from actual rays. In this manner, by respectively using ideal and actual sampled rays, the following calculations contain two groups: ideal and actual results. The sampling error is difficult to avoid in practice but will add some irregularity to the accommodative response; thus, the following analysis will mainly adopt ideal rays to analyze from a theoretical perspective. If a practical system is of interest, the subsequent analysis can be similarly performed based on actual rays. For example, results based on actual rays will be discussed and compared with ideal rays in Sec. 3.3.

- (iii) For each sampled ray, its retinal point spread function (PSF) is obtained through the diffraction integral. Next, the retinal PSF of each pixel’s sampled ray is convoluted with the region enclosed by the boundary of the pixel’s retinal image to get the retinal image of each emanating pixel. Here, the retinal image’s boundary is obtained through Zemax Raytracing.
- (iv) Finally, the retinal image of the target point on the RDP is obtained by integrating retinal images of all emanating pixels with weightings determined by the Stiles-Crawford effect of the first kind [40]: W = 10E(-0.05r
^{2}), where r (in millimeter) is the distance between where a sampled ray hits the pupil and the pupil center. Figure 5(d) and Fig. 5(e) respectively show ideal and actual retinal images of the target point to be reconstructed. The actual one is more blurring slightly because of the sampling error.

#### 2.3 Determining the accommodative response from retinal images

The imaging model can provide retinal images produced by various accommodations while we do not know which one is the correct accommodative response. Thus, we need a quantitative criterion to find the most “focused” retinal image. To this purpose, early studies on multi-focal-plane 3D displays investigated the mechanism that drives an eye to accommodate and established three criteria, as below [41]. The criteria were experimentally verified and adopted for analyzing the accommodative response in previous studies [28,29,35,36]. Thus, in this study, we will directly choose from these reliable criteria.

- (i) The maximum retinal contrast; (ii) the maximum retinal contrast gradient; (iii) the minimum degradation of a retinal image to its real-world image, where the degradation is characterized by the Strehl ratio (i.e., the ratio of a retinal image’s peak value to its diffraction-limited form).

Our pilot study [35] adopted the third criterion because the imaging model directly operates in the spatial domain, enabling convenient calculation of the Strehl ratio. Therefore, we will still adopt this criterion. For example, based on the above case (p = 0.5 mm and CDP = RDP = 4 D), retinal images produced by accommodations of 4 to 0 D are shown in Fig. 6(a). Considering these retinal images have the same diffraction-limited form, i.e., the ideal conjugate image of the target point on the RDP, the maximum Strehl ratio can be conveniently found by obtaining the peak intensities of all retinal images and selecting the maximum one. Here, we call those peak values normalized to the maximum one as “relative Strehl ratio.” Obviously, the relative Strehl ratio of one suggests the accommodative response.

Note that, in Fig. 6(a) and similar figures where several retinal images are compared, each retinal image’s colormap is set according to the maximum value of itself but not all retinal images because those blurred images will be too dimmed if doing the latter. The difference in maximum value of the retinal images can be identified through the relative Strehl ratio, which is proportional to the maximum value of each retinal image.

In Fig. 6(a), the accommodative response is determined to be 4 D. The result is reasonably expected because the retinal footprint is the most converged at the RDP of 4 D, and the retinal image of each emanating pixel is also the most focused at the CDP of 4 D. When the RDP equals to another CDP of 2 D, an accommodative response matching the RDP is also observed, as shown in Fig. 6(b).

## 3. Accommodative response analysis: microlens-based system

#### 3.1 Influence of defocusing

As discussed above, the accommodative response matches the RDP when the RDP equals the CDP. However, if the RDP is moved away to add defocusing to sampled rays, it will be problematic whether the rule is still established. Hence, we again adopt the CDPs of 4 and 2 D with the lenslet pitch p of 0.5 mm (i.e., 45 sampled rays) but move the RDP away.

First, under the CDP of 4 D, RDPs of 2 and 0 D are adopted by considering 3D images are usually reconstructed farther than the CDP as the DoF in the background is larger than that in the foreground. As a result, retinal footprints and retinal images produced by different accommodations are shown in Fig. 7(a) and Fig. 7(b). Here, the step for finding the accommodative response is 0.25 D because the monocular accommodative resolution is usually worse than this value [42,43]. In either case, the retinal footprint is still the most converged when the accommodation coincides with the RDP. The retinal footprint becomes more dispersed when the accommodation deviates more from the RDP because the retinal footprint is solely determined by the relationship between the RDP and the accommodation. However, the accommodative response with the largest relative Strehl ratio shifts from the presumed 2 and 0 D to 2.75 and 1.75 D, respectively. Similarly, under the CDP of 2 D and the RDP of 0 D, the accommodative response drifts from 0 to 0.5 D.

The above accommodative shift comes from that, although sampled rays (i.e., the retinal footprint) are the most converged at the accommodation equal to the RDP, each emanating pixel’s retinal image is the most focused at the accommodation equal to the CDP [35]. As a result, the combined effect from the defocusing and the retinal footprint brings about an accommodative response falling between the RDP and the CDP. By adopting the typical cases (CDP = 4 D; RDP = 4, 2, and 0 D), the influence of the defocusing under a fixed sampled ray number is plotted as a simple curve, as Fig. 8 shows.

According to Fig. 8 showing the system with a CDP of 4 D, the maximum possible depth range of the InIm-LFD is reduced to 4 to 1.75 D due to the defocusing-induced accommodative shift. Note that, in addition to the accommodative shift, the defocusing is well known to reduce the resolution of images not on the CDP and thus bring about a limited DoF [28]. Therefore, the defocusing limits the DoF in two different manners. The actual DoF is the smaller one the two manners determine. The DoF will be further discussed in Sec. 5.1.

#### 3.2 Interaction between defocusing and sampled ray number

In the above study, the accommodative response was regarded as a function of RDP position (i.e., defocusing level). Previous studies [28,29,34] revealed insufficient sampled rays would also cause an accommodative shift. Nevertheless, the interactive relationship between RDP position and ray number has been little discussed. Therefore, in this Section, the accommodative response under different combinations of RDP position and ray number will be derived, and the interaction between the two factors will be quantitatively discussed.

First, a defocusing-free case, i.e., CDP = RDP = 4 D, is considered. The lens pitch is set to 0.5, 0.6, 0.65, 0.75, 0.9, 1.5, and 2 mm to get near-uniformly distributed ray numbers of 45, 37, 29, 21, 13, 5, and 1, respectively. By obtaining retinal images produced by different accommodations, as the method used above, the accommodative response is always at the CDP of 4 D regardless of the ray number. In particular, when there is only one sampled ray, the system becomes a simple eyepiece where the RDP is absent. Hence, the accommodative response is directly at the microlens array’s native image plane, i.e., the CDP of 4 D.

Next, the RDP is moved to 2 D to introduce defocusing. In the above Section, 45 sampled rays correspond to a shifted accommodative response of 2.75 D. By reducing the ray number, a significant phenomenon is observed that the accommodative response is approaching the CDP with the decrease of ray number. For example, Fig. 9 shows detailed analyzes for the ray numbers of 29 and 13, where the accommodative response shifts to 3 and 3.5 D, respectively. In Fig. 9, although the accommodative response is successfully determined through the maximum relative Strehl ratio, the difference across accommodations appears lighter than that in Fig. 7 with more rays. The difference comes from that when the ray number goes smaller (i.e., larger lenslet aperture), the aberration that is nearly constant with the accommodation grows and contributes more to the reconstrued light field image. Consequently, the overall difference across accommodations appears lighter.

Figure 10(a) comprehensively shows the accommodative response as a function of the ray number. As seen, fewer rays make the accommodative response deviate from the RDP and approach the CDP more, namely a stronger accommodative shift. Similarly, two other cases, i.e., (i) CDP = 4 D and RDP = 0 D, and (ii) CDP = 2 D and RDP = 0 D, are revealed in Fig. 10(b) and Fig. 10(c). Similar variations are observed.

Next, the accommodative responses as a function of RDP position corresponding to different ray numbers are plotted in Fig. 11(a). Here, more data points are added through similar simulations to obtain an RDP step of one diopter. Also, the accommodative response is drawn versus RDP position and ray number in a 3D plot, as Fig. 11(b) shows.

The 3D plot can help observe the variation of accommodative response conveniently. Still, a numerical equation of accommodative response is needed to demonstrate the interaction between RDP position and ray number quantitatively. To this end, an x-y polynomial model is adopted (x: RDP position; y: ray number); however, its order is undetermined because the degrees of influence of RDP position and ray number are not given. To determine the order, Table 2 shows R-squared values corresponding to different orders for the polynomial fitting. As a result, the optimal order of x (RDP position) is two, and that of y (ray number) is three by finding the highest R-squared. Higher orders are not adopted because higher orders cause overfitting. In this manner, the accommodative response is explicitly given in Eq. (2) and plotted as a smooth surface in Fig. 11(c).

_{00}to p

_{03}are 4.26 m

^{-1}, 4.86×10

^{−2}, -1.45×10

^{−1}m

^{-1}, -2.99×10

^{−2}m, 3.10×10

^{−2}, 2.56×10

^{−3}m

^{-1}, 9.11×10

^{−4}m, -4.58×10

^{−4}, -1.37×10

^{−5}m

^{-1}, respectively. We set the dimensions for the coefficients to make the dimensions add up, while the equation is mainly to demonstrate the degrees of influence of the two quantities as well as their interaction.

In Eq. (2), by setting y as a constant, the accommodative response as a function of x (RDP position) is a quadric curve, matching our previous study [35], where the accommodative response as a function of RDP position appeared to be a quadric equation. By setting x as a constant, the accommodative response as a function of y (ray number) is a cubic curve, also roughly matching the finding in [28] and [29], where the ray number’s influence was investigated. Thus, Eq. (2) generally unifies the previous studies [28,29,35].

More importantly, the ray number’s influence significantly depends on the RDP position. For example, in Fig. 10(a) and Fig. 10(b) with RDPs of 2 and 0 D, the accommodative responses as a function of ray number appear to be two different curves but not a curve simply translated from the other one, as depicted in Fig. 12. Correspondingly, Eq. (2) must have interactive terms (*xy*, *x ^{2}y*, and

*xy*) for

^{2}*y*, and their coefficients are numerically comparable with independent terms with the same orders (

*x*,

^{2}*y*, and

^{2}*y*). By doing so, the physical fact that an RDP farther from the CDP causes a more severe accommodative shift is effectively reflected.

^{3}Similarly, as the curves in Fig. 11(a) show, the RDP position’s influence also significantly depends on the ray number. Accordingly, the interactive terms (*xy*, *x ^{2}y*, and

*xy*) for

^{2}*x*quantitively reflect the physical fact that fewer rays cause a stronger accommodative shift towards the CDP through different quadratic curves.

In essence, the capacity of a microlens-based near-eye LFD in rendering a 3D image comes from the dense ray sampling of a real 3D object. In this manner, less sampled rays mean a more significant difference from the real 3D object. Also, the defocusing, which makes an ideal sample ray be a defocused light bundle, further intensifies the difference. In the presence of defocusing, when the ray number is no greater than five, the near-eye LFD completely loses the 3D display capacity but always induces an accommodative response right at the CDP. That is, with fewer rays, it is more likely to accommodate to the CDP.

#### 3.3 Incorporation of sampling error

The microdisplay’s pixel positions are discrete in a practical system, so sampled rays cannot be generated from any microdisplay position but only from the pixel centers. The error between ideal sampled rays and those from the pixel centers is called the sampling error, as illustrated in Fig. 5(a). The sampling error changes ideal sampled rays’ directions and consequently changes the retinal image. Thus, a practical system’s accommodative response may be different from a system using ideal rays. To demonstrate this, the three systems previously investigated in Fig. 10, i.e., (i) CDP = 4 D and RDP = 2 D, (ii) CDP = 4 D and RDP = 0 D, and (iii) CDP = 2 D and RDP = 0 D, are re-analyzed. All specifications, including the 7.8-µm microdisplay pixel, the microlens array, the eye model, and their placements, are unchanged except that ideal sampled rays are replaced by rays originating from actual pixel centers while performing the raytracing and the retinal PSF calculation.

As a result, the yellow curves in Fig. 13 indicate that the accommodative response roughly matches the result based on ideal rays, while irregular differences around 0.25 D can be found. Because the accommodative resolution is usually worse than 0.25 D [42,43], the sampling error does not meaningfully affect the accommodative response. However, if the pixel size is doubled (i.e., halved panel resolution) to introduce larger sampling errors, the cyan curves in Fig. 13 demonstrate stronger deviations from ideal rays. Some of the deviations are larger than 0.5 D; in addition, as Fig. 13(b) shows, the accommodative shift becomes slighter when the ray number is reduced from 45 to 21, even changing the way the defocusing and the ray number interacts under certain circumstances. Therefore, a practical system should be specifically analyzed regarding the sampling error, especially when the panel resolution is not very high. Besides, we discussed how to suppress the sampling error by recombining subpixels from different elemental images in our previous work [22].

## 4. Accommodative response analysis: pinhole-based system

A pinhole array can also implement an InIm-LFD by replacing the microlens array [38]. In contrast, the pinhole’s nature eliminates the influence of defocusing; in other words, the light bundles corresponding to sampled rays are always defocused regardless of the RDP position.

The accommodative response is analyzed by setting the RDP at 4 to 0 D with a series of sampled ray numbers. For example, Fig. 14 shows retinal images produced by different accommodations under the RDP of 4 D and ray numbers of 45, 21, and 5. And Fig. 15 comprehensively shows the accommodative response as a function of ray number. As seen, when the ray numbers are beyond approximately five, the accommodative response is always at the RDP. Note no analysis is performed for the ray number of one because it makes little sense in a pinhole-based system. In contrast, one ray was considered in the above analysis for microlens-based systems by regarding the one-ray case as a simple eyepiece.

When the ray number is five, the accommodative response slightly shifts under some of the RDPs. For example, in Fig. 14(c) with the RDP of 4 D, a subtle difference between the accommodations of 4 and 3.75 D drifts the accommodative response to 3.75 D. The slight accommodative shift under the low ray number agrees with previous studies [27,28,34,36]. These studies reported when the ray number was below approximately 2 by 2, a few tenths of accommodative shift may come out. It should be noted that the geometric layout of sampled rays may also affect the accommodative response under a low ray number, as discussed in [36] with vertically and horizontally distributed rays analyzed. Thus, it is worth another study to carefully consider the accommodative response under very few sampled rays, which are usual in real-image InIm-LFDs. In contrast, this study merely provides a preliminary result with a specific cruciform ray layout (see Fig. 14(c)).

## 5. Discussions

#### 5.1 Accommodative shift vs. DoF

As discussed before, the accommodative shift may reduce the depth range a near-eye LFD can render. For example, while using a CDP of 4 D and 45 sampled rays (see Fig. 8), the maximum possible depth range is 4 to 1.75 D while the original range is 4 to 0 D. Nevertheless, the actual DoF should be further determined by considering a specific criterion of resolution, which is strongly associated with the resolution decay also caused by sampled rays’ defocusing. Thus, this study demonstrates that the DoF of such an LFD may be limited by both the resolution decay [44] and the accommodative shift, particularly when the sampled ray number is low. The limitation of DoF also supports the value of multi-CDP microlens-based LFDs [45–47] in not only maintaining the resolution around the multiple CDPs but also suppressing the defocusing-induced accommodative shift.

In a pinhole-based near-eye LFD, the accommodative shift does not occur when the sampled rays are more than approximately five, suggesting the accommodative shift problem can be ignored for most pinhole-based near-eye systems, although the pinhole-based system has other disadvantages such as low efficiency and low resolution.

#### 5.2 Considerations in practical use

The analysis in this study is for microlens-based and pinhole-based near-eye LFDs with a few typical CDP-RDP combinations. In practice, if other CDP-RDP combinations are adopted, we suggest similarly analyzing the accommodative response using the method in this study. In some practical designs, an aspheric or curved microlens array is used instead of an ordinary microlens array to reduce geometric aberration [48,49]; nevertheless, this cannot help eliminate the accommodative shift because the nature of CDP and RDP remains. The modified system needs to be precisely modeled and analyzed based on retinal images.

Another thing in practical use is chromatic imaging, while this study adopts a single wavelength of 550 nm to exclude the influence of polychromatic vision. In fact, the imaging model can be extended for chromatic LFDs. First, a pixel in the model can be constructed as a triad pixel, and wavelengths of interest can be separately calculated and integrated to acquire a chromatic retinal image. For example, by re-adopting the case in Fig. 5 (p = 0.5 mm and CDP = RDP = 4 D) but splitting a pixel into three stripe-arranged subpixels, Fig. 16(a) shows the chromatic retinal image (see [22]). Next, the chromatic signal is converted to a luminance signal with the color matching function to analyze the accommodative response, as Fig. 16(b) shows. However, while considering the subpixels, how the potential color separation caused by the low resolution [50] affects the accommodative response is unclear. The accommodative response analysis for chromatic LFDs needs more work.

#### 5.3 Comparison with real-image InIm-LFDs

This study is aimed at near-eye LFDs generating virtual images, whereas LFDs generating real images are also widely used in scenarios such as table-top 3D displays [51]. Nevertheless, the DoF in real-image systems is much shorter (usually a few tenths of diopters) due to rapid resolution decay with respect to the CDP, so the accommodative shift issue is much lighter and paid less attention in this study. To simply demonstrate the DoFs of virtual-image and real-image systems, as Fig. 17(a) and Fig. 17(b) show, by assuming the distances of the CDP to the microlens array and the RDP to be L and ΔL, the visual angle subtending a reconstrued voxel on the RDP is ideally given by Eqs. (3) and (4) for the virtual-image and real-image systems, respectively. Because ΔL contributes positively both in the denominator and the numerator in Eq. (3) and contributes reversely in the denominator and the numerator in Eq. (4), the real-image system’s resolution (i.e., roughly the reciprocal of the visual angle) drops much more quickly with increasing ΔL than does the virtual-image system.

where θ_{virtual}and θ

_{real}are the visual angles subtending a reconstrued voxel on the RDP in the virtual-image and real-image systems, respectively, and p is the lenslet pitch.

## 6. Conclusions

The InIm-LFD enables true-3D display by sampling a real 3D object’s continuous angular information with a few light bundles. The discreteness and defocusing of the light bundles may cause the accommodative response to deviate from the presumed RDP. By adopting a typical microlens-based near-eye LFD with a proven imaging model, this study analyzed the accommodative response jointly influenced by RDP position (i.e., rays’ defocusing) and ray number, in particular, their interaction. As a result, the accommodative response shifted from the RDP towards the CDP to a certain extent and shifted more to the CDP with fewer rays. For example, when the CDP was at 4 D with 45 rays, under RDPs of 4, 3, 2, 1, and 0 D, an eye accommodated to 4, 3.25, 2.75, 2, and 1.75 D, respectively. When the ray number decreased to 13, the accommodative response further shifted to 4, 3.75, 3.5, 3, and 2.5, respectively. When there were fewer than five rays, all accommodative responses were at the CDP. A polynomial model containing three interactive terms was finally provided. The influence of the accommodative shift on DoF reduction was also discussed. In comparison, in a pinhole-based system without the CDP, the accommodative response always matched the RDP when there were more than five rays. When the ray number was five or lower, very slight irregular accommodative shifts were found.

In light field displays not based on integral imaging, e.g., the Tensor Display using stacked liquid crystal displays [52], the accommodative response may also shift from the RDP because the angular information is also sampled by discrete light bundles. Thus, the accommodative response also needs to be quantitively analyzed for those light field displays.

## Funding

Special Project for Research and Development in Key areas of Guangdong Province (2019B010934001); Fundamental Research Funds for the Central Universities (19lgzd12); Science and Technology Planning Project of Guangdong Province (2020B1212060030).

## Acknowledgement

We thank Dr. Jui-Yi Wu, Dr. Ping-Yen Chou, and Mr. Cheng-Ting Huang for their fruitful discussion and considerable contribution to pilot studies.

## Disclosures

The authors declare that there are no conflicts of interest related to this article.

## References

**1. **G. Tan, Y.-H. Lee, T. Zhan, J. Yang, S. Liu, D. Zhao, and S.-T. Wu, “Foveated imaging for near-eye displays,” Opt. Express **26**(19), 25076–25085 (2018). [CrossRef]

**2. **T. Zhan, J. Xiong, G. Tan, Y.-H. Lee, J. Yang, S. Liu, and S.-T. Wu, “Improving near-eye display resolution by polarization multiplexing,” Opt. Express **27**(11), 15327–15334 (2019). [CrossRef]

**3. **T. Zhan, Y.-H. Lee, and S.-T. Wu, “High-resolution additive light field near-eye display by switchable Pancharatnam–Berry phase lenses,” Opt. Express **26**(4), 4863–4872 (2018). [CrossRef]

**4. **H. Hua and B. Javidi, “A 3D integral imaging optical see-through head-mounted display,” Opt. Express **22**(11), 13484–13491 (2014). [CrossRef]

**5. **H. Huang and H. Hua, “High-performance integral-imaging-based light field augmented reality display using freeform optics,” Opt. Express **26**(13), 17578–17590 (2018). [CrossRef]

**6. **D. Cheng, Y. Wang, C. Xu, W. Song, and G. Jin, “Design of an ultra-thin near-eye display with geometrical waveguide and freeform optics,” Opt. Express **22**(17), 20705–20719 (2014). [CrossRef]

**7. **Y. Weng, Y. Zhang, J. Cui, A. Liu, Z. Shen, X. Li, and B. Wang, “Liquid-crystal-based polarization volume grating applied for full-color waveguide displays,” Opt. Lett. **43**(23), 5773–5776 (2018). [CrossRef]

**8. **C. Yoo, K. Bang, C. Jang, D. Kim, C.-K. Lee, G. Sung, H.-S. Lee, and B. Lee, “Dual-focal waveguide see-through near-eye display with polarization-dependent lenses,” Opt. Lett. **44**(8), 1920–1923 (2019). [CrossRef]

**9. **C. Jang, K. Bang, S. Moon, J. Kim, S. Lee, and B. Lee, “Retinal 3D: augmented reality near-eye display via pupil-tracked light field projection on retina,” ACM Trans. Graph. **36**(6), 1–13 (2017). [CrossRef]

**10. **J. S. Lee, Y. K. Kim, M. Y. Lee, and Y. H. Won, “Enhanced see-through near-eye display using time-division multiplexing of a Maxwellian-view and holographic display,” Opt. Express **27**(2), 689–701 (2019). [CrossRef]

**11. **C. Jang, K. Bang, G. Li, and B. Lee, “Holographic near-eye display with expanded eye-box,” ACM Trans. Graph. **37**(6), 1–14 (2019). [CrossRef]

**12. **L. Shi, F.-C. Huang, W. Lopes, W. Matusik, and D. Luebke, “Near-eye light field holographic rendering with spherical waves for wide field of view interactive 3D computer graphics,” ACM Trans. Graph. **36**(6), 1–17 (2017). [CrossRef]

**13. **M. Martínez-Corral and B. Javidi, “Fundamentals of 3D imaging and displays: A tutorial on integral imaging, light-field, and plenoptic systems,” Adv. Opt. Photonics **10**(3), 512–566 (2018). [CrossRef]

**14. **Z.-B. Fan, H.-Y. Qiu, H.-L. Zhang, X.-N. Pang, L.-D. Zhou, L. Liu, H. Ren, Q.-H. Wang, and J.-W. Dong, “A broadband achromatic metalens array for integral imaging in the visible,” Light: Sci. Appl. **8**(1), 67 (2019). [CrossRef]

**15. **G.-Y. Lee, J.-Y. Hong, S. Hwang, S. Moon, H. Kang, S. Jeon, H. Kim, J.-H. Jeong, and B. Lee, “Metasurface eyepiece for augmented reality,” Nat. Commun. **9**(1), 4562 (2018). [CrossRef]

**16. **D. Dunn, C. Tippets, K. Torell, P. Kellnhofer, K. Akşit, P. Didyk, K. Myszkowski, D. Luebke, and H. Fuchs, “Wide field of view varifocal near-eye display using see-through deformable membrane mirrors,” IEEE Trans. Visual. Comput. Graphics **23**(4), 1322–1331 (2017). [CrossRef]

**17. **D. Lanman and D. Luebke, “Near-eye light field displays,” ACM Trans. Graph. **32**(6), 1–10 (2013). [CrossRef]

**18. **F.-C. Huang, K. Chen, and G. Wetzstein, “The light field stereoscope: immersive computer graphics via factored near-eye light field displays with focus cues,” ACM Trans. Graph. **34**(4), 1–12 (2015). [CrossRef]

**19. **C. Yao, D. Cheng, T. Yang, and Y. Wang, “Design of an optical see-through light-field near-eye display using a discrete lenslet array,” Opt. Express **26**(14), 18292–18301 (2018). [CrossRef]

**20. **P.-Y. Chou, J.-Y. Wu, S.-H. Huang, C.-P. Wang, Z. Qin, C.-T. Huang, P.-Y. Hsieh, H.-H. Lee, T.-H. Lin, and Y.-P. Huang, “Hybrid light field head-mounted display using time-multiplexed liquid crystal lens array for resolution enhancement,” Opt. Express **27**(2), 1164–1177 (2019). [CrossRef]

**21. **Z. Qin, P.-Y. Chou, J.-Y. Wu, Y.-T. Chen, C.-T. Huang, N. Balram, and Y.-P. Huang, “Image formation modeling and analysis of near-eye light field displays,” J. Soc. Inf. Disp. **27**(4), 238–250 (2019). [CrossRef]

**22. **Z. Qin, P.-Y. Chou, J.-Y. Wu, C.-T. Huang, and Y.-P. Huang, “Resolution-enhanced light field displays by recombining subpixels across elemental images,” Opt. Lett. **44**(10), 2438–2441 (2019). [CrossRef]

**23. **Y. Xing, Q.-H. Wang, L. Luo, H. Ren, and H. Deng, “High-performance dual-view 3-D display system based on integral imaging,” IEEE Photonics J. **11**(1), 1–12 (2019). [CrossRef]

**24. **W. Song, Q. Cheng, P. Surman, Y. Liu, Y. Zheng, Z. Lin, and Y. Wang, “Design of a light-field near-eye display using random pinholes,” Opt. Express **27**(17), 23763–23774 (2019). [CrossRef]

**25. **W. Zhang, X. Sang, X. Gao, X. Yu, B. Yan, and C. Yu, “Wavefront aberration correction for integral imaging with the pre-filtering function array,” Opt. Express **26**(21), 27064–27075 (2018). [CrossRef]

**26. **M. Liu, C. Lu, H. Li, and X. Liu, “Bifocal computational near eye light field displays and structure parameters determination scheme for bifocal computational display,” Opt. Express **26**(4), 4060–4074 (2018). [CrossRef]

**27. **H. Hua, “Enabling focus cues in head-mounted displays,” Proc. IEEE **105**(5), 805–824 (2017). [CrossRef]

**28. **H. Huang and H. Hua, “Systematic characterization and optimization of 3D light field displays,” Opt. Express **25**(16), 18508–18525 (2017). [CrossRef]

**29. **H. Huang and H. Hua, “Effects of ray position sampling on the visual responses of 3D light field displays,” Opt. Express **27**(7), 9343–9360 (2019). [CrossRef]

**30. **H. Nagatani and Y. Hirayama, “Evaluation of the influence on the human body of the autostereoscopic display based on the integral imaging method,” Proc. SPIE **6803**, 68030E (2008). [CrossRef]

**31. **Y. Kim, J. Kim, K. Hong, H. K. Yang, J.-H. Jung, H. Choi, S.-W. Min, J.-M. Seo, J.-M. Hwang, and B. Lee, “Accommodative response of integral imaging in near distance,” J. Disp. Technol. **8**(2), 70–78 (2012). [CrossRef]

**32. **H. Hiura, K. Komine, J. Arai, and T. Mishina, “Measurement of static convergence and accommodative responses to images of integral photography and binocular stereoscopy,” Opt. Express **25**(4), 3454–3468 (2017). [CrossRef]

**33. **C. Chen, H. Deng, Q.-H. Wang, and Y.-T. Song, “Measurement and analysis on the accommodative responses to real-mode, virtual-mode, and focused-mode integral imaging display,” J. Soc. Inf. Disp. **27**(7), 427–433 (2019). [CrossRef]

**34. **C. Chen, H. Deng, F.-Y. Zhong, Q.-L. Ji, and L. Qiang, “Effect of viewpoints on the accommodative response in integral imaging 3D display,” IEEE Photonics J. **12**(3), 1–14 (2020). [CrossRef]

**35. **Z. Qin, J.-Y. Wu, P.-Y. Chou, Y.-T. Chen, C.-T. Huang, N. Balram, and Y.-P. Huang, “Revelation and addressing of accommodative shifts in microlens array-based 3D near-eye light field displays,” Opt. Lett. **45**(1), 228–231 (2020). [CrossRef]

**36. **J. Zhao, J. Xia, Q. Ma, and J. Wu, “Spatial loss factor for the analysis of accommodation depth cue on near-eye light field displays,” Opt. Express **27**(24), 34582–34592 (2019). [CrossRef]

**37. **H.-L. Zhang, H. Deng, H. Ren, X. Yang, Y. Xing, D.-H. Li, and Q.-H. Wang, “Method to eliminate pseudoscopic issue in an integral imaging 3D display by using a transmissive mirror device and light filter,” Opt. Lett. **45**(2), 351–354 (2020). [CrossRef]

**38. **K. Akşit, J. Kautz, and D. Luebke, “Slim near-eye display using pinhole aperture arrays,” Appl. Opt. **54**(11), 3422–3427 (2015). [CrossRef]

**39. **J. Schwiegerling, * Field guide to visual and ophthalmic optics* (SPIE, 2004).

**40. **G. Westheimer, “Directional sensitivity of the retina: 75 years of Stiles-Crawford effect,” Proc. R. Soc. London, Ser. B **275**(1653), 2777–2786 (2008). [CrossRef]

**41. **S. Ravikumar, K. Akeley, and M. S. Banks, “Creating effective focus cues in multi-plane 3D displays,” Opt. Express **19**(21), 20940–20952 (2011). [CrossRef]

**42. **W. N. Charman, “Visual Optics,” in * Contact Lens Practice (Third Edition)*, N. Efron, ed. (Elsevier, 2018).

**43. **A. Aghasi, B. Heshmat, L. Wei, M. Tian, and S. A. Cholewiak, “Optimal allocation of quantized human eye depth perception for light field display design,” arXiv preprint arXiv:2010.06382 (2020).

**44. **C.-G. Luo, X. Xiao, M. Martínez-Corral, C.-W. Chen, B. Javidi, and Q.-H. Wang, “Analysis of the depth of field of integral imaging displays based on wave optics,” Opt. Express **21**(25), 31263–31273 (2013). [CrossRef]

**45. **X. Shen, Y.-J. Wang, H.-S. Chen, X. Xiao, Y.-H. Lin, and B. Javidi, “Extended depth-of-focus 3D micro integral imaging display using a bifocal liquid crystal lens,” Opt. Lett. **40**(4), 538–541 (2015). [CrossRef]

**46. **J. Zhao, J. Xia, Q. Ma, J. Wu, B. Du, and H. Zhang, “Hybrid computational near-eye light field display,” IEEE Photonics J. **11**(1), 1–10 (2019). [CrossRef]

**47. **X. Shen and B. Javidi, “Large depth of focus dynamic micro integral imaging for optical see-through augmented reality display using a focus-tunable lens,” Appl. Opt. **57**(7), B184–B189 (2018). [CrossRef]

**48. **J.-Y. Wu, Y.-C. Cheng, Y.-P. Huang, H.-H. Lo, C.-C. Chang, and F.-M. Chuang, “P-94: Free-form micro-optical design for enhancing image quality (MTF) at large FOV in light field near eye display,” SID Symp. Dig. Tech. Papers **49**(1), 1534–1537 (2018). [CrossRef]

**49. **Z. Zhao, J. Liu, Z. Zhang, and L. Xu, “Bionic-compound-eye structure for realizing a compact integral imaging 3D display in a cell phone with enhanced performance,” Opt. Lett. **45**(6), 1491–1494 (2020). [CrossRef]

**50. **C. Jang, J. Kim, J. Yeom, and B. Lee, “Analysis of color separation reduction through the gap control method in integral imaging,” J. Inform. Display **15**(2), 81–89 (2014). [CrossRef]

**51. **J.-H. Park, S.-W. Min, S. Jung, and B. Lee, “Analysis of viewing parameters for two display methods based on integral photography,” Appl. Opt. **40**(29), 5217–5232 (2001). [CrossRef]

**52. **G. Wetzstein, D. Lanman, M. Hirsch, and R. Raskar, “Tensor Displays: Compressive light field synthesis using multilayer displays with directional backlighting,” ACM Trans. Graph. **31**(4), 1–11 (2012). [CrossRef]