In this paper, we propose a novel method to construct an optical see-through light-field near-eye display (OST LF-NED) by using a discrete lenslet array (DLA). The DLA is used as a spatial light modulator (SLM) to generate dense light field of three-dimensional (3-D) scenes inside the user’s eyebox of the system and provide correct focus cues to the user. A corresponding light-field image rendering method is also proposed and demonstrated. The light emitted from the real objects passes through the transparent region of the display panel and the planar area of the DLA successively without redirection, so the user can have a clear view of the real scene as well as the virtual information. The stray light that may degrade the image quality has been analyzed in detail. The experimental result shows that the proposed method is capable of obtaining a corrected depth perception of the virtual information in augmented reality (AR) applications.
© 2018 Optical Society of America under the terms of the OSA Open Access Publishing Agreement
18 July 2018: A typographical correction was made to the caption of Fig. 5.
With the rapid development of virtual reality (VR) and augmented reality (AR) technology and the booming of many commercial products in recent years, near-eye display (NED) technology has been widely studied. The most essential advantage of NED is its capacity of stereoscopic display. However, most of these technologies are only based on binocular parallax while the consistency of vergence and accommodation is often unsolved. The discrepancy of vergence and accommodation of the eyes, known as the vergence-accommodation conflict (VAC) in NED, would lead to incorrect focus cues. Generally speaking, incorrect focus cues may lead to two commonly recognized issues: distorted depth perception and visual discomfort. Examples of discomfort include diplopic vision, visual fatigue, and degradation in oculomotor responses, especially after viewing such display for an extended period of time . In particular, when it comes to optical see-through (OST) AR displays, the VAC problem would degrade the effectiveness of the combination of the virtual object and the real-world scene, resulting in visual confusion.
Several methods have been proposed to solve or to relieve the VAC problem. According to the classification method by Dr. Hong Hua , these techniques can be categorized into five general types: Maxwellian view displays [2, 3], vari-focal plane displays [4–6], multifocal plane (MFP) displays [7–9], computational multilayer displays [10–13], and integral-imaging (InI)-based displays [14–19]. Among these state-of-the-art techniques, the Maxwellian view display and the vari-focal plane display have simple optical structures but are not able to produce natural retinal blur cues. The MFP, integral imaging, and multilayer approaches are commonly referred to be light-field displays, which render a true 3-D scene by sampling either the projections of the scene at different depths or the directions of the light rays apparently emitted by the scene and viewed from different eye positions .
Among the light field methods, InI-based approach has the simplest structure, and has been regarded as an effective way to reconstruct the light field. It is also widely studied in the fields of light-field photography [20,21] and eyewear-free autostereoscopic displays . It typically consists of a screen and a 2-D array as the SLM, which could be a lenslet array [14, 15] or a pinhole array [16–19]. The rendered image is displayed on the screen, and the rays emitted from each pixel intersect with the SLM. The SLM angularly samples the directions of these rays, so that integrally creates the perception of a 3-D scene.
To utilize the light-field technology in AR, the OST capacity of the InI-based displays has also been studied in recent years. A most common approach is to combine the LF display with OST optics [18, 23]. A lenslet array or a pinhole array is commonly placed at the image plane of the OST element to reproduce the light field. The problem of these approaches is that as the OST element could be regarded as a magnifying lens, the non-linear relationship between the object distance and the image distance causes lateral and axial distortion in the image space. It could be resolved by the pitch scaling and depth scaling processes . However, the computational burden of the display engine also increases. Some other approaches were proposed to realize OST, such as pinlight display , three-layered lenslet array approach , and polymer-stabilized liquid crystal shutters . However, these methods either suffer the problem of low image toning or have a limited field of view (FOV).
In this paper, we develop an OST LF-NED system that has the capacity of generating the light field of the virtual scene by using a DLA and a transparent microdisplay. The light from the real world passes through the transparent areas on the microdisplay panel and the gaps on the DLA panel while the light from the screen is angularly selected by the lenslets to form the light field. The stray light is mainly caused by the screen light that leaks into the gaps between the lenslets. In order to improve the viewing effect, several steps are implemented to analyze and minimize the stray light. An OLED-based prototype and a film-based prototype are developed to demonstrate the correct focus cues and the OST capacity.
2. Principle of the novel OST LF-NED system
In this proposal, a DLA is placed at a certain distance to the eye position and a microdisplay panel is placed at the focal plane of the lenslets, as shown in Fig. 1. The image displayed on the microdisplay panel is rendered as a light-field image, which is segmented into elemental images. Each elemental image and corresponding lenslet acts as an independent magnifier, synthesizing an off-axis perspective projection of the virtual image plane at a relatively far distance. Each perspective spans the eyebox and has a center projection coincident with the lens center. Thus, the magnifier array forms the light field in the eyebox, shown as Fig. 1(a). Since the lenslets are discretely arranged, the light emitted from the real world passes directly through the transparent areas on the microdisplay panel and the gaps among the lenslets. This real-world light also spans an eyebox, which is typically smaller than the former one in practical conditions, shown as Fig. 1(b). In the straylight-free conditions, these two eyeboxes coincide exactly, thus there is no space between these two eyeboxes for stray light to exist. Take the requirements of eye relief and compactness into consideration, the rear surface of the DLA is placed at 20-40 mm distance from the eye and the diameter of the eyebox is set larger than 6mm. The design process is based on these settings.
There are several variables in the proposed system, which are the diameter of each lenslet DL, the eye relief LE, the gap between two lenslets t, and the focal length of each lenslet f. The optical parameters of the system, including the size of the eyebox and the display effect, are direct results of the change in these variables. The following sections illustrate the calculating process of a straylight-free system and a system with stray light.
2.1 Design of a straylight-free OST LF-NED system
In a straylight-free system, the light from the screen should intersect exactly on the regions of the lenses on the DLA panel, and the real-world light passes exactly through the gaps among the lenslets. The two types of light transmit in two separated paths so that no stray light would exist. In this case, the geometric relationship of these parameters is illustrated as Fig. 2(a). The dash lines depict the boundaries for the screen light. These boundaries are defined by connecting the edges of the elemental images and those of the corresponding lenslets, which ensures that no screen light would leak into the gaps among the lenslets within the region of eyebox. In practical situations, a small amount of screen light may leak into the gaps inevitably, as shown in Fig. 2(b), which will be discussed in subsection 2.2.
To make sure that all the pixels on the screens are visible through the corresponding lenslets, the marginal ray of the marginal field should cross exactly the edge of the eyebox. For example, the ray emitting from the lower edge of an elemental image and passing through the lower edge of the corresponding lenslet should be refracted and then pass through the upper edge of the eyebox. What is different from a system using a pinhole array is that since the size of a lenslet is not neglectable, the eyebox of the screen light is no longer defined by the chief ray of the marginal field, but the marginal ray of the marginal field. As shown in Fig. 2(a), the thick red line depicts the lower edge of a chief-ray-defined eyebox. In this case, the light from out of the screen, shown as thin red lines, would pass through the edge of the lenslet and enter the eyebox, to form stray light. Thus, compared to a pinhole-based approach, the eyebox in our lenslet-based approach would be narrower by DL. The main feature of the straylight-free system is that the diameter of the eyebox DE depends on the diameter DL and the focal length f of each lenslet. The relation between them is expressed as:Eq. (1) into consideration, then Eq. (3) could be further expressed as:Eq. (3), the amount of the effective pixels in each elemental image in the straylight-free system is always positive. The duty cycle of the effective pixels on the 1-D direction is calculated as:
Above are the geometric relations of the variables in a straylight-free system. As it can be seen from these equations, there could be more than one type of pixel arrangement on a 2-D panel, such as orthogonal array, hexagonal array, and diagonal array. It can be concluded from the equations that the transmittance of the real-world light is affected by the duty cycle of the gaps of the DLA, and the ratio between the eye relief and the diameter of the eyebox is twice the F-number of the lenslets. For example, if the F-number of each lenslet is set 3 in general, and the diameter of the eyebox 10mm, then the eye relief should be 60mm, which is relatively too large for a NED. However, the F-number of an ordinary lenslet is often larger than what is desired in this proposal. Reducing the F-number would increase the complexity in designing and manufacturing and decrease the imaging quality. A catadioptric system may solve this problem, but it still requires a relatively large display panel to obtain a large FOV. Here we make a tradeoff between the eyebox and the viewing effect by manually enlarging the eyebox while allowing a small amount of stray light to exist.
2.2. Practical design of an OST LF-NED system with stray light
In a straylight-free system, the size of eyebox totally depends on the F-number of each lenslet. However, because of the self-adjusting ability of human eyes, a small amount of stray light at the edge of the eyebox would not have a serious impact on viewing effect. Moreover, stray light can never be totally eliminated, but it can often be reduced to a level at which it is tolerable. Thus, the eyebox could be manually enlarged, which means the restriction in Eq. (1) could be broken. In this condition, the eyebox is expressed as two forms: the largest eyebox DE expanded by each view of the elemental image, which contains the complete light field, and DE', defined by the edges of each elemental image and the corresponding lenslet, named clear area, where no screen light would leak into the gaps among the lenslets, as shown in Fig. 2(b). In the region between DE' and DE, marked with red shadow, some portion of the screen light could enter the eyebox without being refracted by the lenslets, to form the stray light, which is the only type of stray light in this proposal.
In such a system with stray light, the region of the effective pixels Ie on the screen is still calculated as Eq. (3), while some of the pixels generate the stray light. The region of these pixels (on one side) is calculated as:
The distribution of the stray light can be calculated through numerical simulation and is illustrated intuitively in Fig. 3. The diameter of each lenslet is set to 0.8mm, the focal length is set to 5mm, and the width of the gaps is set to approximately 0.35mm. The varying colors correspond to the proportion of the intensity of the stray light in the region of the eyebox. As it depicts, stray light exists in all of the listed situations. It increases with the growth of the size of the eyebox and decreases with the growth of the eye relief. This type of stray light emits from the screen close to the eye and will not be collimated by the lenslets, so its main effect is forming bright spots on the retina, to decrease the contrast of the image. According to Eq. (6) to Eq. (9), there is a tradeoff between ergonomic effects (i.e. eyebox and eye relief) and optical effects (i.e. stray light and viewing effect). The viewing effect can be improved by releasing the restriction of eye relief or eyebox. In this situation, a combination of a 6mm eyebox and a 35mm eye relief is regarded to make a tolerable and balanced system, for the stray light mainly distributes at the edge of the eyebox and little of it exists in the central portion of the eyebox. Although with stray light, a relatively clear view of the virtual objects and the real scene can be obtained, which is proved in the experiments. Meanwhile, the size of both eyebox and eye relief are acceptable for wearable devices. These parameters are used in the rest of the paper and the experiments.
2.3. Light field image rendering
Unlike conventional NEDs, the image source in a LF-NED system should be pre-rendered as a light-field image. The rendering process is the inversion of the imaging process of a light-field camera. It could be implemented using a 3D game engine based on OpenGL, such as Unity or Unreal. The DLA could be regarded as a set of cameras with perspective projection. The axes of these projections are defined by connecting the center of the eyebox and the centers of each lenslet, and the center of the projection in each camera is located at the center of the corresponding lenslet, as shown in Fig. 4. Based on the definitions about projection matrix in OpenGL , the view planes, or the near clipping planes, are located at the microdisplay plane, and the far clipping planes are located at infinity. Thus, the projection matrix of each camera is defined as
Based on the projection matrix, the camera array model could be built in Unity, the well-known 3-D game engine, along with two virtual models located at about 0.3m and 5m respectively away from the camera array, as shown in Fig. 5. The ineffective pixels are eliminated during post processing, which means these regions on a valid display are shown as transparent. The background of the virtual scene is set black so that only the virtual projects are visible.
3. Experiment and discussion
Two LF-NEDs were implemented: the first prototype is based on a micro-OLED and the second is a static prototype using a transparent film, as shown in Fig. 6. In the OLED-based prototype, a non-transparent 0.7” Sony micro-OLED screen, with a 1920 × 1080 resolution, was placed in front of the DLA, 20mm × 20mm, made of PMMA. The focal length of each lenslet is 5mm and the diameter is 0.8mm. The horizontal distance between the centers of two lenslets is 1.6mm. Thus, the nearest distance between two lenslet is about 0.35mm. A 15–70fps real-time dynamic image shown in Fig. 5(b) was driven by a 2.6 GHz Intel Core i7 PC with 8 GB of RAM and an NVIDIA GeForce GTX 960M graphics card. The camera was placed in front of the DLA. The entrance pupil of the camera was set within the eyebox of the system. Figure 7 shows the images that are seen through an OLED-based prototype. When the camera focused on the objects at different positions, the out-of-focus objects appeared blurry, which means the system provides correct focus cues.
The film-based prototype is similar to the OLED-based one, except that the display is made of a 14mm × 21mm transparent film. As shown in Fig. 8(a), the effective pixels for virtual objects are within the black circles, which act as the image source for its corresponding lenslet. The light-field images were exposed on the films so that the bright areas are made transparent and the dark areas remain non-transparent. In Fig. 8(b), to illustrate the see-through light path, the virtual scene is not presented by setting all the effective pixels in the circles to black (non-transparent), that is, the film is exposed as white (transparent) background with discrete black circles. The intensity of the real-world light is reduced a little bit and some dark pattern appears because of the uneven duty cycles in different directions. A hexagonally arranged DLA may relieve the dark pattern, but with an increase in the difficulty of manufacture. The transparent film with two virtual objects as shown in Fig. 8(a) was used to illustrate the augmented combination of the see-through light path and the virtual scene light path. As shown in Fig. 8(c) and 8(d), both OST capacity and correct focus cues were achieved. It can be seen that the camera can focus on the nearer object and foreground, located at about 0.3m, or on the farther object and the background, about 5m away from the camera. The image quality of the virtual objects is not as high as that of the OLED-based prototype, which can be explained by two reasons. One reason is the manufacturing error of the film. The other reason is that the light field image is totally lit by the ambient light instead of a special backlight. Thus, it indicates the necessity of a special backlight or a self-illuminant image source to increase the contrast.
From the experiments, it can be concluded that the main advantages of this system are the OST capacity and the correct focus cues. Along with these advantages, some problems exist in the experiments. Apart from the dark pattern, one of the most obvious problems is the resolution loss, a typical shortcoming of light-field displays. The plane resolution of the virtual image is calculated as
Another problem is the imaging performance of the lenslets. Since the lenslets are all the same parameter and planoconvex lenses, the image quality is limited, especially for the marginal field. In future work, the surface of each lenslet corresponding to different fields could be designed separately to improve its image performance; extra apertures could be introduced into the system to further reduce the stray light; and other arrangements of the lenslets on DLA will also be studied and implemented to improve the uniformity of the image.
Moreover, the most critical problem is that a large portion of the screen is always kept transparent. If an ordinary transparent screen is used here, then more than half of the pixels will be never used, which causes huge waste of pixels. Thus, to implement this proposal, a screen with a special pixel arrangement illustrated in section 2.1 is required. Only the effective portions are covered with pixels and made non-transparent, and the rest of the screen is made transparent with no pixels covered, so that the pixels would not be wasted.
We propose an OST LF-NED using a DLA and an ideal transparent microdisplay. The relations between the optical parameters and the structure parameters are derived and a straylight-free model is given. To make it more comfortable for users, the eyebox is enlarged manually, and the structure parameters and the stray light caused by this is calculated. Experiments are implemented on an OLED-based prototype and a film-based prototype, although the brightness and the contrast of the virtual scene viewing through the latter prototype is not as good as the former. The experiments demonstrate the OST capacity and the correct focus cues of the proposal.
National Key Research and Development Program of China (2016YFB1001502); National Natural Science Foundation of China (61727808).
We would like to thank Synopsys for providing the educational license of CODE V. We would also like to acknowledge Mrs. Yang Wang and Mr. Weihong Hou at Beijing NED + AR Display Technology Corporation for fruitful discussions and providing the microdisplay module, and Mr. Jianxun Zhang at Xiangshiboan Technology Corporation for providing support of the fabrication of transparent films.
References and links
1. H. Hua, “Enabling focus cues in head-mounted displays,” Proc. IEEE 105(5), 805–824 (2017). [CrossRef]
3. H. Takahashi and S. Hirooka, “Stereoscopic see-through retinal projection head-mounted display,” Proc. SPIE 6803, 68031N (2008). [CrossRef]
4. S. Shiwa, K. Omura, and F. Kishino, “Proposal for a 3-D display with accommodative compensation: 3DDAC,” J. Soc. Inf. Disp. 4(4), 255–261 (1996). [CrossRef]
6. S. Liu, H. Hua, and D. Cheng, “A novel prototype for an optical see-through head-mounted display with addressable focus cues,” IEEE Trans. Vis. Comput. Graph. 16(3), 381–393 (2010). [CrossRef] [PubMed]
9. B. T. Schowengerdt, M. Murari, and E. J. Seibel, “Volumetric display using scanned fiber array,” SID Symp. Dig. Tech. Pap. 41(1), 653–656 (2010). [CrossRef]
10. H. Gotoda, “A multilayer liquid crystal display for autostereoscopic 3D viewing,” Proc. SPIE 7524 75240P, 1–8 (2010).
11. G. Wetzstein, D. Lanman, W. Heidrich, and R. Raskar, “Layered 3D: tomographic image synthesis for attenuation-based light field and high dynamic range displays,” ACM Trans. Graph. 30(4), 95 (2011). [CrossRef]
12. G. Wetzstein, D. R. Lanman, M. W. Hirsch, and R. Raskar, “Tensor displays: compressive light field synthesis using multilayer displays with directional backlighting,” ACM Trans. Graph. 31(4), 1–11 (2012). [CrossRef]
13. A. Maimone and H. Fuchs, “Computational augmented reality eyeglasses,” in proceedings of the 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR) pp. 29–38 (2013). [CrossRef]
14. D. Lanman and D. Luebke, “Near-eye light field displays,” ACM Trans. Graph. 32(6), 1–10 (2013). [CrossRef]
16. A. Maimone, D. Lanman, K. Rathinavel, K. Keller, D. Luebke, and H. Fuchs, “Pinlight displays: wide field of view augmented reality eyeglasses using defocused point light sources,” ACM Trans. Graph. 33(4), 89 (2014). [CrossRef]
17. W. Song, Y. Wang, D. Cheng, and Y. Liu, “Light field head-mounted display with correct focus cue using micro structure array,” Chin. Opt. Lett. 12(6), 060010 (2014). [CrossRef]
20. R. Ng, M. Levoy, G. Duval, M. Horowitz, and P. Hanrahan, “Light field photography with a hand-held plenoptic camera,” Stanf. Tech Rep. CTSR 2005–02, Dept. of Computer Science, Stanford Univ., (2005).
21. D. G. Dansereau, G. Schuster, J. Ford, and G. Wetzstein, “A wide-field-of-view monocentric light field camera,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), pp. 5048–5057.
22. Q. H. Wang, C. C. Ji, L. Li, and H. Deng, “Dual-view integral imaging 3D display by using orthogonal polarizer array and polarization switcher,” Opt. Express 24(1), 9–16 (2016). [CrossRef] [PubMed]
23. D. Cheng, Y. Wang, H. Hua, and M. M. Talha, “Design of an optical see-through head-mounted display with a low f-number and large field of view using a freeform prism,” Appl. Opt. 48(14), 2655–2668 (2009). [CrossRef] [PubMed]
24. H. Deng, Q. Wang, Z. Xiong, H. Zhang, and Y. Xing, “Magnified augmented reality 3D display based on integral imaging,” Optik (Stuttg.) 127(10), 4250–4253 (2016). [CrossRef]
25. S. Liu, Y. Li, P. Zhou, X. Li, N. Rong, S. Huang, W. Lu, and Y. Su, “A multi-plane optical see-through head mounted display design for augmented reality applications,” J. Soc. Inf. Disp. 24(4), 246–251 (2016). [CrossRef]