Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Spatial perception in stereoscopic augmented reality based on multifocus sensing

Open Access Open Access

Abstract

In many areas ranging from medical imaging to visual entertainment, 3D information acquisition and display is a key task. In this regard, in multifocus computational imaging, stacks of images of a certain 3D scene are acquired under different focus configurations and are later combined by means of post-capture algorithms based on image formation model in order to synthesize images with novel viewpoints of the scene. Stereoscopic augmented reality devices, through which is possible to simultaneously visualize the three dimensional real world along with overlaid digital stereoscopic image pair, could benefit from the binocular content allowed by multifocus computational imaging. Spatial perception of the displayed stereo pairs can be controlled by synthesizing the desired point of view of each image of the stereo-pair along with their parallax setting. The proposed method has the potential to alleviate the accommodation-convergence conflict and make augmented reality stereoscopic devices less vulnerable to visual fatigue.

© 2024 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction

Since Wheatstone’s stereoscope [1] and later on integral photography proposed by Lippmann [2] 3D scene information codified in multiview images has aroused the interest leading to the development of modern integral imaging [3], light-field imaging [4,5], perspective shifts from multifocus imaging [6,7], augmented reality [8,9] and 3D visualization in microscopy [10,11]. The relevance of 3D scene sensing and visualization in modern image technology is undeniable. In this regard, image acquisition through optical means combined with post-processing algorithms allow for the reorganization of the information in the acquired images and synthesis of novel points of view of a given scene. Augmented reality devices, and stereoscopic ones in particular, can benefit from binocular content allowed by the synthesis capabilities aforementioned.

Binocular vision is based on the fact 3D objects are perceived from two different perspectives due to the horizontal separation between left and right eyes resulting in left and right images of the object in our retinas slightly different. The disparity between the left and right points of view gives rise to binocular parallax, that is a relative displacement of an object viewed along two lines of sight. Parallax measures the relative position difference of the image of a given object between the right eye image and the left eye image of a stereoscopic pair. This parallax or retinal disparity in the acquired images codifies depth information about the scene. Left and right images are then fused by our brain in a process called stereopsis that gives us the perception of depth of the scene. Stereopsis acts as a strong depth cue in particular at short distances and is one of the most investigated depth perception mechanisms [12,13].

In stereoscopic displays, left and right view images are displayed separately to the left and right eyes. Both perspectives of the same 3D scene are then fused by the brain to give the perception of depth. When a given object occupies the same position on the screen for both the left and right eyes it has what is referred to as zero disparity or zero parallax.

These displays have become important for many applications [14,15] for example in medicine such as in medical diagnosis, pre-operative planning, image-guided precision surgery, or medical training. [1618]. Among stereoscopic devices those based in Augmented Reality (AR) allow for the simultaneous visualization of a real-world 3D scene while optically overlaying digital information with the use of see-through architectures [8,19,20] with promising applications [21,22] in surgery [23] and ophthalmology [24,25] among others. In ophthalmology in particular they have proved to be useful as a low vision aid technique [26] for patients navigating a 3D real world scene for their capability of superimposing cues that can enhance obstacle visibility [27,28].

For the correct display in AR stereoscopic devices the challenge of minimizing visual discomfort still remains to be solved. When visualizing a natural 3D scene our eyes are converging and accommodating (focusing) on a certain object. Convergence (vergence) involves the visual-motor system, i.e. the muscles that control the eye’s movements and when looking at a certain object both eyes are aiming precisely at that object [15]. On the other hand, accommodation involves the muscles that control changes in the focal length of the lenses of our eyes in order to focus on the object. Looking at infinity is comfortable and relaxing while looking to a closer object implies a muscular effort. Under natural viewing conditions accommodation and vergence work closely since our eyes converge and focus at the same point of an object and the object is brought within the limits of fusion. On the other hand, when we see a stereoscopic image we are accommodating on the screen (located at a fixed distance from the eye) where the stereoscopic pair is displayed while we may be converging somewhere else (in-front or behind the screen plane) depending on the perceived depth due to horizontal parallax. This decoupling between accommodation and convergence is not natural for the visual-motor system and may cause discomfort in what it is usually referred to the "Vergence Accommodation Conflict" (VAC) [29].

Several studies have been performed regarding VAC and the results show that changes in parallax can increase or mitigate VAC and the fatigue caused by the display [30]. The disparity shifting methods are effective to reduce excessive screen parallax [31]. They can mitigate the VAC with a low computational cost by adjusting the camera baseline and then scaling the depth range of the scene. However, the operation on individual focus slices of the scene is not easily addressed and control is usually restricted to a relative displacement between images of the stereo pair.

In the present paper (see Fig. 1), we introduce a new method to reconstruct a pair of stereoscopic images of a 3D-scene from a multi-focus image stack captured with a monocular focus tunable camera. We set the basis for control of perceived depth as well as convergence on the scene not only by the usual Horizontal Image Translation of a stereoscopic pair of images but also by the synthesis and control of the stereo pair itself. By using the image formation model and operating in the Fourier domain of the stack, images with arbitrary viewpoints and then sterescopic pairs with different parallaxes can be obtained. The proposal has the potential to alleviate the accommodation-convergence discrepancy and make AR stereoscopic devices less vulnerable to visual fatigue.

 figure: Fig. 1.

Fig. 1. Capture (left) and display (right) in our proposal. A monocular, finite size aperture focus-tunable lens camera captures a center view stack of multifocus images. Multifocus stack is processed in Fourier domain in order to obtain stereoscopic pairs to be displayed by means of an AR stereoscopic device. The stereo pair is rendered at screen distance $V$ while augmented information is perceived at a different distance. By synthesizing the stereoscopic pair of need perceived depth for each object can be changed while parallax can be tuned by means of HIT.

Download Full Size | PDF

The paper is organized as follows. The basics of all-in-focus point of view synthesis is presented in Section 2. In Section 3 the synthesis of an stereo pair is addressed while Section 4 is devoted to display and depth perception along with AR applications and convergence control. Results are presented in Section 5 while Conclusion is devoted to Section 6.

2. Multifocus sensing and point of view formation model

A multifocus stack can be acquired for example by introducing an electrically focus tunable lens based capture system in the AR device [9] where a given focal plane of interest can be electrically tuned without mechanical movements. Considering then the dependency of depth-of-field on focal length and aperture of the optical system, a multi-focus image stack is acquired performing a discrete focal sweep of the 3D scene along the optical axis. In each of the images of the multifocus stack, points at in-focus plane of the capture system are imaged as points in the camera sensor while points outside the depth of field associated with a certain focusing distance are imaged as blur circles and contribute to the out-of-focus part of the image of the stack. Synthesis of images with novel characteristics from a multifocus stack is accomplished considering the following model.

Let $i_k$ be the intensity distribution of the $k$-th image of the stack ($k=1,\ldots,N$, for color images in $RGB$ space $i_k=\left (i_k^R,i_k^G,i_k^B\right )$) which corresponds to the capture system focusing at a distance $z=z_k$:

$$i_k(x,y)=f_k(x,y)+\sum_{k'\neq k}h_{kk'}(x,y)\ast f_{k'}(x,y),$$
where $f_k$ is the in-focus region of $i_k$. Out-of-focus contributions in $i_k$ come from the 2D convolution between $f_{k'}$ (in-focus part of the $k'$-th image of the stack: $i_{k'}$) and the 2D intensity PSF $h_{kk'}(x,y)$ associated with the focusing distances $z_k$ and $z_{k'}$:
$$h_{kk'}(x,y)=\frac{1}{\pi r_{kk'}^2}circ\left(\frac{\sqrt{x^2+y^2}}{r_{kk'}}\right).$$
where:
$$\frac{r_{kk'}}{p}=R_0\left| \frac{1}{z_k}-\frac{1}{z_{k'}} \right|$$
and parameter $R_0$ is given by:
$$R_0= \frac{Rd}{p},$$
where $R$ is radius of the aperture of the optical system, $d$ is the distance between the optical system and the sensor and $p$ is the pixel pitch of the camera sensor.

Let us now consider the Fourier Transform ($\mathcal {F}$) of the set of $N$ coupled equations in Eq. (1) and put them in the following vector form:

$$\vec{I}(u,v)=H(u,v)\vec{F}(u,v),$$
where $(u,v)$ are spatial frequencies and $N$-element column vectors $\vec {I}$ , $\vec {F}$ and $N\times N$ symmetric matrix $H$ are given by [32]:
$$\begin{array}{c} \vec{I}(u,v) = \left( \begin{array}{c} I_1(u,v)\\ I_2(u,v)\\ \vdots\\ I_N(u,v) \end{array} \right),\quad \vec{F}(u,v) = \left( \begin{array}{c} F_1(u,v)\\ F_2(u,v)\\ \vdots\\ F_N(u,v) \end{array} \right),\\ \\ H(u,v) = \left( \begin{array}{cccc} 1 & H_{12}(u,v) & \ldots & H_{1N}(u,v)\\ H_{12}(u,v) & 1 & & \vdots\\ \vdots & & \ddots & H_{N-1 N}(u,v)\\ H_{1N}(u,v) & \ldots & H_{N-1 N}(u,v) & 1 \end{array} \right) \end{array}$$
where each element $H_{kk'}$ of the symmetric matrix $H$ is an Optical Transfer Function: $H_{kk'}=\mathcal {F}\{h_{kk'}\}$ while $I_k=\mathcal {F}\{i_{k}\}$ and $F_k=\mathcal {F}\{f_{k}\}$. For the case that $H(u,v)$ is invertible the solution to the linear system given by Eq. (5) is $\vec {F}(u,v)=H^{-1}(u,v)\vec {I}(u,v)$ while when $H(u,v)$ is not invertible (as it is the case for the D.C. component) a solution can be found through the Moore-Penrose pseudo-inverse $H^{\dagger }$ [33]. The Moore-Penrose pseudo-inverse provides the set of vectors that minimize the Euclidean norm: $\left \| H(u,v)\vec {F}(u,v)- \vec {I}(u,v)\right \|$ in the least squares sense. The minimal norm vector estimated by the Moore-Penrose pseudo-inverse is given by
$$\vec{F}(u,v)=H^\dagger(u,v)\vec{I}(u,v).$$

An arbitrary horizontal viewpoint of the scene’s reconstruction is accomplished by simulating the displacement of a pinhole camera in the horizontal direction with respect to the center of the original pupil. We consider the disparity (parallax) between the images of a given point as seen by the sensor of a centered pinhole camera and a pinhole camera displaced to the left. As it can be seen by triangle similarity from Fig. 2, the horizontal disparity is given by the following equation:

$$x_{shift}-x_0 =\frac{b_x d}{z}$$

 figure: Fig. 2.

Fig. 2. Horizontal shift and disparity for a two parallel pinhole camera configuration separated by a baseline $b_x$. A point in 3D space is imaged at $x_0$ in the sensor of one pinhole camera (the one in the center of the original pupil) while it is imaged at $x_{shift}$ by the other.

Download Full Size | PDF

If we consider a piecewise-plane approximation of the 3D scene, a horizontally shifted viewpoint $s_{b_x}(x,y)$ is obtained by shifting each focus slice $f_k(x,y)$ in an amount according to the disparity associated with the focusing distance $z_k$ and the baseline displacement $(b_x)$ of the camera,

$$s_{b_x}(x,y)=\sum_{k=1}^{N}f_k\left(x-\frac{d}{z_k}b_x,y\right).$$
$s_{0}(x,y)$ in particular recovers the image of the scene as captured with a pinhole camera in the center of the original circular aperture (that is extended-depth-of-field reconstruction of the scene [32]).

By means of the Fourier transform shift theorem, which states that translation in the space domain introduces a linear phase shift in the frequency domain [34], and by using Eq. (7) for $\vec {F}(u,v)$, we obtain the Fourier transform of Eq. (9):

$$S_{b_x}(u,v)=\sum_{k=1}^{N} e^{{-}j\frac{2\pi d}{z_k}\left(b_x u\right)}\left(H^\dagger(u,v)\vec{I}(u,v)\right)_k.$$

Finally, by Fourier inverse transforming the previous equation we obtain the new scene perspective as seen from a pinhole camera, translated $b_x$ to the left of the center of the original circular pupil:

$$s_{b_x}=\mathcal{F}^{{-}1}\{S_{b_x}(u,v)\}.$$

It is worth noticing that the synthesis of a new perspective of the scene following Eq. (11) is achieved without segmentation of the focused regions from each image of the stack or any depth map estimation, which usually introduce some inaccuracies in the reconstruction.

3. Stereoscopic pair synthesis

In the proposed method, $x_0$ corresponds to the image captured through a centered pinhole (all-in-focus) while $x_{shift}$ corresponds to the left and right images for $b_x=B/2$ and $b_x=-B/2$, respectively (see Sec.2.). This corresponds to a baseline (separation) between the left and right synthetic pinhole cameras of $B$ since we are simulating a stereo camera in parallel configuration (i.e., the optical axes of both cameras are parallel) The pair of stereoscopic images can be generated then by considering a synthetic stereo-camera formed by a left pinhole camera displaced to the left of the center of the original pupil, $b_x=B/2$, and a right pinhole camera displaced to the right of the center of the original pupil, $b_x=-B/2$.

Then, it is straightforward to reconstruct the left and right views according to Eq. (10):

$$I_{L}(u,v)=S_{B/2}(u,v)=\sum_{k=1}^{N} e^{{-}j\frac{2\pi d}{z_k}\frac{B}{2}u}\left(H^\dagger(u,v)\vec{I}(u,v)\right)_k$$
$$I_{R}(u,v)=S_{{-}B/2}(u,v)=\sum_{k=1}^{N} e^{{+}j\frac{2\pi d}{z_k}\frac{B}{2}u}\left(H^\dagger(u,v)\vec{I}(u,v)\right)_k$$
and finally, analogously to Eq. (11):
$$i_{L}(x,y)=\mathcal{F}^{{-}1}\{I_{L}(u,v) \}$$
$$i_{R}(x,y)=\mathcal{F}^{{-}1}\{I_{R}(u,v) \}$$

4. Stereoscopic pair display

4.1 Depth perception

As previously mentioned, in a stereoscopic pair of images each eye perceives a slight different perspective of the 3D scene. For a given point the distance between its images in a stereo pair is called the parallax of that point and it is related to depth perception of that point. In the parallel camera configuration the zero parallax plane in the capture is at infinity and will correspond to the screen plane in the display mode [35].

According to parallax an object could be perceived on the screen plane, in-front of it or behind it. Zero parallax means the object is perceived on the screen plane. If the left eye’s view of an object is shifted left at the screen relative to the right view, then the disparity is uncrossed (positive parallax) and the object is perceived as behind the screen. On the other hand, if the left view is shifted to the right, then the disparity is crossed (negative parallax) and the object is perceived as behind the screen. See Fig. 3.

 figure: Fig. 3.

Fig. 3. Parallax ($P$) and depth perception. For clarity, left eye image is shown in red while right eye image is shown in blue. (a) Uncrossed or positive parallax: the object is perceived as behind the screen plane, (b) zero parallax: the object is perceived as in the screen plane and (c) crossed or negative parallax: the object is perceived as in-front of the screen plane (pop-out of the screen).

Download Full Size | PDF

When displaying the stereoscopic pair (see Fig. 4), the parallax in the screen, $P=X'_{SR}-X'_{SL}$, is related with the disparity between the captured, in this case reconstructed, right and left images, $X_{CR}-X_{CL}$ [36]. Here, $X'_{SR}$ and $X'_{SL}$ are the x-coordinates of the image of the object as perceived, respectively, by the right and left eye of an observer when displayed on the screen, while $X_{CR}$ and $X_{CL}$ correspond to the x-coordinates of the image of the object as captured by the stereo-camera.

 figure: Fig. 4.

Fig. 4. Perceived depth and parallax. $V$ is the viewing distance (distance between the viewer and the screen), $e$ is the inter-ocular distance and $P=X'_{SR}-X'_{SL}$ is the horizontal parallax. For the configuration shown in this figure, parallax is positive, accommodation ($A$) occurs at distance $V$ (viewing screen) while convergence ($C$) occurs behind the screen

Download Full Size | PDF

In our approach the reconstruction of the stereo pair of images is equivalent to the capture of the scene by a stereo camera in parallel configuration. Under this configuration, it can be seen that the two views always produce a negative horizontal disparity between the captured images (remind that zero disparity for point captured under parallel camera configuration corresponds to a point at infinite distance).

Let $M$ be the magnification between acquisition and displaying, i.e.:

$$M=\dfrac{\textit{screen width}}{\textit{sensor width}},$$
then the parallax in the viewing screen will be related to disparity between acquired images [35] by:
$$P=X'_{SR}-X'_{SL}=M\left(X_{CR}-X_{CL}\right)$$

In our case, this leads to a negative parallax in the screen:

$$P=X'_{SR}-X'_{SL}=M\left(X_{CR}-X_{CL}\right)=M\left(\frac{-b_x d}{z}-\frac{b_x d}{z}\right)={-}M \left(\frac{2b_x d}{z}\right)={-}M \frac{B d}{z}.$$

When the stereo pair is displayed on the screen (see Fig. 4) we obtain from similar triangles the following:

$$\frac{P}{e} =\frac{z'-V}{z'},$$
where $V$ is the viewing distance (distance between the viewer and the screen), $e$ is the inter-ocular distance and $z'$ is the perceived depth. $z'$ is then related to parallax in the screen according to:
$$z' =\frac{e}{e-P}V,$$
which for our case, by means of Eq. (18) takes the following form:
$$z' =\frac{e}{e+M \frac{B d}{z}}V.$$

Then, when considering parallel cameras for acquisition, the infinity in real world ($z=\infty$) is placed at the screen level ($z'=V$) while everything else in the scene is perceived in front of the screen ($z'<V$ for $P<0$). The objects appear to float over between the observer and the screen plane (pop-out image). The amount of parallax can in turn be adjusted by selecting the value of $B$ through an appropriate value of $b_x$.

It is worth noticing that the expression in (21) can be further simplified by considering the approximation $M \frac {B d}{z} \gg e$ which is valid under the assumption of sufficiently close objects or $M\gg 1$ (which is the case of AR displays):

$$z' \approx \frac{eV}{M Bd}z,$$
so perceived depths are proportional to real-world depths and scenes might be displayed without depth distortion.

4.2 Augmented reality superimposed cues

For AR applications and in particular using AR as a low vision aid technique, 3D navigation can be enhanced by means of superimposing cues for obstacle avoidance or grasping tasks [27]. Multifocus sensing combined with stereo pair synthesis allows for displaying an object of the 3D scene and superimposing it over the original (without the need to render a color coded wire mesh) by exploiting the relation between real world and display depths.

From Eq. (22) and imposing $z'=z$ (displayed image perceived over the real world object), the corresponding value of $B$ is:

$$B \approx \frac{eV}{Md},$$
so $B$ can be adjusted in order to render augmented information over the real world scene at the same depth. Since all the scene is perceived in front of the screen and given that the effective screen of AR devices can be placed several meters in front of the user, the baseline adjustment can serve to enhance objects of interest to the user for close and medium range activities.

4.3 Convergence control

It is also possible to set convergence after acquisition, i.e., change the simulated camera configuration from parallel to toed-in. Horizontal Image Translation (HIT) is the conventional procedure to accomplish this change and consists in applying a horizontal shift in the same amount to the left and right image of the pair. The position of a particular object in the 3D scene can then be perceived at the screen plane or even behind it [35,36].

Let us consider convergence tuning after acquisition by modifying the zero parallax plane [37] for a particular object originally placed at plane $k=k*$ ($z_k=z_{k*}$). A horizontal image translation $-dB/2z*$ needs to be performed over the left image while a horizontal translation $+dB/2z*$ is to be introduced to the right image. We can account for each of these translations by introducing a global phase factor in the left a right view Fourier space images given by Eqs. (12) and (13), respectively:

$$I^*_{L}(u,v)=e^{{+}j\frac{2\pi d}{z_{k*}}\frac{B}{2}u}S_{B/2}(u,v)=\sum_{k=1}^{N} e^{{-}j2\pi d\left(\frac{1}{z_k}-\frac{1}{z_{k*}}\right)\frac{B}{2}u}\left(H^\dagger(u,v)\vec{I}(u,v)\right)_k$$
$$I^*_{R}(u,v)=e^{{-}j\frac{2\pi d}{z_{k*}}\frac{B}{2}u}S_{{-}B/2}(u,v)=\sum_{k=1}^{N} e^{{+}j2\pi d\left(\frac{1}{z_k}-\frac{1}{z_{k*}}\right)\frac{B}{2}u}\left(H^\dagger(u,v)\vec{I}(u,v)\right)_k$$
so after Fourier inverse transforming:
$$i^*_{L}(x,y)=\mathcal{F}^{{-}1}\{I^*_{L}(u,v) \}$$
$$i^*_{R}(x,y)=\mathcal{F}^{{-}1}\{I^*_{R}(u,v) \}$$
the left and right images of the pair will show no disparity for the in-focus slice corresponding to the plane $k=k*$ ($z_k=z_{k*}$). Objects placed at that plane will then be displayed with zero parallax over the screen.

5. Results

As a proof of principle of our proposal we first consider the synthetic scene in Fig. 5(a) and its corresponding multifocus stack shown in Figs. 5($b_{1,2}$). A possible stereo pair synthesized from this stack is shown in Figs. 5($c_{1,2}$).

 figure: Fig. 5.

Fig. 5. Multifocus stack and stereoscopic pairs (prepared for eyes-crossed free-viewing) showing depth perception flexibility. (a) 3D scene (b1) and (b2) are the images of the multifocus stack when focusing at $z_1$ (house) and $z_2$ (tree) respectively, (c1) and (c2) are the right (R) and left (L) eye stereoscopic pair respectively (here both objects are perceived in front of the screen). (d1) and (d2) are the stereoscopic pair after parallax manipulation to set parallax zero for the house (the house is perceived in the screen while the tree is perceived behind it).(e1) and (e2) are the stereoscopic pair after parallax manipulation to set parallax zero for the tree (the tree is perceived in the screen while the house is perceived in front of it closer to the observer).

Download Full Size | PDF

Although our proposal is mainly aimed at AR devices working under stereoscopic configuration, once the stereoscopic pair is generated by means of Eqs. (14)–(15) the left and right images can be displayed in different ways like crossed-eyes, parallax barrier, shutter glasses or polarized displays among others [38]. We present here the right and left images of the stereo pair to allow the fused image to be perceived by deliberately crossing one’s eyes until the two images come together. For this stereo pair, both objects are perceived in front of the screen. We can then tune parallax by means of HIT to set zero parallax for a certain object of interest. Figures 5($d_{1,2}$) show the result of applying horizontal translation by means of Eqs. (26),(27) for $z*=z_1$ where no disparity between left and right images can be observed for the object house. As a result, the house is perceived in the screen ($P=0$) while the tree is perceived behind the screen ($P>0$). On the other hand, by setting $z*=z_2$ the tree can be perceived on the screen while the house is perceived in front of the screen ($P<0$). VAC can then be alleviated for an object of interest that can be perceived in the screen by adequate parallax adjustment.

In a second series of experiments we consider the multifocus sensing of a scene close to the AR device (Fig. 6(a)) consisting of three numbers at $5$, $10$, and $12$cm and a background at $16$cm from the sensing device. The optical imaging system for multifocus sensing consists of a doublet formed by an Electrically Focus-Tunable Lens (Optotune EL-10-30) directly attached in front of the fixed focal length lens (F 1.6, focal length $6.0$mm) of a camera (Basler series, CMOS sensor $1920\times 1080$px) focusing at infinity. Figures 6($b_{1-4}$) show the image capture after registration for the above mentioned distances. Stereoscopic pairs as obtained from Eqs. (14)–(15) are shown in Fig. 7 for baselines $B=0.2$, $0.4$ and $0.6$mm and can be visualized following cross-eye method.

 figure: Fig. 6.

Fig. 6. (a) Scene to be visualized. ($b_{1-4}$) Registered images with the system focusing at $z_1=5$cm, $z_2=10$cm, $z_3=12$cm and $z_4=16$cm, respectively.

Download Full Size | PDF

 figure: Fig. 7.

Fig. 7. Stereo pairs synthesized from the stack in Fig. 6 for $B=0.2$mm (top), $0.4$mm (middle) and $0.6$mm (bottom) respectively presented for cross-eye viewing with increasing negative parallax. Depth maps estimated from each stereo pair by optical flow [39] using DMAG9b software (the lighter areas correspond to objects that are closer to the observer while the darker areas correspond to objects that are further away).

Download Full Size | PDF

It can be clearly seen that synthesis of new points of view for each image (left and right) of the pair allows for different depth cues for example occlusion to vary from one stereo pair to another. Besides, flexible parallax allows also for visualization of the scene with tunable depth range (as reflected in the differences in the depth maps shown on the right column of Fig. 7) while preserving its structure.

6. Conclusion

We have presented through a series of proof of concept experiments that a multifocus stack which can obtained from the central point of view of a focus tunable, finite aperture camera can allow for the synthesis of stereoscopic pairs of different parallaxes by means of an algorithm operating over the Fourier domain of the stack. Perceived depth as well as convergence are then tuned by synthesizing the adequate pair of images with the desired parallax.

Besides, by selecting the appropriate perceiving depth for a given object and allowing it to match the original object in real world, depth cues superimposition in augmented reality devices might also be achieved with the consequent enhancement of visualization in particular for low vision users (e.g. by increasing intensity or contrast in the AR device).

The proposed method could be useful to alleviate visual fatigue due to Accommodation Convergence Conflict in stereo devices. Other possible applications could be regarding binocular treatment of the amblyopia eye (lazy eye), strabismus, etc., in therapy devices.

As a future line of work, the ability to adjust the parallax values for depth perception flexibility from a multifocus stack described here can be extended to reconstruct arrays of multiview images with both horizontal and vertical parallax, allowing to extend the method to autostereoscopic displays, for example in see-through integral imaging displays [8], which can have interesting applications in many areas of research from Augmented Reality to 3D visualization in microscopy.

Acknowledgments

J. R. Alonso and A. Fernández acknowledge support by Comisión Sectorial de Investigación Científica (Uruguay). Portions of this work were presented at 3D Image Acquisition and Display: Technology, Perception and Applications in 2022 [40] and 2023 [41].

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. C. Wheatstone, “Xviii. contributions to the physiology of vision.&part the first. on some remarkable, and hitherto unobserved, phenomena of binocular vision,” Philosophical transactions of the Royal Society of London pp. 371–394 (1838).

2. G. Lippmann, “Epreuves reversibles donnant la sensation du relief,” J. Phys. Theor. Appl. 7(1), 821–825 (1908). [CrossRef]  

3. H. Navarro, R. Martínez-Cuenca, G. Saavedra, et al., “3d integral imaging display by smart pseudoscopic-to-orthoscopic conversion (spoc),” Opt. Express 18(25), 25573–25583 (2010). [CrossRef]  

4. M. Levoy and P. Hanrahan, “Light field rendering,” in Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, (1996), pp. 31–42.

5. M. Martínez-Corral and B. Javidi, “Fundamentals of 3d imaging and displays: a tutorial on integral imaging, light-field, and plenoptic systems,” Adv. Opt. Photonics 10(3), 512–566 (2018). [CrossRef]  

6. J. R. Alonso, A. Fernández, and J. A. Ferrari, “Reconstruction of perspective shifts and refocusing of a three-dimensional scene from a multi-focus image stack,” Appl. Opt. 55(9), 2380–2386 (2016). [CrossRef]  

7. A. Kubota, K. Kodama, D. Tamura, et al., “Filter bank for perfect reconstruction of light field from its focal stack,” IEICE Trans. Inf. & Syst. E106.D(10), 1650–1660 (2023). [CrossRef]  

8. H. Hua and B. Javidi, “A 3d integral imaging optical see-through head-mounted display,” Opt. Express 22(11), 13484–13491 (2014). [CrossRef]  

9. J. R. Alonso, A. Fernández, and B. Javidi, “Augmented reality three-dimensional visualization with multifocus sensing,” Opt. Continuum 1(2), 355–365 (2022). [CrossRef]  

10. L. Tian and L. Waller, “3d intensity and phase imaging from light field measurements in an led array microscope,” Optica 2(2), 104–111 (2015). [CrossRef]  

11. J. R. Alonso, A. Silva, A. Fernández, et al., “Computational multifocus fluorescence microscopy for three-dimensional visualization of multicellular tumor spheroids,” J. Biomed. Opt. 27(06), 066501 (2022). [CrossRef]  

12. I. P. Howard and B. J. Rogers, Binocular vision and stereopsis (Oxford University Press, USA, 1995).

13. M. Lambooij, M. Fortuin, I. Heynderickx, et al., “Visual discomfort and visual fatigue of stereoscopic displays: a review,” J. Imaging Sci. Technol. 53(3), 30201-1–30201-14 (2009). [CrossRef]  

14. J. Schild, J. LaViola, and M. Masuch, “Understanding user experience in stereoscopic 3d games,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, (ACM, 2012), pp. 89–98.

15. B. Mendiburu, 3D movie making: stereoscopic digital cinema from script to screen (CRC Press, 2012).

16. Y. Fong, P. C. Giulianotti, J. Lewis, et al., Imaging and Visualization in The Modern Operating Room: A Comprehensive Guide for Physicians (Springer, 2015).

17. M. H. van Beurden, W. IJsselsteijn, and J. Juola, “Effectiveness of stereoscopic displays in medicine: a review,” 3D Res. 3(1), 3–13 (2012). [CrossRef]  

18. R. Ferdig, J. Blank, A. Kratcoski, et al., “Using stereoscopy to teach complex biological concepts,” Adv. Phys. Education 39(3), 205–208 (2015). [CrossRef]  

19. X. Shen and B. Javidi, “Large depth of focus dynamic micro integral imaging for optical see-through augmented reality display using a focus-tunable lens,” Appl. Opt. 57(7), B184–B189 (2018). [CrossRef]  

20. Z. He, X. Sui, G. Jin, et al., “Progress in virtual reality and augmented reality based on holographic display,” Appl. Opt. 58(5), A74–A81 (2019). [CrossRef]  

21. M. Billinghurst, A. Clark, and G. Lee, “A survey of augmented reality,” Found. Trends Human-Computer Interact. 8(2-3), 73–272 (2015). [CrossRef]  

22. D. Van Krevelen and R. Poelman, “A survey of augmented reality technologies, applications and limitations,” J. Virtual Reality 9(2), 1–20 (2010). [CrossRef]  

23. T. Sielhorst, M. Feuerstein, and N. Navab, “Advanced medical displays: A literature review of augmented reality,” J. Disp. Technol. 4(4), 451–467 (2008). [CrossRef]  

24. G. Aydındoğan, K. Kavaklı, A. Şahin, et al., “Applications of augmented reality in ophthalmology,” Biomed. Opt. Express 12(1), 511–538 (2021). [CrossRef]  

25. D. M. Levi, “Applications and implications for extended reality to improve binocular vision and stereopsis,” J. Vis. 23(1), 14 (2023). [CrossRef]  

26. V. Bansal, K. Balasubramanian, and P. Natarajan, “Obstacle avoidance using stereo vision and depth maps for visual aid devices,” SN Appl. Sci. 2(6), 1131 (2020). [CrossRef]  

27. A. N. Angelopoulos, H. Ameri, D. Mitra, et al., “Enhanced depth navigation through augmented reality depth mapping in patients with low vision,” Sci. Rep. 9(1), 11230 (2019). [CrossRef]  

28. D. R. Fox, A. Ahmadzada, C. T. Wang, et al., “Using augmented reality to cue obstacles for people with low vision,” Opt. Express 31(4), 6827–6848 (2023). [CrossRef]  

29. D. M. Hoffman, A. R. Girshick, K. Akeley, et al., “Vergence–accommodation conflicts hinder visual performance and cause visual fatigue,” J. Vis. 8(3), 33 (2008). [CrossRef]  

30. C. J. Lin and S. Canny, “Effects of virtual target size, position, and parallax on vergence-accommodation conflict as estimated by actual gaze,” Sci. Rep. 12(1), 20100 (2022). [CrossRef]  

31. H. G. Kim, M. Park, S. Lee, et al., “Visual comfort aware-reinforcement learning for depth adjustment of stereoscopic 3d images,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35 (2021), pp. 1762–1770.

32. J. R. Alonso, A. Fernández, G. A. Ayubi, et al., “All-in-focus image reconstruction under severe defocus,” Opt. Lett. 40(8), 1671–1674 (2015). [CrossRef]  

33. A. Ben-Israel and T. N. Greville, Generalized inverses: theory and applications, vol. 15 (Springer Science & Business Media, 2003).

34. J. W. Goodman, Introduction to Fourier optics (Roberts and Company Publishers, 1996).

35. M. D. Smith and B. T. Collar, “Perception of size and shape in stereoscopic 3d imagery,” in Stereoscopic Displays and Applications XXIII, vol. 8288 (SPIE, 2012), pp. 582–612.

36. D. K. Broberg, “Guidance for horizontal image translation (hit) of high definition stereoscopic video production,” in IS&T/SPIE Electronic Imaging, (International Society for Optics and Photonics, 2011), pp. 78632F.

37. L. Lipton, Foundations of the stereoscopic cinema: a study in depth (Van Nostrand Reinhold, 1982).

38. K. Iizuka, Engineering optics, vol. 35 (Springer Science & Business Media, 2013).

39. U. Capeto, “Depth map generation using optical flow,” (2017).

40. J. R. Alonso, “Manipulation of the parallax values for depth perception flexibility from multifocus stacks,” in 3D Image Acquisition and Display: Technology, Perception and Applications, (Optica Publishing Group, 2022), pp. 3F3A–8.

41. J. R. Alonso, “Multi-focus imaging in fluorescence microscopy and augmented reality,” in 3D Image Acquisition and Display: Technology, Perception and Applications, (Optica Publishing Group, 2023), pp. DTu5A–5.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (7)

Fig. 1.
Fig. 1. Capture (left) and display (right) in our proposal. A monocular, finite size aperture focus-tunable lens camera captures a center view stack of multifocus images. Multifocus stack is processed in Fourier domain in order to obtain stereoscopic pairs to be displayed by means of an AR stereoscopic device. The stereo pair is rendered at screen distance $V$ while augmented information is perceived at a different distance. By synthesizing the stereoscopic pair of need perceived depth for each object can be changed while parallax can be tuned by means of HIT.
Fig. 2.
Fig. 2. Horizontal shift and disparity for a two parallel pinhole camera configuration separated by a baseline $b_x$. A point in 3D space is imaged at $x_0$ in the sensor of one pinhole camera (the one in the center of the original pupil) while it is imaged at $x_{shift}$ by the other.
Fig. 3.
Fig. 3. Parallax ($P$) and depth perception. For clarity, left eye image is shown in red while right eye image is shown in blue. (a) Uncrossed or positive parallax: the object is perceived as behind the screen plane, (b) zero parallax: the object is perceived as in the screen plane and (c) crossed or negative parallax: the object is perceived as in-front of the screen plane (pop-out of the screen).
Fig. 4.
Fig. 4. Perceived depth and parallax. $V$ is the viewing distance (distance between the viewer and the screen), $e$ is the inter-ocular distance and $P=X'_{SR}-X'_{SL}$ is the horizontal parallax. For the configuration shown in this figure, parallax is positive, accommodation ($A$) occurs at distance $V$ (viewing screen) while convergence ($C$) occurs behind the screen
Fig. 5.
Fig. 5. Multifocus stack and stereoscopic pairs (prepared for eyes-crossed free-viewing) showing depth perception flexibility. (a) 3D scene (b1) and (b2) are the images of the multifocus stack when focusing at $z_1$ (house) and $z_2$ (tree) respectively, (c1) and (c2) are the right (R) and left (L) eye stereoscopic pair respectively (here both objects are perceived in front of the screen). (d1) and (d2) are the stereoscopic pair after parallax manipulation to set parallax zero for the house (the house is perceived in the screen while the tree is perceived behind it).(e1) and (e2) are the stereoscopic pair after parallax manipulation to set parallax zero for the tree (the tree is perceived in the screen while the house is perceived in front of it closer to the observer).
Fig. 6.
Fig. 6. (a) Scene to be visualized. ($b_{1-4}$) Registered images with the system focusing at $z_1=5$cm, $z_2=10$cm, $z_3=12$cm and $z_4=16$cm, respectively.
Fig. 7.
Fig. 7. Stereo pairs synthesized from the stack in Fig. 6 for $B=0.2$mm (top), $0.4$mm (middle) and $0.6$mm (bottom) respectively presented for cross-eye viewing with increasing negative parallax. Depth maps estimated from each stereo pair by optical flow [39] using DMAG9b software (the lighter areas correspond to objects that are closer to the observer while the darker areas correspond to objects that are further away).

Equations (27)

Equations on this page are rendered with MathJax. Learn more.

i k ( x , y ) = f k ( x , y ) + k k h k k ( x , y ) f k ( x , y ) ,
h k k ( x , y ) = 1 π r k k 2 c i r c ( x 2 + y 2 r k k ) .
r k k p = R 0 | 1 z k 1 z k |
R 0 = R d p ,
I ( u , v ) = H ( u , v ) F ( u , v ) ,
I ( u , v ) = ( I 1 ( u , v ) I 2 ( u , v ) I N ( u , v ) ) , F ( u , v ) = ( F 1 ( u , v ) F 2 ( u , v ) F N ( u , v ) ) , H ( u , v ) = ( 1 H 12 ( u , v ) H 1 N ( u , v ) H 12 ( u , v ) 1 H N 1 N ( u , v ) H 1 N ( u , v ) H N 1 N ( u , v ) 1 )
F ( u , v ) = H ( u , v ) I ( u , v ) .
x s h i f t x 0 = b x d z
s b x ( x , y ) = k = 1 N f k ( x d z k b x , y ) .
S b x ( u , v ) = k = 1 N e j 2 π d z k ( b x u ) ( H ( u , v ) I ( u , v ) ) k .
s b x = F 1 { S b x ( u , v ) } .
I L ( u , v ) = S B / 2 ( u , v ) = k = 1 N e j 2 π d z k B 2 u ( H ( u , v ) I ( u , v ) ) k
I R ( u , v ) = S B / 2 ( u , v ) = k = 1 N e + j 2 π d z k B 2 u ( H ( u , v ) I ( u , v ) ) k
i L ( x , y ) = F 1 { I L ( u , v ) }
i R ( x , y ) = F 1 { I R ( u , v ) }
M = screen width sensor width ,
P = X S R X S L = M ( X C R X C L )
P = X S R X S L = M ( X C R X C L ) = M ( b x d z b x d z ) = M ( 2 b x d z ) = M B d z .
P e = z V z ,
z = e e P V ,
z = e e + M B d z V .
z e V M B d z ,
B e V M d ,
I L ( u , v ) = e + j 2 π d z k B 2 u S B / 2 ( u , v ) = k = 1 N e j 2 π d ( 1 z k 1 z k ) B 2 u ( H ( u , v ) I ( u , v ) ) k
I R ( u , v ) = e j 2 π d z k B 2 u S B / 2 ( u , v ) = k = 1 N e + j 2 π d ( 1 z k 1 z k ) B 2 u ( H ( u , v ) I ( u , v ) ) k
i L ( x , y ) = F 1 { I L ( u , v ) }
i R ( x , y ) = F 1 { I R ( u , v ) }
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.