Abstract
We present an implantable metaverse featuring retinal prostheses in association with bionic vision processing. Unlike conventional retinal prostheses, whose electrodes are spaced equidistantly, our solution is to rearrange the electrodes to match the distribution of ganglion cells. To naturally imitate the human vision, a scheme of bionic vision processing is developed. On top of a three-dimensional eye model, our bionic vision processing is able to visualize the monocular image, binocular image fusion, and parallax-induced depth map.
© 2023 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement
1. Introduction
In a 1992 sci-fi novel Snow Crash [1], its author Neal Stephenson coined a term “metaverse”, which is now becoming a worldwide phenomenon. To realize a fully immersive metaverse, augmented/virtual reality (AR/VR) headsets are deemed as indispensable hardware. Technically, AR/VR headsets are of wearable devices. Depending on the form factors, AR/VR headsets may adopt different optical architectures or near-eye displays (NEDs). The most successful one is arguably the magnifier-based NEDs [2], which is suitable for helmet-like headsets, e.g., Oculus Rift (Meta/Facebook). For eyeglasses-like headsets, waveguide-based NEDs [3–11] are the most promising architecture, e.g., HoloLens (Microsoft). Alternatively, some NEDs can get even closer to eyes by building the entire or part of device on top of contact lenses [12–14].
Following in the footsteps of Neal Stephenson―the time setting of his version of metaverse is still in the 21st century―let us try to move the timeline a bit further. Say a time when a cyborg is no longer a science fiction. Futuristic or scary as it may sound, this is happening now. If to look it up in the dictionary, cyborg is usually defined as a being with both organic and biomechatronic or bionic body parts [15]. Say someone having an artificial cochlea or a cardiac pacemaker. By this definition, he/she is more or less a cyborg. As our life expectancy keeps increasing, it might not be a bad idea to replace the dysfunctional body parts with the artificial ones. Inspired by this scenario, we hereby introduce an implantable metaverse/AR/VR, which consists of two major components, i.e., retinal prostheses and bionic vision processing.
2. Retinal prostheses
Retinal prostheses, also known as bionic eyes, are an implantable electronic device that could restore the sensation of vision for the individuals with retinal diseases such as retinitis pigmentosa or age-related macular degeneration [16]. To compensate the loss of photoreceptors in the outer layer of retina, i.e., rod cells or rods and cone cells or cones, retinal prostheses employ arrays of micro electrodes or photodiodes to provide the electric stimulation to the remaining cells. As the retina is composed of as many as 10 distinct layers [17], retinal prostheses could be mounted at different locations. In most clinical trials, retinal prostheses are placed epiretinally (on the retina) or subretinally (behind the retina) [18]. We prefer to design with the former out of two main reasons. Number one. Epiretinal placement allows prostheses to bypass all other retinal layers to directly stimulate the ganglion cells. Number two. The number of ganglion cells or ganglions (0.7 to 1.5 million) is far less than both rods (92 million) and cones (4.6 million) [19]. Similar to rods and cones, ganglions are also unevenly distributed throughout the retina. In the fovea, the ratio of photoreceptors to ganglions is as small as 5. In the periphery, this ratio could go up to hundreds. By leveraging this property, we arrange the electrodes to proportionally match the density of ganglions, forming a foveated pattern, as shown Fig. 1. As opposed to photoreceptor-based foveation techniques [20–22], ganglion-based foveation could significantly reduce the abundance of pixels, thereby translating into fewer electrodes and lower power consumption.
3. Bionic vision processing
3.1 First principle
To explain how the human vision works, a diagram of simplified visual pathway [23] is drawn in Fig. 2. The whole process of vision can be condensed into four steps. Step 1. Formation of monocular images of both left and right visual fields or fields of vision (FOVs). As distinguished by blue and red colors, the left/right half of left/right FOV is projected to the right/left hemiretina of left/right eye. Step 2. Transmission of visual information or images via the optic nerve, optic chiasm, optic tract, lateral geniculate nucleus, and optic radiation. Step 3. Fusion of monocular images into a binocular image on the visual cortex. Step 4. Derivation of depth of binocular field―the overlap of left and right FOVs. The purpose of our bionic vision processing is to emulate and visualize the above steps. In order for computer-generated images to be correctly or better interpreted by the brain, it is necessary to process the images in a way analogous to the innate visual processing. Otherwise, the physiological rejection or the VR sickness [24] may incur. Motivated by Dai Vernon [25] and his quote “What the eyes see, the heart must believe.”, the key of bionic vision processing is to “fool” the brain. To meet this principle, the images must be natural.
3.2 Image plane adjustment
For human vision, say there is an object being defined by a plane ABCD, then its image projected onto the retina will be a spherical surface A'B'C'D’, as shown in Fig. 3, where the eyeball is approximated as a sphere, and for the sake of symmetry, the object, image, pupil and fovea are center aligned. Note that the orientation of image shall be opposite to that of object. For bionic vision, as all optical components preceding the retinal prosthesis are bypassed, both the object and image need to be generated by the computer. In order for the computer to simulate the imaging more efficiently, a three-dimensional (3D) object is first projected onto the plane ABCD, and then the image plane is shifted from the surface A'B'C'D’ to the plane A''B''C''D'‘, which is tangent to the eyeball. Otherwise, the region of surface A'B'C'D’ will be subject to the change of object distance.
3.3 Field of vision
As shown in Fig. 4, where the angle is measured in degrees, in the horizontal direction, monocular FOV extends to 60° nasally (towards the nose) and to 100° temporally (towards the temple). In the vertical direction, monocular FOV extends to 60° superiorly (towards the forehead) and to 75° inferiorly (towards the chin) [26–28]. The binocular FOV―the overlap between left and right monocular FOVs―is 120° (horizontal) and 134° (vertical), respectively. The blind spot or scotoma―5.5° (horizontal) by 7.5° (vertical)―is located 12–15° temporally and 1.5° inferiorly [29]. By transforming FOV from the polar coordinate to image coordinate, as shown in Fig. 5, masks for image manipulations could be obtained.
3.4 Foveated blurring
To demonstrate the foveated blurring with respect to the densities of ganglions, different kernel sizes for convolutions shall be adopted, as listed in Table 1. According to the fitted ganglion distribution function [30], the average ganglion density at fovea is about 90 times higher than that at the periphery. Pursuant to the rule that the kernel size is inversely proportional to the ganglion density, the foveal region uses a 1 × 1 convolution kernel, i.e., to keep the original pixels intact, while the outermost region uses 9 × 9 convolution kernel, i.e., to calculate the mean value out of 9 × 9 pixels. Figure 6 shows the image with four regions being blurred with 1 × 1, 3 × 3, 5 × 5, and 9 × 9 convolution kernels, respectively.
3.5 Spherical distortion
When projecting the retinal image to the target imaging plane, it is obvious that the FOV cannot approach 180°, which will make the image infinitely large. To avoid this situation, the maximum horizontal/vertical FOV is limited to 164°, which is wide enough to cover the entire range of FOV of eye. From the geometries shown in Fig. 7 and Fig. 8, we could easily calculate the coordinate transformation. As shown in Fig. 7, A is the pupil, O is the center of the eye, and B is the fovea. If we assume that a point C’ is on the eyeball and C is its corresponding point in the horizontal plane, and let DC and C'F be perpendicular to the yBz plane, then there is C’ (|C'F|, |GF|, |BG|) and C (|CD|, |BD|, 0). For convenience, we write the coordinates of C'(x, y, z), and let u = |CD|, v = |BD|. Thus, the coordinates of C can be converted to C(u, v) in the UV coordinate system. It should be emphasized that each component of the changed coordinate is negative, when it corresponds to the negative half axis of the coordinate axis. From the geometric relationship, we could have
and where FOVh and FOVv represent the maximum horizontal and vertical FOVs of the image, respectively. In Python, PyVista is employed to build a 3D eyeball model and to accurately project the planar or flat image to its spherical surface.3.6 Visual field transition
Revisiting the visual pathway, as shown in Fig. 2, we shall mainly focus on how the visual fields or images evolve from the retina to cortex. On the retina, the image therein will be reversed to be opposite to its original orientation, i.e., upside down and left-to-right. In addition, the monocular visual field is split into two hemifields, which are separated by the blind spot. On the cortex, the image therein will be re-reversed back to its original orientation, which we actually perceive. Interestingly, since the retina of each eye is connected to both left and right cerebral hemispheres, there could be two types of the so-called binocular images. Type A refers to the unilateral binocular image, which is handled by one side of hemisphere and comprises two hemifields of the same side (left or right) of both eyes. Type B refers to the bilateral binocular image, which is coordinated by both hemispheres and comprises two full monocular visual fields of both eyes. Although two types of binocular images are coexisting inside the brain, we can only see the latter unless the visual pathway to one hemisphere is completely damaged [23].
3.7 Binocular disparity
The depth perception of human vision can be attributed to a variety of monocular and binocular cues [31]. For the binocular cues, the binocular disparity or parallax is supposed to be the most decisive one. The depth induced by the binocular disparity can be deduced from an epipolar geometry [32], as shown in Fig. 9, where P denotes the point of interest, OL the center of left retinal plane, and OR the center of right retinal plane. When the left and right retinal planes are rectified, the depth d of point P can be written as
where b is the center distance between the left and right retinal planes, f the focal length of eye, x1 and x2 the x-coordinates of projections of P onto the left and right retinal planes, respectively. In the case of parallel eyes or lines of sight, b will become the interpupillary distance and f the diameter of eyeball.4. Results and discussion
4.1 Input images
To create input images for the retinal prostheses, our bionic vision processing is developed with Unity (Unity Software Inc.). Two physical cameras are used to mimic the eyes to view the distant objects. The focal length―the distance between the camera lens and the sensor―is 24 mm, which approximates the diameter of eyeball [33]. The sensor size is 179 mm (width) by 179 mm (height). Hence, the camera’s FOV is 150° (horizontal) by 150° (vertical). Considering the angular span of monocular FOV as discussed earlier, the bisector of horizontal FOV of camera shall be roughly center-aligned with the blind spot. The distance between two cameras is set as 63 mm, on par with the average interpupillary distance for an adult [34]. In analogy to eyes, the Unity cameras capture the images in 3D scenes and flatten them to display, as shown in Fig. 10.
4.2 3D eye model
On the spherical surface of a 3D eye model are rendered the retinal images, which have factored into the monocular FOV, foveated blurring, and spherical distortion. Since two eyes are identical, only the left eye is shown in Fig. 11, where the dark spot and empty area of the eye model represent the blind spot and pupil, respectively. As compared to the traditional eye models [34], this 3D eye model is rotatable and scalable for a 360° view of retinal image (see Visualization 1).
4.3 Binocular image fusion
As shown in Fig. 12, the unilateral image fusion is performed by splitting the monocular images into the left and right hemiretinal images according to the location of blind spot (Fig. 12(a) and Fig. 12(b)), rectifying hemiretinal images (Fig. 12(c) and Fig. 12(d)), and merging the hemiretinal images for each side of hemisphere (Fig. 12(e) and Fig. 12(f)). As shown in Fig. 13, the bilateral image fusion is achieved by merging the above two unilateral images with one-sided hemifields into one single image. For the overlap of left and right visual fields, the ocular-dominance-based image fusion can be applied, which is done by overriding the binocular overlap by the left/right-eye image. Incidentally, approximately 70% of the population are right-eye dominant and 29% left-eye dominant [35]. That being said, ocular dominance may vary with the viewing or gaze direction due to image size changes on the retinas.
4.4 Parallax-induced depth map
For the depth perception is color independent and generated in the cortex, the calculation of depth shall be carried out with monochromatic upright images. Firstly, both the left-eye (Fig. 14(a)) and right-eye (Fig. 14(b)) images are rectified with the epipolar corrections. Secondly, the binocular overlap of the rectified left-eye (Fig. 14(c)) and rectified right-eye (Fig. 14(d)) images is obtained, as shown in Fig. 14(e). Finally, for the case of parallel eyes when viewing distant objects, Fig. 14(f) shows the parallax-induced depth map, which is based on the foregoing binocular disparity. It should be mentioned that the false matching points in our results could be decreased not just by optimizing the algorithms for matching, but by tweaking the Unity cameras for more realistic images.
5. Conclusions
A design of retinal prosthesis in conjunction with bionic vision processing scheme has been conceptually studied. The main contributions of this work are summarized as follows. Contribution 1. A paradigm shift in the field of metaverse/AR/VR for ushering in an implantable device. Contribution 2. Ganglion-based foveation by patterning the electrodes with respect to the density of ganglions rather than photoreceptors. Contribution 3. Bionic vision processing, which is capable of visualizing the spherical distortion, foveated blurring, binocular image fusion, and parallax-induced depth. Contribution 4. An interactive 3D eye model with the retinal images being rendered on the eyeball. Admittedly, retinal prostheses and other implantable devices might not have the chance to be popularized in our time. But believe it or not, this technology will be turned into a reality someday.
Funding
National Natural Science Foundation of China (61901264, 61831015); Science and Technology Commission of Shanghai Municipality (19ZR1427200); Natural Science Foundation of Chongqing (cstc2021jcyj-msxmX1136).
Disclosures
The authors declare no conflicts of interest.
Data availability
Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.
References
1. N. Stephenson, Snow Crash (Bantam Books, 1992).
2. J. E. Melzer and K. Moffitt, Head-Mounted Displays: Designing for the User (McGraw-Hill, 1997).
3. T. Levola, “Diffractive optics for virtual reality displays,” J. Soc. Inf. Disp. 14(5), 467–475 (2006). [CrossRef]
4. H. Mukawa, K. Akutsu, I. Matsumura, S. Nakano, T. Yoshida, M. Kuwahara, and K. Aiki, “A full-color eyewear display using planar waveguides with reflection volume holograms,” J. Soc. Inf. Disp. 17(3), 185–193 (2009). [CrossRef]
5. T. Vallius and J. Tervo, “Waveguides with extended field of view,” US Patent 9,791,703 B1 (2016).
6. Y. Wu, C. P. Chen, L. Zhou, Y. Li, B. Yu, and H. Jin, “Design of see-through near-eye display for presbyopia,” Opt. Express 25(8), 8937–8949 (2017). [CrossRef]
7. D. Grey and S. Talukdar, “Exit pupil expanding diffractive optical waveguiding device,” US Patent 10,359,635 B2 (2018).
8. W. Zhang, C. P. Chen, H. Ding, L. Mi, J. Chen, Y. Liu, and C. Zhu, “See-through near-eye display with built-in prescription and two-dimensional exit pupil expansion,” Appl. Sci. 10(11), 3901 (2020). [CrossRef]
9. C. P. Chen, L. Mi, W. Zhang, J. Ye, and G. Li, “Waveguide-based near-eye display with dual-channel exit pupil expander,” Displays 67, 101998 (2021). [CrossRef]
10. C. P. Chen, L. Mi, W. Zhang, J. Ye, and G. Li, “Wide-field-of-view near-eye display with dual-channel waveguide,” Photonics 8(12), 557 (2021). [CrossRef]
11. C. P. Chen, Y. Cui, Y. Chen, S. Meng, Y. Sun, C. Mao, and Q. Chu, “Near-eye display with a triple-channel waveguide for metaverse,” Opt. Express 30(17), 31256–31266 (2022). [CrossRef]
12. Y. Wu, C. P. Chen, L. Mi, W. Zhang, J. Zhao, Y. Lu, W. Guo, B. Yu, Y. Li, and N. Maitlo, “Design of retinal-projection-based near-eye display with contact lens,” Opt. Express 26(9), 11553–11567 (2018). [CrossRef]
13. S. Lan, X. Zhang, M. Taghinejad, S. Rodrigues, K.-T. Lee, Z. Liu, and W. Cai, “Metasurfaces for near-eye augmented reality,” ACS Photonics 6(4), 864–870 (2019). [CrossRef]
14. J. Chen, L. Mi, C. P. Chen, H. Liu, J. Jiang, and W. Zhang, “Design of foveated contact lens display for augmented reality,” Opt. Express 27(26), 38204–38219 (2019). [CrossRef]
15. Wikipedia, “Cyborg,” https://en.wikipedia.org/wiki/Cyborg.
16. A. Makawi, V. A. Shah, P. H. Tang, L. A. Kim, J. D. Akkara, and S. R. Montezuma, “Retina prosthesis,” https://eyewiki.aao.org/Retina_Prosthesis.
17. Wikipedia, “Retina,” https://en.wikipedia.org/wiki/Retina.
18. Wikipedia, “Retinal implant,” https://en.wikipedia.org/wiki/Retinal_implant.
19. Wikipedia, “Retinal ganglion cell,” https://en.wikipedia.org/wiki/Retinal_ganglion_cell.
20. L. Mi, C. P. Chen, Y. Lu, W. Zhang, J. Chen, and N. Maitlo, “Design of lensless retinal scanning display with diffractive optical element,” Opt. Express 27(15), 20493–20507 (2019). [CrossRef]
21. J. Chen, L. Mi, C. P. Chen, H. Liu, J. Jiang, W. Zhang, and Y. Liu, “A foveated contact lens display for augmented reality,” Proc. SPIE 11310, 1131004 (2020). [CrossRef]
22. P. Chakravarthula, Z. Zhang, O. Tursun, P. Didyk, Q. Sun, and H. Fuchs, “Gaze-contingent retinal speckle suppression for perceptually-matched foveated holographic displays,” IEEE Trans. Vis. Comput. Graph. 27(11), 4194–4203 (2021). [CrossRef]
23. V. Dragoi and C. Tsuchitani, “Chapter 15: visual processing: cortical pathways,” https://nba.uth.tmc.edu/neuroscience/s2/chapter15.html.
24. G. Li, M. McGill, S. Brewster, C. P. Chen, J. A. Anguera, A. Gazzaley, and F. Pollick, “Multimodal biosensing for vestibular network-based cybersickness detection,” IEEE J. Biomed. Health Inform. 26(6), 2469–2480 (2022). [CrossRef]
25. Wikipedia, “Dai Vernon,” https://en.wikipedia.org/wiki/Dai_Vernon.
26. S. Bhartiya, M. Ariga, G. V. Puthuran, and R. George, Practical Perimetry (JP Medical, 2016), Chap. 1.
27. Wikipedia, “Field of view,” https://en.wikipedia.org/wiki/Field_of_view.
28. Wikipedia, “Visual field,” https://en.wikipedia.org/wiki/Visual_field.
29. Wikipedia, “Blind spot (vision),” https://en.wikipedia.org/wiki/Blind_spot_(vision).
30. A. B. Watson, “A formula for human retinal ganglion cell receptive field density as a function of visual field location,” J. Vision 14(7), 15 (2014). [CrossRef]
31. Wikipedia, “Depth perception,” https://en.wikipedia.org/wiki/Depth_perception.
32. Wikipedia, “Epipolar geometry,” https://en.wikipedia.org/wiki/Epipolar_geometry.
33. Wikipedia, “Human eye,” https://en.wikipedia.org/wiki/Human_eye.
34. M. Bass, C. DeCusatis, J. Enoch, V. Lakshminarayanan, G. Li, C. MacDonald, V. Mahajan, and E. Van Stryland, Handbook of Optics Volume III: Vision and Vision Optics (3rd ed.) (McGraw-Hill, 2009).
35. Wikipedia, “Ocular dominance,” https://en.wikipedia.org/wiki/Ocular_dominance.