Depth of field expansion method for integral imaging based on diffractive optical element and CNN

Ruyi Zhou; Chenxiao Wei; Haowen Ma; Shuo Cao; Munzza Ahmad; Chao Li; Jingnan Li; Yutong Sun; Yongtian Wang; Juan Liu

doi:10.1364/OE.503056

1. Introduction

Three-dimensional (3D) display technology includes holography [1,2], volumetric 3D display [3], and light field 3D display [4,5]. Integral imaging technology belongs to light field display, it can achieve full parallax naked-eye 3D display through a micro-lens array. As illustrated in Fig. 1, achieving multi-view stereoscopic display through integral imaging requires two essential steps including the acquisition and reconstruction of elemental images [6,7]. During the process of recording elemental images, capture the image of the 3D scene observed at each micro-lens position, then synthesize these images into an elemental image array, which contains information about different views of the 3D scene; During the subsequent reconstruction process, load the elemental image array onto the display screen. According to the principle of reversible optical path, the observer can view the real 3D scene from different angles through the micro-lens array, leading to an immersive and multi-view stereoscopic experience [8].

Fig. 1. Integral imaging elemental images acquisition and reconstruction process.

Download Full Size | PDF

The limitation of lens imaging performance is one of the main obstacles to the widespread application of lens-based display systems, including lens-based integral imaging systems [9]. Such systems face difficulties in achieving clear 3D reconstructed images across a large depth of field (DoF) range, as depicted in Fig. 1. While lens-based integral imaging systems can achieve clear imaging at the central depth plane (CDP), the system aberration increases with the defocus of the imaging position, resulting in image blurring and distortion, which limits the DoF of the system and reduces the imaging resolution. Overcoming these challenges is crucial and represents an important research area.

Improving integral imaging DoF has attracted a lot of interest in academic research. Many research works have expanded the DoF by constructing multiple CDPs through optical path control or multi-layer display modules [10–12]. Replacing micro-lens arrays with multifocal imaging elements such as bifocal liquid crystal lenses, bifocal holographic optical elements, zoom liquid lenses, and multifocal micro-lens arrays has also been widely adopted in the expansion of DoF [13–19]. The DoF can be improved by optimizing hardware, but it requires the use of the spatiotemporal multiplexing method or may have an impact on the imaging quality, resulting in drawbacks such as resolution reduction and aliasing effects. Optimization algorithms based on computer technology have been extensively used in the display field to improve imaging quality, image processing of elemental images can also achieve DoF extensions, such as selective depth deconvolution filtering and multi-focus elemental image fusion method [20–21], but the improvement in DoF and image quality is limited. Notably, X. Yu et al. combined a convolutional neural network (CNN) to end-to-end optimize the diffractive optical element (DOE) and pre-filter elemental images, expanding the DoF to 30 mm [22–23]. This method combines the design of optical elements with image processing, offering a promising solution for high resolution and large DoF integral imaging display.

However, jointly optimizing optical elements and CNN may raise specific restrictions in optimizing accuracy. On the one side, when simulating the reconstructed image by calculating the point spread function (PSF), the sampling interval of the PSF needs to match the actual voxel size of the image plane. Display screens have a pixel density of approximately 200-800 ppi, with pixel sizes ranging from tens to hundreds of micrometers. After magnification, the voxel size becomes even larger. This poses a challenge to the optimization, as the PSF sampled with the voxel size might be too sparse to precisely describe the imaging quality, while using smaller sampling intervals could lead to overcorrection of the images. On the other side, in order to simplify calculations, only the PSF in the center field of view (FOV) is used to simulate the reconstructed image and pre-correct the elemental image. The PSF in the edge FOV undergoes some deformation compared to the PSF in the central FOV. Due to the system magnification, only a portion of the elemental image corresponding to the viewing angle can be seen during display. Using the PSF in the center FOV to pre-correct the entire elemental image may not accurately represent the complete imaging characteristics of the system.

To overcome the above constriction, and enhance both the DoF and resolution of the reconstructed scene, we propose a high-accuracy DoF extension method for integral imaging. This method combines DOE phase distribution design using the Adam algorithm and image pre-correction through CNN to achieve a large DoF and high-resolution display. During the optimization of the DOE, a smaller sampling interval is used to calculate the PSF distribution of DOE, in tandem with a RMS radius merit function, we achieve a more precise optimization of DOE phase distribution to obtain depth-invariant PSF and improve the imaging quality within the depth range; When training the parameters of the CNN, we choose a sampling interval that matches the voxel size of the image plane and targeted pre-correct elemental image segments corresponding to different FOVs, consequently compensating for the image blur introduced by DOE and elevating image resolution. By incorporating Feature loss into the training, the pre-corrected elemental images effectively align with the specific demands of actual display requirements, leading to enhanced visual quality.

2. Method

In this work, we propose a DoF expansion method to optimize DOE and pre-correct original images through the Adam algorithm and CNN. The proposed method applied in integral imaging can be divided into two parts, including the design of DOE phase distribution based on the Adam algorithm and the pre-correction of elemental images based on CNN.

2.1 Algorithm framework

In the design process of DOE phase distribution, the distribution of thickness affects the phase modulation, the thickness parameter of DOE plays a pivotal role in representing the phase distribution and serves as the optimization variable. Through optimization, we can obtain a DOE with depth-invariant PSF and low sensitivity to defocus, achieving clear and uniform imaging in the entire DoF range. The specific process of the optimization of DOE phase distribution comprises several steps, as shown in Fig. 2. First, calculate the PSF distribution of the DOE in the center FOV at different depth image planes, then convolve the original image with the PSF to simulate reconstructed images, the final step involves calculating the loss between the reconstructed images and the original image, and employ the Adam algorithm to optimize the phase distribution parameter of the DOE according to the loss value.

Fig. 2. DOE phase distribution optimization process through Adam algorithm.

Download Full Size | PDF

Although the optimized DOE enhances imaging performance within the entire DoF compared to conventional lens-based imaging systems, some aberrations are difficult to eliminate and may result in a degradation in imaging quality, there is still a gap between the reconstructed images and the ideal image. The elemental image array loaded on the screen is pre-corrected through CNN to improve the resolution and clarity of the integral imaging display. The specific pre-correction process is shown in Fig. 3. According to the optimized DOE phase distribution, first we calculate its PSF distribution at different depth image planes and different FOVs, then input the original image into the CNN to obtain the pre-corrected output image. The output image is divided into sub-regions corresponding to different FOVs and convolved with corresponding PSFs to simulate the reconstructed images. The simulated images are then compared to the ideal image, and a loss function is calculated to quantify the discrepancies. Through iterative optimization, the parameters within the CNN are updated by minimizing the loss value. To some extent, the pre-correction process can compensate for the residual blur introduced by DOE throughout the entire DoF range, improving the display resolution of the integral imaging system.

Fig. 3. Elemental image pre-correction process based on CNN.

Download Full Size | PDF

2.2 DOE phase distribution optimization

To achieve a large DoF display, we establish a rotationally symmetric DOE model as the imaging element. The methodology involves selecting multiple specific depth image planes within the DoF range and simulating the reconstructed image of these planes, calculating the average loss between the reconstructed images and the original image, and optimizing the DOE parameters according to the average loss. When simulating the reconstructed image, we calculate the PSF distribution of DOE in the central FOV, to describe the response function of DOE to the incident point light source on the display screen. By convolving the original image with the PSF, we generate the reconstructed image at the depth image plane.

The amplitude of the spherical wave incident by the point light source $\textrm{A}$ on the screen onto the DOE can be expressed as

(1)$$A({{x_1},{y_1}} )= \; \frac{{{A_0}}}{{\sqrt {{{({{x_1} - {x_0}} )}^2} + {{({{y_1} - {y_0}} )}^2} + {d_0}^2} }}$$

${d_0}$ is the distance from the screen to the DOE array, ${d_1}$ is the distance from the DOE array to the depth imaging plane. The coordinate system of point light source $\textrm{A}$ on the display screen is denoted as $({{x_0},{y_0}} )$, ${A_0}$ is the intensity of point light source $\textrm{A}$, ${\varphi _0}$ is the initial phase, the coordinate system of DOE plane space is denoted as $({{x_1},{y_1}} )$.

Wave number $k = \frac{{2\mathrm{\pi }}}{\lambda }$, the phase of the incident spherical wave can be expressed as

(2)$$\varphi ({{x_1},{y_1}} )= k\sqrt {{{({{x_1} - {x_0}} )}^2} + {{({{y_1} - {y_0}} )}^2} + {d_0}^2} + {\varphi _0}$$

$D({{x_1},{y_1}} )$ is the amplitude distribution of DOE, $h({{x_1},{y_1}} )$ is the thickness parameter of DOE, the refractive index of DOE material is $\textrm{n}$, the phase distribution of DOE can be expressed as

(3)$$\varPhi ({{x_1},{y_1}} )= k({n - 1} )h({{x_1},{y_1}} )$$

The complex amplitude distribution of the emitted light field after passing through DOE can be represented as

(4)$$P({{x_1},{y_1}} )= \; D({{x_1},{y_1}} )\; A({{x_1},{y_1}} ){\textrm{e}^{\textrm{i}[{\varPhi ({{x_1},{y_1}} )+ \varphi ({{x_1},{y_1}} )} ]}}$$

The spatial coordinate system of the depth image plane is denoted as $({x,y} )$, and the complex amplitude distribution of the emergent light field at the depth image plane is obtained after Fresnel diffraction, as shown below

(5)$$U({x,y} )= \frac{{{\textrm{e}^{\textrm{i}kd1}}}}{{\textrm{i}\lambda {d_1}}}{\textrm{e}^{\frac{{\textrm{i}k}}{{2{d_1}}}({{x^2} + {y^2}} )}}\mathrm{\int\!\!\!\int }P({{x_1},{y_1}} ){\textrm{e}^{\frac{{\textrm{i}k}}{{2{d_1}}}({{x_1}^2 + {y_1}^2} )}}{\textrm{e}^{ - \frac{{\textrm{i}k}}{{{d_1}}}({{x_1}x + {y_1}y} )}}d{x_1}d{y_1}$$

The PSF at the depth image plane can be expressed as

(6)$$PSF({x,y} )= {|{U({x,y} )} |^2}$$

Due to the rotational symmetry of the DOE model, the DOE phase distribution can be represented by one-dimensional (1D) thickness parameters along the DOE radius. During optimization, only the 1D parameters of the DOE need to be iterated, which greatly simplifies the computational complexity. The 1D PSF distribution is calculated from the 1D DOE thickness parameter, each PSF is represented by the 1D first-order Bessel function of the first kind [24]. The 1D PSF distribution is converted into the two-dimensional PSF distribution by the linear interpolation method.

We set l depth image planes within the entire DoF, the coordinate system of the ith $({i = 1,2,\ldots ,l} )$ depth image plane is denoted as $({{x_i},{y_i}} )$, and the corresponding PSF distribution is denoted as $PS{F_i}({{x_i},{y_i}} )$, the intensity distribution of the elemental image is $I({{x_i},{y_i}} )$, the simulated reconstructed image intensity distribution $R{I_i}({{x_i},{y_i}} )$ can be represented as

(7)$$R{I_i}({{x_i},{y_i}} )= I({{x_i},{y_i}} )\ast PS{F_i}({{x_i},{y_i}} )$$

Loss between each reconstructed image and the ideal image is calculated to update the DOE parameters. The selection of a loss function is an important factor in the optimization of the Adam algorithm, different loss functions will lead to different results. The most commonly used loss function includes MAE and MSE, which quantitatively assess the error of each pixel value between two images. The structural similarity index (SSIM) extracts the brightness, contrast, and structure characteristics of the two images for comparison. Compared to the loss function calculated per pixel, it can better align with the judgment of human visual similarity between two images. The multi-scale structural similarity (MS-SSIM) calculates the structural similarity at different image scales. Based on SSIM, the influence of factors such as viewer distance and pixel information density are considered.

The above loss function evaluates the quality of the reconstructed image. Notably, parallels exist between the design of DOE and conventional optical systems, the evaluation function used in optical design can also be used to optimize DOE parameters. In this work, we incorporate a loss function to compute the root mean square radius (RMS) of the PSF distribution. RMS radius can reflect the size of the speckles and quantitatively reflect the dispersion of the PSF distribution. The calculation of PSF involves obtaining a 1D PSF distribution $\textrm{PSF}(r )$, where r denotes the distance from the sampling point on the PSF radius to the reference center point, the RMS radius can be calculated as

(8)$$RMS = \sqrt {\frac{{\sum {r^3} \cdot PSF(r )}}{{\sum r \cdot PSF(r )}}} $$

Incorporating the RMS radius as a loss function to optimize DOE phase distribution can reduce stray light, and lead to a more concentrated PSF distribution at the depth image plane.

We optimize the DOE parameter by using various combinations of multiple loss functions, numerical and experimental results using different loss functions are given in Section 3.

2.3 Elemental image pre-correction based on CNN

After DOE optimization, according to the PSF distribution of DOE in different depth image planes and different FOVs, CNN is trained to accurately pre-correct the elemental image to compensate for the aberration introduced by DOE in the entire DoF range.

Due to the difference between the edge FOV PSF and the central FOV PSF, precise pre-correction is applied to elemental images within different FOVs. In the process of simulating the reconstructed image, the original image $I({{x_0},{y_0}} )$ is segmented into $\textrm{m} \times \textrm{n}$ chunked images ${C_{jk}}({{x_0},{y_0}} )$, where $j = 1,2,\ldots \textrm{m},k = 1,2, \ldots \textrm{n}$, we calculate the PSF distribution $PS{F_{ijk}}({{x_i},{y_i}} )$ of the spherical wave emitted by the point light source located at the center of each chunked image, the wave propagates to the i-th$({i = 1,2,\ldots ,l} )$ depth image plane after passing through DOE, the PSF calculating formula is the same as Eq. (6) in Section 2.2. The difference is that the PSF distribution at the off-axis FOV no longer has central symmetry, so we use the Fourier transform to directly calculate the 2D PSF. To align the PSF model with integral imaging display requirements, interpolate the PSF distribution based on the voxel size of each depth image plane. Then we convolve the chunked image ${C_{jk}}({{x_0},{y_0}} )$ with the PSF distribution $PS{F_{ijk}}({{x_i},{y_i}} )$ to obtain the reconstructed image of the segmented image in the i-th depth image plane

(9)$$R{C_{ijk}}({{x_i},{y_i}} )= {C_{jk}}({{x_0},{y_0}} )\ast PS{F_{ijk}}({{x_i},{y_i}} )$$

The reconstructed image $R{I_i}({{x_i},{y_i}} )$ of the i-th depth image plane after splicing the segmented reconstructed image $\textrm{R}{\textrm{C}_{ijk}}({{x_i},{y_i}} )$ is as follow

(10)$$R{I_i}({{x_i},{y_i}} )= \left( {\begin{array}{ccc} {R{C_{i11}}({{x_i},{y_i}} )}& \cdots &{R{C_{im1}}({{x_i},{y_i}} )}\\ \vdots & \ddots & \vdots \\ {R{C_{i1n}}({{x_i},{y_i}} )}& \cdots &{R{C_{imn}}({{x_i},{y_i}} )} \end{array}} \right)$$

The elemental image pre-correction process is illustrated in Fig. 3, commencing with the input of the elemental image into the CNN to output the pre-corrected image, reconstructed images of the pre-corrected image in each depth plane can be calculated by Eq. (10), then update the CNN parameters based on the loss function. The U-net model structure is utilized to pre-correct the elemental image. As shown in Fig. 4, the network architecture can be divided into two segments, consisting of an encoder and a decoder. The encoder processes the input image with a series of down-sampling modules. Each down-sampling module consists of a convolution layer and activation function (LeakyRelu), which gradually reduces the spatial resolution of the image, increases the number of channels, and extracts high-level feature information from the image. Feature information includes image texture, shape, edges, etc., which can represent abstract features of the image; The decoder performs a series of up-sampling modules to process the feature information output by the encoder. Each up-sampling module is composed of an up-sampling layer, convolution layer, and activation function (LeakyRelu), which gradually restores the low-resolution feature map to the size of the input image.

Fig. 4. Elemental image pre-correction process based on CNN.

Download Full Size | PDF

The overall pre-correction CNN is structured in the form of a “U”, where the size of the feature map reaches its minimum at the center of the network. At the same time, skip connection is added in pre-correction CNN to establish a link between the output of the down-sampling modules in the encoder and the input of the up-sampling modules in the decoder, to form the feature information of higher channel number and input it into concatenate module, which is composed of a convolution layer and activation function (LeakyRelu). Skip connections can retain both high-level and low-level feature information of the input image, and more accurately reconstruct the shape, texture, and other detailed information of the pre-corrected image.

To enhance the training capacity, in addition to the loss function mentioned in Section 2.2, we incorporate the VGG16 feature extraction network as a feature extractor to calculate Feature loss [25]. VGG16 neural network is widely used in image feature extraction. This approach can reduce the error of high-dimensional features and generate reconstructed images that are more consistent with human visual sense. As shown in Fig. 4, input the original image ${\textrm{I}_1}$ into VGG16 feature extractor to obtain five feature layers ${\textrm{I}_2} - {\textrm{I}_6}$, input the reconstructed image ${\textrm{R}_1}$ into VGG16 feature extractor to obtain five feature layers ${\textrm{R}_2} - {\textrm{R}_6}$. The Feature loss can be calculated as the MSE loss between the feature layers of the original image and the reconstructed image, where ${w_j}$ is the weight parameter

(11)$$loss = \mathop \sum \nolimits_{j = 1}^6 {w_j}\ast {L_{MSE}}({{I_j},{F_j}} )$$

We train the parameters of CNN by different combinations of MAE, SSIM, MS-SSIM, and Feature loss, and demonstrate the training results of different loss functions in Section 3.

3. Numerical simulation and experimental result

To verify the effectiveness of the proposed method for improving the DoF and resolution of reconstructed images, we conduct numerical simulations and optical experiments on the proposed method and compare it with the reference method in prior work.

3.1 Numerical simulation

In order to enhance the resolution of the integral imaging display, a Sony 5.5 inch 4 k screen is selected as the display screen, with a pixel density of up to 807 ppi and a pixel size of approximately 31.5µm. The DOE array contains 16*9 tightly arranged square DOEs with a spacing of 7.55 mm. The size of the elemental image used in the simulation is the same as that of the DOE, each DOE corresponds to an elemental image with a pixel count of 240*240. The distance from the display screen to the DOE array is 100 mm. We set 9 depth image planes through the entire depth range, which are 200 mm, 250 mm, 300 mm, 350 mm, 400 mm, 450 mm, 500 mm, 550 mm, and 600 mm away from the DOE array, with a magnification of 2-6. The voxels size of each depth image plane are 63µm, 78.75µm, 94.5µm, 110.25µm, 126µm, 141.75µm, 157.5µm, 173.25µm, 189µm, respectively. The optimized wavelengths selected were 639 nm, 532 nm, and 473 nm, respectively.

While optimizing the phase distribution of DOE with the Adam algorithm, the DOE feature size is 3.74µm, with a total pixel number of 2018*2018. Simplify the 2D phase distribution as a 1D thickness parameter on the radius, the maximum radius of a square DOE is located at the diagonal position, with a maximum value of approximately 5.3386 mm. Therefore, a total of 1427 thickness parameters are subject to optimization. The PSF sampling interval selected for simulating the reconstructed image in each depth image plane is 15.5µm. The dataset used for training is built on the dataset DIV2K. By randomly cropping a 512*512 area from high-resolution images in DIV2K, we generate 2000 training images and 200 validation images as the dataset. To comprehensively simulate the reconstructed image, the 1D PSF size is set to 512 during calculation, and the 2D PSF size of 1023*1023 is obtained by using the Linear interpolation algorithm. With the purpose of reducing the calculation amount, the PSF distribution with the size of 511*511 at the center is taken as the convolution kernel to simulate the reconstructed image, with the size of 7.936*7.936 mm, covering the elemental image size. Regions outside the central region, which do not participate in the reconstructed image simulation, cannot be optimized by the image quality evaluation loss function. Therefore, we added the RMS loss mentioned in Section 2.2 to optimize the PSF distribution as concentrated as possible. The initial structure of DOE is chosen as a single lens with a focal length of 8 cm, with a central depth plane of 400 mm. The initial learning rate is set to 1e-8, the batch size is set to 2, and trained for 4 epochs.

As shown in Fig. 5, we first assess the 1D PSF distribution of the lens and three optimized DOE at nine depth image planes and three wavelengths: (1) proposed DOE optimized without RMS Loss; (2) proposed DOE optimized with RMS loss; (3) reference DOE optimized following the method described in prior work with the same design parameters [22]. It is evident that post-optimization, the PSF distribution is significantly more concentrated compared to the lens. As shown in Fig. 5(f)-(h), compared with the reference DOE, the center part of the 1D PSF is more concentrated, and the proposed DOE is optimized more precisely and obtains better imaging quality. As demonstrated in Fig. 5(d), without RMS loss, numerous rays extend beyond the imaging area of the elemental image (the second half of the PSF distribution), affecting the reconstruction performance of adjacent elemental images. The introduction of RMS loss substantially reduces these stray light effects. As illustrated in Fig. 6, we compared the simulation results of the reconstructed images of the proposed DOE optimized using different loss functions with conventional lens-based method and reference DOE. The averaged reconstructed image error value and RMS value of the optimized DOE over 200 validation images are shown in Fig. 7. Compared to the reference method in prior work as shown in Fig. 6(b), the proposed method can better reflect DOE imaging performance, DOE optimized with the proposed method and the combination of MAE, MS-SSIM, Feature loss and RMS loss function achieves a better effect. The proposed DOE can clearly image within a 40 cm DoF range, which verifies the feasibility of optimizing DOE phase distribution to expand the DoF.

Fig. 5. 1D PSF distribution of lens and DOE optimized with methods. (a) Proposed DOE optimized without RMS Loss. (b) A single lens with a focal length of 8 mm. (c) Proposed DOE optimized with RMS loss. (d) Enlarged image of the second half of the PSF distribution. (e) Reference DOE. (f)-(h) Enlarged image of the center part of the 1D PSF.

Download Full Size | PDF

Fig. 6. Reconstructed images at five depth planes using different methods. (a) Reconstructed images of a single lens with a focal length of 8 mm. (b) Reference DOE. (c) Proposed DOE optimized using MAE + SSIM. (d) Proposed DOE optimized using MAE + MS-SSIM + Feature Loss. (e) Proposed DOE optimized using MAE + MS-SSIM + Feature Loss + RMS Loss.

Download Full Size | PDF

Fig. 7. Averaged Feature Loss, MAE, RMS, SSIM, MS-SSIM, PSNR over 200 validation images using different methods and loss functions.

Download Full Size | PDF

While pre-correcting the elemental image through CNN, the original elemental image is divided into 5*5 equal-sized segmented images, as detailed in section 2.3. The PSF distribution for each depth image plane and FOVs is computed based on the voxel size. The training dataset is composed of 2000 images from DIV2K with a resolution of 240*240 and 500 elemental images rendered by Blender, and the validation dataset is composed of 100 images from DIV2K and 100 elemental images. Throughout the training process, the batch size is set to 4, the initial learning rate is 1e-4, CNN model is trained for a total of 4 epochs. The simulation results of the reconstructed image in each depth image plane after using different methods are depicted in Fig. 8. Utilize the conventional CNN model with Batch Normalization (BN) layers as the pre-correction network can cause artifacts in the output pre-corrected image, as shown in Fig. 8 (d). Training the network with Feature loss can suppress artifacts but cannot completely eliminate them. After removing the BN layer, the artifacts disappear. Compared with the reference method in prior work and pre-correction network trained by other loss functions, elemental images pre-corrected with the proposed method trained by the combination of MAE, MS-SSIM, and Feature loss leads to better results. The addition of Feature loss contributes to clearer texture, better retention of detailed information, and images that are more in line with human perception. Figure 9 represents the comparison of the reconstructed image at 200 mm, 300 mm and its enlarged drawing, the image quality of the reconstructed images in the center FOV and edge FOV using the proposed method is higher. After adding Feature loss in the training, the window structure of the reconstructed image becomes significantly distinct. The averaged loss value of the reconstructed image over 200 validation images is shown in Fig. 10, the combined optimization of MAE, MS-SSIM, and Feature loss minimizes the overall error.

Fig. 8. Reconstructed images of different methods at five depth planes. (a) Reconstructed images of a single lens with a focal length of 8 mm without pre-correction. (b) Reconstructed images of proposed DOE without pre-correction. (c) Reconstructed images of reference DOE with reference pre-correction. (d)-(f) Reconstructed images of proposed DOE of image pre-corrected using: (d) MAE + MS-SSIM with BN. (e) MAE + MS-SSIM without BN. (f) MAE + MS-SSIM + Feature Loss without BN.

Download Full Size | PDF

Fig. 9. Comparison of the reconstructed images of the elemental image at 200 mm and 300 mm.

Download Full Size | PDF

Fig. 10. Averaged Feature Loss, MAE, MSE, SSIM, MS-SSIM, PSNR of reconstructed images of pre-corrected EI over 200 validation images using different loss functions.

Download Full Size | PDF

We simulate the integral imaging display results based on the proposed method and the lens-based method. We construct a 3D scene with a DoF of 40 cm in Blender, 3D models of a bird, a red panda, a tiger, a crocodile, and a cat are placed at depths ranging from 20-60 cm. While reconstructing, each model is focused on different depth image planes. To demonstrate motion parallax, the integral imaging reconstruction effects at different perspectives for the 200 mm depth image plane are shown in Fig. 11. The integral imaging display results for center FOV at 300-600 mm depth image planes are shown in Fig. 12, except for the depth plane of 400 mm, the reconstructed images of the proposed method apparently have better image quality than the lens-based method.

Fig. 11. Integral imaging display results from different perspectives using different methods at 200 mm. SSIM between the reconstructed image and the ideal image is marked.

Download Full Size | PDF

Fig. 12. Integral imaging display result using different methods at 200 mm. The enlarged image in the middle row is the reconstructed image of the 3D object focused on each depth imaging plane, SSIM between the reconstructed image and the ideal image is marked.

Download Full Size | PDF

On a single NVIDIA GeForce RTX 3060 GPU, the optimization process of the reference method takes 123h23m44s, the proposed method takes a total of 36h44m37s, the DOE phase optimization process takes 32h11m25s, and CNN training costs 6h33m12s. The computational complexity of gradient backpropagation in the reference method is extremely high, and our proposed method significantly compresses optimization time, for the same optimized data volume, the optimization time required for the proposed method is 29.8% of the reference method in prior work [22]. Over the 200 images in the validation dataset, the average SSIM and MS-SSIM values of the reconstructed images based on the lens, the reference method, the proposed optimized DOE, and the proposed method are shown in Fig. 13. After DOE optimization, compared with the reference method the SSIM and MS-SSIM values increased by 0.12 and 0.06, respectively. After pre-correction, the SSIM and MS-SSIM values increased by 0.06 again. Numerical simulation experiments have verified the feasibility and effectiveness of the proposed method in expanding the DoF and improving the resolution and clarity of the integral imaging display.

Fig. 13. Averaged SSIM, MS-SSIM of reconstructed images using the lens-based method, reference method, proposed optimized DOE and, proposed method.

Download Full Size | PDF

3.2 Optical experiment

To further verify the accuracy of the proposed method, we conduct an optical experiment to measure the PSF distribution of the optimized DOE and the reconstructed images of each depth image plane. We apply Jasper Display 4 k liquid crystal spatial light modulator (SLM) with a pixel size of 3.74 µm to load a square DOE phase distribution for 532 nm wavelength. Firstly, we measure the PSF distribution of DOE, a laser with a wavelength of 532 nm is used as the light source. The emitted laser passed through a pinhole with a diameter of 10 µm, the emergent light is equivalent to the spherical wave spherical wave originating from a point light source on the display screen. The wave loads the DOE phase distribution after passing through SLM. The PSF distribution within the DoF can be measured on the depth image planes 200-600 mm away from the SLM, and captured by a CCD with a sensor pixel size of 9 µm. We compare the PSF distribution of the lens, proposed DOE optimized without RMS loss, proposed DOE optimized with RMS loss, and the reference DOE, as shown in Fig. 14. RMS loss imposes a greater penalty for rays located far from the center point, after RMS loss participates in the optimization, the energy of the halo outside the center point of the PSF is weaker, the center point spot is closer to a circle, leading to a more concentrated PSF distribution and better optimization effects.

Fig. 14. Experimental result of point spread function distribution of lens, proposed optimized DOE and reference DOE at five depth planes.

Download Full Size | PDF

After completing the verification of PSF distribution, we measure the reconstructed image distribution on depth image planes after passing through optimized DOE. We used white light emitted by a strong flashlight as the light source, and illuminated a black cardboard with a hollow pattern placed 100 mm away from the SLM as the original image, lens phase distribution, proposed DOE phase distribution, and reference DOE phase distribution for 532 nm are loaded on SLM. The captured reconstructed images are shown in Fig. 15. The reconstructed image of the optimized DOE significantly reduces dispersion compared to lens-based reconstructed images. In the lens-based reconstruction, the red light is focused near the 200 mm imaging position, while the green light displays greater divergence. Conversely, near the 600 mm imaging position, the green light is well-focused while the red light exhibits divergence. Compared to the lens, the reconstructed image of optimized DOE has a smaller dispersion, and the edge of the optimized reconstructed image is clearer in the visible light band. Compare the proposed DOE with the reference DOE, the contrast between the pattern and the background in the reconstructed image is higher, and there is less scattered light around the pattern. This experiment can prove the feasibility of the proposed method to achieve low dispersion and large DoF imaging, leading to a high-resolution 3D display with large DoF.

Fig. 15. Experimental result of the reconstructed image of the lens, proposed optimized DOE, and reference DOE.

Download Full Size | PDF

4. Conclusion

In this work, we combine hardware optimization and image pre-correction process, propose a depth of field expansion method to achieve large DoF and high-resolution display, which applies the Adam algorithm to optimize the phase distribution of DOE, and use CNN to pre-correct original images to compensate for image quality degradation. We solve the optimization accuracy limitations during end-to-end optimization of DOE and CNN, the optimized DOE DoF reaches 400 mm and realizes 134-403 ppi high-resolution 3D display in multiple depth planes. In tandem with RMS loss and Featrue loss, we further improve DOE imaging performance and obtain reconstructed images with clearer texture, better retention of detailed information, and more in line with human perception. We verify the effectiveness of the proposed method by numerical simulation and further prove the feasibility by measuring the PSF of optimized DOE and the reconstructed image through optical experiments.

Though the proposed method can significantly expand DOE imaging DoF, there are still some drawbacks such as the reduction in the brightness of reconstructed image, the fabrication of DOE is difficult and costly. In this work, DOE prototypes and integral imaging systems were not actually fabricated, the experiment is conducted with the spatial light modulator. We will attempt to apply them to practical integral imaging displays in further research.

By offering a potential solution to the DoF limitation issues associated with conventional lens-based systems, the proposed DoF extension method can be widely applied to the DoF improvement of integral imaging display, light field display, AR/VR head-mounted display, et al. This combination of hardware design and image processing equally has broad prospects in the field of view expansion, resolution enhancement, and chromatic aberration elimination.

Funding

National Key Research and Development Program of China (2021YFB2802300); National Natural Science Foundation of China (61975014, 62035003, U22A2079); Beijing Municipal Science and Technology Commission, Adminitrative Commission of Zhongguancun Science Park (Z211100004821012).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. D. Pi, J. Liu, and Y. Wang, “Review of computer-generated hologram algorithms for color dynamic holographic three-dimensional display,” Light: Sci. Appl. 11(1), 231 (2022). [CrossRef]

2. C. Wei, R. Zhou, H. Ma, D. Pi, J. Wei, Y. Wang, and J. Liu, “Holographic display using layered computer-generated volume hologram,” Opt. Express 31(15), 25153–25164 (2023). [CrossRef]

3. D. Miyazaki, N. Hirano, Y. Maeda, S. Yamamoto, T. Mukai, and S. Maekawa, “Floating volumetric image formation using a dihedral corner reflector array device,” Appl. Opt. 52(1), A281–A289 (2013). [CrossRef]

4. J. Geng, “Three-dimensional display technologies,” Adv. Opt. Photon. 5(4), 456–535 (2013). [CrossRef]

5. X. Xiao, B. Javidi, M. Martinez-Corral, and A. Stern, “Advances in three-dimensional integral imaging: sensing, display, and applications [Invited],” Appl. Opt. 52(4), 546–560 (2013). [CrossRef]

6. J. Park, K. Hong, and B. Lee, “Recent progress in three-dimensional information processing based on integral imaging,” Appl. Opt. 48(34), H77–H94 (2009). [CrossRef]

7. C. Li, H. Ma, J. Li, S. Cao, and J. Liu, “Viewing angle enhancement for integral imaging display using two overlapped panels,” Opt. Express 31(13), 21772–21783 (2023). [CrossRef]

8. N.-Q. Zhao, J. Liu, and Z.-F. Zhao, “High performance integral imaging 3D display using quarter-overlapped microlens arrays,” Opt. Lett. 46(17), 4240–4243 (2021). [CrossRef]

9. X. Ma, H. Zhang, R. Yuan, T. Wang, M. He, Y. Xing, and Q. Wang, “Depth of field and resolution-enhanced integral imaging display system,” Opt. Express 30(25), 44580–44593 (2022). [CrossRef]

10. J. Jang and B. Javidi, “Large depth-of-focus time-multiplexed three-dimensional integral imaging by use of lenslets with nonuniform focal lengths and aperture sizes,” Opt. Lett. 28(20), 1924–1926 (2003). [CrossRef]

11. H. Choi, Y. Kim, J. Park, J. Kim, S. Cho, and B. Lee, “Layered-panel integral imaging without the translucent problem,” Opt. Express 13(15), 5769–5776 (2005). [CrossRef]

12. J. Hong, J. Park, S. Jung, and B. Lee, “Depth-enhanced integral imaging by use of optical path control,” Opt. Lett. 29(15), 1790–1792 (2004). [CrossRef]

13. X. Wang and H. Hua, “Design of a digitally switchable multifocal microlens array for integral imaging systems,” Opt. Express 29(21), 33771–33784 (2021). [CrossRef]

14. D. Shin, C. Kim, G. Koo, and Y. Hyub Won, “Depth plane adaptive integral imaging system using a vari-focal liquid lens array for realizing augmented reality,” Opt. Express 28(4), 5602–5616 (2020). [CrossRef]

15. X. Shen, Y. Wang, H. Chen, X. Xiao, Y. Lin, and B. Javidi, “Extended depth-of-focus 3D micro integral imaging display using a bifocal liquid crystal lens,” Opt. Lett. 40(4), 538–541 (2015). [CrossRef]

16. K. Kwon, Y. Lim, C. Shin, M. Erdenebat, J. Hwang, and N. Kim, “Enhanced depth-of-field of an integral imaging microscope using a bifocal holographic optical element-micro lens array,” Opt. Lett. 42(16), 3209–3212 (2017). [CrossRef]

17. X. Wang and H. Hua, “Depth-enhanced head-mounted light field displays based on integral imaging,” Opt. Lett. 46(5), 985–988 (2021). [CrossRef]

18. Z. Lv, J. Li, Y. Yang, and J. Liu, “3D head-up display with a multiple extended depth of field based on integral imaging and holographic optical elements,” Opt. Express 31(2), 964–975 (2023). [CrossRef]

19. S. Pinilla, S. Miri Rostami, I. Shevkunov, V. Katkovnik, and K. Egiazarian, “Hybrid diffractive optics design via hardware-in-the-loop methodology for achromatic extended-depth-of-field imaging,” Opt. Express 30(18), 32633–32649 (2022). [CrossRef]

20. M. Zhang, C. Wei, Y. Piao, and J. Liu, “Depth-of-field extension in integral imaging using multi-focus elemental images,” Appl. Opt. 56(22), 6059–6064 (2017). [CrossRef]

21. H. Navarro, G. Saavedra, M. Martínez-Corral, M. Sjöström, and R. Olsson, “Depth-of-Field Enhancement in Integral Imaging bySelective Depth-Deconvolution,” J. Display Technol. 10(3), 182–188 (2014). [CrossRef]

22. X. Xie, X. Yu, X. Gao, X. Pei, Y. Wang, X. Sang, and B. Yan, “Extended depth of field method with a designed diffraction optical element based on multi-depth fusion and end-to-end optimization,” Opt. Commun. 517, 128317 (2022). [CrossRef]

23. X. Pei, X. Yu, X. Gao, X. Xie, Y. Wang, X. Sang, and B. Yan, “End-to-end optimization of a diffractive optical element and aberration correction for integral imaging,” Chin. Opt. Lett. 20(12), 121101 (2022). [CrossRef]

24. X. Dun, H. Ikoma, G. Wetzstein, Z. Wang, X. Cheng, and Y. Peng, “Learned rotationally symmetric diffractive achromat for full-spectrum computational imaging,” Optica 7(8), 913–922 (2020). [CrossRef]

25. Y. Liu, C. Zhang, T. Kou, Y. Li, and J. Shen, “End-to-end computational optics with a singlet lens for large depth-of-field imaging,” Opt. Express 29(18), 28530–28548 (2021). [CrossRef]

Depth of field expansion method for integral imaging based on diffractive optical element and CNN

Abstract

1. Introduction

2. Method

2.1 Algorithm framework

2.2 DOE phase distribution optimization

2.3 Elemental image pre-correction based on CNN

3. Numerical simulation and experimental result

3.1 Numerical simulation

3.2 Optical experiment

4. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (15)

Equations (11)

Optics Express