## Abstract

Depth from defocus (DFD) obtains depth information using two defocused images, making it possible to obtain a depth map with high resolution equal to that of the RGB image. However, it is difficult to change the focus mechanically in real-time applications, and the depth range is narrow because it is inversely proportional to the depth accuracy. This paper presents a compact DFD system based on a liquid lens that uses chromatic aberration for real-time application and depth accuracy improvement. The electrical focus changing of a liquid lens greatly shortens the image-capturing time, making it suitable for real-time applications as well as helping with compact lens design. Depth accuracy can be improved by dividing the depth range into three channels using chromatic aberration. This work demonstrated the improvement of depth accuracy through theory and simulation and verified it through DFD system design and depth measurement experiments of real 3D objects. Our depth measurement system showed a root mean square error (RMSE) of 0.7 mm to 4.98 mm compared to 2.275 mm to 12.3 mm in the conventional method, for the depth measurement range of 30 cm to 70 cm. Only three lenses are required in the total optical system. The response time of changing focus by the liquid lens is 10 ms, so two defocused images for DFD can be acquired within a single frame period of real-time operations. Lens design and image processing were conducted using Zemax and MATLAB, respectively.

© 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

Recently, as cognitive research on AI and robots has progressed along with the 4th industrial revolution, the demand for RGB-D cameras that provide both RGB images and depth information has been increasing. Real-world recognition, i.e. 3D object detection, has become a very important problem in many robotics and computer vision applications [1]. Any object has both semantic and geometric features, which play a critical role in real-world recognition [2]. Semantic features correspond to RGB images rich in texture, edge, and intensity information, and geometric features correspond to depth maps containing distance information from the camera. These RGB images and depth maps provide complementary information, so RGB-D cameras are essential for 3D object detection.

RGB-D cameras generally consist of a color camera and a depth sensor [3]. Active depth sensors, such as time-of-flight (TOF) [4] and structured-light (SL) sensors [5], use their own light source and are able to quickly provide depth information. However, the depth-map resolution is significantly lower than that of the RGB image [6]. In addition, they are bulky, heavy, expensive, and sensitive to external light [7,8]. In addition, they are separated from the RGB sensor; therefore, the obtained depth map has an optical axis different from that of the RGB image [9]. Hence, it is important to consider the difference between the optical axes of the RGB image and the depth map for matching each other. On the one hand, there are the stereo method, the use of depth from focus (DFF), and depth from defocus (DFD) as passive sensors. The stereo method obtains a depth map by calculating the disparity between two images [10]. The correspondence matching not only is computationally expensive, but the system is bulky, and the depth of the occlusion area cannot be obtained. DFF uses dozens of differently focused images to find the best focus and then calculates the depth by the lens equation [11]. Because focus changing is usually done mechanically, it takes a few seconds to acquire whole focused images, which is unsuitable for real-time applications. In addition, changes in camera/object position and illumination that occur during image capture can cause serious depth errors [12]. DFD requires only one or two defocused images [12–14]. Single-image DFD is a texture variant, which has the disadvantages of the dead zone and depth ambiguity problem [15]. Two-image DFD calculates the depth by the difference in defocus between two defocused images [16,17]. Thus, it is free from the dead zone and depth ambiguity problems. Among DFD methods, rational filter-based DFD uses a far-focused image and a near-focused image [12,18]. The depth obtained by this method is texture-invariant, computed with high spatial resolution as well as high efficiency and accuracy. However, because this method also generally acquires defocused images by mechanically changing the focus setting (>100 ms), it is not suitable for real-time applications. A solution of simultaneously acquiring a pair of defocused images using a beam splitter was suggested, but this makes the system bulky [19]. In addition, a telecentric system is essential to minimize the magnification change between two defocused images, making the system bulky and complex [20]. Moreover, the depth range of DFD is smaller than that of DFF because the depth sensitivity and texture frequency range permissible for depth extraction decrease as the depth range increases, which adversely affects the depth accuracy.

In this paper, we propose a depth measurement system that applies two solutions to solve the problems mentioned. First, our proposed system exploits a liquid lens to obtain two defocused images. Due to its compact structure, low-cost, and non-mechanical/fast operation, numerous researches on a liquid lens have been conducted [21–23]. We use a commercialized electrowetting liquid lens. The response time of changing focus by the liquid lens is as fast as 10 ms [24], allowing real-time applications. In addition, although the focus of the liquid lens changes, a constant magnification is maintained, so a telecentric system is not required, and the system can be simplified. Second, we use chromatic aberration to increase the depth accuracy. There have been studies on applying chromatic aberration to single-image DFD, but those studies just focused on solving the two fundamental drawbacks of single-image DFD, namely, the dead zone and ambiguity problems [15,25]. Using chromatic aberration for two-image-based DFD increases the depth sensitivity and permissible frequency range; thus, accuracy can be increased. This work demonstrated the improvement of depth accuracy through theory and simulation and verified it through depth-measurement experiments of real 3D objects.

## 2. Theory of the rational filter-based depth from defocus

In this section, we describe the principle of the rational filter-based DFD exploited in this work [18,26]. The DFD optical model considered in this work is shown in Fig. 1, where defocused image pairs are obtained through the focus changing of an electrowetting liquid lens.

In Fig. 1, *u* is the distance between the object point and the lens, ${v_0}$ is the distance between the lens and the image sensor, and *a* is the radius of the lens aperture. Also, ${v_{1,2}}$ is the distance between the image plane of the object point and the lens when the synthetic diopter of the lenses is at ${D_{max,min}}$, and ${R_{1,2}}$ is the size of the resulting defocus radius. If the synthetic diopter of the lens causing the object point to focus is *D* and $\nabla D$ is defined as (${D_{max}} - {D_{min}}$)/2, the following relationship is established:

*u*can be obtained through a lens equation. To obtain $\alpha $, a normalized ratio M/P is introduced, which is the ratio of the amplitude difference and the amplitude sum for each frequency component of two defocused images. In general, many studies have used the pill-box model or Gaussian model as the defocus function [27,28]. In this work, however, a geometrical point spread function (PSF) was used to consider spherical aberration. If the defocus model is adopted as a geometrical PSF, M/P can be expressed as

*s*$- $1 to

*s*+1 when ${f_r}$ is less than ${f_r}_{max}$. Here,

*s*is not zero because the axial position at which the defocus is minimized shifts due to spherical aberration. Equation (2) can be modeled with a polynomial model in closed form. Modelling only need to be done for $\alpha $ in the range from

*s*$- $1 to

*s*$+ $1 by a rational expression of two linear combinations of basis functions. We modeled M/P using six basis functions because the geometrical PSF-based M/P is not only zero-crossing shifted by

*s*; it is also not point symmetric. After modelling is done, the coefficient functions become rational operators by going through the inverse Fourier transform, and $\alpha $ can be obtained by several convolution operations and the Newton-Rapshon method.

## 3. Accuracy improvement using chromatic aberration

In this section, we describe the principle of improving depth accuracy using chromatic aberration. Chromatic aberration is an aberration caused by a difference in the refractive index of a material according to wavelength. We analyzed the effect of chromatic aberration in terms of depth sensitivity and permissible frequency range, and we verified the effect through the simulation of various virtual objects.

#### 3.1 Increased depth sensitivity

In rational filter-based DFD, the normalized depth $\alpha $ is calculated first and the actual distance, that is, depth *u*, is calculated using the lens equation. Thus, using Eq. (1), the depth sensitivity can be expressed as

The depth sensitivity is inversely proportional to the product of $\nabla D$ and the square of depth *u*. If the depth sensitivity is small, even very small errors in α lead to large depth errors. We use chromatic aberration to reduce $\nabla D$ and consequently increase the depth sensitivity. Figure 2 shows the scheme of the proposed DFD using chromatic aberration.

$\textrm{Here},\; {D_{min/max}}_C$ is the synthetic diopter of the entire lens for C channel, and the corresponding distance to each focal plane is ${L_{{D_{min/max}}_C}}$. If there is no chromatic aberration, the images of each channel have all the same PSF information. Therefore, the color and saturation information are removed, and only the intensity information is extracted [29] or photographed with a monochrome camera to calculate depth. However, we design the lenses so that the depth ranges of each neighboring channel are adjacent to each other (${L_{{D_{max}}_R}} = \; {L_{{D_{min}}_G}},\; {L_{{D_{max}}_G}} = \; {L_{{D_{min}}_B}}$) as shown in Fig. 2. The depth is measured by dividing the total depth range into a small range of three R, G, and B channels. Thus, $\nabla {D_C} = \left( {{D_{max}}_C - {D_{min}}_C} \right)/2$ is three times smaller than $\nabla D$, and the depth sensitivity is increased by three times.

#### 3.2 Expanded permissible frequency range

The rational filter-based DFD is based on the frequency domain, whose accuracy is dependent on the texture of an object. The range of ${f_r}$, where M/P is a monotonic function, is 0 to ${f_r}_{max}$, which is the same as the width of the main lobe of the optical transfer function (OTF) when the object is defocused to the maximum as follows [12,26]:

$\textrm{Here},\; {f_r}_{max}$ refers to the range of texture frequency components of an object to be used for depth extraction. When an object is defocused to the maximum, textures with a frequency component higher than ${f_r}_{max}$ are removed and thus cannot be used for depth extraction. If ${f_r}_{max}$ is large, because there is a lot of texture frequency information to be used for depth extraction, depth accuracy can be improved. As the radius of the lens aperture*a*decreases, ${f_r}_{max}$ increases, but the f number also increases, resulting in large noise due to diffraction [30]. There is also a way to reduce ${v_o}$, but there is a limitation as it inevitably increases the diopter of the lens, which increases the aberration and makes the operation of the liquid lens very sensitive. In the previous section, we describe the use of chromatic aberration to increase the depth sensitivity. Similarly, $f_{rmax}$ can be increased by reducing $\nabla_{D}$ using chromatic aberration. As seen in Eq. (4), as $\nabla D$ decreases by three times to $\nabla D_{C}$, the maximum defocus decreases, so the main lobe width of OTF increases, and $f_{rmax}$ also increases three times to $f_{rmaxC}$.

#### 3.3 Verification through simulation

The proposed DFD uses a method of reducing $\nabla D$ to $\nabla {D_C}$ while maintaining parameters such as lens diameter and sensor distance by using chromatic aberration. In this process, the depth sensitivity and permissible frequency range increase together, so higher accuracy can be expected. We simulated the effectiveness of the proposed method by measuring the depth of a virtual flat pattern object at a distance of 600 mm. Image processing was carried out using the MATLAB program. First, the sum and difference of the far and near focused images are obtained, and prefilter is applied respectively, leaving only the frequency components useful for depth extraction. Then, after convoluting the sum and difference of the two images with the rational filters described in Section 2, the normalized depth α can be calculated using the Newton-Raphson method. Finally, the depth value can be calculated by Eq. (1). The simulation was conducted in two ways. The first is depth measurement using virtual image pairs with specific defocus applied by MATLAB, and the second is depth measurement using virtual image pairs rendered by ZEMAX (the optical system will be described in section 4.2). In the former case, there are no other aberrations except defocus, but in the latter case, all aberrations including defocus exist. Figure 3 shows the simulation results for the accuracy improvement when depth sensitivity and permissible frequency range were increased. The results of improved accuracy for either only depth sensitivity or expanded permissible frequency range are also shown together. In these cases, the size of the lens aperture was changed to keep another parameter as it was. The power spectral densities (PSD) of several images are also shown.

In the conventional DFD, an RMSE value of several millimeters was calculated due to the small depth sensitivity and narrow permissible frequency range. On the other hand, in the proposed DFD, it can be seen that a depth having a low RMSE value of less than 1 mm was measured owing to both high depth sensitivity and expanded permissible frequency range. Due to the high depth sensitivity, a small error in α did not lead to a large depth error. In addition, because ${f_r}_{max}$ increased and the texture frequency information available for depth extraction increased, accurate depth measurement was possible. Of course, even if only one of the depth sensitivity and the permissible frequency range was increased, the depth accuracy was improved in comparison to the conventional DFD. However, in the case of ‘wall’ images, there was no significant improvement in accuracy by expanding the permissible frequency range because the texture frequency components were rich even in a narrow frequency range as in black box of Fig. 3(g). On the contrary, for the ‘tree ring’ image, just expanding the frequency range greatly improved the depth accuracy. As in the black box in Fig. 3(h), it was difficult to accurately measure depth with conventional DFD because the image had a sparse frequency distribution in a narrow frequency range. However, as the permissible frequency range was expanded as shown in the red box in Fig. 3(h), the texture frequency components available for depth extraction increased, and the depth error was greatly reduced. Figure 3(i) is the depth result using image pairs rendered by Zemax. In this case, there are some other aberrations including defocus. Therefore, although the RMSE values were slightly higher than that of corresponding simulation results using the images defocused by MATLAB, it can be confirmed that the accuracy was improved with the proposed method. From these simulation results, it can be seen that the proposed DFD increasing the depth sensitivity and permissible frequency range using chromatic aberration is reasonable for improving depth accuracy.

To get the final depth map, it is necessary to combine the measured accurate depth maps for three channels into one depth map. The depth of the G channel can be divided into three areas according to the measured depth value. If the depth of a specific area of the G channel is within the corresponding depth range of the G channel, it is regarded as the accurate depth. If it is higher or lower than the corresponding depth range, it is replaced with depth of the R or B channel, respectively. To validate this method, simulations were conducted to measure the depth of a virtual staircase with the sine pattern of a single frequency. A total of 35 steps spanned the whole depth range from ${L_{{D_{max}}_G}}$ to ${L_{{D_{min}}_R}}$, and the frequency of the sine pattern was set to various values within the permissible frequency range. Figure 4(a) presents the simulation process, and Fig. 4(b) shows the results of depth measurements for the G channel over the entire depth range.

As shown in Fig. 4(b), when the depth of the entire depth range was measured with only the G channel, the depth was accurately extracted only in the depth range of the G channel (green box), but the depth values of other ranges were inaccurate. Depth values outside the corresponding depth range were not accurate, but appeared monotonic. This is because even though M/P was modeled only within the corresponding range, the modeled M/P followed actual M/P somewhat similarly even outside the corresponding range.

## 4. Fast image capturing for real-time operation using an electrowetting liquid lens

#### 4.1 Electrical focus variation in an electrowetting liquid lens

We use an electrowetting liquid lens for focus changing in our proposed DFD system. Electrowetting is a phenomenon that controls the surface tension of a conducting liquid by electricity and changes the contact angle *θ* of the droplets on the surface of the solid substrate. In this way, the focus can be changed by controlling the curvature of the liquid interface with the magnitude of the voltage as shown in Fig. 5. Because no mechanical movement is involved in the focus variable, it is easy to miniaturize and, above all, fast operation is possible. Although it differs according to the amount and type of liquid, the response time of the electrowetting liquid lens is 10 ms, and with overshoot, it can be as fast as 7.5 ms [24].

As the aperture of the liquid lens increases, the shape of the lens can be distorted due to the effect of gravity, and the optical aberrations increase. However, the gravitational deformation of the liquid lens can be avoided if the density of the two types of liquids constituting the liquid lens are the same [31]. In this paper, we used Corning Varioptic’s liquid lens, which is composed of two liquids of the same density, so that deformation by gravity does not occur [32]. The aperture of the liquid lens can be set relatively freely as long as it is possible to avoid gravitational deformation through density matching. As the aperture of the liquid lens increases, the brightness increases, but the defocus increases as well, so that the permissible frequency range used for depth extraction decreases, so the accuracy decreases. Also, the response time of the liquid lens increases [33]. So, depending on the application, a liquid lens of the appropriate size is needed. A 2.5mm aperture liquid lens was used in this paper, but larger or smaller lenses could be used.

We present a fast and compact DFD system using this electrowetting liquid lens. The traditional method of changing the focus setting by adjusting the distance between the lens and the sensor has a slow operation speed (>100 ms) [34], so a beam splitter is required for real-time application [19]. Also, in such a method, a telecentric system is required for a fixed magnification between defocused images, which makes the system complex and bulky [20]. However, all of the above problems can be solved with a liquid lens. Figure 6 shows the operation of an electrowetting liquid lens to acquire defocused images in DFD.

To obtain two defocused images, the curvature of the liquid lens is controlled by voltage. When the voltage is low, a far-focused image is captured, and when the voltage is high, a near-focused image is obtained. The fast focus variation of the liquid lens makes it suitable for real-time applications [35]. Moreover, because the distance ${v_0}$ between the liquid lens and the sensor does not change while the two defocused images are being acquired, the magnification of each object is kept constant. There is no need for a telecentric system to maintain a constant magnification between the two defocused images. Thus, the optical system can be made compact, and off-axis aberration is also easier to suppress.

#### 4.2 Lens design and performance

In this section, we describe the process of the lens design for our proposed chromatic DFD. We needed to select a glass with an appropriate Abbe number so that the depth ranges of neighboring channel would be adjacent to each other. To select the glass type of a lens, the structure of both lenses and the synthetic diopter had to be predetermined. Of course, we also use an electrowetting liquid lens to change the focus. However, the liquid lens was designed so that its diopter would change around 0 ${\textrm{m}^{ - 1}}$ for convenience, so we only considered the design of the solid lenses first. We adopted a periscopic lens design, which consists of two convex meniscus lenses [36]. This design is compact, and its symmetric structure suppresses off-axis aberrations. In addition, there is an air space between the two symmetrical lenses, in which we can insert a liquid lens. Figure 7 shows the structure of the periscopic lens design with an image sensor.

Here ${\emptyset _C}$ is the synthetic diopter of the two lenses for channel C, $\emptyset {'_C}$ is the diopter of each solid lens for channel C, *t* is the gap between the two lenses,${\; }{V_d}$ is the Abbe number of the lens, and ${v_0}$ is the distance from the principal plane to the image sensor. We designed the solid lenses for far-focused mode. Therefore, ${\emptyset _C}$ is same as ${D_{min}}_C$. Then $\nabla D$, ${\emptyset _C}$, and d∅︀ are given by

We set the entire depth range from 30 cm to 60 cm, and ${v_0}$ is 18.85 mm. The resulting ${\emptyset _G}$ is 55.27 ${\textrm{m}^{ - 1}}$ and *d*∅︀ is 1.11 ${\textrm{m}^{ - 1}}$. Because ${\emptyset _G}$ is the synthetic diopter of the two identical lenses for the G channel, it is given as

Taking the derivative of Eq. (7) and substituting *d∅︀’* with $\emptyset {^{\prime}_G}/{V_d}$, ${V_d}$ becomes

To insert the liquid lens in the air space, we set the distance *t* as 6 mm. The resulting $\emptyset {^{\prime}_G}$ was calculated as 30.41${\textrm{m}^{ - 1}}$. As a result, the Abbe number suitable for the target depth range was calculated as 44.7953. Thus, we selected the glass type of E-LASF09 with ${n_d}$=1.816 and ${V_d} = $46.62. We performed pre-design with the parameters obtained in the calculation process so far, and optimized the pre-designed lenses using the Zemax program. First, symmetrical biconvex lenses with suitable thickness are arranged at 6mm intervals, and then the curvatures are optimized for on-axis aberrations. Next, while increasing the field angle, the lens curvature and gap are adjusted to reduce off-axis aberrations. Finally, the symmetry of the lenses is slightly broken to improve the performance a bit. Then, the electrowetting liquid lens is inserted into the gap. Figure 8 shows the final layout of the designed optical system.

The electrowetting liquid lens is A-25H0 from Corning Varioptic. The aperture sizes of the liquid lens and solid lens are 2.5 mm and 8 mm. Figure 9 and Table 1 show the optical performance and aberrations of the optical system.

As shown Fig. 9(a), the image planes of the three channels are all separated at almost equal intervals, and focal shifts can be seen in detail in Fig. 9(b). The maximum focal shift is 372 um, which is very close to the calculated value of 364 um (*df* =$\; d\emptyset /{\emptyset _G}^2$). Distortion is less than 0.01% and the maximum value of lateral color is 1.35 um, which is within 1 pixel. Because the two solid lenses are designed symmetrically, most off-axis aberrations are strongly suppressed, but spherical aberration and field curvature exist to some extent as shown Table 1. In our proposed system, however, M/P is modeled with geometrical PSF considering spherical aberration. Also, the depth error due to the field curvature can be corrected by the post-processing algorithm presented in [37,38]. Therefore, the depth can be accurately extracted without any problem.

## 5. Depth-measurement experiment

#### 5.1 Experiment setting

Using the optimized lens data, two solid lenses were custom made from Sun Optical, and a liquid lens was purchased from Corning Varioptic. A lens housing was manufactured with a 3D printer (3DWOX 1, Sindoh). The CCD camera used in the experiment was an acA2500-14uc from Basler ace with a pixel size of 2.2 um, and it had a resolution of 2590×1942. Figure 10 shows the fabricated lens module and the whole camera system.

In this system, only two states of diopter variation are required for depth measurement. We originally designed the liquid lens to operate in a far-focused mode when the liquid lens is at 0 ${\textrm{m}^{ - 1}}$. However, because the position of the minimum spot size was pulled toward the camera as a result of spherical aberration, the operating diopter of the liquid lens was slightly decreased to set the whole depth range to approximately 30 cm to 70 cm. As a result, the operating diopter range of the liquid lens is [−1 ${\textrm{m}^{ - 1}}$,−0.25 ${\textrm{m}^{ - 1}}$], and the measurable depth range is approximately from 30 cm to 70cm. This detectable range is determined by the distance at which the R channel generates the minimum beam spot in the far-focused mode and the distance at which the B channel generates the minimum beam spot in the near-focused mode. In the case of an object outside this range, the defocus is so severe that the texture information of the object disappears, making it impossible to measure the depth of the object.

#### 5.2 Depth measurement result

This section presents depth measurement results obtained with real objects. We first evaluated the depth accuracy of the system. Depth accuracy is defined as the RMSE value between the ground truth and the measured depth value [39]. In order to analyze the depth error of the real 3D object's depth, we need to know the ground truth of the 3D object to be measured. However, since the ground truth of the real 3D object is unknown, the depth error for the 3D object cannot be analyzed [40]. In many studies, not 3D objects, but planar objects with known ground truth are used to analyze depth errors [15,18,40]. In this paper, we analyzed depth error by measuring the depth of a flat pattern object from 30 cm to 65 cm at 5 cm intervals. An experiment was also conducted on conventional DFD without chromatic aberration (only one channel used for the whole depth range). A cork sheet, office partition wall and KIMTECH wiper were used as flat objects with random patterns. Figure 11 shows the pattern of the cork sheet, office partition, KIMTECH wiper, and RMSE of the depth measurement result.

As seen in Fig. 11, the proposed DFD showed a lower RMSE value than the conventional DFD for the whole depth range. The depth measurement results of the conventional DFD showed RMSE values of 2.275 mm to 12.3 mm, which is an error range of 0.75% to 2.73% in relation to the measured distance. However, the RMSE of the proposed DFD was a minimum of 0.7 mm and a maximum of 4.98 mm, corresponding to errors of 0.23% and 0.76% for the minimum and maximum depths, respectively. The results confirm that the proposed DFD provides higher accuracy than the conventional DFD for the whole depth range even for real objects. With both methods, it can be seen that RMSE increases as the depth increases because sensitivity decreases with distance. However, in the case of the conventional DFD, the RMSE does not increase monotonically. Even with the same object, the distribution of the frequency components of the captured image varies depending on the distance from the camera. The reason why the RMSE of the depth measurement of 450mm distance by the conventional DFD was not good is due to some local depth errors, that is, artifacts. Even though the texture is rich to the naked eye, there may be some low texture regions locally, and it can be said that the depth errors generated in these regions are amplified more due to the low depth sensitivity. Since an object is magnified more at short distances than at large distances, there will be more low-texture areas resulting in depth errors. Conversely, as the distance increases, the object is captured smaller, so the low-texture area is relatively reduced, reducing the depth error. Therefore, the reason why the conventional RMSE attains a value close to the proposed RMSE at a large distance in Fig. 11(d) is that the frequency components available for depth measurement have become abundant as the captured objects become smaller at large measurement distance.

This system measures the depth based on the function of defocus ratio of two images (far and near focused images), called normalized ratio M/P, according to the distance. In addition to the depth sensitivity and the permissible frequency range described in this paper, how well the normalized ratio M/P, which is a continuous function, is modeled and how well the image sensor represents the defocused intensity are factors that determine the resolving power of depth measurement. In this system, the smallest interval increases from 0.7 mm at the nearest distance to 4.98 mm at the farthest distance. The phenomenon that the smallest interval increases with distance is a characteristic of the passive depth measurement method. However, the smallest interval can be further reduced by improving the accuracy of the normalized ratio modelling and the sensitivity of the image sensor.

Next, we measured the depth of various real 3D objects. There was an office partition as the background, and one object was placed in each small depth range of three channels, so the depth maps of the three channels were obtained in parallel. Figure 12 shows the synthesis process of the three channel depth maps targeting 3D objects and the result.

To make it easier to check, the color of the small-depth area is warm and the color of the large-depth area is cold, and the color scale ranges from 30 cm to 70 cm. As seen in Fig. 12, the depth map obtained by the proposed DFD is smoother than the depth map obtained by the conventional DFD, and the shapes of the objects are also clearer. This can be said to be a result of increased sensitivity and expanded permissible frequency range. Additional results of depth measurements from real 3D objects are shown in Fig. 13. It is also possible to check the comparison with the depth measurement result of the conventional DFD.

In the depth map obtained by the conventional DFD, the shape of the farthest object is not clear, and severe noise is generated. On the other hand, the depth map obtained by the proposed DFD is smooth and clear over the entire depth area. The bottom row shows that the proposed DFD can provide detailed depth information of objects. Figure 14 shows zoomed versions of partial areas of the depth map in the bottom row of Fig. 13. In Fig. 14(c), the scale of the color was also changed for clarity.

As shown in Fig. 14(c) and (d), the depths of the cat doll’s paws, ears (orange box), and face (red box) appear in detail. Also, it can be seen that the depth ranges of each channel are naturally synthesized without any artifacts. This is because the use of the electrowetting liquid lens with focus changing makes the optical system a symmetrical structure, and lateral color aberration is suppressed. In addition, real-time depth measurement is possible due to the fast focal switching of the liquid lens. Figure 15 shows the operating sequence of the lens, image sensor, and depth map generation.

For real-time applications, the output depth map must be generated at a frame rate of 30 fps or higher. Therefore, two defocused images must be acquired within a single frame period of 33 ms, so the required focus variation time is less than 16 ms. The liquid lens used in this system is A-25H0 from Corning Varioptic, which meets the requirement because the focus variation time (${\textrm{t}_\textrm{r}}$) is 10 ms. The image-processing time (${\textrm{t}_\textrm{d}}$) of this DFD is only 14 ms if parallel processing is performed by a field-programmable gate array (FPGA) with a high data rate (80 ns) as presented in [41]. Therefore, real-time depth measurement is possible with our proposed DFD system using a sensor that supports a sufficiently high frame rate (${\textrm{t}_\textrm{i}} < 33\; \textrm{ms})$.

## 6. Conclusion

In this paper, we presented a depth measurement system based on an electrowetting lens, which uses chromatic aberration to improve depth accuracy. The DFD algorithm used is rational filter-based DFD, which requires two defocused images. Geometric PSF considering spherical aberration was applied to modeling the defocus function. Chromatic aberration was used to improve depth accuracy. By dividing the depth range into three channels, the depth sensitivity and permissible frequency range were improved by three times. Our simulation results demonstrated that the depth accuracy improves with increased permissible frequency range and depth sensitivity. The depth maps of three channels were generated in parallel, and the accurate depth areas of these channels were combined to create a high-accuracy depth map. Our depth measurement system showed a RMSE of 0.7 mm to 4.98 mm compared to the conventional 2.275 mm to 12.3 mm for the depth measurement range of 30 cm to 70 cm. Experimental results showed that the proposed DFD using chromatic aberration generates a smooth and detailed depth map with higher accuracy than the conventional DFD. In addition, the electrical focus change of the liquid lens enables the acquisition of defocused image pairs within 33 ms, making the DFD system applicable to real-time operation and simplifying the optical system configuration. Since the proposed method focused on the depth accuracy improvement using on-axis chromatic aberration, and a compact system with fast capturing using a liquid lens, the optical system was built only for reducing off-axis aberrations that cannot be post-processed (coma, astigmatism, distortion). Ultimately, however, further studies will be needed on reducing field curvature aberration using the Petzval’s scheme.

## Acknowledgments

This work was supported by the BK21 FOUR program.

## Disclosures

The authors declare no conflicts of interest.

## Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

## References

**1. **A. S. Huang, A. Bachrach, P. Henry, M. Krainin, D. Maturana, D. Fox, and N. Roy, “Visual odometry and mapping for autonomous flight using an RGB-D camera,” in Proceedings of the International Symposium on Robotics Research (ISRR, 2017), pp. 235–252.

**2. **C. R. Qi, X. Chen, O. Litany, and L. J. Guibas, “Imvotenet: Boosting 3d object detection in point clouds with image votes,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE, 2020), pp. 4404–4413.

**3. **Z. Zhang, “Microsoft kinect sensor and its effect,” IEEE Multimedia **19**(2), 4–10 (2012). [CrossRef]

**4. **M. J. Sun, M. P. Edgar, G. M. Gibson, B. Sun, N. Radwell, R. Lamb, and M. J. Padgett, “Single-pixel three-dimensional imaging with time-based depth resolution,” Nat. Commun. **7**(1), 12010 (2016). [CrossRef]

**5. **J. Geng, “Structured-light 3D surface imaging: a tutorial,” Adv. Opt. Photonics **3**(2), 128–160 (2011). [CrossRef]

**6. **F. Li, H. Chen, A. Pediredla, C. Yeh, K. He, A. Veeraraghavan, and O. Cossairt, “CS-ToF: High-resolution compressive time-of-flight imaging,” Opt. Express **25**(25), 31096–31110 (2017). [CrossRef]

**7. **S. Achar, J. R. Bartels, W. L. R. Whittaker, K. N. Kutulakos, and S. G. Narasimhan, “Epipolar time-of-flight imaging,” ACM Trans. Graphics **36**(4), 1–8 (2017). [CrossRef]

**8. **M. Gupta, Q. Yin, and S. K. Nayar, “Structured light in sunlight,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE, 2020), pp. 545–552.

**9. **F. Shinmura, D. Deguchi, I. Ide, H. Murase, and H. Fujiyoshi, “Estimation of Human Orientation using Coaxial RGB-Depth Images,” in Proceedings of International Conference on Computer Vision Theory an Applications (VISAPP, 2015), pp. 113–120.

**10. **S. Chaudhuri and A.N. Rajagopalan, * Depth from defocus: a real aperture imaging approach* (Springer Science & Business Media, 2012).

**11. **P. Grossmann, “Depth from focus,” Pattern Recognit. Lett. **5**(1), 63–69 (1987). [CrossRef]

**12. **M. Watanabe and S. K. Nayar, “Rational filters for passive depth from defocus,” Int. J. Comput. Vision **27**(3), 203–225 (1998). [CrossRef]

**13. **M. Subbarao and G. Surya, “Depth from defocus: A spatial domain approach,” Int. J. Comput. Vision **13**(3), 271–294 (1994). [CrossRef]

**14. **P. Favaro, A. Mennucci, and S. Soatto, “Observing shape from defocused images,” Int. J. Comput. Vision **52**(1), 25–43 (2003). [CrossRef]

**15. **P. Trouvé, F. Champagnat, G. Le Besnerais, J. Sabater, T. Avignon, and J. Idier, “Passive depth estimation using chromatic aberration and a depth from defocus approach,” Appl. Opt. **52**(29), 7152–7164 (2013). [CrossRef]

**16. **J. Ens and P. Lawrence, “An investigation of methods for determining depth from defocus,” IEEE Trans. Pattern Anal. Mach. Intell. **15**(2), 97–108 (1993). [CrossRef]

**17. **D. Ziou and F. Deschenes, “Depth from defocus-estimation in spatial domain,” Comput. Vision Image Understanding **81**(2), 143–165 (2001). [CrossRef]

**18. **A. N. J. Raj and R. C. Staunton, “Rational filter design for depth from defocus,” Pattern Recognit. **45**(1), 198–207 (2012). [CrossRef]

**19. **A. P. Pentland, “A new sense for depth of field,” IEEE Trans. Pattern Anal. Mach. Intell. **PAMI-9**(4), 523–531 (1987). [CrossRef]

**20. **M. Watanabe and S. K. Nayar, “Telecentric optics for constant magnification imaging,” IEEE Trans. Pattern Anal. Mach. Intell. **19**(12), 1360–1365 (1997). [CrossRef]

**21. **K. Yin, Z. He, and S. T. Wu, “Reflective Polarization Volume Lens with Small f-Number and Large Diffraction Angle,” Adv. Opt. Mater. **8**(11), 2000170 (2020). [CrossRef]

**22. **H. Ren and S. T. Wu, “Variable-focus liquid lens by changing aperture,” Appl. Phys. Lett. **86**(21), 211107 (2005). [CrossRef]

**23. **H. Zhang, H. Ren, S. Xu, and S. T. Wu, “Temperature effects on dielectric liquid lenses,” Opt. Express **22**(2), 1930–1939 (2014). [CrossRef]

**24. **https://www.corning.com/kr/ko/innovation/corning-emerging-innovations/corning-varioptic-lenses.html

**25. **P. Trouvé-Peloux, J. Sabater, A. Bernard-Brunel, F. Champagnat, G. Le Besnerais, and T. Avignon, “Turning a conventional camera into a 3D camera with an add-on,” Appl. Opt. **57**(10), 2553–2563 (2018). [CrossRef]

**26. **M. Ye, X. Chen, Q. Li, J. Zeng, and S. Yu, “Depth from defocus measurement method based on liquid crystal lens,” Opt. Express **26**(22), 28413–28420 (2018). [CrossRef]

**27. **C. D. Claxton and R. C. Staunton, “Measurement of the point-spread function of a noisy imaging system,” J. Opt. Soc. Am. A **25**(1), 159–170 (2008). [CrossRef]

**28. **F. Mannan and M. S. Langer, “What is a good model for depth from defocus?” in Proceedings of the IEEE Conference on Computer and Robot Vision (IEEE, 2016), pp. 273–280.

**29. **https://www.w3.org/Graphics/Color/sRGB

**30. **M. Subbarao, “Parallel Depth Recovery by Changing Camera Parameters,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE, 1988), pp. 149–155.

**31. **H. Ren, S. Xu, and S. T. Wu, “Effects of gravity on the shape of liquid droplets,” Opt. Commun. **283**(17), 3255–3258 (2010). [CrossRef]

**32. **J. Fuentes-Fernández, S. Cuevas, L. C. Álvarez-Nuñez, and A. Watson, “Tests and evaluation of a variable focus liquid lens for curvature wavefront sensors in astronomy,” Appl. Opt. **52**(30), 7256–7264 (2013). [CrossRef]

**33. **J. Hong, Y. K. Kim, K. H. Kang, J. M. Oh, and I. S. Kang, “Effects of drop size and viscosity on spreading dynamics in DC electrowetting,” Langmuir **29**(29), 9118–9125 (2013). [CrossRef]

**34. **H. Oku and M. Ishikawa, “High-speed liquid lens for computer vision,” in Proceedings of the IEEE International Conference on Robotics and Automation (IEEE, 2010), pp. 2643–2648.

**35. **S. Liu and H. Hua, “Time-multiplexed dual-focal plane head-mounted display with a liquid lens,” Opt. Lett. **34**(11), 1642–1644 (2009). [CrossRef]

**36. **J. M. Geary, * Introduction to lens design: with practical ZEMAX examples* (Willmann-Bell, 2002).

**37. **A. Li, T. Tjahjadi, and R. Staunton, “Adaptive deformation correction of depth from defocus for object reconstruction,” J. Opt. Soc. Am. A **31**(12), 2694–2702 (2014). [CrossRef]

**38. **G. Blahusch, W. Eckstein, and C. Steger, “Calibration of curvature of field for depth from focus,” Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. **34**(3/W8), 173–180 (2003).

**39. **C. Timen, C. M. Speksnijder, F. Van Der Heijden, C. H. Beurskens, K. J. Ingels, and T. J. Maal, “Depth accuracy of the RealSense F200: Low-cost 4D facial imaging,” Sci. Rep. **7**(1), 1–8 (2017). [CrossRef]

**40. **K. Takemura and T. Yoshida, “Depth from Defocus Technique Based on Cross Reblurring,” IEICE Trans. Inf. Syst. **E102.D**(11), 2083–2092 (2019). [CrossRef]

**41. **A. N. J. Raj and R. C. Staunton, “Video-rate calculation of depth from defocus on a FPGA,” J. Real-Time Image Process. **14**(2), 469–480 (2018). [CrossRef]