High-resolution depth imaging with a small-scale SPAD array based on the temporal-spatial filter and intensity image guidance

Yan Kang; Ruikai Xue; Ruikai Xue; Xiaofang Wang; Xiaofang Wang; Tongyi Zhang; Tongyi Zhang; Tongyi Zhang; Fanxing Meng; Fanxing Meng; Lifei Li; Wei Zhao; Wei Zhao

doi:10.1364/OE.459787

1. Introduction

With the development of single-photon detector technology, especially the single-photon avalanche diode (SPAD) arrays with independent timing circuits for each pixel [1–5], the three-dimensional (3D) imaging technology based on SPAD arrays has attracted researchers’ wide attention [6–25]. Since the time-correlated single-photon counting (TCSPC) electronics for each individual pixel is directly integrated onto the sensor chip, it allows for timing resolutions on the order of tens of picoseconds. Therefore, the SPAD array-based imaging system can obtain 3D profile of target with single-photon sensitivity and high depth resolution. It has huge application prospects in the fields of 3D imaging under ultra-weak illumination or long detection range, such as biological science, planetary remote sensing, unmanned aerial vehicle autopilots and so on [26–32].

However, the current SPAD array still suffers from a relatively small-scale pixel count. For example, the latest TCSPC mode SPAD array reported by Lincoln Lab is in size of 256 × 256 pixels [33], which makes it difficult to achieve high-resolution 3D imaging directly through themselves. Although Morimoto et al. [3] recently demonstrated a ground-breaking 1 Megapixel (1024 × 1024) SPAD array with timing capabilities, the sensor lacked TCSPC electronics however, instead gaining temporal information by scanning a timing gate and identifying the rising edge of the response, this leads to a relative low photon utilized efficiency [3]. Besides, SPAD array also has advantages of low pixel fill-factor and poor uniformity of pixel performance (e.g., hot pixel problem) [34]. Therefore, the resulting 3D image by directly using these small-scale SPAD arrays has low spatial resolution and poor image quality. Aiming at these problems, varieties of works are devoted to improve the performance of the SPAD array-based 3D imaging system. Shin et al. [6] uses a 32 × 32 SPAD array to perform a two-dimensional scanning on the image plane, and obtains a depth image with a pixel counts of 384 × 384. High-quality depth images from ∼1 detected signal photon per pixel are reconstructed by an array-specific algorithm exploiting both the transverse smoothness and longitudinal sparsity of natural scenes. Henriksson et al. [35] proposed a continuous scanning panoramic imaging method. They obtain a depth image of 128 × 768 pixels by placing a 128 × 32 SPAD array-based imaging lidar on a rotating platform. But, enhanced resolution through image scanning results in longer acquisition times. Sun et al. [36,37] propose a neural network architecture named SPADnet, which uses a monocular depth estimation algorithm together with a denoising and sensor fusion strategy. Ruget et al. [38] develop a multi-feature fusion deep network for dual-mode SPAD array. This SPAD array captures alternate low-resolution depth and high-resolution intensity images. The network then uses the intensity images and multiple features extracted from photon counting histograms to guide the up-sampling of the depth image. The limitation of neural network method is the need for larger training data sets in the depth field, which are currently small. Susan Chan et al [39] performed a regularization-based depth data filling method which accounts for correlations in depth data of SPAD array and intensity data of CCD. Xie et al. [40] combine active and passive intensity data acquired by SPAD array to guide data filling of depth images. However, these methods only perform a compensatory depth image restoration for the weakly reflective area of the target without enhance the transverse resolution of the depth image. Gyongy el al. [41] report a high-speed 3D imaging system by a SPAD array used in a hybrid imaging mode. The hybrid imaging mode in which high-resolution intensity images and low-resolution depth images can be captured in an interleaved fashion, enable guided upscaling of depth data from a native resolution of 64 × 32 to 256 × 128. Callenberg et al. [42] implemented an iterative optimization scheme to increase the spatial resolution SPAD array-based depth imaging using a sensor fusion approach. They experimentally demonstrate depth imaging with up-sampling factor of 3 × 3 on SPAD array-based depth data.

In this paper, we develop a resolution-enhanced depth imaging method through intensity image guidance from assistant CCD for SPAD array-based imaging system. Our method uses a diffractive optical element (DOE) to generate an illumination laser lattice to match the field of view (FOV) of a low fill-factor SPAD array. It is different from the uniform flood illumination mode in the Ref. [42]. Except for the high efficiency of laser illumination energy utilization, these laser lattices can also be used as marks to help the registration of SPAD depth image and CCD intensity image. Utilizing a proposed depth image reconstruction method consisting of total generalized variation (TGV) regularization and temporal-spatial (T-S) noise filter, the resolution-enhanced and high-quality depth image is reconstructed.

2. Imaging system set-up

The imaging system schematically shown in Fig. 1 is established to collect the depth and intensity data. A summary of the main system parameters are listed in Table 1. Our imaging system consists of a SPAD array-based active imaging subsystem and an assistant CCD camera. The illumination source is a pulsed laser (LDH-D-TA-530, PicoQuant, Germany) with a wavelength of 532 nm and tunable repetition rate. The scale of the SPAD array (PF32, Photon Force, UK) we use is 32 × 32 pixels and 1.6 mm×1.6 mm in size. PF32 has an active area size of φ7 µm and a pixel pitch of 50 µm, corresponding to a small pixel fill-factor as small as 1.5%. If we use the conventional flood illumination mode, the small fill-factor will cause most of the illuminating laser energy to be wasted since the echo photons they produce would be imaged into the non-active areas among pixels. In order to improve the energy utilization efficiency of the illumination laser, and also to provide a reference for the registration of the SPAD’s depth image and the CCD’s intensity image, a diffractive optical element is placed on the outgoing path of laser. The DOE (MS-693-Q-Y-A, Holo/Or, Israel) can diffract a single beam incident light into 32 × 32 beamlets in matrix shape. The divergence full angle and separation angle of beamlets output from DOE are 59.86 mrad×59.86 mrad and 1.93 mrad×1.93 mrad, respectively. Through a ×1.875 beam expander consists of lens L₁ and L₂, the divergence full angle and separation angle of the 32 × 32 beamlets can be adjusted to about 32 mrad and 1 mrad, respectively. This divergence full angle is matched with the FOV determined by 50 mm focal length of OBJ₁ and 1.6 × 1.6 mm² SPAD array size. Although the divergence angle of one beamlet is larger than the sub-FOV of 0.14 mrad determined by the pixel diameter of 7 µm, the laser energy utilization efficiency is greatly improved compared with the conventional flood illumination mode. A polarization beam splitter (PBS) combined with a 1/4 λ wave plate (QWP) are used to construct a transceiver switch for the coaxial optical system. To mitigate the internal reflections in the coaxial system, we rotated the PBS by a small angle so that most of the internally reflected photons cannot enter the SPAD sensor. Before the PBS, a rotatable half wave plate (HWP) is used to adjust the emitted laser power. An off-the-shelf objective lens OBJ₁ (Sony 18-70 mm f/3.5-5.6 DT) with its focal length set to 50 mm was used to collect the scattered return photons and image the target onto the SPAD array sensor. Meanwhile, a BPF with 10 nm bandwidth is used to filter out ambient light noise. A Tektronix AFG3252 signal generator (SG) is used to generate a 20 MHz pulse synchronization signal to trigger the pulsed laser and the SPAD array so that our system works in TCSPC data acquisition mode. It is worth noting that, the mirror M₃ is controlled by actuators (Z812B, Thorlabs, USA) to change the direction of the laser beamlets as well as the FOV of SPAD array imager, simultaneously. Therefore, through a micro-scan stitching we can obtain a ground truth of the high-resolution depth image of the target, so as to objectively evaluate the quality of high-resolution depth image obtained by intensity image-guided reconstruction. Parallel to the SPAD array-based active depth imaging system, we placed a digital camera (Lw575M) with a pixel resolution set to 1280 × 1024 to obtain intensity data with high spatial resolution. The focal length of the objective lens OBJ₂ (Canon EF 50 mm f/2.5 II) of Lw575M is 50 mm. The high-resolution intensity image is resized and cropped to 128 × 128 pixels to match the FOV of the SPAD array, and the detailed image cropping process is described in the data processing section. Finally, the low-spatial but high-temporal resolution depth image gathered with SPAD array and the intensity image that has high-spatial resolution but no temporal information gathered with CCD are transferred to computer for processing.

Fig. 1. Schematic of the imaging system.

Download Full Size | PDF

Table 1. Summary of the system parameters

View Table | View all tables in this article

3. Reconstruction procedure of high-resolution depth image

In this section, we demonstrate the reconstruction procedure of recovering resolution-enhanced depth image with intensity image guidance. The procedure can be divided into three steps: (A) Data preprocessing, including initial depth image generation and the registration of depth and intensity image. (B) Censoring abnormal depth values due to background noise and dark count of SPAD array. (C) Recovering the resolution-enhanced depth image via a TGV regularization reconstruction framework with intensity image guidance.

3.1 Data preprocessing

Raw photon arrival data of SPAD array is a cube ${\textbf S}$ of $N_{r} \times N_{c}$ pixels ${\times} N_{f}$ frames. The element $S_{i,j,k}$ in the cube denotes time-of-flight (TOF) of detected photon in the pixel ($i,j$) and $k$th frame. According to the TCSPC principle, after multi-frame cumulative measurement, each pixel can get a TOF-photon counting histogram. Then we estimate TOF value from the histogram using cross-correlation algorithm [43]. Time bin corresponding to the peak of cross-correlation curve per spatial pixel was used as the depth value. In addition, we also found in our experiments that there are also fixed timing offsets among individual pixels in SPAD array [44]. Therefore, bias correction is also performed on the depth image. Finally, we can get an initial depth image ${\textbf D}_{(Nr \times Nc)}$.

Since the low-resolution depth image and the high-resolution intensity image stem from different cameras, a mapping between them need be established. Fortunately, we employ a coaxial depth imaging method based on laser lattice illumination. That is, the illumination laser lattice and FOV of depth imaging system are matched. Since these laser lattice can also be captured by the digital camera, we can easily match depth and intensity images based on these laser lattices. As shown in Fig. 2, the intensity image of the target is acquired by digital camera together with the laser illumination lattice on the target. Intensity images $ {\textbf I}_{\textrm{H}}$ registered with the depth image can be obtained by doing crop along the edges of the lattice.

Fig. 2. Intensity image of target and illumination laser lattice captured by the digital camera.

Download Full Size | PDF

3.2 Censoring abnormal depth values

To obtain a tractable problem for depth image reconstruction, those abnormal depth values should be censored out from initial depth image ${\textbf D}_{(Nr \times Nc)}$. These abnormal depth values are mostly caused by the ambient noise or dark count noise. In Ref. [6], a noise filtering method by exploiting the longitudinal sparse characteristics of natural scene was proposed. Based on the phenomenon that the signal photons returned from the target are always concentrated in a few depth positions (i.e., depth clusters), an optimization problem is designed to pre-estimate the sparsely distributed depth values, and then combined with the system response bandwidth to filter out noise detection. The full width at half maximum (FWHM) of the instrument response function (IRF) of the imaging system is taken as the system bandwidth. However, they set the number of depth clusters as a fixed value of 2 since their target scene of interest consisted of a mannequin and a flower. When the scene consists of more targets with different depth positions and the number is unknown, we need to consider an improved method for determining the number of depth clusters. Here we proposed an improved noise filtering method for more complex target scene.

Firstly, we estimate the sparsely distributed depth values of target. After accumulating the of TOF-photon counting histograms obtained at each pixel together, except for hot pixels. For hot pixel identification, we placed the SPAD array in a dark room with its shutter off, and identified pixels with dark count rate > 100 Hz, since the standard pixel of PF32 SPAD array have a dark count rate <100 Hz. A total of 185 hot pixels were identified. We get an all-pixel photon counting histogram shown in Fig. 3 by an accumulation of 10,000 frames. It is a noisy observation of the depth distribution of the target scene, $\tilde{{\textbf z}}$. In the all-pixel histogram, we can see several peaks intuitively. These peaks may consistent to the scene shown in the Fig. 2, where there is handmade ‘house’, ‘flower’, ‘flower branch’, ‘tree’ and ‘backboard’ in different depth positions.

Fig. 3. All-pixel photon counting histogram.

Download Full Size | PDF

Actually, $\tilde{{\textbf z}}$ is the noisy convolution of the true depth vector ${\textbf z}$ and the system response h. h can be obtained through a calibration process [43]. The optimal estimate $\hat{{\textbf z}}$ can be obtained by solving the following optimization problem [6]:

(1)$$\hat{{\bf z}}\textrm{ = }\mathop {\textrm{argmin}}\limits_{\bf z} \;\;\;\;\sum\limits_k {||{\tilde{{\bf z}}\textrm{ - (}h \ast {\bf z}\textrm{)}} ||_2^2} ,\;\;\;\;subject\;to\;\;{||{\bf z} ||_0} = K,\;\;{\bf z} \ge 0$$

where ${\ast} $ denotes discrete convolution operator, and K is the number of non-zero elements in reconstructed $\hat{{\textbf z}}$, which represents the centers of estimated depth clusters. The optimization problem in Eq. (1) is a discrete sparse deconvolution problem, which can be solved by a modified orthogonal matching pursuit algorithm [6]. Unlike Ref. [6], we set a relatively large initial value for K at first. In our experiment, we choose $K = 20$, then we can obtain a weights distribution of potential depth clusters by solving the optimization problem in Eq. (1). The result is shown in Fig. 4. We get 20 potential depth clusters, all of which have weights greater than 0. Then we need to select out the reliable depth clusters from these potential depth clusters. From Fig. 3 and Fig. 4, we can observe that the depth weights corresponding to the target position are larger, while the depth weights corresponding to the non-target position are smaller and their weights are basically the same. Therefore, we use the threshold method to distinguish the reliable depth clusters.

Fig. 4. Weights of the potential depth clusters.

Download Full Size | PDF

In order to set a suitable threshold, we successively perform descend sorting on the depth weights and gradient operations on the sorted depth weights. The sorting and gradient curves are shown in the Fig. 5(a) and Fig. 5(b). As can be seen from Fig. 5, the change of weights after sorting can be divided into two stages: the obvious changing stage and the stable stage. A clear demarcation point can be seen from the gradient curve, the gradient of the sorted depth weights from the 7th is almost zero. Therefore, we choose a gradient threshold $Th_{G}$ of 0.01. The discrimination rule is set as: if the gradient is less than 0.01 for more than two consecutive times, it is considered in the stable stage. Then we can get the first few depth clusters with larger weights as reliable depth clusters.

Fig. 5. The weights sorting of potential depth clusters and gradient of sort weights.

Download Full Size | PDF

Secondly, the abnormal depth judgement rule is designed according to the estimated depth vector $\hat{{\textbf z}}$ and the system response bandwidth which is expressed by the FWHM of IRF the imaging system. Let $\textrm{supp}(\hat{{\textbf z}}) = \{ k:\hat{{\textbf z}}_{k} \ne 0\& Th_{G}(\hat{{\textbf z}}_{k}) > 0.01,k = 1,2,\ldots ,N_{z}\} $ be the support of the solution to Eq. (1) after reliable deep clusters selection. Then the set of reliable depth values can be expressed as $Sz = \{ k:|k - k^{\prime}|\le 2T_\textrm{IRF},\forall k^{\prime} \in \textrm{supp(}\hat{{\textbf z}}\textrm{)}\}$. It is easy to see that $S_{z}$ is like the concept of time window. When the depth clusters are relatively dense, $Sz$ is likely to be one time window. When the depth clusters are relatively scattered, $S_{z}$ is likely to be several discrete time windows.

Finally, we can obtain optimized depth image $\hat{{\textbf D}}_{(Nr \times Nc)}$ with outliers censored out based on the initial depth image ${\textbf D}_{(Nr \times Nc)}$ and the following judgment rule Eq. (2):

(2)$$\begin{array}{{c}} {{\textbf if}{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} D_{i,j} \in S_{z},{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} }\\ {{\textbf {if}}{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} D_{i,j} \notin S_{z},{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} } \end{array}\begin{array}{{c}} {{\textbf {then}}{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} }\\ {{\textbf {then}}{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} } \end{array}\begin{array}{*{20}{c}} {{\kern 1pt} D_{i,j}{\kern 1pt} {\kern 1pt} {\kern 1pt} is{\kern 1pt} {\kern 1pt} {\kern 1pt} not{\kern 1pt} {\kern 1pt} {\kern 1pt} censored.}\\ {D_{i,j}{\kern 1pt} {\kern 1pt} {\kern 1pt} is{\kern 1pt} {\kern 1pt} {\kern 1pt} censored{\kern 1pt} .{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} } \end{array}$$

The aforesaid censoring procedure exploit the longitudinal sparse features of natural scene. And the longitudinal features of the target are characterized by temporal information of arrival photons. Therefore, it is equivalent to a temporal filtering form. When the target scene occupies a large longitudinal space including many sub-targets with different depth positions, such that uniformly distributed noise detections are more likely to be retained within the time windows.

Further, in order to censor out depth outliers more precisely, we also consider the transverse spatial characteristics of the target, and proposed an enhanced temporal-spatial filtering method exploiting both temporal and spatial features. In summary, the flowchart of the proposed T-S filter can be described in Fig. 6. The temporal noise filtering part in upper has been described in the above paragraphs. The process of the spatial domain filtering on the lower side is as follows. After temporal filtering stage, the depth values that are not censored in $\hat{{\textbf D}}_{(Nr \times Nc)}$ are replaced by their nearest depth cluster centers to generate a depth cluster map ${\textbf D}_{cl}$, so that a loose spatial filtering can be performed later. We use a classical spatial filtering method based on rank-ordered absolute differences (ROAD) statistic to censor depth outliers [45,46]. At each transverse location of ${\textbf D}_{cl}$, the ROAD statistic is first computed using the depth values of the nearest transverse neighbors. Here we choose 5 × 5 neighborhood pixels. Calculate the absolute value of the depth difference between the current pixel and its neighbor pixels. These results are sorted in ascending order. The ROAD statistic is the sum of the first two absolute differences from this sorted collection. Then, a judgment rule Eq. (3) in follow is used to classify the depth value at $({i,j} )$ in $\hat{{\textbf D}}_{(Nr \times Nc)}$ as being outlier or not.

(3)$$\begin{array}{{c}} {{\textbf {if}}{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \textrm{ROAD}({i,j} )= 0,{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} }\\ {{\textbf {if}}{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \textrm{ROAD}({i,j} )\ne 0,{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} } \end{array}\begin{array}{{c}} {{\textbf {then}}{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} }\\ {{\textbf {then}}{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} } \end{array}\begin{array}{*{20}{c}} {{\kern 1pt} \hat{D}_{i,j}{\kern 1pt} {\kern 1pt} is{\kern 1pt} {\kern 1pt} {\kern 1pt} not{\kern 1pt} {\kern 1pt} {\kern 1pt} censored.}\\ {\hat{D}_{i,j}{\kern 1pt} {\kern 1pt} {\kern 1pt} is{\kern 1pt} {\kern 1pt} {\kern 1pt} censored.{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} } \end{array}$$

Unlike the ROAD filtering method in Ref. [46], they use the discriminant derived from the calculation process with reflectivity estimation to generate a detection threshold to classify the ROAD statistic. It depends on the link between estimation of reflectivity and depth, so the denoising effect is affected by the reflectivity reconstruction accuracy [8]. Here, we directly set the threshold to zero. Because we have classified the depth value into reliable depth clusters previously, once there is a difference in ROAD statistic, we can consider that the pixel has little spatial correlation with its neighbors, and can be identified as an outlier.

Fig. 6. Flowchart of proposed temporal-spatial filtering method.

Download Full Size | PDF

3.3 Reconstruction of high-resolution depth image

After the abnormal depth value is filtered out, reconstruction of resolution-enhanced depth image with the guidance of intensity image is carried out. Based on the observation that textural edges are more likely to appear at high depth discontinuities, whereas homogeneous textured regions correspond to homogeneous surface parts [47], we use the high-resolution intensity image to guide the reconstruction of the depth image. The total generalized variation regularization [47] method is introduced to reconstruct high-resolution image ${\textbf D}_{\textrm{HR}}$ by solving a optimization problem as follows:

(4)$${{\textbf D_{{\textrm {HR}}}} = \mathop {\arg \min }\limits_u \{{G(u )+ TGV_\alpha^2(\mu )} }, $$

where $G(u)$ denotes the data term that measures the fidelity of the argument u to the input depth measurements $D_{S}$. The data item $G(u)$ can be represented as

(5)$$G(u) = \sum\limits_{p \in M} {{{|{u(p) - DS(p)} |}^2}}, $$

where p denotes the pixel index, M denotes the pixel set of the depth image. The data item penalizes deviations of the resulting depth from the measured depth. Referring to [47], $TGV_\alpha ^2(\mu )$ denotes a 2-order TGV regularization term as shown in Eq. (6),

(6)$$TGV_\alpha ^2 = \mathop {\arg \min }\limits_v \left\{ {\alpha 0\sum\limits_{p \in M} {|{\nabla v} |} + \alpha 1\sum\limits_{p \in M} {|{{T^{1/2}}({\nabla u(p) - v} )} |} } \right\},$$

where the regularization parameter $\alpha = (\alpha_{0},\alpha_{1})$ are used to weight each order. ${T^{1/2}}$ denotes anisotropic diffusion tensor, ${T^{1/2}} = \textrm{exp} ( - \beta |\nabla {\textbf I}_{\textrm{H}}{^\gamma }|)n \cdot {n^T} + {n^ \bot } \cdot {n^{ \bot T}}$, where $n = {{\nabla {\textbf I}_{\textrm{H}}} / {|{\nabla {\textbf I}_{\textrm{H}}} |}}$ is the normalized direction of gradient of the intensity image $ {\textbf I}_{\textrm{H}}$, ${n^ \bot }$ is the normal vector to the gradient and the scalars β, γ adjust the magnitude and the sharpness of the tensor. Including intensity image $ {\textbf I}_{\textrm{H}}$ in TGV model so that we can penalize high depth discontinuities at homogeneous regions and allow sharp depth edges at corresponding texture differences.

4. Results and discussions

Figure 7(a) shows our experiment set up, including the experimental prototype system and target scene. The scene used for the imaging experiment is detailed in Fig. 7(b), which includes a handmade ‘house’, ‘flower’, ‘tree’, and ‘backboard’. Figure 7(c) shows the longitudinal spatial relationship of these scenes. The front surfaces of the ‘tree’, ‘flower’ and ‘house’ are about 7 cm, 10 cm, and 16 cm from the backboard, respectively. The FOV at the object plane nearly 3 m away is 10 cm×10 cm. The imaging experiments are carried out indoor with the lamps on.

Fig. 7. (a) Experiment setup, (b) photograph of the target, (c) side view of the target scene.

Download Full Size | PDF

Figure 8(a), (c), (e), (g) show TOF values of arriving photons collected by four different pixels (2,2), (2,4), (2,20) and (2,18) in the second row of the SPAD array. A total of 10,000 frames have been accumulated and the frame time is set as 10 µs. We selected a timing gate of 180 bins (180 × 55 ps = 9.9 ns) to reduce the amount of computation. Figure 8(b), (d), (f), (h) show the corresponding TOF-photon counting histograms and cross-correlation estimation curves. We can see that these four pixels exhibit very different noise levels. We already know that the echo photons on the second row of pixels to which these four pixels belong are scattered from the same backboard target in Fig. 2 and Fig. 7(b). The level of ambient noise should be essentially the same. Therefore, different dark count rates of each pixel result in different noise levels. In pixel (2,2), (2,4) and (2,20), although there are more noise photons than signal photons, accurate TOF values can be extracted by performing a cross-correlation operation on the photon counting histogram. The time bins corresponding to the peak of cross-correlation curve in pixel (2,2), (2,4), (2,20) are 123, 122, 122 (marking with green dashed line in Fig. 8), respectively. The results are basically consistent with the ground truth that will be mentioned later in Fig. 11. In pixel (2,18), the accurate TOF value cannot be extracted due to serious noise. In fact, pixels (2,2), (2,18) and (2,20) have been identified as hot pixels during the calibration process. Their dark count rates are all much larger than the 100 Hz count rate of standard pixels, which are about 14.1 kHz, 63.5 kHz and 13.6 kHz, respectively. Here, we adopt the strategy of first estimating depth and then censoring out depth outliers. Specifically, when getting the initial depth image, we preserve and utilize the photon arrival data from the hot pixels. This is because, although it is difficult for hot pixels to obtain accurate depth estimates when photon accumulation count is small, most hot pixels can obtain accurate depth estimates when photon accumulation count is large.

Fig. 8. Influence of ambient noise and dark count noise.

Download Full Size | PDF

Fig. 9. Initial depth images and resulting depth images by different filtering methods.

Download Full Size | PDF

Fig. 10. High-resolution depth images reconstruction by different methods.

Download Full Size | PDF

Fig. 11. (a) Initial depth image of 32 × 32 pixels with 50,000 frames, (b) reconstructed depth image of 128 × 128 pixels with 50,000 frames, (c) ground truth depth image of 128 × 128 pixels with 700,000 frames by stitching 4 × 4 micro-scanning.

Download Full Size | PDF

Figure 9 shows the filtering results of depth outliers using the existing ROAD filtering method, our improved temporal filtering and proposed T-S filtering method under different frame numbers: 1000, 2000, 5000, 10000, 50000. Figure 9(a1) ∼(a5) show the initial depth images obtained by the cross-correlation estimation method, where the time bin corresponding to the peak of cross-correlation curve per pixel was used as the depth value. Figure 9(b1) ∼(b5) show the resulting depth images after doing ROAD filtering to initial depth images. Figure 9(c1) ∼(c5) are the resulting depth images obtained by processing initial depth images with our improved temporal filtering method. Figure 9(d1) ∼(d5) show the resulting depth images obtained by processing initial depth images using our proposed T-S filtering method. As can be seen from the initial depth image in column Fig. 9(a), the outliers in depth image decrease as the frame number increases, that is, the influence of ambient noise and dark count noise on the cross-correlation depth estimation gradually decreases. When using the existing ROAD filtering method, two disadvantages can be observed compared with the other methods in columns in Fig. 9(c) and (d). One is that the noise at edge of the image is difficult to filter out due to the lack of sufficient neighborhood information. The other is that when the noise is serious, some effective depth values are filtered out due to the noisy neighborhood pixels. For example, many of effective depth values of the ‘house’ in Fig. 9(b1) ∼(b4) are mistakenly filtered out. The improved temporal filtering method can well preserve the effective depth values, but it cannot completely filter out the noise in the selected time windows. We can observe more unfiltered noise from Fig. 9(c1) ∼(c3). Compared with the aforesaid two methods, Fig. 9(d1) ∼(d4) show that the proposed T-S filtering method can not only retain more effective depth values but also has better filtering effect on noise. Since it makes full use of the distribution characteristics of nature scenes in the longitudinal and transverse spaces. This precise noise filtering is critical for subsequent regularized image reconstruction [6], especially we are now using a SPAD array with very small pixel numbers. In the next section, we will analyze the influence of different noise filtering effects on the reconstruction of high-resolution depth images.

Figure 10 shows experimental results of resolution-enhanced depth image reconstruction with the guidance of intensity image (Fig. 2) acquired by assisting digital camera. The column in Fig. 10(a) shows the initial depth images with low resolution of 32 × 32 pixels. Columns in Fig. 10(b), (c), (d) show the reconstructed high-resolution depth images of 128 × 128 pixels from the initial depth images with three different methods. The column in Fig. 10(b) show the results obtained by TGV regularization combined with ROAD filtering method. The column in Fig. 10(c) show the results obtained by TGV regularization combined with the improved temporal filtering method. The column in Fig. 10(d) show the results obtained by TGV regularization combined with the proposed T-S filtering method. For a fair comparison, the TGV regularization parameters (α₀, α₁) as well as the tensor parameters (β, γ) have been empirically chosen as fixed values. (α₀, α₁) and (β, γ) are set as (5,0.002) and (0.1,0.1), respectively. The comparison results are obtained under different cumulative frame numbers: 1000, 2000, 5000, 10000, 50000. We can see that when frame number is as high as 50000, the initial depth image has little noise and the resolution-enhanced depth image obtained by the three methods are all very clear. The high-resolution depth image can distinguish six branches of the ‘tree’ that otherwise is indistinguishable in the initial depth image. Meanwhile, the shape features of house and flower are clearly reconstructed compared to the initial depth images. When the noise of the initial depth image is significant as the frame number decreases, the high-resolution advantage of the depth image obtained by these three methods is still obvious, but the image quality is degraded to different degrees. For example, the depth of the house’s roof in Fig. 10(b2) cannot be accurately reconstructed using the method of ROAD filter combined with TGV. Using the method of improved temporal filter and the proposed T-S filter combined with TGV, the depth of house can be reconstructed as shown in Fig. 10(c2), (d2). When the noise of the initial depth image is more significant as shown in Fig. 10(a1) with frame number equal to 1000, the depth of the house’s wall still can be reconstructed using the proposed T-S filter combined with TGV method. But at this time the other two methods have lost the ability to reconstruct the accurate depth image of the house shown in Fig. 10(b1), (c1).

In order to evaluate the image reconstruction quality more objectively, we calculated the root mean square error (RMSE) of each result image. A lower RMSE means that the reconstruction result is closer to the ground truth. The ground truth depth image is obtained by stitching 4 × 4 micro-scanning under the condition of low ambient light with all lab lamps off. We accumulate a long time of 7 s (corresponding to 700,000 frames) to acquire each sub-depth image (1 in 16) in order to guarantee the accuracy of the ground truth. Figure 11(c) shows the obtained ground truth depth image in 3D coordinates, and the low-resolution initial depth image in Fig. 10(a5) and the resolution-enhanced depth image in Fig. 10(d5) are shown in 3D coordinates in Fig. 11(a) and Fig. 11(b) for comparison. It can be seen that the positions of several sub-targets reconstructed in Fig. 11(b) are consistent with the ground truth in Fig. 11(c). For example, the ‘house’, ‘flower’, ‘tree’ and ‘backboard’ are located in time bin of 142, 135, 132 and 123. Knowing the speed of light (3e8 m/s) and the bin size (55 ps), the distances between the ‘house’, ‘flower’, ‘tree’ and ‘backboard’ can be calculated to be 15.7 cm, 9.9 cm and 7.4 cm, respectively. They are basically the same as results measured with a meter stick shown in Fig. 7. Although a depth image with enhanced transverse resolution and basic fidelity can be obtained by the proposed method, the reconstruction on the edge regions in the depth image is not ideal. As shown in Fig. 11(b), the original sharp edge is reconstructed into a gradually changing edge. It is necessary to conduct research on this issue in the future.

The result RMSEs under different frame numbers are showing in Fig. 12 and Table 2. At the same time, we also give the acquisition times as well as the number of collected signal photons per pixel (SPP) corresponding to different frame numbers, so as to understand the experimental imaging conditions more intuitively. According to the acquisition time of a single frame of 10 µs, the total acquisition times in Table 2 can be calculated and they are all less than 1 second. According to the full-pixel cumulative photon count histogram including hot pixels, by counting the total number of photons and the number of noise photons (estimated by the average noise level), [6,43] we can calculate the SPPs corresponding to different frame numbers as shown in Table 2. Meanwhile, it can also be calculated that the signal-to-background noise ratio corresponding to the five sets of data is about 0.3 within a gate width of 180 bins. The time consumptions of different methods were recorded in Table 3. A computer with an Intel Core i7-10700 CPU at 2.9 GHz and a 16 GB RAM was used to run MATLAB code. We can see that the time consumptions are mainly due to the TGV reconstruction, which takes about 16 seconds. The time consumptions of the temporal filter and T-S filter are larger than that of the conventional ROAD filter, but their time consumptions are all very short in the order of milliseconds.

Fig. 12. Result RMSEs curve of different methods.

Download Full Size | PDF

Table 2. Result RMSEs of different methods with different frame numbers

View Table | View all tables in this article

Table 3. Time cost of different methods

View Table | View all tables in this article

Consistent with the above subjective visual results, we can see that the RMSE values of the three methods decrease as the frame number increases. When the frame number reaches 50,000 (SPP = 113.2), the RMSE values of the three methods are basically equal, because the interference of depth outliers is very small in this situation. As the number of accumulated frames decreases, the depth outliers have a significant effect, and the RMSE of the three methods is quite different. Especially when the frame number is less than 5000 (SPP = 11.4), we can see that the RMSE of our improved temporal filter and proposed T-S filter are all much smaller than the existing ROAD filter. When the number of frames drops below 2000 frames (SPP = 4.6), the RMSE of our proposed method outperforms the conventional ROAD filtering method by more than two times. An interesting phenomenon is that the improved temporal filtering method exhibits a slight advantage over the T-S filter in RMSE when SPP is as small as 2.4. In fact, we can see from Fig. 10(c1), (d1) that this slight advantage comes from the reconstruction of the shape of ‘house’, but the depth profiles of the ‘house’ has been biased, which can be observed from the color representing depth. The reason for this should be that the deep outliers are not filtered thoroughly enough. The same reason leads to the worst RMSE obtained by the temporal filter when the frame number is 10,000, since we can observe a lot of noise in Fig. 10(c4) that is not handled properly. In comparison, our proposed T-S filter combining with TGV exhibits very good noise robustness. In addition, from the perspective of time efficiency, our method achieves an up-sampling factor of 4 × 4 for depth image, which corresponds to saving 16-fold of acquisition time to obtain the depth image with the same transverse resolution. As a consequence, our proposed method has large potential to improve the quality and resolution of depth image using SPAD array with a very short acquisition time. In particular, the transverse resolution of the depth image is enhanced in a non-scanning manner.

To further test the robustness of the T-S filter and TGV in low frame number conditions, we generate the absolute depth error maps of the reconstructed depth images as shown in the Fig. 13. Row (a) show the depth error maps of TGV regularization combined with ROAD filtering method. Row (b) show the depth error maps of TGV regularization combined with the improved temporal filtering method. Row (c) show the depth error maps of TGV regularization combined with the proposed T-S filtering method. The color represents the magnitude of the depth error in Fig. 13. It can be seen that the three methods have obvious depth errors in edge texture and low reflectivity regions such as the ‘house’ area. As the frame number decreases, the depth errors of the edge texture do not change obviously. But the depth errors of the low reflectivity area are significantly affected by the frame number. When the frame number drops to 5000 and 2000, the depth error of the conventional ROAD + TGV method becomes more significant than the other two methods in the low reflectivity area (such as in the ‘roof’ area). As the frame number drops further down to 1000, the depth errors obtained by the temporal filter and T-S filter methods in the ‘roof’ area also begin to increase, but they are significantly smaller than the depth errors of the ROAD filter method, and the temporal filter performs better. It is worth noting that when the depth errors of ‘house’s wall’ area obtained by the ROAD filter and temporal filter methods increase at 1000 frames, the T-S filter method can still maintain a small depth error.

Fig. 13. Depth error maps obtained by taking the absolute difference between reconstructed depth and ground-truth depth images.

Download Full Size | PDF

To quantitatively evaluate the depth error, we calculated the mean absolute errors (MAE) of depth in area1 and area2 shown in Fig. 13, corresponding to the areas of ‘house’s roof’ and ‘house’s wall’. The size of the imaging area1 and area2 are in size of 6 rows × 31 columns and 15 rows × 31 columns, respectively. The MAE results are in Table 4 and Table 5. The optimal result for each frame number condition is bolded. Consistent with the visual evaluation, our T-S filter method has optimal MAE in both area1 and area2 at 2000-50000 frames. With the frame number of 1000, our T-S filter method has suboptimal MAE in area1 and has optimal MAE in area2. This proves that our T-S filter combined with TGV method has a certain degree of robustness at low frame number. However, the continued decrease in the number of frames will lead to a sparse photon situation (SPP less than 1) and high spatial resolution depth imaging at this case need doing more work in further.

Table 4. Mean absolute errors of depth in area1

View Table | View all tables in this article

Table 5. Mean absolute errors of depth in area2

View Table | View all tables in this article

5. Conclusions

In summary, we have demonstrated a high-resolution depth imaging system using small-scale SPAD array, including a proposed depth image reconstruction method consisting of TGV regularization and T-S filtering algorithm. Based on the laser lattice illumination by DOE, the registration of the low-resolution depth image gathered by SPAD and the high-resolution intensity image gathered by CCD is realized. The abnormal depth values due to SPAD’s dark count and background noise are filtered out by the proposed T-S filter. The TGV regularization algorithm is introduced to reconstruct high-resolution depth images with the guidance of intensity image. Experimental results show that the proposed method can increase the depth imaging resolution based on SPAD array by 4 × 4 times, and the imaging quality is also improved 2-fold compared with conventional ROAD filter. Our method could enable a high-resolution and noise-tolerant 3D imaging using small-scale SPAD array in a non-scanning manner, which can be potentially applied in a wide field from biophotonics to remote sensing and autonomous navigation. For future work, it is of interest to optimize the image reconstruction quality for depth abrupt edges.

Funding

National Natural Science Foundation of China (62001473, 62171443); Key Research and Development Projects of Shaanxi Province (2022GY-009); Youth Talents Promotion Program of Xi 'an (095920211305); State Key Laboratory of Transient Optics and Photonics.

Acknowledgments

Y. Kang thanks Research Fund from Youth Talents Promotion Program of Xi'an and State Key Laboratory of Transient Optics and Photonics.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. C. Bruschini, H. Homulle, I. M. Antolovic, S. Burri, and E. Charbon, “Single-photon avalanche diode imagers in biophotonics: review and outlook,” Light: Sci. Appl. 8(1), 87 (2019). [CrossRef]

2. A. Gulinatti, F. Ceccarelli, M. Ghioni, and I. Rech, “Custom silicon technology for SPAD-arrays with red-enhanced sensitivity and low timing jitter,” Opt. Express 29(3), 4559–4581 (2021). [CrossRef]

3. K. Morimoto, A. Ardelean, M. L. Wu, A. C. Ulku, I. M. Antolovic, C. Bruschini, and E. Charbon, “Megapixel time-gated SPAD image sensor for 2D and 3D imaging applications,” Optica 7(4), 346–354 (2020). [CrossRef]

4. B. Aull, “Geiger-Mode avalanche photodiode arrays integrated to all-digital CMOS circuits,” Sensors 16(4), 495 (2016). [CrossRef]

5. C. Liu, H. Ye, and Y. Shi, “Advances in near-infrared avalanche diode single-photon detectors,” Chip 1(1), 100005 (2022). [CrossRef]

6. D. Shin, F.-H. Xu, D. Venkatraman, R. Lussana, F. Villa, F. Zappa, V. K. Goyal, F. N. C. Wong, and J. H. Shapiro, “Photon-efficient imaging with a single-photon camera,” Nat. Commun. 7(1), 12046 (2016). [CrossRef]

7. Z.-P. Li, J.-T. Ye, X. Huang, P.-Y. Jiang, Y. Cao, Y. Hong, C. Yu, J. Zhang, Q. Zhang, C.-Z. Peng, F.-H. Xu, and J.-W. Pan, “Single-photon imaging over 200 km,” Optica 8(3), 344–349 (2021). [CrossRef]

8. Y. Cheng, X.-Y. Zhao, L.-J. Li, and M.-J. Sun, “First-photon imaging with independent depth reconstruction,” APL Photonics 7(3), 036103 (2022). [CrossRef]

9. J. Tachella, Y. Altmann, N. Mellado, A. McCarthy, R. Tobin, G. S. Buller, J. Tourneret, and S. McLaughlin, “Real-time 3D reconstruction from single-photon lidar data using plug-and-play point cloud denoisers,” Nat. Commun. 10(1), 4984 (2019). [CrossRef]

10. M.-J. Sun, M. P. Edgar, G. M. Gibson, B. Sun, N. Radwell, R. Lamb, and M. J. Padgett, “Single-pixel three-dimensional imaging with time-based depth resolution,” Nat. Commun. 7(1), 12010 (2016). [CrossRef]

11. A. Halimi, A. Maccarone, R. A. Lamb, G. S. Buller, and S. McLaughlin, “Robust and guided bayesian reconstruction of single-photon 3D lidar data: application to multispectral and underwater imaging,” IEEE Trans. on Comput. Imaging 7, 961–974 (2021). [CrossRef]

12. X.-C. Zhao, X.-D. Jiang, A.-J. Han, T.-Y. Mao, W.-J. He, and Q. Chen, “Photon-efficient 3D reconstruction employing a edge enhancement method,” Opt. Express 30(2), 1555–1569 (2022). [CrossRef]

13. S.-M. Chen, A. Halimi, X.-M. Ren, A. McCarthy, X.-Q. Su, S. McLaughlin, and G. S. Buller, “Learning non-local spatial correlations to restore sparse 3D single-photon data,” IEEE Trans. on Image Process. 29, 3119–3131 (2020). [CrossRef]

14. L. Xu, Y. Zhang, Y. Zhang, C.-H. Yang, X. Yang, and Y. Zhao, “Restraint of range walk error in a Geiger-mode avalanche photodiode lidar to acquire high-precision depth and intensity information,” Appl. Opt. 55(7), 1683–1687 (2016). [CrossRef]

15. X. Peng, X.-Y. Zhao, L.-J. Li, and M.-J. Sun, “First-photon imaging via a hybrid penalty,” Photonics Res. 8(3), 325–330 (2020). [CrossRef]

16. Z.-P. Li, X. Huang, P.-Y. Jiang, Y. Hong, C. Yu, Y. Cao, J. Zhang, F.-H. Xu, and J.-W. Pan, “Super-resolution single-photon imaging at 8.2 kilometers,” Opt. Express 28(3), 4076–4087 (2020). [CrossRef]

17. Y. Duan, C. Yang, and H. Li, “PCA-based real-time single-photon 3D imaging method,” Opt. Commun. 508, 127777 (2022). [CrossRef]

18. B. Wang, M.-Y. Zheng, J.-J. Han, X. Huang, X.-P. Xie, F.-H. Xu, Q. Zhang, and J.-W. Pan, “Non-Line-of-Sight Imaging with Picosecond Temporal Resolution,” Phys. Rev. Lett. 127(5), 053602 (2021). [CrossRef]

19. X.-M. Ren, P. W. R. Connolly, A. Halimi, Y. Altmann, S. McLaughlin, I. Gyongy, R. K. Henderson, and G. S. Buller, “High-resolution depth profiling using a range-gated CMOS SPAD quanta image sensor,” Opt. Express 26(5), 5541–5557 (2018). [CrossRef]

20. L. Ye, G.-H. Gu, W.-J. He, H.-D. Dai, T.-Y. Mao, and Q. Chen, “A reconstruction method for restraining range walk error in photon counting lidar via dual detection,” J. Opt. 21(4), 045703 (2019). [CrossRef]

21. Z.-H. Li, E. Wu, C.-K. Pang, B.-C. Du, Y.-L. Tao, H. Peng, H.-P. Zeng, and G. Wu, “Multi-beam single-photon-counting three-dimensional imaging lidar,” Opt. Express 25(9), 10189–10195 (2017). [CrossRef]

22. A. Halimi, A. Maccarone, A. McCarthy, S. McLaughlin, and G. S. Buller, “Object depth profile and reflectivity restoration from sparse single-photon data acquired in underwater environments,” IEEE Trans. Comput. Imaging 3(3), 472–484 (2017). [CrossRef]

23. F. Heide, S. Diamond, D. B. Lindell, and G. Wetzstein, “Sub-picosecond photon-efficient 3D imaging using single-photon sensors,” Sci. Rep. 8(1), 17726 (2018). [CrossRef]

24. Y. Zheng, M.-J. Sun, Z.-G. Wang, and D. Faccio, “Computational 4D imaging of light-in-flight with relativistic effects,” Photonics Res. 8(7), 1072–1078 (2020). [CrossRef]

25. A. Lyons, F. Tonolini, A. Boccolini, A. Repetti, R. Henderson, Y. Wiaux, and D. Faccio, “Computational time-of-flight diffuse optical tomography,” Nat. Photonics 13(8), 575–579 (2019). [CrossRef]

26. L. Xu, X. Yang, L. Wu, C.-F. Jin, and Y.-J. Zhang, “Dual Gm-APD polarization lidar to acquire the depth image of shallow semitransparent media with a wide laser pulse,” IEEE Photonics J. 12(5), 1–10 (2020). [CrossRef]

27. A. Maccarone, F. Mattioli Della Rocca, A. McCarthy, R. Henderson, and G. S. Buller, “Three-dimensional imaging of stationary and moving targets in turbid underwater environments using a single-photon detector array,” Opt. Express 27(20), 28437–28456 (2019). [CrossRef]

28. M. Buttafava, F. Villa, M. Castello, G. Tortarolo, E. Conca, M. Sanzaro, S. Piazza, P. Bianchini, A. Diaspro, F. Zappa, G. Vicidomini, and A. Tosi, “SPAD-based asynchronous-readout array detectors for image-scanning microscopy,” Optica 7(7), 755–765 (2020). [CrossRef]

29. H. A. R. Homulle, F. Powolny, P. L. Stegehuis, J. Dijkstra, D. U. Li, K. Homicsko, D. Rimoldi, K. Muehlethaler, J. O. Prior, R. Sinisi, E. Dubikovskaya, E. Charbon, and C. Bruschini, “Compact solid-state CMOS single-photon detector array for in vivo NIR fluorescence lifetime oncology measurements,” Biomed. Opt. Express 7(5), 1797–1814 (2016). [CrossRef]

30. J.-K. Guo, S. H. Hong, H. J. Yoon, G. Babakhanova, O. D. Lavrentovich, and J. K. Song, “Laser-induced nanodroplet injection and reconfigurable double emulsions with designed inner structures,” Adv. Sci. 6(17), 1900785 (2019). [CrossRef]

31. J. Stoker, Q. Abdullah, A. Nayegandhi, and J. Winehouse, “Evaluation of single photon and Geiger mode lidar for the 3D elevation program,” Remote Sens. 8(9), 767 (2016). [CrossRef]

32. J. Rapp, J. Tachella, Y. Altmann, S. McLaughlin, and V. K. Goyal, “Advances in single-photon lidar for autonomous vehicles: working principles, challenges, and recent advances,” IEEE Signal Proc. Mag. 37(4), 62–71 (2020). [CrossRef]

33. B. F. Aull, E. K. Duerr, J. P. Frechette, K. A. McIntosh, D. R. Schuette, V. Suntharalingam, R. D. Younger, O. Mitrofanov, C. H. Tan, J. L. Pau Vizcaíno, and M. Razeghi, “Large-format image sensors based on custom Geiger-mode avalanche photodiode arrays,” Proc. SPIE 10729, 9 (2018). [CrossRef]

34. R.-K. Xue, Y. Kang, T.-Y. Zhang, L.-F. Li, and W. Zhao, “Sub-pixel scanning high-resolution panoramic 3D imaging based on a SPAD array,” IEEE Photonics J. 13(4), 1–6 (2021). [CrossRef]

35. M. Henriksson and P. Jonsson, “Photon-counting panoramic three-dimensional imaging using a Geiger-mode avalanche photodiode array,” Opt. Eng. 57(09), 1 (2018). [CrossRef]

36. Z. Sun, D. B. Lindell, O. Solgaard, and G. Wetzstein, “SPADnet: deep RGB-SPAD sensor fusion assisted by monocular depth estimation,” Opt. Express 28(10), 14948–14962 (2020). [CrossRef]

37. D. B. Lindell, M. O’Toole, and G. Wetzstein, “Single-photon 3D imaging with deep sensor fusion,” ACM Trans. Graph. 37(4), 1–12 (2018). [CrossRef]

38. A. Ruget, S. McLaughlin, R. K. Henderson, I. Gyongy, A. Halimi, and J. Leach, “Robust super-resolution depth imaging via a multi-feature fusion deep network,” Opt. Express 29(8), 11917–11937 (2021). [CrossRef]

39. S. Chan, A. Halimi, F. Zhu, I. Gyongy, R. K. Henderson, R. Bowman, S. McLaughlin, G. S. Buller, and J. Leach, “Long-range depth imaging using a single-photon detector array and non-local data fusion,” Sci. Rep. 9(1), 8075 (2019). [CrossRef]

40. J.-H. Xie, Z.-J. Zhang, F. Jia, J.-H. Li, M.-W. Huang, and Y. Zhao, “Improved single-photon active imaging through ambient noise guided missing data filling,” Opt. Commun. 508, 127747 (2022). [CrossRef]

41. I. Gyongy, S. W. Hutchings, A. Halimi, M. Tyler, S. Chan, F. Zhu, S. McLaughlin, R. K. Henderson, and J. Leach, “High-speed 3D sensing via hybrid-mode imaging and guided upsampling,” Optica 7(10), 1253–1260 (2020). [CrossRef]

42. C. Callenberg, A. Lyons, D. D. Brok, A. Fatima, A. Turpin, V. Zickus, L. Machesky, J. Whitelaw, D. Faccio, and M. B. Hullin, “Super-resolution time-resolved imaging using computational sensor fusion,” Sci. Rep. 11(1), 1689 (2021). [CrossRef]

43. Y. Kang, L.-F. Li, D.-J. Li, D.-W. Liu, T.-Y. Zhang, and W. Zhao, “Performance analysis of different pixel-wise processing methods for depth imaging with single photon detection data,” Journal of Mod. Opt. 66(9), 976–985 (2019). [CrossRef]

44. Y. Kang, R.-K. Xue, L.-F. Li, T.-Y. Zhang, and Q. Gao, “Coaxial scanning three-dimensional imaging based on SPAD array,” Laser & Optoelectronics Progress 58(10), 1011024 (2021). [CrossRef]

45. R. Garnett, T. Huegerich, C. Chui, and W. He, “A universal noise removal algorithm with an impulse detector,” IEEE Trans. on image processing 14(11), 1747–1754 (2005). [CrossRef]

46. A. Kirmani, D. Venkatraman, D. Shin, A. Colaco, F. N. C. Wong, J. H. Shapiro, and V. K. Goyal, “First-photon imaging,” Science 343(6166), 58–61 (2014). [CrossRef]

47. D. Ferstl, C. Reinbacher, R. Ranftl, M. Ruether, and H. Bischof, “Image guided depth upsampling using anisotropic total generalized variation,” in Proceedings of IEEE International Conference on Computer Vision (IEEE, 2013).

System	Parameters	Value
Emitting pulsed laser	Wavelength/nm	532
	Repetition frequency/MHz	20
	Pulse width/ps	∼70
	Average power/µW	20
DOE	Beam splitting	32 × 32
DOE	Divergence angle/(mrad×mrad)	59.86 × 59.86
Beam expanding	Focal length of L₁ /mm	40
Beam expanding	Focal length of L₂ /mm	75
Objective lens	Focal length of OBJ₁ and OBJ₂/mm	50
SPAD array (PF32)	Array size/(pixel×pixel)	32 × 32
	Active area/µm	7
	Pixel pitch /µm	50
	Pixel size/(mm×mm)	1.6 × 1.6
TDC unit of SPAD array	TDC bin size/ps	55
TDC unit of SPAD array	TDC range/ns	55e-3 × 2¹⁰= 56.3
Digital camera (Lw575M)	Imager size	Diagonal 7.13 mm
Digital camera (Lw575M)	Array size	2592 × 1944@max

	RMSE (mm)
Frame number	1000	2000	5000	10000	50000
Acquisition time (s)	0.01	0.02	0.05	0.1	0.5
SPP	2.4	4.6	11.4	22.9	113.2
ROAD filter + TGV	41.7	25.0	14.9	11	7.4
Temporal filter + TGV	19.5	13.4	10.6	12.2	8.3
T-S filter + TGV	20.6	11.9	10.2	8.9	7.1

Time cost (s)
Frame number	1000	2000	5000	10000	50000
ROAD filter	0.003	0.003	0.002	0.002	0.002
Temporal filter	0.012	0.011	0.015	0.016	0.032
T-S filter	0.019	0.015	0.018	0.021	0.036
ROAD filter + TGV	16.392	16.230	16.405	16.150	15.552
Temporal filter + TGV	16.625	16.246	16.261	16.086	15.508
T-S filter + TGV	16.520	16.243	16.205	15.827	15.645

MAE (time bins)
Frame number	50000	10000	5000	2000	1000
ROAD filter + TGV	0.6	3.5	7.7	27.4	47.1
Temporal filter + TGV	0.8	3.2	3.8	5.8	11.1
T-S filter + TGV	0.6	2.5	2.1	3.5	19.7

MAE (time bins)
Frame number	50000	10000	5000	2000	1000
ROAD filter + TGV	0.5	1.8	4.8	2.9	17.2
Temporal filter + TGV	0.5	2.0	1.6	1.2	8.8
T-S filter + TGV	0.5	0.7	1.3	0.9	2.2

High-resolution depth imaging with a small-scale SPAD array based on the temporal-spatial filter and intensity image guidance

Abstract

1. Introduction

2. Imaging system set-up

3. Reconstruction procedure of high-resolution depth image

3.1 Data preprocessing

3.2 Censoring abnormal depth values

3.3 Reconstruction of high-resolution depth image

4. Results and discussions

5. Conclusions

Funding

Acknowledgments

Disclosures

Data availability

References

Data availability

Cited By

Figures (13)

Tables (5)

Equations (6)

Optics Express