Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Image-free target identification using a single-point single-photon LiDAR

Open Access Open Access

Abstract

Single-photon light detection and ranging (LiDAR) — offering single-photon sensitivity and picosecond temporal resolution — has become one of the most promising technologies for 3D imaging and target detection. Generally, target detection and identification requires the construction of an image, performed by a raster-scanned or an array-based LiDAR system. In contrast, we demonstrate an image-free target identification approach based on a single-point single-photon LiDAR. The idea is to identify the object from the temporal data equipped with an efficient neural network. Specifically, the target is flood-illuminated by a pulsed laser and a single-point single-photon detector is used to record the time-of-flight (ToF) of back-scattering photons. A deep-learning method is then employed to analyze the ToF data and perform the identification task. Simulations with indoor and outdoor experiments show that our approach can identify the class and pose of the target with high accuracy. Importantly, we construct a compact single-point single-photon LiDAR system and demonstrate the practical capability to identify the types and poses of drones in outdoor environments over hundreds of meters. We believe our approach will be useful in applications for sensing dynamic targets with low-power optical detection.

© 2023 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction

The ability to detect and recognize the objects, which refers to target identification, has a wide range of applications spanning from autonomous driving [1], air defense [2] and remote sensing [3]. Single-photon light detection and ranging (LiDAR) [46] has emerged as a promising technology for target identification [712]. A traditional identification procedure with single-photon LiDAR requires the image formation, followed by a classifier that extracts and analyzes the features [1315]. For example, in [15], 2D images of intensity and depth is utilized to identify the type, orientation and different parts of drones. In practice, however, it is difficult to obtain the image in several scenarios. For example, due to the diffraction limit, the transverse spatial resolution degrades significantly with the distance, which greatly reduces the image quality for long-range targets [16]; Also, for fast dynamic target [1719], the limited acquisition time results in weak signal levels and the image will be severely disrupted [79].

Our interest is to identify the targets without imaging them. The single-point single-photon LiDAR can capture the high-resolution time-of-flight (ToF) data that contains rich information for general applications. Remarkably, such 1D temporal data has been exploited to successfully construct the spatial image with the aid of deep learning and prior background information [20]. Inspired by this pioneering work, we consider that the 1D temporal data is sufficient to identify the class and the pose of target. Furthermore, existing deep-learning based results have successfully extracted information from 1D signals [2125], which can be further explored for the task of target identification using 1D temporal data.

In this work, we demonstrate an image-free target identification approach based on a single-point single-photon LiDAR. In our approach, the target is flood-illuminated by a pulsed laser, and the returned light is collected by a single-point single-photon avalanche diode (SPAD) detector, which records the arrival times of the returned photons in the form of a temporal histogram. Note that our system does not have any scanning component, array-based detector or structured illumination/detection. Hence, no spatial structure is imprinted and the temporal histogram contains no transverse spatial information. Instead, an end-to-end deep learning method is explored to process the 1D temporal data and predict the class and the pose of the target. Different from Ref. [20], our approach perform the image-free object detection and does not require the prior background information. Also, instead of stacking fully-connected layers, we design an efficient 1D convolutional neural network (CNN) based on the UNet architecture [26]. This architecture can better perceive the local and global features [27], enabling the spatial information to be extracted from the temporal measurements. Simulations, indoor and outdoor experimental results show that our proposed approach has high prediction accuracy on synthetic and real-world data. More importantly, we construct a compact single-point single-photon LiDAR system and demonstrate the capability to identify the types and poses of drones in outdoor environments over hundreds of meters. These results prove the potential of our approach for real-life applications.

2. Approach description

In our proposed approach, the temporal data of the target is first obtained by a single-point single-photon LiDAR. Then a CNN is used to give predictions according to the temporal data, and finally achieve target identification.

Our single-point single-photon LiDAR is illustrated in Fig. 1 (a). A pulsed laser flood-illuminates the target, while the SPAD’s field of view (FoV) covers the whole target. The SPAD is synchronized with the pulsed laser to record the time-of-flight (ToF) of each photon returned, and a temporal histogram is formed, which contains the information of the target.

 figure: Fig. 1.

Fig. 1. An illustration of our proposed approach. (a) Our single-point single-photon LiDAR for target identification. A pulsed laser flood-illuminates the target. The reflected photons are detected by a single-point SPAD and a temporal histogram is obtained. (b) A CNN is adopted to identify the class and the pose of the target, according to the temporal histogram.

Download Full Size | PDF

The temporal histogram obtained by the system encodes the depth distribution of the target. Different targets and target in different poses have different depth distributions, thus lead to different temporal histograms. Here we apply a data-driven method to analyze the temporal histograms. As shown in Fig. 1 (b), we adopt a CNN to extract the spatial information from the histograms, and give predictions of the class and the pose of the target.

3. Method

3.1 Forward model and synthetic data generation

Based on Fig. 1 (a), we describe the forward model as follows. Consider a light source periodically emits a laser beam with temporal shape $g(t)$. The target with a surface $S$ reflects light back to a SPAD, which counts the number of photon arrivals over bins of time duration $\Delta t$. The number of signal photons $\tau$ at time interval $n$ is

$$\tau(n) = \iint_{S}\int_{n\Delta t}^{(n+1)\Delta t} \alpha(x, y) \cdot v(x, y) \cdot (g*f) \left(t - \frac{2z(x, y)}{c}\right)dt\,dx\,dy,$$
where $f$ models the temporal uncertainty of the SPAD, $c$ is the speed of light, and $(x, y)$ is a pixel on the surface of the target. $\alpha (x, y)$ is a factor including the number of pulses, the efficiency of the SPAD, the reflectivity and the radial falloff effects of pixel $(x, y)$. Suppose that it is far enough between the target and the receiver/transmitter, and the distance between the transmitter and the receiver is close. A quantity $v(x, y) \in \{0, 1\}$ is used to model the visibility of a path between $(x, y)$ and the receiver/transmitter, while $z(x, y)$ denotes the distance between them.

We generate synthetic datasets based on the discrete form of Eq. (1). To discretize it, $(g*f)(t)$ could be rewritten as $w(n)$ where $w(n) = \int _{n\Delta t}^{(n+1)\Delta t} (g*f)(t)\,dt$, and $w(n)$ can be regarded as the impulse response function (IRF) of the system. The 3D target is projected onto the plane that is perpendicular to the viewing direction of the transceiver, to form $N \times N$ 2D maps of reflectivity $\beta (i, j)$ and depth $z(i, j)$, where $\beta (i, j)$ is simply set to $1$ when pixel $(i, j)$ is on the target and $0$ when it is off the target. Correspondingly, when pixel $(i, j)$ is on the target, $z(i, j)$ means the distance from the receiver/transmitter to the front surface of the target relative to the viewing direction, while $z(i, j)$ is meaningless and can be any value if pixel $(i, j)$ is off the target, since they make no contribution to $\tau (n)$ because $\beta (i, j) = 0$. An example of projecting a 3D object to 2D maps of reflectivity and depth is illustrated in Fig. 2.

 figure: Fig. 2.

Fig. 2. An example of projecting a 3D target. To generate synthetic data, the 3D object is projected to 2D maps of reflectivity and depth according to the view direction of the receiver/transmitter. Note that the values in thee depth map represent relative depths.

Download Full Size | PDF

The discrete form of Eq. (1) is thereby written as:

$$\tau(n) = \sum_{i, j = 1}^{N}C\beta(i, j)\cdot w\left(\left[n - \frac{2z(i, j)}{c\cdot\Delta t}\right]\right),$$
where $[\cdot ]$ refers to the rounding operation. Note that after the projection of the 3D object, the visible term $v(x, y)$ can be neglected, since the self-occlusion effect has been already considered during projection. Moreover, based on the following considerations, we assume that $\alpha (x, y)$’s discretized form, $\alpha (i, j)$ equals to $C\beta (i, j)$, where $C$ is a constant: 1. The number of pulses during collection and the efficiency of the SPAD is constant for all pixels on the target; 2. We consider the radial falloff effects to be the same among all the pixels on the target, which is reasonable when the target is distant.

Further considering dark counts, ambient light, and the detection model of the SPAD, the temporal histogram measured by the SPAD is represented as follow:

$$\begin{aligned} h(n) &\sim \mathcal{P}(\tau(n) + b) \\ &= \mathcal{P}\left(C\cdot\sum_{i, j = 1}^{N}\beta(i, j)\cdot w\left(\left[n - \frac{2z(i, j)}{c\cdot\Delta t}\right]\right) + b\right), \end{aligned}$$
where $b$ includes the dark count of the SPAD and the background noise, $\mathcal {P}$ stands for a Poisson process which models the detection process of the SPAD. According to Eq. (3), we generate synthetic data of temporal histograms to form a training set. The length of the temporal histograms is set to 1024.

3.2 Data preprocessing

Before being fed into the network, the temporal histogram should be preprocessed to ensure its intensity invariance and shifting invariance. The invariances of the temporal histogram are illustrated in Fig. 3.

 figure: Fig. 3.

Fig. 3. An illustration of the invariances of the temporal histogram. The blue histogram and the yellow histogram should be equivalent for target identification. Preprocessing is needed to ensure these invariances. (a) Intensity invariance: the overall intensity of the histogram should not affect the classification result. (b) Shifting invariance: the absolute distance between the target and the single-point single-photon LiDAR, i.e. the position of the signal peak should not affect the classification result.

Download Full Size | PDF

Intensity invariance: An input 1D histogram $h$ should have the same classification result as $C' h$ where $C'$ is a constant, since the overall intensity, which varies with factors such as laser power and optical efficiency, does not affect the classification result (Fig. 3 (a)). To ensure this intensity invariance, a normalization operation is first performed:

$$h_{norm}(n) = \frac{h(n)}{\sum_{i = 0}^{1023}h(i)}.$$

Shifting invariance: The absolute distance of the target, which corresponds to the position of the signal peak in the temporal histogram, does not affect the classification result (Fig. 3 (b)). Therefore, to ensure this shifting invariance, we perform a cyclic shift operation on the normalized histogram $h_{norm}(n)$ as:

$$\hat{h}(n) = h_{norm}\left((n-512+K)\ mod\ 1024\right).$$
where $K$ satisfies:
$$\sum_{i=0}^{K-1}h_{norm}(i)< 0.5\leq\sum_{i=0}^{K}h_{norm}(i).$$

The cyclic shift operation described by Eq. (5) and (6) shifts the "peak" (index $K$) to the center of the histogram, resulting in $\sum _{i=0}^{511}\hat {h}(i)\approx \sum _{i=512}^{1023}\hat {h}(i)\approx 0.5$, thus ensure the shifting invariance.

3.3 Deep learning method

Currently, CNNs have been widely used in a large number of fields for 1D signal processing, such as electroencephalogram (EEG) signal identification [21,22], classification of electrocardiogram (ECG) signals [23] and vibration-based structural damage detection [24]. In our study, a 1D-Unet model, which is the 1D version of the famous network Unet for 2D pixel-wise classification task [26], is put forward to extract features from temporal histograms in different scales. The preprocessed 1D temporal histogram of length 1024 is directly fed into the network, to perform a classification task.

As shown in Fig. 4, our 1D-UNet consists of an encoding path and a decoding path. The encoding path consists of repeated application of 1D convolutions, followed by a rectified linear unit (ReLU) and a 1D max pooling operation. The decoding path applies a squence of 1D up-convolutions and concatenations with the features from the encoding path, followed by 1D convolutions and ReLUs. Finally, three fully connected layers are utilized, which maps each 1024-component feature vector to $N_{C}$ classes.

 figure: Fig. 4.

Fig. 4. The architecture of our 1D-UNet. The input is a one-channel 1D histogram with a length of 1024, and the length of the output vector is $N_{C}$, where $N_{C}$ is the desired number of classes.

Download Full Size | PDF

A 1D fully connected network (abbreviated as Fc) is used for comparison. It consists of 2 hidden layers, having 2048 and 256 nodes respectively, while the input layer has 1024 nodes and the output layer has $N_{C}$ nodes. Each layer is followed by a leaky ReLU as the activation function, except the output layer.

To measure the error between the output of the network and the true label, we choose cross-entropy as loss function:

$$L(p, q) ={-}\sum_{i=0}^{N_{C}-1}p_{i}\cdot log\frac{exp(q_{i})}{\sum_{i = 0}^{N_{C}-1}exp(q_{i})},$$
where $q$ is the output of the network, $p$ is the target vector corresponding to the true label of the $T$th class:
$$p_{i} = \left\{ \begin{array}{lc} 1\ & i = T \\ 0\ & otherwise \\ \end{array} \right. , \qquad i = 0, 1,\ldots, N_{C}-1.$$

We consider two different application scenarios: the classification of multiple targets and a single target with different poses, we generate 2 synthetic datasets for training. For the former scenario, we generate a synthetic dataset containing 45 classes (42 models from ShapeNet [28] and other 3 from open internet), each includes 1000 samples with different signal levels and signal-to-noise ratios (SNRs). Here signal level means the total signal photon counts across 1024 time bins. Defining noise level as the total noise photon counts across 1024 time bins, SNR means the ratio between the signal level and the noise level. For the latter scenario, we generate a dataset of target "bunny" in 24 different view angles, with 4 angles along x-axis and 6 angles along z-axis. Note that different rotating angles along the view direction (y-axis) is meaningless, since they will result in same temporal histograms. Each class contains 1000 samples and with different signal levels and SNRs. For both scenarios, the signal level range is from 100 to 20000, and the SNR range is from 0.3 to 10.0. The dataset is split into training, validation and test set with a ratio of 8:1:1.

We initialize the network randomly and use the ADAM solver [29] with $\beta _{1} = 0.9$ and $\beta _{2} = 0.999$ and a learning rate of $10^{-3}$ having a decay rate of 0.98 after each epoch. We implement our method using PyTorch and the training is conducted on NVIDIA 1080Ti GPU, which takes a few hours for the network to converge.

4. Results

We conduct simulations and real-world experiments on classifying multiple targets and multiple poses of one target. These two tasks represent two typical application scenarios of the proposed target identification approach.

We first conduct simulations to verify the feasibility of our approach. We choose 45 different targets and successfully classify them; at the same time we choose 24 different poses of target "bunny" and perform classification task at a high accuracy. Furthermore, we build an indoor experimental system to perform indoor experiments. We generalize the simulations to real data and further validate the proposed approach under different SNR conditions. Finally, a compact outdoor system is built to perform outdoor experiments. We classify two types of drones at 200 m and recognize 15 different poses of one drone. The high-accuracy results illustrate the promising applications of the proposed approach.

4.1 Simulations

For the classification of multiple targets, we train the networks using a synthetic dataset to perform a classification of 45 different targets from temporal histograms. In the form of the normalized confusion matrix, the testing result on synthetic data is shown in Fig. 5, which shows a classification accuracy over total 45 classes of 99.92% on Fc and 99.88% on 1D-UNet. Both networks have high accuracy on synthetic dataset, demonstrating the feasibility of this deep-learning based method.

 figure: Fig. 5.

Fig. 5. Normalized confusion matrix for the classification of 45 different targets, on (a) Fc and (b) 1D-UNet, tested on synthetic dataset.

Download Full Size | PDF

For the classification of different poses of one target, we train the networks using a synthetic dataset to perform a classification of target "bunny" in 24 poses. As shown in Fig. 6, we choose 4 angles along x-axis ($\theta _{x}$) and 6 angles along z-axis ($\theta _{z}$), resulting in a total of 24 different poses. To eliminate the influence of the slightly mismatch of the pose between the simulation and the real-world experiment, we randomly add a variation of -5°~5° on both $\theta _{x}$ and $\theta _{z}$ when generating synthetic dataset. The classification accuracy on synthetic data is 97.42% on Fc and 98.35% on 1D-UNet. Both 2 networks have similar high accuracy on synthetic dataset, demonstrating their feasibility.

 figure: Fig. 6.

Fig. 6. The chosen 24 different poses of target "bunny", shown in 2D depth maps. The networks are trained to perform a classification task on these 24 poses.

Download Full Size | PDF

4.2 Indoor experiments

To validate our proposed approach on real-world data, we build an indoor experimental system. A photo of the setup is shown in Fig. 7. The collimated pulsed laser (532 nm) at a repetition rate of 11 MHz is scattered by a diffuser and illuminates the target (~0.35 m in size) about 3 m away, and the reflected photons are directly received by a single-mode fiber, whose FoV covers the whole target. The SPAD is synchronized with the laser, and the temporal resolution of the time digital converter (TDC) is 16 ps.

 figure: Fig. 7.

Fig. 7. The indoor experiment setup. (a) The collimated pulsed laser is divided into two paths after passing through the beam splitter (BS), one passes through the diffuser to flood-illuminate the target, and the other is coupled into the fiber and trigger the photodiode (PD) for synchronization. (b) A bared single-mode fiber is used as the receiver, whose field of view (FoV) covers the whole target.

Download Full Size | PDF

In indoor experiments of classifying different targets, we 3D-print two targets ("bunny" and "jet") to validate our approach on real data. The size of the two targets are ~35 cm which is consistent with synthetic dataset. For both targets, we use the system to collect the echo signal over a relatively long acquisition time (~15 s), and randomly split it into different acquisition times (10 ms, 50 ms, 100 ms), finally form real-word dataset under different signal levels. Besides, we change the background lighting in experiments to obtain real-world data with different noise levels.

The classification performance of the two networks on real-world data under different signal levels and SNRs are shown in Table 1 and Table 2, for target "bunny" and "jet" respectively. Note that we add the signal counts and noise counts across all 1024 time bins to give a "signal : noise" ratio. The results show that 1D-UNet generalizes well to real-world data by giving a accurate prediction over total 45 classes, and give a much higher accuracy than Fc, especially in weak signal level and low SNR conditions.

Tables Icon

Table 1. Classification accuracy for "bunny".

Tables Icon

Table 2. Classification accuracy for "jet".

In indoor experiments of classifying multiple poses of one target, we fix the 3D-printed target "bunny" on a two-axis rotation table so that we could rotate the target to the desired angle as shown in Fig. 6. Same as previously described, we obtain a real-world dataset with different signal levels and SNRs by randomly splitting long-acquisition-time data and changing the background lighting.

The classification results for different poses of the target on real-world data are shown in Table 3. The results indicate that our proposed 1D-UNet generalizes well to real-world data, and achieve higher accuracy than Fc, especially in weak signal level and low SNR conditions.

Tables Icon

Table 3. Classification accuracy for 24 poses of target "bunny".

The high performances of our proposed approach on real data, especially under the condition of weak signal level and strong noise, demonstrate the potential of our approach for applications such as identifying distant targets in high background noise.

4.3 Outdoor experiments

We conduct outdoor experiments to further illustrate the applicability of our approach in outdoor environments, including distinguishing two different types of drones and identifying different poses of one drone. Different from the aforementioned method in indoor experiments, the 3D models of the drones are unknown, making it difficult to generate a synthetic dataset to train the network, and the same problem may occur in practical applications. To address this issue, we expand the limited amount real data for network training. The results show that our approach performs well even trained on real data with only 1 s acquisition time.

A compact system is built to obtain the temporal histograms of the drones hovering in the air. The setup is shown in Fig. 8 (a) (b). The system employs a laser operating at a wavelength of 1560 nm and at the power of 110 mW, which generates laser pulses with a width of 1 ps at the repetition rate of 30 MHz. Two collimators are used for transmitting and receiving respectively, sharing the same divergence of 2.25 mrad. During the experiment, the drone hovers in the air about 200 m away from the system, where the FoV is about 0.45 m and can cover the drone. A compact homemade free-running InGaAs/InP SPAD [30] serves as the detector in our system. We use Time Tagger 20 as the TDC, whose time resolution is set to 10 ps in our experiments. The time jitter of the entire system is about 200 ps.

 figure: Fig. 8.

Fig. 8. Outdoor experiments. (a) Two collimators are used as the transmitter and the receiver respectively, and a visible-band camera helps the system align with the target. (b) The overall photo of the system. (c) The photo captured by the visible-band camera during the experiments, after aiming the system at the drone 200 m away. (d) DJI phantom 3 professional and (e) DJI mavic 3 are the two drones used as the targets in the experiments.

Download Full Size | PDF

Without knowing the 3D models of the drones, we train the network using real data obtained from our system. Considering the fact that target’s temporal histogram cannot be obtained with a long acquisition time in practical applications, we expand the real data obtained within only 1 s for training. Specifically, we randomly select photons from a single temporal histogram and randomly add noise, eventually expand the temporal histogram obtained within 1 s into 1000 temporal histograms.

In the outdoor experiment of distinguishing the two types of drones (Fig. 8 (c) (d)), we use the system to obtain the temporal histograms of the two drones with an acquisition time of 1 s. Together with the temporal histograms simulated from 8 other types of aerial targets, the real data are expanded for training. We further obtain multiple temporal histograms of the two drones at different times for testing, and the network is able to classify the two drones from all 10 different targets with an accuracy of 89%.

In the outdoor experiment of identifying different poses of one drone, we steer the drone (DJI mavic 3) in 15 different poses and use our system to obtain its temporal histograms, as illustrated in Fig. 9 (a). For each pose, we expand the temporal histogram collected within 1 s for training and other temporal histograms obtained at different times are used for testing. The trained network gives a prediction accuracy of 100% and 96.13% on training set and validation set respectively, and it is able to identify 15 different poses of the drone with an accuracy of 79.20% on real world testing set, and the confusion matrix is shown in Fig. 9 (b). These results indicate the applicability of our method in outdoor environment, and further illustrates the practical application prospect of our proposed approach.

 figure: Fig. 9.

Fig. 9. Experiment of identifying 15 different poses of the drone. (a) Close-up photos of the drone (DJI mavic 3) in 15 different poses, and their corresponding examples of temporal histograms acquired by our system within 1 second. (b) The performance of classifying 15 different poses of the drone, shown as a confusion matrix.

Download Full Size | PDF

5. Conclusion and future work

We propose a target identification approach using single-point single-photon LiDAR without scanning procedures or array detectors. Specifically, we use a deep-learning based approach to identify targets using the temporal histograms obtained from a single-point SPAD, including the classification of multiple targets and distinction of different poses of the targets. Simulations are performed to illustrate the feasibility of this approach, while both indoor and outdoor experiments are performed to further validate the approach. The results show that our proposed approach has high identification accuracy even under low SNR and is promising for practical applications. In the future, experiments can be performed on long range dynamic airborne targets of several kilometers or even tens of kilometers, making the proposed approach come into practical use.

Funding

Innovation Program for Quantum Science and Technology (2021ZD0300300); National Natural Science Foundation of China (62031024); Shanghai Municipal Science and Technology Major Project (2019SHZDZX01); Shanghai Academic/Technology Research Leader (21XD1403800); Shanghai Science and Technology Development Funds (22JC1402900); Key-Area Research and Development Program of Guangdong Province (2020B0303020001).

Acknowledgments

The authors acknowledge the helpful discussions with Yuan Cao, Wenwen Li and Zheng-Ping Li.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. K. Patel, K. Rambach, T. Visentin, D. Rusev, M. Pfeiffer, and B. Yang, “Deep learning-based object classification on automotive radar spectra,” in 2019 IEEE Radar Conference (RadarConf), (IEEE, 2019), pp. 1–6.

2. A. Zyweck and R. E. Bogner, “Radar target classification of commercial aircraft,” IEEE Trans. Aerosp. Electron. Syst. 32(2), 598–606 (1996). [CrossRef]  

3. S. Chen, H. Wang, F. Xu, and Y.-Q. Jin, “Target classification using the deep convolutional networks for sar images,” IEEE Trans. Geosci. remote sensing 54(8), 4806–4817 (2016). [CrossRef]  

4. J. J. Degnan, “Unified approach to photon-counting microlaser rangers, transponders, and altimeters,” Surv. Geophys. 22(5/6), 431–447 (2001). [CrossRef]  

5. J. J. Degnan, “Photon-counting multikilohertz microlaser altimeters for airborne and spaceborne topographic measurements,” J. Geodyn. 34(3-4), 503–549 (2002). [CrossRef]  

6. R. M. Marino and W. R. Davis, “Jigsaw: a foliage-penetrating 3d imaging laser radar system,” Lincoln Laboratory J. 15, 23–36 (2005).

7. A. M. Pawlikowska, A. Halimi, R. A. Lamb, and G. S. Buller, “Single-photon three-dimensional imaging at up to 10 kilometers range,” Opt. Express 25(10), 11919–11931 (2017). [CrossRef]  

8. Z.-P. Li, X. Huang, Y. Cao, B. Wang, Y.-H. Li, W. Jin, C. Yu, J. Zhang, Q. Zhang, C.-Z. Peng, F. Xu, and J.-W. Pan, “Single-photon computational 3d imaging at 45 km,” Photonics Res. 8(9), 1532–1540 (2020). [CrossRef]  

9. Z.-P. Li, J.-T. Ye, X. Huang, P.-Y. Jiang, Y. Cao, Y. Hong, C. Yu, J. Zhang, Q. Zhang, C.-Z. Peng, F. Xu, and J.-W. Pan, “Single-photon imaging over 200 km,” Optica 8(3), 344–349 (2021). [CrossRef]  

10. M. O’Toole, D. B. Lindell, and G. Wetzstein, “Confocal non-line-of-sight imaging based on the light-cone transform,” Nature 555(7696), 338–341 (2018). [CrossRef]  

11. X. Liu, I. Guillén, M. La Manna, J. H. Nam, S. A. Reza, T. Huu Le, A. Jarabo, D. Gutierrez, and A. Velten, “Non-line-of-sight imaging using phasor-field virtual wave optics,” Nature 572(7771), 620–623 (2019). [CrossRef]  

12. D. B. Lindell, G. Wetzstein, and M. O’Toole, “Wave-based non-line-of-sight imaging using fast fk migration,” ACM Trans. Graph. 38(4), 1–13 (2019). [CrossRef]  

13. D. Lu and Q. Weng, “A survey of image classification methods and techniques for improving classification performance,” Int. journal of Remote sensing 28(5), 823–870 (2007). [CrossRef]  

14. W. Rawat and Z. Wang, “Deep convolutional neural networks for image classification: A comprehensive review,” Neural Comput. 29(9), 2352–2449 (2017). [CrossRef]  

15. S. Scholes, A. Ruget, G. Mora-Martín, F. Zhu, I. Gyongy, and J. Leach, “Dronesense: The identification, segmentation, and orientation detection of drones via neural networks,” IEEE Access 10, 38154–38164 (2022). [CrossRef]  

16. Z.-P. Li, X. Huang, P.-Y. Jiang, Y. Hong, C. Yu, Y. Cao, J. Zhang, F. Xu, and J.-W. Pan, “Super-resolution single-photon imaging at 8.2 kilometers,” Opt. Express 28(3), 4076–4087 (2020). [CrossRef]  

17. A. Maccarone, F. M. Della Rocca, A. McCarthy, R. Henderson, and G. S. Buller, “Three-dimensional imaging of stationary and moving targets in turbid underwater environments using a single-photon detector array,” Opt. Express 27(20), 28437–28456 (2019). [CrossRef]  

18. J. Tachella, Y. Altmann, N. Mellado, A. McCarthy, R. Tobin, G. S. Buller, J.-Y. Tourneret, and S. McLaughlin, “Real-time 3d reconstruction from single-photon lidar data using plug-and-play point cloud denoisers,” Nat. Commun. 10(1), 4984–4986 (2019). [CrossRef]  

19. I. Gyongy, S. W. Hutchings, A. Halimi, M. Tyler, S. Chan, F. Zhu, S. McLaughlin, R. K. Henderson, and J. Leach, “High-speed 3d sensing via hybrid-mode imaging and guided upsampling,” Optica 7(10), 1253–1260 (2020). [CrossRef]  

20. L. Turpin, G. Musarra, V. Kapitany, F. Tonolini, A. Lyons, I. Starshynov, F. Villa, E. Conca, F. Fioranelli, R. Murray-Smith, and D. Faccio, “Spatial images from temporal data,” Optica 7(8), 900–905 (2020). [CrossRef]  

21. Ö. Yıldırım, U. B. Baloglu, and U. R. Acharya, “A deep convolutional neural network model for automated identification of abnormal eeg signals,” Neural Comput. Appl. 32(20), 15857–15868 (2020). [CrossRef]  

22. S. Khessiba, A. G. Blaiech, K. Ben Khalifa, A. Ben Abdallah, and M. H. Bedoui, “Innovative deep learning models for eeg-based vigilance detection,” Neural Comput. Appl. 33(12), 6921–6937 (2021). [CrossRef]  

23. D. Li, J. Zhang, Q. Zhang, and X. Wei, “Classification of ecg signals based on 1d convolution neural network,” in 2017 IEEE 19th International Conference on e-Health Networking, Applications and Services (Healthcom), (IEEE, 2017), pp. 1–6.

24. O. Abdeljaber, O. Avci, S. Kiranyaz, M. Gabbouj, and D. J. Inman, “Real-time vibration-based structural damage detection using one-dimensional convolutional neural networks,” J. Sound Vib. 388, 154–170 (2017). [CrossRef]  

25. L. Peng, S. Xie, T. Qin, L. Cao, and L. Bian, “Image-free single-pixel object detection,” Opt. Lett. 48(10), 2527–2530 (2023). [CrossRef]  

26. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, (Springer, 2015), pp. 234–241.

27. H. Cao, Y. Wang, J. Chen, D. Jiang, X. Zhang, Q. Tian, and M. Wang, “Swin-unet: Unet-like pure transformer for medical image segmentation,” in European Conference on Computer Vision, (Springer, 2022), pp. 205–218.

28. A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu, “Shapenet: An information-rich 3d model repository,” arXiv, arXiv:1512.03012 (2015). [CrossRef]  

29. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv, arXiv:1412.6980 (2014). [CrossRef]  

30. C. Yu, J. Qiu, H. Xia, X. Dou, J. Zhang, and J.-W. Pan, “Compact and lightweight 1.5 μ m lidar with a multi-mode fiber coupling free-running ingaas/inp single-photon detector,” Rev. Sci. Instrum. 89(10), 103106 (2018). [CrossRef]  

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (9)

Fig. 1.
Fig. 1. An illustration of our proposed approach. (a) Our single-point single-photon LiDAR for target identification. A pulsed laser flood-illuminates the target. The reflected photons are detected by a single-point SPAD and a temporal histogram is obtained. (b) A CNN is adopted to identify the class and the pose of the target, according to the temporal histogram.
Fig. 2.
Fig. 2. An example of projecting a 3D target. To generate synthetic data, the 3D object is projected to 2D maps of reflectivity and depth according to the view direction of the receiver/transmitter. Note that the values in thee depth map represent relative depths.
Fig. 3.
Fig. 3. An illustration of the invariances of the temporal histogram. The blue histogram and the yellow histogram should be equivalent for target identification. Preprocessing is needed to ensure these invariances. (a) Intensity invariance: the overall intensity of the histogram should not affect the classification result. (b) Shifting invariance: the absolute distance between the target and the single-point single-photon LiDAR, i.e. the position of the signal peak should not affect the classification result.
Fig. 4.
Fig. 4. The architecture of our 1D-UNet. The input is a one-channel 1D histogram with a length of 1024, and the length of the output vector is $N_{C}$ , where $N_{C}$ is the desired number of classes.
Fig. 5.
Fig. 5. Normalized confusion matrix for the classification of 45 different targets, on (a) Fc and (b) 1D-UNet, tested on synthetic dataset.
Fig. 6.
Fig. 6. The chosen 24 different poses of target "bunny", shown in 2D depth maps. The networks are trained to perform a classification task on these 24 poses.
Fig. 7.
Fig. 7. The indoor experiment setup. (a) The collimated pulsed laser is divided into two paths after passing through the beam splitter (BS), one passes through the diffuser to flood-illuminate the target, and the other is coupled into the fiber and trigger the photodiode (PD) for synchronization. (b) A bared single-mode fiber is used as the receiver, whose field of view (FoV) covers the whole target.
Fig. 8.
Fig. 8. Outdoor experiments. (a) Two collimators are used as the transmitter and the receiver respectively, and a visible-band camera helps the system align with the target. (b) The overall photo of the system. (c) The photo captured by the visible-band camera during the experiments, after aiming the system at the drone 200 m away. (d) DJI phantom 3 professional and (e) DJI mavic 3 are the two drones used as the targets in the experiments.
Fig. 9.
Fig. 9. Experiment of identifying 15 different poses of the drone. (a) Close-up photos of the drone (DJI mavic 3) in 15 different poses, and their corresponding examples of temporal histograms acquired by our system within 1 second. (b) The performance of classifying 15 different poses of the drone, shown as a confusion matrix.

Tables (3)

Tables Icon

Table 1. Classification accuracy for "bunny".

Tables Icon

Table 2. Classification accuracy for "jet".

Tables Icon

Table 3. Classification accuracy for 24 poses of target "bunny".

Equations (8)

Equations on this page are rendered with MathJax. Learn more.

τ ( n ) = S n Δ t ( n + 1 ) Δ t α ( x , y ) v ( x , y ) ( g f ) ( t 2 z ( x , y ) c ) d t d x d y ,
τ ( n ) = i , j = 1 N C β ( i , j ) w ( [ n 2 z ( i , j ) c Δ t ] ) ,
h ( n ) P ( τ ( n ) + b ) = P ( C i , j = 1 N β ( i , j ) w ( [ n 2 z ( i , j ) c Δ t ] ) + b ) ,
h n o r m ( n ) = h ( n ) i = 0 1023 h ( i ) .
h ^ ( n ) = h n o r m ( ( n 512 + K )   m o d   1024 ) .
i = 0 K 1 h n o r m ( i ) < 0.5 i = 0 K h n o r m ( i ) .
L ( p , q ) = i = 0 N C 1 p i l o g e x p ( q i ) i = 0 N C 1 e x p ( q i ) ,
p i = { 1   i = T 0   o t h e r w i s e , i = 0 , 1 , , N C 1.
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.