Abstract
Detection of objects outside the line of sight remains a challenge in many practical applications. There have been various researches realizing 2D or 3D imaging of static hidden objects, whose aim are to improve the resolution of reconstructed images. While when it comes to the tracking of continuously moving objects, the speed of imaging and the accuracy of positioning becomes the priorities to optimize. Previous works have achieved centimeter-level or even higher precision of positioning through marking coordinates in intervals of 3 seconds to tens of milliseconds. Here a deep learning framework is proposed to realize the imaging and dynamic tracking of targets simultaneously using a standard RGB camera. Through simulation experiments, we firstly use the designed neural network to achieve positioning of a 3D mannequin with sub-centimeter accuracy (relative error under 1.8%), costing only 3 milliseconds per estimation in average. Furthermore, we apply the system to a physical scene to successfully recover the video signal of the moving target, intuitively revealing its trajectory. We demonstrate an efficient and inexpensive approach that can present the movement of objects around the corner in real time, profiting from the imaging of the NLOS scene, it is also possible to identify the hidden target. This technique can be ultilized to security surveillance, military reconnaissance, autonomous driving and other fields.
© 2022 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement
1. Introduction
The advent of imaging equipment with high sensitivity and high resolution, along with the booming of computational imaging technology, makes it possible to ‘see around corners’. This technique is called Non-line-of-sight Imaging (NLOS Imaging). It was first proposed in 2008 by Raskar et al. from MIT, and has been widely applied to detect objects outside the field of view (FOV) of imaging equipment [1]. Due to the breakthrough of perspective limitation, NLOS imaging has a very broad application in fields such as disaster relief, military counter-terrorism, medical imaging, and assisted driving [2].
According to the time resolution of the imaging equipment, NLOS imaging can be divided into transient imaging and steady-state imaging. Early research mainly focused on transient imaging, using modulated pulse laser with high frequency as light source to illuminate NLOS scenes, and expensive imaging equipment such as streak camera or single-photon avalanche diode (SPAD) to detect the reflected laser. Benefit from the ultra-high temporal resolution of these photodetectors, the time-of-flight (ToF) information of the laser can be captured. By exploiting the ToF ranging principle similar to LiDAR, the light transmission model can be established, and image reconstruction can then be operated through various optimization algorithms [3–5]. In contrast, steady-state imaging usually employs standard CMOS camera, where the speed of light is assumed to be infinite so that the propagation time of diffuse light in the scene can be ignored. By utilizing the total light intensity the sensor received during the exposure time [6–8] or the optical memory effect of speckle coherence [9,10], reconstruction work can be accomplished.
1.1 Deep learning for non-line-of-sight imaging
No matter transient imaging or steady-state imaging, traditional techniques require to establish the light transmission model from the hidden object to the imaging device, and then use the reflected light to inversely calculate the actual information of the target through optimization algorithms. This is an ill-posed inverse problem, which involves a large number of high-dimensional matrix operations, and it is much more difficult to obtain an accurate model containing environmental uncertainties such as noise and aberration in actual imaging process [11]. Deep learning has been applied to NLOS imaging in recent years. By inputting massive data samples beforehand, neural networks can learn the complex nonlinear association between physical measurement and the information of targets, so as to achieve rapid reconstruction [12].
1.2 Tracking moving objects around the corner
Previous works have been delved into reconstructing images of NLOS scenes. However, in certain applications such as detection of vehicles and pedestrians approaching from blind areas in autonomous driving, or detection of indirectly accessible suspects in criminal investigation, not only imaging, capturing the location of the target is also acquired. In relative researches, Gariepy et al. and Chan et al. respectively used lasers and SPAD to obtain the time-photon count histogram of each pixel. After Gaussian fitting, the joint probability distribution of the target location is calculated using all pixels’ measurement [13,14]. Smith et al. used speckle correlation to achieve high-precision tracking of two objects [15].
All the above studies can only present the position of the target in the form of coordinates, while our goal is to achieve intuitive, real-time tracking of moving objects through rapid imaging. Based on these requirements, an approach using deep learning with no enhancement of active light source method is designed. We select generative neural network and convolutional neural network suitable for image processing to complete the reconstruction work. Firstly, a simulated scene is built to reconstruct the classic MNIST digits dataset using the proposed neural network, proving the imaging ability of the network when applied to NLOS scene. Secondly, coordinates of a non-self-luminous 3D mannequin in a NLOS room are also recovered. Moreover, in order to verify the performance of our system in practical application, a physical scene simulating conventional interior lighting environment is set up, and the mannequin undergoing continuous motion is detected by a RGB camera placed outside the scene. This experiment eventually realized the combination of NLOS imaging and tracking: images of the mannequin in the reconstructed video can change with the real target in real time, accurately reflecting its trajectory. The system provides a more intuitive way to image and track moving objects in corners.
2. Experimental procedure
2.1 Experimental setup
For the reason that deep learning is essentially a ‘data-driven’ approach, a host of sample data need to be obtained in advance to train the network before putting the system into practical application. Although this feature makes considerable preparatory work a necessity when the NLOS scene changes, it is not a problem for relatively fixed scenes such as security monitoring areas. In experimental operations, the collection of training data is a time-consuming task. For the purpose of improving efficiency, a software is first used to conduct simulation experiments. Here we select the physical-based 3D image software, Blender with Cycle renderer, to build the NLOS scene.
According to the basic scene commonly proposed in previous studies, the camera cannot directly shoot the target due to the existence of certain obstruction (e.g. walls), but it can capture the diffuse light reflected by the relay wall to indirectly obtain the targets’ information (Fig. 1). We build up two NLOS scenes to study the feasibility of the system in NLOS imaging and positioning respectively.
Scene No. 1 is shown in Fig. 2(a), representing the concept of ‘Accidental Pinhole Cameras’ proposed by Torralba et al. in 2012 [16]: It simulates that the object is located indoors, and the light emitted from this object can be projected to a diffuse surface outdoors through the window hole on the wall, then captured by a camera faced to the surface. The role of this window hole is to act as an occluder, which can form penumbra information of the target on the relay wall. The targets are MNIST data set with a picture resolution of 28*28. In Blender, a flat-panel display is simulated by setting a plane of the same size as luminescent material, and mapping the numeral images as material nodes to it (Fig. 2(b)). We write a piece of code to continuously switch numeral images mapped to the plane, and at each time the software generates the diffuse image by rendering (Fig. 2(c)), so as to obtain the training set in which the numeral images and the diffuse images are one-to-one corresponded.
In the above scene, the target is set as a self-luminous plane. However, most objects that need to be detected in real life are 3D (such as human body) and cannot shine actively. It can only transmit information by reflecting ambient light. Generally, the light is weakened a lot after multiple reflections, as a result, very little information can be captured.
In order to study the positioning ability of the neural network in such case, the following ‘L-shaped’ scene is constructed (Fig. 3): The target to be observed is a mannequin, and its surface material is set to Diffuse Reflection BSDF (roughness rate: 0.0) according to the albedo characteristics of human skin. In order to record the position (x and y coordinates in Blender) of the target more precisely, we write a piece of code to control its random movement in an enclosure space (simulating indoor scene). A chair is placed between the target and the relay wall as an occluder, simulating the partial occlusion of reflected light by furniture or other objects in real indoor environment. An ordinary lamp that emits incoherent light is installed on the top of the room, the light irradiates on the mannequin, and the reflected light projects penumbra information on the relay wall with the effect of the chair-shaped obstruction. Diffuse images taken by the camera (placed in the ‘corridor’) are generated by rendering.
2.2 Deep convolutional inverse graphics network for image reconstruction
2.2.1 Network structure
Among various deep neural networks, generative neural networks are often used for image processing. In this paper we design a generative network called Deep Convolutional Inverse Graphics Network (DCIGN). Its basic structure is shown in Fig. 4(a) [17], which mainly contains the Encoder before the latent variables z and the Decoder after them. The encoder is composed of a convolutional neural network (CNN). Its function is to extract and abstract features from the input image by conducting down-sampling through convolutional layers and pooling layers. The latent variables z is comprised of a series of mean and variance obtained through reparameterization trick. We hope that z can be as close as possible to the characteristic probability distribution of the input data. The decoder consists of a deconvolutional neural network (DNN), which uses transpose convolutional layers to randomly sample from each feature distribution, and then generate output images by up-sampling to enhance the dimension (Fig. 4(b)).
2.2.2 Selection of loss function
Loss function is a tool to evaluate the difference between network outputs and real data. According to the characteristics of DCIGN, the loss function used in this paper is divided into two parts.
The first part is reconstruction loss, which measures the differences between images generated by the decoder and the labels (real images) in the data set. We select Sigmoid Cross Entropy function frequently-used in image processing to evaluate the reconstruction loss. First, the pixel values of output images are scaled to interval (0,1) through Sigmoid function, and its cross entropy is calculated:
The second part is KL Divergence, which is used to evaluate the error between the probability distribution of latent variables z and the actual feature distribution of real images. It is usually assumed that the actual feature distribution $p(z )$ obeys the standard normal distribution $N(0,1)$, and the probability distribution $p(z|x)$ obtained from training follows the normal distribution $N(\mu ,{\sigma ^2})$, then the KL Divergence of the two is:
For the purpose of diminishing the error as much as possible, the KL divergence and the Cross Entropy function are required to reach their minimum. Therefore, the training of DCIGN is actually a process of conducting minimal optimization and inverse gradient propagation to the loss function.
2.2.3 Selection of optimizer
The optimizer selected in this paper is Adam optimizer. It combines the advantages of Stochastic Gradient Descent (SGD) with Momentum and RMSprop, and has a very prominent parametric optimization effect [18].
The basic principle of SGD with Momentum is to construct the velocity $V$ during backpropagation, adding the gradient of the loss function to the velocity, then use the velocity to update the parameters:
where $\rho $ represents friction, normally be 0.9-0.99; $\omega $ represents the parameters of the network; $\alpha $ is the learning rate; f stands for the loss function and $\nabla {f_{\omega t}}$ is the gradient of the loss function with respect to the parameter ${w_t}$.By using the defined velocity to update parameters, even if the gradient changes to zero at the saddle point or the local optimum, the parameter updating will not stop because the velocity is still not zero. In addition, SGD with Momentum can transform the drastically changed data into a gentler transition state, so as to obtain a smoother descent effect and eventually accelerate the speed of gradient descent.
RSMprop is to calculate the exponentially weighted average of the square of the gradient:
In the direction with a smaller gradient, the denominator $(\nabla f_{{w_t}}^2 + \varepsilon )$ in the fraction reduces as the parameters are updated, which makes the update stride larger; While in the direction with a larger gradient, the denominator $(\nabla f_{{w_t}}^2 + \varepsilon )$ increases as the parameters are updated, making the update stride decrease. It can effectively prevent excessive oscillation of the parameters updating during gradient descent, approaching the optimal parameters faster.
2.3 Convolutional neural network for positioning
Since only the coordinates of two dimensions (scalars) are needed to estimate the location of the object, the structure of the neural network can be greatly simplified compared with the generative network used to recover images. Here, we only need the Convolutional Neural Network (CNN). According to the characteristic of diffuse images obtained in scene No.2, adjustments are made to the number of the convolutional layers and the size of the convolutional kernels. The network structure is shown in Fig. 5: after five convolutional layers (a blend of batch normalization layers and max-pooling layers) compressing features, three fully connected layers extract the location information, outputting x, y coordinates. The Cross Entropy loss function and Adam optimizer are also used to train this network.
3. Results
3.1 Image reconstruction with self-luminous plane
For scene No.1 described in Section 2.1, 10000 MNIST handwritten numeral images are randomly sampled, the relative disuse images are rendered in Blender, among which 8000 are used as training set, 1000 as validation set and 1000 as test set. the Regions of Interest (ROI, 400 pixels by 400 pixels) containing effective diffuse information are firstly extracted, and the perspective transformation function in OpenCV, a computer vision library, is used to transform the ROI images from an oblique perspective into a positive perspective. The size of each processed diffuse image is 180 pixels by 200 pixels. Since all networks used in this paper are built on TensorFlow, a machine learning framework for computing tensors, it is also necessary to convert the image data into tensor data and then perform normalization. After going through all the above preprocessing procedures, the data can be eventually sent into the neural network.
In terms of the selection of hyper-parameters, the learning rate is set to 0.001, the epoch is set to 200, and each batch contained 50 images. ReLU function is chosen to be the activation function. Compared with the Sigmoid function widely used before, since the gradient of ReLU function keeps 1 in the positive range of x, it can effectively avoid the saturation of gradient update during back-propagation. As a result, ReLU function is generally used as the activation function for large-scale deep neural networks. Dimension of each layer is shown in Fig. 6. GTX 1080Ti graphics card is used to execute the training process, which takes about 6 hours in total.
Figure 7 shows the results of network training. The network contains 96,909 parameters altogether, among which 96,881 are trainable. The decline curve of the loss function (Fig. 7(a)) shows as the number of iterations increases, the value of loss function drops rapidly and then flattens out, indicating that the training process has achieved expectation. Figure 7(b) presents partial reconstruction images of test set, it turns out that after only 200 epochs, DCIGN can reconstruct the handwritten numeral images very accurately, which means the reconstruction effect of this well-trained network on the test set is satisfactory. In addition, the entire construction process carries out very quickly, which only takes about 3 ms for each image. This experiment has fully confirmed the feasibility of applying the proposed DCIGN to the NLOS scene, aiming to imaging some self-luminous plane objects.
3.2 Positioning with non-self-luminous 3D model
For scene No.2 which is used to explore the location-recovery capacity of the neural network, the diffuse images are preprocessed through procedures similar to the simulation experiment of numerical images explained in Section 3.1: converting into tensor data after perspective transformation. The learning rate is set to 0.001, the batch size and epoch are both set to 500. The diffuse images and the ground truth of coordinates are sent into CNN as inputs and labels for training. 1000 groups of data are randomly selected in advance to test the generalization performance of the network. Partial results are shown in Table 1, the formula $|{a - \hat{a}} |/a$ is applied to calculate the relative error between the output and the ground truth. Over 1000 test data, the average error is about 1.831%, which manifests that CNN can use diffuse images to achieve sub-centimeter level of swift and accurate positioning.
4. Application in physical scene
In the above section, the reliability of deep learning in coping with NLOS imaging and positioning has been fully confirmed. Moreover, compared with traditional methods, it can achieve rapid recovery (millisecond level) in that there is no need to establish complex optical transmission model. This superiority provides a new idea to track objects continuously moving in NLOS scenes. However, these above experiments are designed separately, and the first numerical study is conducted under ideal conditions where the targets are static and there is no ambient light in the scene. Tancik et al. from MIT firstly realized dynamic tracking of 3D geometries with CNN through simulation experiment [19]. In their work, objects trace an infinity sign and a circular path. Every frame of video is sent into CNN to estimate position, and the moving trail of the target is eventually presented in the grid. Yanpeng Cao et al. also designed a NLOS-LUCAI frame based on CNN, which can carry out ambient compensation, and realize the precise positioning of 3D mannequin at different positions in a grid region under the interference of changing ambient illumination [20]. Prior to this, there also have been several studies using different devices and methods to track objects around the corner, but all of them were only to recover the coordinates [13–15,21]. Besides, former approaches such as Tancik’s and Yanpeng Cao’s all require additional light source to actively illuminate the NLOS scene. This may bring more information of the hidden target to the imaging device, however, it also poses interference to the scene. In certain applications such as criminal investigation, any interference could incur danger. In order to explore a more convert and intuitive approach to track objects in NLOS scene, the following experiment is conducted to verify the rapid imaging ability of DCIGN to non-self-luminous moving targets.
We use PVC expansion sheets to build a scene similar to scene No.2 (Fig. 8): the target (2D flat mannequin) can move freely in the NLOS space. A desk lamp emitting white incoherent light is placed on top to illuminate the NLOS scene. In the experiment, two standard H264-encoded camera modules produced by Rayvision are used. The camera sensor is CMOS IMX322 from SONY, whose focal length is 4 mm, and the viewing angle is about 70°. One (camera 1) located outside the scene is responsible for photographing the diffuse surface. The other one (camera 2) is fastened on the top of the diffuse surface to directly shoot the mannequin, the pictures are used as labels in the training set.
Since the light emitted by the lamp can also illuminate the diffuse surface without reaching the target, light that does not contain effective target information will also be captured by camera 1 through reflection, directly use these images will make the effect of reconstruction deteriorate, and even being impossible to distinguish. As a result, measures need to be taken to eliminate the influence of ambient light and to improve the signal-to-noise ratio (SNR). Here background subtraction is employed: Before experiment, we use camera 1 to take a picture of the diffuse surface when there is no target in the scene, this picture is used as the background image. Differential processing is conducted between the background image and diffuse images. Then sigmoid function is applied to amplify the difference. After all these procedures the processed images are sent into the network.
The proposed DCIGN is still used for this experiment. During training process, the position of the mannequin shifts randomly and continuously, camera 1 and camera 2 are controlled by code to simultaneously take 10000 groups of diffuse images and reference images which act as training data. After training, camera 1 shoots a 15-second video of the moving mannequin, which is sent to DCIGN for reconstruction frame by frame. Part of the results are shown in Fig. 9: Fig. 9(a) and 9(b) respectively, shows the real diffuse images and the corresponding ones after background subtraction, it can be seen that as the mannequin moves, it is hard for naked eyes to differentiate changes of reflection on the diffuse surface, but after differential processing, the variation of penumbra information can be clearly presented. The network output clearly shows the contour and posture of the mannequin (Fig. 9(d)).
4.1 Test of generalization
The above experiment has successfully reconstructed images of a continuously moving mannequin under the interference of ambient light. Furthermore, in order to verify the generalization of the proposed DCIGN, three more mannequins with different postures are applied to the system. The mannequins are sequentially numbered as No.1-3, among which mannequin No.1 and No.2 are used to train the network, so as to improve its reconstruction capacity a step further. And mannequin No. 3 is used to test the generalization. The shape of each one is shown in Fig. 10.
To ensure consistency, 10000 groups of training data are respectively captured for each training mannequin (No.1 and No. 2). Through unified preprocessing procedure (perspective transformation, background subtraction and so on), all data are sent to the DCIGN to train for 500 epochs. Partial reconstruction results are presented in Fig. 11(a).
After the whole training process, 400 diffuse images of mannequin No.3 shot by camera 1 are used for testing. The reconstruction images shown in Fig. 11(b) reveal that even though images of mannequin No. 3 are not involved in the training data, DCIGN still can rebuild its rough images.
Combined with the first simulation experiment, the imaging ability of the proposed DCIGN has been fully confirmed, no matter for static plane objects or moving objects. Besides, DCIGN has superiority in the speed of each reconstruction, which only take around 3ms, which is much less than a frame of regular video (say a 60 fps video, about 16.7 ms per frame). This reconstruction speed could completely meet the requirement of real-time presentation in the form video.
5. Discussion
In this paper, a deep-learning-based framework for non-line-of-sight imaging and positioning is proposed. Through simulation experiments, it has been verified that this ‘data-driven’ method can achieve very fast reconstruction speed, which provides a better solution for dynamic tracking of hidden objects. The deep convolutional inverse graphics network we designed realizes the real-time tracking of a non-self-illuminating model moving in a blind area. Table 2 shows the comparison between our method and previous works. The deep learning method based on neural network can guarantee high positioning accuracy (sub-cm level) and simultaneously reduce the time required for a single reconstruction in a dramatic way, requiring only a few milliseconds, which is significantly lower than the resolution limit of human eyes. Being beneficial from this, our method can intuitively demonstrate the trajectory of targets in the form of video through continuous rapid imaging. Meanwhile, since the video directly shows the recovered image of the scene, it is also possible to identify the target.
In addition, the powerful capability of neural networks in information extraction and nonlinear fitting also makes the establishment of light transmission model no longer necessary, thus considerably simplifying the imaging system. Only standard RGB camera and ordinary light source (if the target object is not self-luminous and the scene is too dark) are required to complete our goal, as a result, the cost is extremely reduced. The deep learning method proposed provides an effective approach for NLOS imaging and tracking of moving targets in practical application. In certain cases, such as criminal investigation and security monitoring, where there may be no direct access to the scene, but people still have to observe the whereabouts of the target in the scene, this technology will show great significance.
Even for the above potential values, our approach still has a long way to go. From Fig. 9 and Fig. 11, it is obvious that the reconstruction images are sufficiently recognizable for users to identify the overall shape and trajectory of the hidden targets, but they are still rough. Moreover, reconstruction effect deteriorates dramatically when the mannequin is too far away from the diffuse wall (over 100 cm), which mainly results from the little ambient light reflected off the mannequin. Improvements to the performance of the network is the main objective of our subsequent research.
Funding
Basic Research Program of Jiangsu Province (BK20212006); National Natural Science Foundation of China (6210031456); Fundamental Research Funds for the Central Universities (2242021K1G005).
Disclosures
The authors declare no conflicts of interest.
Data Availability
Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.
References
1. R. Ramesh and J. Davis, “5d time-light transport matrix: What can we reason about scene properties?” Tech. Rep (Massachusetts Institute of Technology, 2008).
2. T. Maeda, G. Satat, T. Swedish, L. Sinha, and R. Raskar, “Recent advances in imaging around corners,” arXiv:1910.05613 (2019).
3. A. Velten, T. Willwacher, O. Gupta, A. Veeraraghavan, M. G. Bawendi, and R. Raskar, “Recovering three-dimensional shape around a corner using ultrafast time-of-flight imaging,” Nat. Commun. 3(1), 745 (2012). [CrossRef]
4. J. Rapp, C. Saunders, J. Tachella, J. M-Bruce, Y. Altmann, J.-Y. Tourneret, S. McLaughlin, R. M. A. Dawson, F. N. C. Wong, and V. K. Goyal, “Seeing around corners with edge-resolved transient imaging,” Nat. Commun. 11(1), 5929 (2020). [CrossRef]
5. F. Heide, L. Xiao, W. Heidrich, and M. B. Hullin, “Diffuse Mirrors: 3D Reconstruction from Diffuse Indirect Illumination Using Inexpensive Time-of-Flight Sensors,” in 32th Computer Vision and Pattern Recognition (CVPR) (2014), pp. 3222.
6. C. Saunders, J. Murray-Bruce, and V. K. Goyal, “Computational periscopy with an ordinary digital camera,” Nature 565(7740), 472–475 (2019). [CrossRef]
7. T. Maeda, Y. Wang, R. Raskar, and A. Kadambi, “Thermal Non-Line-of-Sight Imaging,” in 11th Computational Photography (ICCP) (2019), pp. 1–11.
8. M. Tancik, G. Satat, and R. Raskar, “Flash photography for data-driven hidden scene recovery,” arXiv:1810.11710 (2018).
9. M. Batarseh, S. Sukhov, Z. Shen, H. Gemar, R. Rezvani, and A. Dogariu, “Passive sensing around the corner using spatial coherence,” Nat. Commun. 9(1), 3629 (2018). [CrossRef]
10. S. Divitt, D. Gardner, and A. Watnik, “Imaging around corners in the mid-infrared using speckle correlations,” Opt. Express 28(8), 11051–11064 (2020). [CrossRef]
11. D. Faccio, A. Velten, and G. Wetzstein, “Non-line-of-sight imaging,” Nat. Rev. Phys. 2(6), 318–327 (2020). [CrossRef]
12. S. Li, M. Deng, J. Lee, A. Sinha, and G. Barbastathis, “Imaging through glass diffusers using densely connected convolutional networks,” Optica 5(7), 803–813 (2018). [CrossRef]
13. G. Gariepy, F. Tonolini, R. Henderson, J. Leach, and D. Faccio, “Detection and tracking of moving objects hidden from view,” Nat. Photonics 10(1), 23–26 (2016). [CrossRef]
14. S. Chan, R. Warburton, G. Gariepy, J. Leach, and D. Faccio, “Non-line-of-sight tracking of people at long range,” Opt. Express 25(9), 10109–10117 (2017). [CrossRef]
15. B. M. Smith, M. O’Toole, and M. Gupta, “Tracking Multiple Objects Outside the Line of Sight Using Speckle Imaging,” in 36th Computer Vision and Pattern Recognition (CVPR) (2018), pp. 6258–6266.
16. A. Torralba and W. T. Freeman, “Accidental pinhole and pinspeck cameras: Revealing the scene outside the picture,” in30th Computer Vision and Pattern Recognition (CVPR) (2012), pp. 374-381.
17. D. P. Kingma, M.Welling, “Auto-encoding variational bayes,” arXiv:1312.6114 (2013).
18. I. GoodFellow, Y. Bengio, and A. Courville, Deep Learning (Massachusetts Institute of Technology, 2016), Chap. 8.
19. M. Tancik, G. Satat, and R. Raskar, “Flash photography for data-driven hidden scene recovery,” arXiv:1810.11710 (2018).
20. Y. Cao, R. Liang, J. Yang, Y. Cao, Z. He, J. Chen, and X. Li, “Computational framework for steady-state NLOS localization under changing ambient illumination conditions,” Opt. Express 30(2), 2438–2452 (2022). [CrossRef]
21. J. Klein, C. Peters, J. Martín, M. Laurenzis, and M. B. Hullin, “Tracking objects outside the line of sight using 2D intensity images,” Sci. Rep. 6(1), 32491 (2016). [CrossRef]