## Abstract

Real-time 3-D tracking of a fast-moving object has found important applications in industry, traffic control, sports, biomedicine, defense, etc. However, it is difficult to adopt typical image-based object tracking systems in a fast-moving object tracking in real time and for a long duration, because reliable and robust image processing and analysis algorithms are often computationally exhausted, and limited storage and bandwidth can hardly fulfill the great demand of high-speed photography. Here we report an image-free 3-D tracking approach. The approach uses only two single-pixel detectors and a high-speed spatial light modulator for data acquisition. By illuminating the target moving object with six single-period Fourier basis patterns, the approach is able to analytically calculate the position of the object with the corresponding single-pixel measurements. The approach is low-cost, and data- and computation-efficient. We experimentally demonstrate that the proposed approach can detect and track a fast-moving object at a frame rate of 1666 frames per second by using a 10,000 Hz digital micromirror device. Benefiting from the wide working spectrum of single-pixel detectors, the reported approach might be applicable for hidden fast-moving object tracking.

© 2020 Optical Society of America

Real-time tracking of a fast-moving object in 3-D space has found many applications in various fields, such as particle tracking in biomedicine, vehicle navigation in public transport, aircraft monitoring in defense, etc.

In comparison with RADAR systems [1], image-based object tracking systems are relatively low-cost and applicable for a wider range of applications. With the advance of image sensors and other imaging components, image-based systems allow for tracking objects of multiple scales. For example, monocular vision [2,3] and stereoscopic vision [4] are applicable for macro object tracking. Light field microscopy [5], digital holographic microscopy [6,7], and focus scanning microscopy [8,9] are applicable for both micro-object tracking, and nano-object tracking. In these systems, object tracking is achieved within two steps. First, a sequence of images of the target moving object is captured. Secondly, the position of the target object is extracted from the captured images through image processing and analysis.

The blurs in the images captured are the primary factor that affects tracking accuracy. To deal with the motion blurs, high temporal resolution of the imaging system and advanced image processing or analysis algorithms are desired. Although it enables a much higher temporal resolution than conventional photography, high-speed photography is resource consuming. Specifically, it generates a huge amount of image data, even in a short period, which demands a large capacity of data store and a broad bandwidth of data transfer. Consequently, it is challenging to apply high-speed photography for a long duration. On the other hand, advanced image processing and analysis algorithms are typically computationally exhausted. It is difficult to apply these algorithms in real-time object tracking. Consequently, real-time and long-duration high-speed moving object detection and tracking using an image-based system remain a challenge.

Actually, the essence of object tracking is to calculate the position of the target object, or mathematically to determine three coordinates ($x$, $y$, and $z$). In the image-based systems, images, each of which typically consists of millions of pixels, are generated, stored, transferred, and processed. The images are used as an intermediate to deliver the object’s spatial information required by the determination of the three coordinates. Apparently, it is rather data-inefficient. To tackle this problem, Shi *et al.* proposed a method [10] based on single-pixel imaging [11–15]. Using the “slice theory” of Hadamard transform, the method can track an object in a 2-D plane with two 1-D projections without acquiring a complete image. Z. Zhang *et al.* recently proposed an image-free approach allowing for fast-moving object tracking in real time [16]. However, this approach is only applicable for 2-D tracking. Additionally, the approach requires a reference frame to calculate the absolute position of the target object, which limits its practical use.

In this Letter, we propose an image-free fast-moving object 3-D tracking approach. The proposed approach removes the need for image acquisition and analysis. Different from the existing image-free systems (such as RADAR or LiDAR [17,18], our approach only uses two cost-effective single-pixel detectors and a spatial light modulator for data acquisition. The proposed approach is both data-efficient and computation-efficient, allowing for real-time and long-duration object detection and tracking in a 3-D space. As single-pixel detectors can work at invisible wavebands, the approach might enable hidden object tracking.

The key to our approach is to calculate the position of the object in two orthogonal 2-D projection planes and synthesize the 3-D position. As illustrated by Fig. 1(a), we can derive the 3-D position of the object, $({{x_0},{y_0},{z_0}})$, if the two 2-D projection positions of the object, $({{x_0},{z_0}})$ and $({{y_0},{z_0}})$, in two perpendicular planes, $x- O- z$ and $y- O- z$, are known.

To calculate the position of an object in a scene in a 2-D projection plane, we propose using the two sets of three-step phase-shifting Fourier basis patterns shown in Fig. 1(b) as structured patterns to illuminate the scene and to record the resulting six single-pixel measurements. Considering the 2-D projection plane $x- O- z$, the utilized structured patterns can be expressed by ${P_i}({x,z}) = a + b\cos [{2\pi ({{f_x}x + {f_z}z}) + {\varphi _i}}]$, where $({x,z})$ denotes the spatial coordinate, ${f_x}$ and ${f_z}$ are the spatial frequency at $x$ and $z$ directions, respectively, $a$ is the average intensity, $b$ is the contrast, and the initial phase ${\varphi _i} = {{2\pi ({i - 1})} / 3}$ where $i = 1,2,3.$ As shown in Fig. 1(b), the spatial frequency of the first set of patterns (top row) is $({{f_x} = 1,{f_z} = 0})$, and the spatial frequency of the second set of patterns (bottom row) is $({{f_x} = 0,{f_z} = 1})$. Here the spatial frequency is defined as the number of periods. By illuminating the scene with a pattern ${P_i}({x,z})$, the resulting single-pixel measurement can be expressed as

Assuming that the target object is far smaller than the scene, $O({x,z})$ can be replaced by an impulse function $\delta ({x,z})$. Then Eq. (2) can be further simplified as follows:

According to Eq. (3), it can be seen that the magnitude of the single-pixel measurements depends on the position of the target object. To calculate the absolute position of the object, we can use the single-period patterns shown in Fig. 1(b) for structured illumination. With the single-pixel measurements associated with the first set of patterns, ${x_0}$ can be derived. Specifically, substituting $({{f_x} = 1,{f_z} = 0})$ into Eq. (3), we have

According to trigonometry, the desired ${x_{0}}$ can be calculated through

By looping the six basis patterns for structured illumination, we can derive the position of the target in a 2-D projection plane continuously, and 2-D tracking can be achieved. For 3-D tracking, we proposed to simultaneously conduct 2-D tracking in two orthogonal planes, as Fig. 1(a) shows. It can be done by illuminating the object from two orthogonal directions and collecting the resulting transmitted light with two single-pixel detectors. The final 3-D tracking result is derived by synthesizing the two 2-D tracking results.

We note that the position calculation depicted above is based on the null-background assumption. In most cases, this assumption is difficult to guarantee. If there are other objects in the scene, the tracking result will be inevitably affected. Therefore, background removal is necessary. Fortunately, the background removal in the proposed method is simple. Considering there are multiple objects in the scene and only one is in motion, we can illuminate the scene with the six Fourier basis patterns and record the corresponding single-pixel measurements before the moving object enters the scene. Assuming the resulting single-pixel measurements denoted by ${D_i^\prime}$, the background can be effectively eliminated by replacing ${D_i}$ with ${D_i} - {D_i^\prime}$ in Eq. (5).

We also note that the deduction of Eqs. (3)–(5) is based on the assumption of the target object being an infinite small point. As will be demonstrated, the result is still accurate, even when the size of the object cannot be negligible. Here an accurate result means the calculated position that falls into the boundary of the object. If the 2-D projection of the object in a certain plane is symmetric, the position calculated through Eq. (5) would be the center of the object. Otherwise, the calculated position is away from the object center, but still falls in the boundary of the object. Mathematically, the upper and lower limits of the integration in Eq. (2) are actually the boundary of the target object. By substituting Eq. (2) into Eq. (5), the value of the resulting ${x_0}$, ${y_0}$, or ${z_0}$ would not exceed the boundary of the object.

Except for moving object tracking, the proposed method is also capable of moving object detection. As Fourier transform is a global-to-point transformation, any change in the scene can be detected in the resulting single-pixel measurements. Here we define a calculation of the object position as a tracking frame. The averaged single-pixel measurements in a tracking frame, $\bar D$, is referred to the average of the single-pixel measurements corresponding to the six patterns. If the target moving object is not in the scene, or the target object is not in motion, the resulting $\bar D$ should be invariant with time. Otherwise, remarkable fluctuations of $\bar D$ should be observed. Thus, the stationarity of $\bar D$ can be used as a measure to determine the existence of a moving object. Specifically, we can conclude that there is a moving object in the scene, if the variation (quantified by the standard deviation, for example) of $\bar D$ in the last few tracking frames is higher than some threshold, and vice versa.

We first validate the proposed method through a numerical simulation. A sphere with a radius of 20 pixels moves along a spiral path in a 3-D scene represented by ${512} \times {512} \times {768}$ pixels. As Fig. 2(a) shows, the 3-D trajectory calculated by the proposed method nicely coincides with the real one, although the object moves at a high speed [evidenced by Fig. 2(b)]. As Fig. 2(c) shows, the resulting absolute 3-D tracking error is small in comparison with the object radius (20 pixels). The tracking error is affected by both the speed and the acceleration of the object. In general, the higher the speed or the larger the acceleration, the larger is the error. We also note that the error is remarkably large when the object is entering or exiting the scene.

We further demonstrate the proposed method with a proof-to-concept experiment. The schematic diagram of our experimental setup is shown in Fig. 3(a). A digital micromirror device (DMD) (ViALUX V4395) is used as the spatial light modulator to generate the Fourier basis patterns for structured illumination. The DMD has ${1}{,}{920} \times {1}{,}{080}$ micro-mirrors whose pitch size is 7.6 µm. The DMD operates at the refreshing rate of 10 KHz and loops the two sets of pre-generated patterns in sequence. As shown in Fig. 1(b), the patterns are in a size of ${1}{,}{920} \times {1}{,}{080}$ pixels and dithered [19]. The pattern dithering is to take the advantage of high-speed binary pattern generation given by the DMD. A 12 W white LED is used as the light source. The patterns are displayed on the DMD and projected onto the target object through the projecting lens. In order to generate illumination from two orthogonal directions, we split the illumination beam using a beam splitter with polarization. The resulting two beams are directed onto the object by using mirrors M1 and M2. Two photodiodes, PDA1 (Thorlabs PDA100A-EC) and PDA2 (Thorlabs PDA100A2), are used as the single-pixel detectors to collect the transmitted light. We employ orthogonal linear polarization to eliminate the crosstalk of the two beams. Two polarizers (P1 and P2) are placed in front of PDA1 and PDA2, respectively. The output signals of the PDAs are digitalized by the data acquisition board (DAQ) (National Instruments USB-6366 BNC). The DAQ operates at a sampling rate of 1 MHz.

In this proof-of-concept demonstration, we simulate a fast-moving object by using a metallic hollow ball. The ball is threaded by a bended metallic wire shown in Fig. 3(b). When the ball is released, it goes down along the bended wire due to the gravity. The target object is placed in the image plane of the DMD. We start structured illumination before the ball is released, and we stop the illumination after the ball leaves the illumination area.

In Fig. 4(a), we show the single-pixel measurements obtained by the two single-pixel detectors for the middle 1500 tracking frames which cover the entire process of the target object motion. As the figure shows, $\bar D$ is stationary for the first 0.27 s and the final 0.17 s. This implies that there is no moving object in the scene during these two periods. Therefore, we conclude that the target object enters the scene at the 458th frame and exits at the 1215th. frame. As soon as the target object enters, the magnitude of the resulting single-pixel measurements fluctuates significantly until the object exists. The duration of the target moving object in the scene is 0.45 s.

Once the target moving object is detected, object tracking starts. The averaged single-pixel measurements for the first 10 tracking frames are taken as the background measurement ${D^\prime _i}$. With the background removal applied, we use the single-pixel measurements recorded by PDA1 to calculate the object position in the plane $x- O- z$, deriving $x$ and $z$, as Fig. 4(b) shows. Likewise, we use the single-pixel measurements recorded by PDA2 to calculate the object position in the plane $y- O- z$, deriving $y$ and $z$, as Fig. 4(c) shows. The final 3-D tracking result is derived by synthesizing the two 2-D tracking results. We note that, as $z$ is duplicated in the two 2-D tracking results, the 3-D tracking result uses the average of $z$. As shown in Figs. 4(d) and 4(e), the 3-D tracking result (also see Visualization 1) coincides with the bended wire shown in Fig. 3(b), which validates the tracking results. The achievable tracking frame rate is 1,666 frame/s, because the refreshing rate of the DMD is 10 KHz, and each 3-D positioning takes six structured patterns. In addition, the proposed method allows us to measure the speed of the object. According to the pixel size in the image plane being 24.3 µm, the value of the average speed measured is 0.366 m/s. Considering the field of view is $47\; {\rm mm} \times 26\; {\rm mm}$, the object moves at a high speed.

The proposed method is computation-efficient, allowing for real-time object tracking. Statistically, it takes only 0.69 s to calculate the object position for ${{10}^6}$ times (Intel 8700 K 3.7 GHz CPU, 24 GB RAM, and MATLAB 2014a). In other words, the average computation time for each tracking frame is 0.69 µs, shorter than the data acquisition time (600 µs). As the refreshing rate of the utilized DMD is the bottleneck, using a higher-speed spatial light modulator (such as a high-speed LED matrix [20,21] might further improve the tracking frame rate.

The proposed method is also data-efficient, allowing for object tracking for a long duration. The method only uses 12 single-pixel measurements (six measurements for each plane) for each object 3-D position calculation, which is far more data-efficient than the image-based approaches. Such data efficiency can effectively avoid the issue of data accumulation, which makes long-duration object tracking possible. The proposed method can potentially combine with the method recently reported by Jiang *et al.* [22], achieving high-speed moving object imaging even when the moving trajectory of the object is unknown. The use of single-period Fourier basis patterns for structured illumination is the key for absolute positioning. As indicated by Eq. (5), the argument operator might cause fringe order ambiguity (also known as ${2}\pi$ ambiguity), if the period of the utilized patterns is higher than one.

We acknowledge that the proposed method, at its current stage, is limited to single object tracking. Thus, our future work is to extend the method to multiple object tracking.

In conclusion, we propose an image-free 3-D tracking method. The key to the method is to acquire the 3-D spatial information of the target object in two orthogonal 2-D projection planes. The method is low-cost, working with only a spatial light modulator and two single-pixel detectors. To calculate the position of the object, it only uses six single-period Fourier basis patterns and takes six dual-pixel measurements. The proposed method is both data- and computation-efficient, allowing for real-time and long-duration object tracking. Given the advantages of wide working spectral range by single-pixel detectors, the proposed method might find potential applications in hidden object tracking.

## Funding

National Natural Science Foundation of China (61875074, 61905098); Fundamental Research Funds for the Central Universities (21618307).

## Disclosures

The authors declare no conflicts of interest.

## REFERENCES

**1. **S. T. Park and J. G. Lee, IEEE Trans. Aerosp. Electron. Syst. **37**, 727 (2001). [CrossRef]

**2. **A. Mauri, R. Khemmar, B. Decoux, N. Ragot, R. Rossi, R. Trabelsi, R. Boutteau, J. Ertaud, and X. Savatier, Sensors **20**, 532 (2020). [CrossRef]

**3. **S. Scheidegger, J. Benjaminsson, E. Rosenberg, A. Krishnan, and K. Granström, in *IEEE Intelligent Vehicles Symposium (IV)* (IEEE, 2018), p. 433.

**4. **R. Muñoz-Salinas, E. Aguirre, and M. García-Silvente, Image Vision Comput. **25**, 995 (2007). [CrossRef]

**5. **N. Cohen, S. Yang, A. Andalman, M. Broxton, L. Grosenick, K. Deisseroth, M. Horowitz, and M. Levoy, Opt. Express **22**, 24817 (2014). [CrossRef]

**6. **P. Memmolo, L. Miccio, M. Paturzo, G. Di Caprio, G. Coppola, P. A. Netti, and P. Ferraro, Adv. Opt. Photonics **7**, 713 (2015). [CrossRef]

**7. **X. Yu, J. Hong, C. Liu, and M. K. Kim, Opt. Eng. **53**, 112306 (2014). [CrossRef]

**8. **P. Annibale, A. Dvornikov, and E. Gratton, Biomed. Opt. Express **6**, 2181 (2015). [CrossRef]

**9. **F. O. Fahrbach, F. F. Voigt, B. Schmid, F. Helmchen, and J. Huisken, Opt. Express **21**, 21010 (2013). [CrossRef]

**10. **D. Shi, K. Yin, J. Huang, K. Yuan, W. Zhu, C. Xie, D. Liu, and Y. Wang, Opt. Commun. **440**, 155 (2019). [CrossRef]

**11. **M. P. Edgar, G. M. Gibson, and M. J. Padgett, Nat. Photonics **13**, 13 (2019). [CrossRef]

**12. **M. Sun and J. Zhang, Sensors **19**, 732 (2019). [CrossRef]

**13. **Z. Zhang, X. Ma, and J. Zhong, Nat. Commun. **6**, 6225 (2015). [CrossRef]

**14. **Z. Zhang, X. Wang, G. Zheng, and J. Zhong, Opt. Express **25**, 19619 (2017). [CrossRef]

**15. **Z. Zhang, S. Liu, J. Peng, M. Yao, G. Zheng, and J. Zhong, Optica **5**, 315 (2018). [CrossRef]

**16. **Z. Zhang, J. Ye, Q. Deng, and J. Zhong, Opt. Express **27**, 35394 (2019). [CrossRef]

**17. **U. Wandinger, in *Lidar* (Springer, 2005), p. 1.

**18. **P. Morton, B. Douillard, and J. Underwood, in *Australasian Conference on Robotics & Automation (ACRA)* (2011).

**19. **Z. Zhang, X. Wang, G. Zheng, and J. Zhong, Sci. Rep. **7**, 12029 (2017). [CrossRef]

**20. **Z. H. Xu, W. Chen, J. Penuelas, M. Padgett, and M. J. Sun, Opt. Express **26**, 2427 (2018). [CrossRef]

**21. **E. Balaguer, P. Carmona, C. Chabert, F. Pla, J. Lancis, and E. Tajahuerce, Opt. Express **26**, 15623 (2018). [CrossRef]

**22. **W. Jiang, X. Li, X. Peng, and B. Sun, Opt. Express **28**, 7889 (2020). [CrossRef]