## Abstract

As an accurate and efficient shape measurement method, fringe-projection-based three-dimensional (3D) reconstruction has been extensively studied. However, patchwise point cloud registration without extra assistance is still a challenging task. We present a flexible and robust self-registration shape measurement method based on fringe projection and structure from motion (SfM). Other than ordinary structured-light measurement devices in which the camera and the projector are rigidly connected together, the camera and the projector in our method can be moved independently. An image-capturing scheme and underlying image-matching strategy are proposed. By selectively utilizing some sparse correspondence points across the fringe images as virtual markers, the global positions of the camera and the projector corresponding to each image are calculated and optimized under the framework of SfM. Dense global 3D points all over the object surface are finally calculated via forward intersection. Experimental results on different objects demonstrate that the proposed method can obtain a self-registered 3D point cloud with comparable accuracy to the state-of-the-art techniques by using only one camera and one projector, requiring no post-registration procedures and no assistant markers.

© 2020 Optical Society of America

## 1. INTRODUCTION

Accurate three-dimensional (3D) reconstruction from images is widely applied in computer graphics, reverse engineering, industrial inspection, medical engineering, and many other fields. A fringe-projection-based 3D reconstruction method is successfully employed in 3D measurements due to its advantages such as high accuracy, non-contact setup, and low cost [1]. A typical fringe projection measurement system consists of a computer, a camera, and a projector. The projector, rigidly connected to the camera, is in charge of projecting the fringe pattern onto an object. Ahead of the measurement procedure, a calibration procedure is needed to establish the correspondence between the image planes of the projector and the camera via the phase information involved in the fringe patterns [2,3]. During the measurement process, the camera captures the modulated fringe patterns projected onto the object surface, while the computer decodes the phase of the captured images and produces dense 3D points.

A single measurement of the ordinary fringe-projection-based system usually can only acquire 3D points on part of the object surface due to the limitation of the system-working volume and the self-occlusion of the object [4]. Therefore, the projector-camera device should be moved to different positions relative to the object to perform multiple measurements. However, the multiple patches of point clouds are in different coordinate frames and need to be correctly registered into a uniform coordinate system to generate an integral 3D shape.

Although having been extensively studied, point cloud registration is still not a trivial task. A commonly used method in practice is to adhere visual markers on or near the object surface [5]. By aligning the 3D coordinates of three or more shared visual markers, two adjacent patches of point clouds can be easily registered together. However, as the number of local point clouds increases, registration errors accumulate. In response to this problem, Reich *et al.* [6] reconstructs the 3D coordinates of all the visual markers in a global coordinate frame in advance under the structure from motion (SfM) [7,8] framework. The reconstructed coordinates of these visual markers serve as global references to reduce registration error. This method requires an extra camera to take images of the markers and a prior procedure to establish the global reference points. At present, almost all the commercial structured-light devices resort to applying cooperative markers for registration, which is a strong limitation in practice. Except for the preparation work before measurement and the tedious marker removal work after the measurement takes place, another drawback of using markers lies in the fact that the areas covered by the markers cannot be measured, leaving undesirable holes in the reconstructed point clouds. Even if the data loss caused by the markers can be alleviated to some extent by interpolation, the detail geometry under the markers can hardly be recovered. This is unacceptable for the measurement of objects with rich details. Moreover, placing markers on the object is infeasible for in-process industrial measurements, since placing and removing markers will inevitably break off the production process. For some special objects, such as cultural relics, placing markers (particularly if using an adhesive) on the surface could be prohibited.

Some researchers have developed point cloud registration algorithms without using markers [9,10]. Most of these post-processing algorithms are based on the detection of common features in the overlapping region of the adjacent point clouds. The basic framework of these methods usually consists of two steps. First, a rough estimate of the coordinate transformation between the underlying point sets is calculated from the common features. Then, the result is fine-tuned by using the iterative closest point (ICP) algorithm [11] or its varieties. The registration effect of this type of method strongly depends on the existence of detectable common features. However, many industrial parts do not meet this prerequisite. Therefore, this registration strategy is rarely used in industrial applications. Visual simultaneous localization and mapping (visual SLAM) can potentially help those situations by locating the trajectory of the camera and reconstructing the 3D structure of the scene from sequence images [12]. However, visual SLAM pays more attention to real-time performance rather than the high-precision of the reconstruction. In addition, the performance of visual SLAM is poor with textureless scenes.

To avoid the post-processing registration procedure, there have been some efforts to develop a multi-view structured-light system for 3D reconstruction. Ordinary multi-view structured-light systems consist of multiple cameras and projectors, each of the cameras and projectors being placed in a fixed position, respectively. A prior calibration procedure is required to form a complete measurement field [13,14]. This kind of solution is limited by the number of cameras and projectors, therefore, it is not suitable for the measurement of large and complex objects. In Ref. [15,16], the flexibility of the multi-view structured-light system is improved by partly allowing the cameras or the projectors to be moved. However, cameras or projectors must be fixed during the entire measurement process in these methods. Therefore, they can hardly be used to measure complex objects either, especially objects enclosed in 360 deg.

In this work, we propose a flexible self-registration shape measurement method that can directly produce dense and accurate 3D points all over the entire object surface in a uniform coordinate system. Only one camera and one projector are needed, requiring no post-processing procedure or cooperative markers for the registration. Unlike regular fringe-projection-based shape measurement devices or existing multi-view structured-light systems, both the camera and the projector in our method can be independently moved to acquire a serial of modulated fringe images that jointly cover the whole surface to be measured. The crucial problem involved is the establishment of correspondences among the fringe images, since the phase information of the same point on the object surface would be changed when the projector is moved. To address this problem, an effective image-capturing scheme and a novel image-matching strategy are proposed. After decoding the phase information involved in the fringe patterns, we define some virtual markers in the images. Based on the correspondence points of the virtual markers, the global positions of the camera and the projector corresponding to each of the images are calculated and optimized under the framework of SfM. Finally, dense 3D points all over the object surface in the global coordinate frame are reconstructed via forward intersection. To the best of our knowledge, no similar ideas or enabling techniques are available at present.

## 2. FRINGE PROJECTION THEORY

A digital projector can be regarded as an inverse camera [17]. The projection chip (digital micro-mirror device, DMD) takes the role of an image plane, just like the CCD or complementary metal oxide semiconductor (CMOS) in a camera. Establishing the correspondence between camera pixels and projector pixels is critical to the proposed method. We use the phase-shift technique to complete this task. Since the image is a two-dimensional (2D) signal, vertical and horizontal fringe patterns are utilized. The intensity of the fringe patterns can be represented by

where $(x,y)$ is the pixel coordinate, $A$ is the average intensity, $B$ is the intensity magnitude, ${I_n}$ is the recorded intensity, $N$ is the total number of phase-shift steps, $n$ is the phase-shift index, and $\varphi (x,y)$ is the desired relative phase [18], which can be computed asA multi-frequency phase-shifting technique [19], which stems from the heterodyne principle, is applied for phase unwrapping in our implementation.

## 3. METHODOLOGY

#### A. Method Overview

The setup of the proposed method consists of a camera, a projector, and the static object to be measured, as shown in Fig. 1. The relative position of the camera, the projector, and the object are properly adjusted to make sure that the camera and the projector “view” a common region of the object surface. Then, a set of fringe patterns are projected onto the object surface, while the camera captures the modulated fringe images. After that, either the camera or the projector can be moved to another place around the measured object to enlarge the measurement region, while the other one maintains its position. The new relative position should also ensure that the camera and the projector have an overlapping field of view. A new set of modulated fringe images is then captured. These procedures are repeated until the whole surface of the object is scanned.

The projector in the $i$th position is denoted by ${P_i}(i = 1,2, \ldots ,{N_P})$, where ${N_P}$ represents the total number of projector positions. Suppose the camera move ${N_i}$ times when the projector is in the $i$th position, then we denote the camera in the ${N_i}$ positions as $C_i^j(j = 1,2, \ldots {N_i})$. The modulated fringe images captured by the camera $C_i^j$ are denoted as ${\mathbb C}_i^j$. The fringe patterns projected by the projector can be regarded as a set of unmodulated images and denoted as ${\mathbb P}$. Considering that when the projector remains fixed, the modulated fringe patterns on the object surface are unchanged no matter how many times the camera moves, we take ${P_i}$ and $C_i^j$ into the $i$th group, as shown in Fig. 1. In this image-capturing scheme, there are ${N_P}$ projector positions and ${N_c} = \sum\nolimits_{i = 1}^{{N_P}} {{N_i}}$ camera positions in total, which need to be accurately determined in the following 3D reconstruction process. It is worth noting that image sets ${\mathbb C}_i^{N_i}$ and ${\mathbb C}_i^{N_i}$ correspond to the same camera position.

Taking the ${N_c}$ groups of fringe images as input, the proposed method is largely composed of three components, namely phase decoding, self-registration of multi-views, and dense 3D point reconstruction as illustrated in Fig. 2. The phase information is extracted from the modulated fringe image sets to assist in establishing image correspondence points among ${P_i}$ and $C_i^j$. Sparse virtual markers are sampled and matched to construct geometric constraints of the camera and the projector in all positions. In the self-registration stage, the global positions of ${P_i}$ and $C_i^j$ are estimated under the framework of SfM by utilizing the virtual markers. Finally, the estimated global positions of the camera and the projector, as well as the dense image correspondence points, are used to automatically reconstruct a complete point dataset of the entire object surface in a unified coordinate system.

#### B. In-Group Dense Matching

The core of 3D reconstruction is how to accurately match the image locations of the same spatial point across different images [20]. In our solution, the projector joins in the global 3D reconstruction just like a camera. For each group of image sets ${\mathbb C}_i^{j}$, a dense in-group correspondence map is established in the following way.

For an arbitrary pixel $({u^c},{v^c})$ on the image plane of $C_i^j$, its vertical and horizontal absolute phases are first calculated from the modulated images ${\mathbb C}_i^{j}$, denoted as $({\phi ^h},{\phi ^v})$. The correspondence location on the image plane of is directly computed by

#### C. In-Group Sparse Virtual Markers Matching

For establishing a fully constrained image network for the global 3D reconstruction, correspondence regions across all the image planes of $C_i^j$ and ${P_i}$ are indispensable. It’s obvious that the matched pixels should share the same phase information within each group. For arbitrary $C_i^j$ and $C_i^l$ in the $i$th group, where $j,l \in \{1,2, \ldots ,{N_i}\}$ and $j \ne l$, the correspondence $C_i^j \to C_i^l$ could be directly obtained by searching pixels that have the same absolute phases in both horizontal and vertical directions; this 2D searching task is relatively time-consuming.

Different from the sparse feature points (such as scale invariant feature transform (SIFT) and speeded-up robust features (SURF)) in SfM, fringe projection usually obtains a large number of dense-matched pixels, which is detrimental to the efficiency of SfM. Therefore, we sampled these densely matched pixels to obtain sparse pixels, called virtual markers. These virtual markers instead of the feature point are employed in the SfM framework. Specifically, as shown in Fig. 3, the virtual markers are the cross points of the regular image grid with a spacing of $d$ pixels in our method. The position of an arbitrary virtual marker on the projector image plane is denoted by $(u_m^p,v_m^p)$, whose absolute phase calculated from the image set ${\mathbb P} $ is denoted by $(\phi _m^v,\phi _m^h)$. In general situations, one cannot find an integer pixel on the image plane of $C_i^j$ with exactly the same absolute phase $(\phi _m^v,\phi _m^h)$. Instead, $(u_m^p,v_m^p)$ would correspond to a subpixel location, if available. We search multiple integer pixels in $C_i^j$, whose absolute phases are close enough to $(u_m^p,v_m^p)$. Then, the subpixel correspondence locations are determined using quadratic interpolation. In this way, we build up the correspondence points of virtual markers among all the in-group image planes.

#### D. Across-Group Matching

As shown in Fig. 4, when the position of projector is moved from ${P_i}$ to ${P_{i + 1}}$, the fringe images cast on the same surface are changed. Therefore, the absolute phase of any given point on the object will also be different, which would undoubtedly result in the failure of the phase-based image matching across groups. To represent the next group in a unified coordinate system, it is necessary to match the pixel between ${P_i}$ and ${P_{i + 1}}$. This is achieved by recording the phase changes with the camera. As shown in Fig. 4, the camera captures two sequences of fringe patterns at the same position $C_i^{{N_i}}$ (before the movement of the projector) and $C_{i + 1}^1$ (after the movement of the projector).

Taking the ${N_c}$ sequential modulated image sets as the only input, our algorithm needs to determine when the projector has been moved. In other words, our algorithm needs to group the images at first. Without loss of generality, suppose the camera in the $n$th position, namely ${C^n}$, has been categorized in the $i$th group, we temporarily assume that the camera in the next position ${C^{n + 1}}$ is also in the $i$th group. Then, the correspondence regions between ${C^n}$ and ${C^{n + 1}}$ can be obtained according to the method introduced in Section 3.C. The number of correspondence regions is denoted by $l$. The matched virtual makers are used to estimate the fundamental matrix $F$ between ${C^n}$ and ${C^{n + 1}}$ based on the eight-point algorithm [21] and the random sample consensus (RANSAC) [22]. The number of the outlier correspondences that do not satisfy the epipolar geometry constraint [23] is counted and denoted as $m$. Next, the ratio of the outliers to the total number of correspondence points is defined as $\delta = m/l$. If $\delta \lt \lambda$, where $\lambda$ is a threshold, the assumption is valid, and ${C^{n + 1}}$ does belong to the $i$th group. Otherwise, ${C^{n + 1}}$ belongs to the ($i + 1$)th group. We set $\lambda = 0.2$ in our experiments considering the existence of a small number of outlier points; even the two positions belong to the same group. When the camera and projector are moved in controlled patterns, as the camera needs to take two images whenever the projector moves, the step of grouping images is not required.

Having grouped up the modulated images, the image sets ${\mathbb C}_i^{j}$ and ${\mathbb C}_i^{j}$ are exploited to establish correspondence across adjacent groups. As we can see in Fig. 1, the camera positions of $C_i^{{N_i}}$ and $C_{i + 1}^1$ coincide together. Therefore, the image pixel $({u^{{c_i}}},{v^{{c_i}}})$ of $C_i^{{N_i}}$ and the same image pixel $({u^{{c_{i + 1}}}},{v^{{c_{i + 1}}}})$ of $C_{i + 1}^1$ correspond to the same spatial point $X$, as shown in Fig. 4. This means we can establish the bidirectional correspondence map $C_i^{{N_i}} \leftrightarrow C_{i + 1}^1$ without doing anything. The corresponding projector pixel $({u^{{p_i}}},{v^{{p_i}}})$ of ${P_i}$, as well as $({u^{{p_{i + 1}}}},{v^{{p_{i + 1}}}})$ of ${P_{i + 1}}$, can be directly computed by Eq. (3). The computed $({u^{{p_i}}},{v^{{p_i}}})$, $({u^{{p_{i + 1}}}},{v^{{p_{i + 1}}}})$ are directly matched, though their phase values are distinct. Hence, the correspondences $C_i^{{N_i}} \to {P_i}$, $C_{i + 1}^1 \to {P_{i + 1}}$, and $P_i^{} \to {P_{i + 1}}$ can be built up. By jointly using the in-group matching in Sections 3.B and 3.C, we eventually construct the geometric constraints among $C_i^j$ and ${P_i}$, in and across the groups.

#### E. Self-Registration via SfM

For achieving global 3D reconstruction, all the positions of the camera and the projector need to be determined in a global coordinate frame. This process is called self-registration in this paper. We assume that the intrinsic parameters, including distortion coefficients, of the camera and the projector have been calibrated in advance, respectively. The readers are referred to Ref. [24,25] for detail methods on camera and projector calibration. We estimate the projector positions and camera positions in a global coordinate system under the framework of SfM.

First, the initial two-view geometry is established between the projector in position ${P_1}$ and the camera in position $C_1^1$. The fundamental matrix $F$ and essential matrix $E$ between the two image planes are estimated based on the virtual markers available in $C_1^1$. The coordinate system of ${P_i}$ is taken as the global coordinate system, and the coordinate transformation matrix between $C_1^1$ and ${P_1}$ is recovered by decomposing the essential matrix $E$ [26]. The 3D coordinates of the virtual markers available ($C_1^1$ in ${P_1}$) are then reconstructed via triangulation. The remaining $C_i^j$ and ${P_i}$ are incrementally registered by solving the perspective-n-point (PnP) problem [27]. The image correspondence points are obtained based on the phase information in the structured-light patterns. The precision of image correspondence regions is high, and the number of matched pixels is large, therefore, it is a good initial value to preliminarily determine the position of the camera and the projector.

The positions of the camera and the projector incrementally obtained in the previously mentioned procedure are not accurate enough. Therefore, they are just used as the initial values, and a bundle adjustment refinement [28] is then employed to iteratively adjust camera parameters, projector parameters, and 3D points simultaneously, which minimizes the following reprojection errors:

Finally, the minimization is solved by the Levenberg–Marquardt (LM) algorithm [29]. A sparse Jacobian matrix is designed to speed up numerical derivative computation. A scale with known length is placed in the scene during the image-capturing process in our implementation to restore the absolute matrix of the entire geometry structure as required for an SfM application.

#### F. Dense 3D Reconstruction

So far, we have obtained global camera positions $C_i^j$, projector positions ${P_i}$, refined camera parameters, and projector parameters. Each position of the camera can be combined with the projector to form a binocular system. The dense correspondence areas between the camera image and the projector image are established based on the phase information which is described in Section 3.B. Using these correspondences and the estimated positions, the world coordinates are obtained by triangulation. Finally, the dense point cloud is acquired in a unified coordinate system.

## 4. EXPERIMENT AND RESULTS

#### A. Experiment Settings

In this section, two types of experiments were demonstrated to verify the performance of the proposed method. A digital light processing (DLP) projector (DLP Light crafter 4500, Texas Instruments) with resolution of $1140 \times 912 $, a CMOS camera (AVT, Germany) with image resolution of $2040 \times 2040$, and a 35 mm lens (Schneider, German) were used in the experiments as shown in Fig. 5. The non-linear gamma effect of the projector had been pre-corrected by the manufacturer. A total of 24 fringe patterns (with three frequencies and four phase shifts, respectively, in horizontal and vertical directions) were projected onto the object in the same position. The open source Ceres-solver [30] was used to solve the bundle adjustment optimization. The software for completing the whole experimental process was installed in a personal computer with a 3.1 GHz central processing unit and 8 G RAM.

#### B. Accuracy Evaluation Experiment

The first experiment was conducted to verify the accuracy of the proposed method. Figure 6(a) showed the first object being measured, which contained five fine-grinded stepped planes, denoted as plane 1 to 5, respectively. The object sizes in the three orthogonal directions were $210 \times 230 \times 40 \; {\rm mm}$. In this experiment, the projector and camera were placed randomly to capture the modulated fringe images of the five planes. The top and side views of the final reconstructed point cloud were shown in Fig. 6(b). In order to compare the effect of the traditional projector-camera system fairly, the camera and the projector were maintained in the same position and calibrated using the method in [24]. Then, the same group of modulated fringe images were used to reconstruct the plane through the calibration parameters.

The distance between two planes was utilized to show the accuracy of the method (as shown in Table 1, where ${d_{\textit{ij}}}$ represented the distance between plane $i$ and plane $j$). The real values were acquired by the three coordinate measuring machine. The mean error of the proposed method was 0.046 mm, which was similar to the traditional method of 0.049 mm. The comparison results demonstrated that the proposed self-registration method can achieve comparable accuracy with the traditional method.

#### C. Complex Curved Shape Reconstruction

In the second experiment, we verified the proposed method by reconstructing curved surfaces without any markers. The object to be reconstructed was a plaster vase, as shown in Fig. 5(a). Since the vase shape was closed 360 deg, a multi-view measurement was necessary to acquire the whole shape of the surface. With our method, the camera and the projector were independently moved around the markerless vase to take multiple groups of modulated fringe images. More specifically, the position of the projector was changed six times, and the camera was also changed six times. So, there were 11 groups of modulated fringe images captured. With these acquired images, our program first established all the global positions of the camera and the projector. The result of the self-registration is shown in Fig. 7.

In order to demonstrate the quality of the reconstructed 3D points, a commercial binocular structured-light scanner, namely ATOS, was employed to scan the same vase. In this experiment, ATOS was moved 12 times following the same scanning trajectory as much as possible. The 12 point cloud patches generated by the ATOS system had to be registered into a single complete model with the aid of the cooperative markers adhered on the vase surface as shown in Fig. 8(b). For the comparison purpose, the model directly output by our self-registration method was illustrated in Fig. 8(a), while the marker-aided registration results output by ATOS was illustrated in Fig. 8(b). We focused on comparing the completeness of the point cloud data by showing the three typical angles of the reconstructed vase. It can be observed that our method directly output a complete point cloud model, and the geometry details on the surface were well presented. From Fig. 8(b), we can see the undesired holes in the model acquired from ATOS. The first reason was that the region with the markers can’t be reconstructed; another reason was caused by the difficulty of looking at the same point with three devices (two cameras, one projector).

In current situation, these undesired holes can be interpolated to correct them. As shown in the close-up of Fig. 8(b), we filled up the hole using Geomagic software to see whether the data loss resulted from the markers can be well recovered by post-interpolation. The close-up of Fig. 8(a) for the same region is in the red rectangle. From the comparison, we can intuitively observe that the detail geometry under the markers cannot be recovered by post-processing, while the point cloud directly output by our method resulted in much better details.

To quantitatively evaluate the performance of our self-registration method, we listed the time for each step of the vase reconstruction. As shown in Table 2, our method produced the final reconstruction in about 29.06 s, of which approximately 70% was consumed in phase decoding. Phase decoding and dense reconstruction took a fixed amount of time, while the time for self-registration was variable. The time for self-registration depended on the number of virtual markers, which were determined by the pixel spacing $d$. In this experiment, $d$ is set to 7, and there were a total of 11373 virtual markers.

Further, we systematically evaluated the effect of pixel spacing $d$ on the time for self-registration. Specifically, we selected the different values for the number of pixel spacing and recorded the time of self-registration. The result is shown in Fig. 9. It was obvious that time decreased as $d$ increased, especially when $d$ increased from 5 to 7. The main reason was the reduction of virtual markers, which resulted in faster convergence in the self-registration process. When $d$ exceeded 13, self-registration would fail due to the lack of enough correspondence matching in this experiment.

## 5. DISCUSSION

The experimental results in Section 4 showed that the proposed self-registration 3D reconstruction framework had the following advantages compared with traditional structured-light devices.

Traditional systems often required special objects, e.g., checkerboards for the calibration stage. The proposed method used a camera and a projector with an arbitrary relative location to achieve comparable accuracy of the carefully calibrated projector-camera system. However, the accuracy performance was not the highlight of our method. The main contribution of the proposed method lies in its ability to directly obtain a self-registered 3D point cloud, requiring no post-registration procedures or assistant markers. This was critical in practice.

Compared with the regular structured-light devices, our method improved flexibility by allowing the projector and the camera be moved independently. Due to the flexibility of the method, our method can obtain more complete data than rigid scanning systems when the scanning trajectory of the system was the same. Taking the vase measurement in Section 4.C as an example, the fixed binocular structured-light system required more movement to obtain the complete vase model.

After capturing the images required, the method outputted point clouds all over the entire surface in a uniform coordinate frame automatically without any user intervention. Since markers were not needed, there was no loss of data. Moreover, the size of the optimization problem for self-registration was greatly reduced by utilizing the sparse virtual markers.

Bidirectional fringe projection was mandatory in our method as the feature correspondence requires orthogonal phases. Compared with a regular camera-projector 3D imaging system that uses only unidirectional fringe projection, our method performs both image capturing and phase decoding. Calibrated camera-projectors arranged in a rigid structure result in epipolar constraints during 3D reconstruction, whereas a system allowing for free position changes between the camera and projector have no such constraints. This was the cost of the proposed method for achieving the capability of self-registration. Actually, other structural light patterns can also be used in the proposed method due to its ability to establish image correspondence, such as 2D grid patterns [31], Fourier transform phase demodulation [32], and four fringe patterns [33]. When using these structural light patterns with lower number images to capture, the image acquisition time is reduced. In addition, the projector hardware can now project fringe patterns at a very high frame rate. The faster the frame rate, the less time it took to project the same amount of structured-light patterns which reduces our image acquisition time. According to the projection frame rate reported in Ref. [33], 24 fringe images (with three frequencies four phase shifts) can be automatically projected within 20 ms.

It should be noted that the proposed method cannot reconstruct dynamic or non-rigid objects, because the objects must be static during the image acquisition process.

## 6. CONCLUSIONS AND FUTURE WORK

A self-registration shape measurement method based on fringe projection and SfM is proposed. Only an off-the-shelf DMD projector and an off-the-shelf digital camera are required. The camera and the projector can be independently moved for projecting and capturing fringe images on the entire object surface. Due to the effective image-capturing scheme and the delicate image-matching strategy proposed, accurate dense points all over the object surface can be directly reconstructed in a uniform coordinate frame without using any physical markers or post-registration procedures. Since all the external parameters of the camera and the projector, as well as the 3D coordinates of the virtual markers, are optimized together, the reconstruction errors are well balanced. The experimental results show that the proposed method can achieve stable accuracy, and it displays good performance in dealing with 360 deg measurement of the 3D shape of the object. In general, the proposed method can be used in situations where placing cooperative markers on the object for registration is not acceptable.

At present, each two-view correspondence $C_i^j \to {P_i}$ is respectively used for the final dense 3D reconstruction. However, the same 3D point on the object generally appears in multiple views. Forward intersection of the rays from multiple views is promising in improving accuracy. This will be one of our future studies. A potential application scenario of our method, which we also intend to perform in the future, is to carry the camera and the projector respectively by robotic arms to predefined positions so as to automate the in-process shape tolerance verification. In this scenario, the registration accuracy of our method does not depend on the movement accuracy of the robotic arms at all, which obviously outperforms the existing registration methods that depend on mechanical assist devices.

## Funding

National Natural Science Foundation of China (51575276).

## Disclosures

The authors declare no conflicts of interest.

## REFERENCES

**1. **M. Ha, C. Xiao, D. Pham, and J. Ge, “Complete grid pattern decoding method for a one-shot structured light system,” Appl. Opt. **59**, 2674–2685 (2020). [CrossRef]

**2. **B. Breuckmann and W. Thieme, “Computer-aided analysis of holographic interferograms using the phase-shift method,” Appl. Opt. **24**, 2145–2149 (1985). [CrossRef]

**3. **R. Chen, J. Xu, and S. Zhang, “A self-recalibration method based on scale-invariant registration for structured light measurement systems,” Opt. Lasers Eng. **88**, 75–81 (2017). [CrossRef]

**4. **T. H. Lin, “Automatic 3D color shape measurement system based on a stereo camera,” Appl. Opt. **59**, 2086–2096 (2020). [CrossRef]

**5. **M. Franaszek, G. S. Cheok, and C. Witzgall, “Fast automatic registration of range images from 3D imaging systems using sphere targets,” Autom. Constr. **18**, 265–274 (2009). [CrossRef]

**6. **C. Reich, R. Ritter, and J. Thesing, “3-D shape measurement of complex objects by combining photogrammetry and fringe projection,” Opt. Eng. **39**, 224 (2000). [CrossRef]

**7. **J. L. Schonberger and J. M. Frahm, “Structure-from-motion revisited,” in *IEEE Conference on Computer Vision and Pattern Recognition* (2016), pp. 4104–4113.

**8. **N. Snavely, S. M. Seitz, and R. Szeliski, “Photo tourism: exploring photo collections in 3D,” ACM Siggraph. **25**, 835–846 (2006). [CrossRef]

**9. **R. B. Rusu, N. Blodow, and M. Beetz, “Fast point feature histograms (FPFH) for 3D registration,” in *IEEE Conference on Robotics and Automation* (2009), pp. 3212–3217.

**10. **Z. Xie, S. Xu, and X. Li, “A high-accuracy method for fine registration of overlapping point clouds,” Image Vis. Comput. **28**, 563–570 (2010). [CrossRef]

**11. **P. J. Besl and N. D. McKay, “Method for registration of 3-D shapes,” Proc. SPIE **1611**, 586–606 (1992). [CrossRef]

**12. **K. Yousif, A. Bab-Hadiashar, and R. Hoseinnezhad, “An overview to visual odometry and visual SLAM: applications to mobile robotics,” Intell. Ind. Syst. **1**, 289–311 (2015). [CrossRef]

**13. **F. J. M. Cuevas, R. M. Salinas, and M. Jimenez, “Simultaneous reconstruction and calibration for multi-view structured light scanning,” J. Vis. Commun. Image Represent. **39**, 120–131 (2016). [CrossRef]

**14. **R. Garcia and A. Zakhor, “Geometric calibration for a multi-camera-projector system,” in *IEEE Workshop on Applications of Computer Vision* (2013), pp. 467–474.

**15. **D. G. Aliaga and Y. Xu, “A self-calibrating method for photogeometric acquisition of 3D objects,” IEEE Trans. Pattern Anal. Mach. **32**, 747–754 (2010). [CrossRef]

**16. **Ž. Santoši, I. Budak, V. Stojaković, M. Šokac, and Đ. Vukelić, “Evaluation of synthetically generated patterns for image-based 3D reconstruction of texture-less objects,” Measurement **147**, 106883 (2019). [CrossRef]

**17. **S. Zhang and P. S. Huang, “Novel method for structured light system calibration,” Opt. Eng. **45**, 083601 (2006). [CrossRef]

**18. **T. Chen, H. P. Seidel, and H. P. Lensch, “Modulated phase-shifting for 3D scanning,” in *IEEE Conference on Computer Vision and Pattern Recognition* (2008), pp. 1–8.

**19. **Y. Wang and S. Zhang, “Superfast multifrequency phase-shifting technique with optimal pulse width modulation,” Opt. Express **19**, 5149–5155 (2011). [CrossRef]

**20. **T. Kanade and M. Okutomi, “A stereo matching algorithm with an adaptive window: theory and experiment,” IEEE Trans. Pattern Anal. Mach. Intell. **16**, 920–932 (1994). [CrossRef]

**21. **R. I. Hartley, “In defense of the eight-point algorithm,” IEEE Trans. Pattern Anal. Mach. Intell. **19**, 580–593 (1997). [CrossRef]

**22. **M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Commun. ACM **24**, 381–395 (1981). [CrossRef]

**23. **Z. Zhang, “Determining the epipolar geometry and its uncertainty: a review,” Int. J. Comput. Vis. **27**, 161–195 (1998). [CrossRef]

**24. **Z. Huang, J. Xi, and Y. Yu, “Accurate projector calibration based on a new point-to-point mapping relationship between the camera and projector images,” Appl. Opt. **54**, 347–356 (2015). [CrossRef]

**25. **Z. Zhang, “A flexible new technique for camera calibration,” IEEE Trans. Pattern Anal. Mach. Intell. **22**, 1330–1334 (2000). [CrossRef]

**26. **R. Wu, D. Zhang, and Q. Yu, “Health monitoring of wind turbine blades in operation using three-dimensional digital image correlation,” Mech. Syst. Sig. Process. **130**, 470–483 (2019). [CrossRef]

**27. **V. Lepetit, F. Moreno-Noguer, and P. Fua, “EPNP: an accurate O(n) solution to the PnP problem,” Int. J. Comput. Vis. **81**, 155–166 (2009). [CrossRef]

**28. **B. Triggs, P. F. McLauchlan, and R. I. Hartley, “Bundle adjustment—a modern synthesis,” in *International Workshop on Vision Algorithms* (1999), pp. 298–372.

**29. **J. J. Moré, “The Levenberg-Marquardt algorithm: implementation and theory,” in *Numerical Analysis* (Springer, 1978). Vol. 630, pp. 105–116.

**30. **S. Agarwal and K. Mierle, “Ceres solver,” http://ceres-solver.org.

**31. **G. Wu, Y. Wu, L. Li, and F. Liu, “High-resolution few-pattern method for 3D optical measurement,” Opt. Lett. **44**, 3602–3605 (2019). [CrossRef]

**32. **J. S. Hyun, B. Li, and S. Zhang, “High-speed high-accuracy three-dimensional shape measurement using digital binary defocusing method versus sinusoidal method,” Opt. Eng. **56**, 074102 (2017). [CrossRef]

**33. **B. Li, Y. An, and S. Zhang, “Single-shot absolute 3D shape measurement with Fourier transform profilometry,” Appl. Opt. **55**, 5219–5225 (2016). [CrossRef]