Efficient and accurate registration with BWPH descriptor for low-quality point clouds

Zhihua Du; Yong Zuo; Xiaohan Song; Yuhao Wang; Xiaobin Hong; Jian Wu

doi:10.1364/OE.505609

1. Introduction

Point cloud registration is a critical part of 3D computer vision and remote sensing, offering a multitude of significant applications including shape retrieval [1], object recognition [2], and surface alignment [3]. The primary objective of point cloud registration is to establish a common coordinate system by aligning data obtained from multiple viewpoints, enabling optimal transformation for model reconstruction or pose estimation. Several issues, however, still remain to be overcome, such as the sensitivity to noise [3], the low accuracy, high time consumption [4], and substantial memory requirements [2]. These challenges become particularly prominent when dealing with data acquired from low-cost sensors like Microsoft Kinect, LiDAR, and Intel RealSense. Consequently, further research is imperative to enhance and refine existing registration methods.

Generally, the iterative nearest point (ICP) algorithm is used for point cloud registration. Many researchers have adjusted and optimized the ICP algorithm. Rusinkiewicz [5] proposed developing point correspondences based on a projection-based algorithm. Chen [6] proposed the Hong-Tan-based ICP automatic registration algorithm (HTICP) for partially overlapping point clouds. Sun [7] achieved fine registration by leveraging the point cloud curvature and direction vectors. These methods mentioned above may be ineffective and prone to local minimums [4,8]. For satisfactory registration results, proper initialization is necessary. Consequently, coarse registration is the fundamental challenge of point cloud registration [1]. It serves as an initial value for fine registration by reducing rotations and translations between point clouds. Here we discuss coarse registration techniques, broadly divided into global feature-based [9] and local feature-based [10]. Global descriptors may not be entirely contained inside the overlapped region, which limits their discrimination ability. For aligning partially overlapped point clouds, local features are more appropriate. Due to this, we will focus on the local feature registration method for point clouds.

Over the past few decades, numerous approaches have been proposed for 3D local descriptors in coarse registration. They can be classified into non-learning-based and learning-based methods. For learning-based methods, they have shown significant promise for point cloud registration. However, some methods, such as GeDi [11] and SpinNet [12], require GPU acceleration. As well, some methods, such as FCGF [13] and D3Feat [14], exhibit low generalization. A practical issue is that they lack adequate point cloud datasets for some real-world applications and their generalization have not always succeeded. To this end, this article focuses on non-learning-based descriptors for practical applications. These descriptors can be categorized into two main types: real-valued descriptors and binary local descriptors [15]. Real-valued 3D local descriptors include well-known techniques such as Triple Orthogonal Local Depth Images (TOLDI) [16], which projects the local point cloud onto three orthogonal planes of the Local Reference Frame (LRF) and generates a depth image by discretizing and selecting the minimum value as the depth image. Rotational Contour Signature (RCS) descriptor [17] represents contour features in multiple views, encoding the largest distance as the characteristic between projected points within each local surface bin. Weighted height image (WHI) [18] simplifies the LRF to eliminate computation redundancy, as well as encoding the local surface using a weighted height image. Due to their high memory requirements, these techniques may not be suitable for lightweight platforms. The first binary descriptor, Binary Signature of Histograms (B-SHOT) [19], divides the SHOT [20] into four tuples and binarizes it into five different scenarios. B-RCS [17], another binary extension, employs thresholding, quantization, and geometrical binary encoding to build a binary representation of RCS. However, these conversion operations may lead to information loss, resulting in reduced robustness. Local Voxelized Structure (LoVS) [8] divides a neighborhood into multiple voxels, which are then labelled either 0 or 1 based on whether any points exist within each voxel. Nevertheless, it is sensitive to density variations and boundary effects. One common issue with most binary descriptors is that while they excel in terms of compactness and efficiency, they often exhibit lower alignment accuracy than real-valued descriptors.

Taking these considerations into account, we propose a novel descriptor called BWPH. Specifically, we first compute the Local Reference Frame (LRF) at the feature point and align the local neighboring points accordingly. Next, the local surface is projected onto the x-y plane and divided into multiple bins. To effectively describe the shape and ensure robustness, we utilize Gaussian kernel density estimation and weighted height characteristics to encode the projection features. By employing threshold comparison tests to transform the attributes of each bin into binary features, the BWPH descriptor is constructed. In our experiments, we compare the BWPH descriptor with state-of-the-art algorithms using public datasets. The results demonstrate that our descriptor achieves high matching accuracy, exhibits strong robustness, and requires a minimal memory footprint. Furthermore, we showcase the effectiveness of the registration method based on the BWPH descriptor when dealing with low-quality point clouds. This paper is an extension of our conference paper [21] and aims to improve the expression ability of feature space and optimize the representation of local surface height information. Overall, this work makes the following contributions:

1. The proposed BWPH descriptor achieves high efficiency, compactness, and feasibility across low-cost 3D sensors, which is a significant breakthrough for existing technologies.
2. We introduce a novel point cloud registration method based on the BWPH descriptor, which enables accurate and robust alignment of low-quality point clouds.

The remaining sections of this paper are organized as follows: Section 2 provides an introduction to the BWPH descriptor method. Section 3 presents the evaluation results of the BWPH descriptor, focusing on its matching accuracy, robustness, compactness and efficiency. Section 4 presents the point cloud registration algorithm based on BWPH descriptor. Section 5 summarizes the conclusions.

2. Local feature description method

In this section, we provide a comprehensive overview of the proposed feature descriptor. Construction of the descriptor consists of several key phases, namely LRF construction, transformed local surface projection, weighted projection-point height calculation, binarization and BWPH generation parameters. An illustration of the overall process of the proposed method is provided in Fig. 1.

Fig. 1. An overview of the BWPH descriptor.

Download Full Size | PDF

2.1 Local reference frame construction

The construction of most local feature descriptors relies on a unique and reliable LRF [22]. Existing methods for LRF construction can broadly be categorized into two categories: Covariance Analysis (CA) [23] and Geometric Attribute (GA) [22]. Here, we adopt CA-based methods similar to the approach used in the recent VBBD method [24], which has demonstrated stability. In order to enhance the repeatability of the LRF in the presence of noise and low-quality data, we employ the following procedures for constructing the LRF.

The covariance matrix M is determined using the feature point p and its neighboring points within its support radius r. To optimize efficiency, we designate p as the center point. The corresponding M is

(1)$$M = \frac{1}{{\sum\limits_{i:{d_i} \le r} {{{(r - {d_i})}^2}} }}\sum\limits_{i:{d_i} \le r} {{{(r - {d_i})}^2}({q_i} - p){{({q_i} - p)}^T}}$$

Where ${d_i} = {||{{q_i} - p} ||_2}$ is the distance between the key point p and its neighbors q_i. To maximize the effect of neighboring points situated closer to the feature point, the weight is modified to become a quadratic function.

The z-axis is determined as the eigenvector associated with the smallest eigenvalue of M. Meanwhile, the eigenvector with the largest eigenvalue is selected as the x-axis. To reduce ambiguity in the LRF, the x-axis and z-axis are aligned with the direction of the majority of vectors [9]. Moreover, by taking the cross-product between the z-axis and y-axis, the y-axis can be determined.

2.2 Weighted projection-point height feature calculation

To efficiently describe the 3D local shape while minimizing information loss, we generate the BWPH descriptor. Specifically, we first perform a transformation on the local surface based on the LRF to achieve rotational and translational invariance. Afterwards, the transformed point cloud ${P_t} = \{{q_0^t,q_1^t,q_2^t,\ldots ..q_N^t} \}$ is projected onto the x-y plane, giving rise to the projected point cloud $P_t^p = \{{q_0^{tp},q_1^{tp},q_2^{tp},\ldots ..q_N^{tp}} \}$, as shown in Fig. 1(c). It is noteworthy that only the x-y plane is used for the BWPH descriptor description [17]. The projected point clouds are then divided into $m \times m$ bins, with each bin having an edge length of $2 \times {r / m}$, as shown in Fig. 1(d). Moreover, the center coordinates $(x,y)$ of each bin are determined based on the index (i, j) of the mesh cell. It is calculated as follows:

(2)$$\left\{ \begin{array}{l} x = {p_c}.x + (i - m/2) \times l\\ y = {p_c}.y + (j - m/2) \times l \end{array} \right.$$

Where ${p_c}$ refers to the transformed coordinates of p in the current local surface, and l denotes the edge length of each bin.

The key of BWPH descriptor is to enhance the robustness of projection feature calculations with Gaussian kernel density estimation [22] and to preserve the spatial information of the object with local height characteristics. In addition, to maximize efficiency, this method combines weighted projection-point height features for each bin rather than for each point when calculating the BWPH descriptor. Figure 2 shows weighted features using Gaussian kernel density estimation. Specifically, each bin’s region is established, where the current bin’s center is the origin and the bandwidth h is the radius, as shown in Fig. 2(a). The relationship between the weight and the distance from the center point of the current bin to its neighbors is shown in Fig. 2(b). The calculation of weighted projection-point height for each bin is as follows:

(3)$$f(i,j) = h(i,j)\frac{1}{{{m_d}}}\sum\limits_{n = 1}^{{m_d}} {\frac{1}{{\sqrt {2\pi } h}}} \exp ( - \frac{{{{||{{t_n} - b(i,j)} ||}^2}}}{{2{h^2}}}),s.t||{{t_n} - b(i,j)} ||< 3h$$

Where $b(i,j)$ represents the center point of the current bin, ${t_n}$ is the neighbors of $b(i,j)$ in the bin’s region, h is defined as the variance of a Gaussian distribution, which determines the bandwidth of a Gaussian kernel, ${m_d}$ represents the number of neighboring points of $b(i,j)$ within the distance h, and $h(i,j)$ denotes the average of weighted height of the current bin’s region. In light of the fact that the feature space of an ideal state is more descriptive [18], which means that each feature dimension contains a comparable amount of information, we estimate the height characteristics of each bin by using the weighted coding function to express information about local surface depth. The formula is as follows:

(4)$$h(i,j) = \frac{1}{{{m_d}}}\sum\limits_{n = 1}^{{m_d}} {\left. {\left( {\lambda + (1 - \lambda )\frac{{(r - {e_n})}}{r}} \right.} \right)} \times {h_n},(0 < \lambda \le 1)$$

Where ${e_n}$ indicates the Euclidean distance between the p and ${t_n}$, ${h_n}$ is the local height defined as the length from neighboring points to projection plan, and λ is set to 0.3 which controls the range.

Fig. 2. Illustration of the weighted feature calculation in the current bin’s region.

Download Full Size | PDF

2.3 Binarization

Meanwhile, the average weighted projection-point height ${f_{ave}}(i,j)$ of all bins on the local surface can be calculated as (5). To improve the matching efficiency of the BWPH descriptor, transform the attributes of each bin into binary features. Specifically, the current bin is assigned a label of 1 if its $f(i,j)$ value exceeds the ${f_{ave}}(i,j)$ value of the local surface. Otherwise, it is labeled 0.

(5)$${f_{ave}}(i,j) = \frac{{\sum {f(i,j)} }}{{m \times m}}$$

Finally, by combining these binary strings of $m \times m$ bins into a feature vector, the BWPH descriptor is generated.

2.4 BWPH generation parameters analysis

The BWPH descriptor incorporates three key parameters: the support radius r, the number of partition bins m, and the bandwidth of the Gaussian kernel h. To evaluate the performance of the BWPH descriptor under different parameter settings, we conduct tests using the well-tuned B3R dataset. Specifically, scenes are created by 1/4 mesh decimation and Gaussian noise with 0.3 mesh resolution (mr). For evaluation, the recall versus 1-precision curve (RPC) [25,26] is used, as described in Section 3.

The support radius r determines the scale of a feature descriptor. It is essential to use a moderate size of r for 3D registration areas, since large r results in a descriptor that is more sensitive to missing border regions, clutter, and occlusion, whereas small r results in a descriptor that is less distinct. Figure 3(a) shows the BWPH descriptor’s performance under different support radius conditions. Specifically, matching performance improves significantly with an increase in the support radius from 5mr to 10mr, which is due to the fact that a radius of 5mr does not contain enough discriminating information. When r increases from 10mr to 15mr, the BWPH descriptor’s performance improves significantly. As r continues to increase, performance growth slows down due to boundary effects. To strike a balance between descriptiveness, robustness, and time efficiency, we set r at 15mr.

Fig. 3. Parameters analysis of the BWPH descriptor.

Download Full Size | PDF

The number of partition bins m in the BWPH descriptor determines the extent of division in the 2D image patch, which directly influences the descriptiveness and robustness of the descriptor. To evaluate the impact of different m values, we conduct experiments on the tuning dataset while keeping the bandwidth h set to 3 g and $g = r/m$. It is critical to note that each bin’s weighted projection-point height characteristic is calculated by aggregating the features of each center point’s neighboring points within the corresponding bin. This step is essential to mitigating boundary effects. To reduce such effects, we set m as an odd number. Figure 3(b) illustrates the experiment results. Particularly, the performance with $m = 11$ is considerably less than the others and some abnormal data is observed. This is due to the fact that if m is too small, the descriptor would lose many details regarding the local shape. It is observed that the BWPH descriptor performs better as m increases from 11 to 19, indicating that encoding more details about the local surface point distribution enhances its performance. However, the improvement slows down as m increases to 31. This is due to the fact that excessively large m values reduce the robustness of the descriptor against disturbances. Hence, we set $m = 19$ to strike a balance between robustness and computational efficiency.

The bandwidth of the Gaussian kernel h is another significant parameter. We conduct tests to evaluate the descriptor’s performance with varying h values, ranging from 1 g to 5 g, as shown in Fig. 3(c). The results demonstrate that as the bandwidth h increases from 1 g to 2 g, the recall of the BWPH descriptor improves. This means that a larger bandwidth allows for better capture of local surface details and enhances descriptor performance. However, as bandwidth reaches beyond 2 g, the performance degrades. Increasing bandwidth will make the descriptor less susceptible to disturbances, but also lead to over-smoothing. Therefore, we have chosen $h = 2g$ as the optimal value.

3. Experiments and analysis

This section presents an overview of the experiment datasets, experimental setup, benchmark parameters, and evaluation criteria. Afterward, we compare the proposed descriptor to several state-of-the-art algorithms to assess its performance.

3.1 Datasets and experimental setup

To provide a comprehensive evaluation of the proposed method, we conduct performance assessments of the BWPH descriptor on five publicly available datasets. These datasets include: the Bologna 3D Retrieval (B3R) dataset [27], Kinect dataset [28], 3DMatch dataset [29], UWAOR dataset [30] and Space Time dataset [28]. These images are medium- or low-quality, representing various modalities. Examples of these datasets are illustrated in Fig. 4. All experiments in this research were conducted on a PC equipped with an Intel Core i7-8550 U 1.99 GHz CPU and 8GB of RAM.

Fig. 4. Experimental datasets.

Download Full Size | PDF

Several state-of-the-art 3D features are selected for performance comparisons and evaluations. Representative descriptors include TOLDI [16], RCS [17], and WHI [18], which are real-valued types, and B-RCS [17], LOVS [8], which are binary types. Benchmark parameters for all selected descriptors are listed in Table 1.

Table 1. Parameter settings for selected descriptors

View Table | View all tables in this article

3.2 Evaluation criteria

Recall vs 1-Precision curve (RPC) is used to evaluate the performance of the selected descriptors. Here is an overview of the evaluation process: Firstly, feature points are extracted from the model and scene point clouds. For each model feature, a matching process is performed with all scene features. The closest and second-closest features are identified. A match is considered if the ratio between the closest distance and the second closest distance is below a predefined threshold τ. The distances are measured differently for real-valued descriptors (using L2 Euclidean distance) and binary descriptors (using Hamming distance) [15]. Additionally, to be considered correct, the matched features must be located close to each other, less than 1/2r. Unless specifically mentioned, for each feature point in the model, 1000 points are randomly selected, and their corresponding points on the scene are identified by ground-truth transformation [31,32].

(6)$$\textrm{Recall} = \frac{{\textrm{number of correct matches}}}{{\textrm{number of corresponding features}}}$$

(7)$$1\textrm{ - Precision = }\frac{{\textrm{number of false matches}}}{{\textrm{number of total matches}}}$$

By varying the threshold τ, the RPC can be generated for each descriptor. In general, if the RPC lies within the top left corner of the plot, it indicates that the descriptor has excellent matching accuracy. To quantitatively assess the performance of the RPC, we utilize the Area Under Curve (AUC_pr) metric [18]. A higher AUC_pr value indicates better performance, reflecting a higher ability to achieve both high recall and precision simultaneously.

In our experiments, we utilize the uniform-sampling (US) detector [15] to detect feature points. It is worth noting that the US detector is intentionally chosen for its relatively lower performance than other feature detectors such as ISS [33] and Harris3D [34]. By employing the US detector, which has lower detection capabilities, we impose higher requirements on the distinctiveness and discriminative power of the descriptors being evaluated.

3.3 Accuracy of feature descriptor matching

As a general rule, low-cost sensors do not always produce satisfactory point clouds when used in practical applications. These processes are often disturbed by noise, varying mesh resolutions, clutter, and occlusion. To make a comprehensive test, we perform experiments on three benchmark datasets to validate the accuracy of the BWPH descriptor: the B3R dataset (shape retrieval) [27], the Space Time dataset (3D Object recognition) [28], and the 7-Scenes dataset (partial view matching) [29]. Lastly, we calculate the AUC_pr as a measure of overall accuracy.

The B3R dataset serves to evaluate the resilience of descriptors to noise and mesh decimation. As a measure of robustness to noise, we introduce Gaussian noise to the scenes in the dataset with standard deviations of 0.3mr and 0.5mr. The RPC for this evaluation is depicted in Fig. 5(a, b). It has been observed that the BWPH descriptor demonstrates the highest recall in noisy conditions, followed by the BWPD and WHI descriptors. LOVS and TOLDI perform comparably across different noise levels, whereas RCS and B-RCS perform significantly worse. Furthermore, we assess the robustness of the descriptors to varying mesh resolutions by testing them on simplified scenes with different mesh resolutions. Figure 5(c, d) presents the RPC results for this evaluation. In accordance with the decreasing simplification rate, which indicates more severe mesh decimation, all descriptors exhibit deteriorating performance. In challenging scenarios with 1/4 or 1/8 mesh decimation, BWPH achieves the highest performance among these descriptors. It is followed by BWPD and LOVS descriptors. By contrast, B-RCS descriptors behave similarly to RCS, but have more vulnerability to mesh decimation, as they lose information during the quantitative transformation. Moreover, BWPH is superior to BWPD in this dataset. This is because BWPH with weighted height information can make similar contributions to distance measurement for each dimension, improving feature space expression. Overall, these evaluations demonstrate the robustness of the BWPH descriptor against noise and mesh decimation, showcasing its superior performance in challenging conditions compared to the other descriptors evaluated.

Fig. 5. Feature matching performance on the B3R dataset.

Download Full Size | PDF

The Space Time dataset was captured using a Space Time Stereo camera, resulting in a point cloud of low-quality with partial occlusion and clutter. In Fig. 6, the RPC results are presented, showcasing the superior performance of the proposed BWPH descriptor. Matching accuracy of the WHI descriptor is ranked second. This suggests that depth-based information is effective in distinguishing complex local patches. It is worth noting that the BWPH descriptor achieves this while consuming minimal memory compared to WHI. However, compared to the B3R dataset, LOVS ranks second to last in this dataset due to clutter and occlusion. Additionally, the B-RCS descriptor receives the lowest score. These results show that the BWPH descriptor encodes enough distinctive information to distinguish local surfaces.

Fig. 6. Feature matching performance on the Space Time dataset.

Download Full Size | PDF

7-Scenes dataset contains multiple indoor 2.5D scenes captured with a Kinect camera. This dataset is characterized by a wide range of scenes with high levels of noise, clutter, occlusion, and partial overlap. The quality of point clouds in this dataset is relatively low compared to others. Therefore, in this challenging dataset, the recall rate of all descriptors is low. In Fig. 7, the RPC results are displayed. The BWPH and TOLDI descriptors stand out with excellent performance, while WHI experiences a significant drop compared to all other descriptors. This drop can be attributed to WHI relying solely on the height image, making it more susceptible to missing data.

Fig. 7. Feature matching performance on the 7-Scenes dataset.

Download Full Size | PDF

These phenomena illustrate the challenges encountered in cross-dataset experiments. Nevertheless, it is apparent that our BWPH descriptor demonstrates high stability across all datasets, distinguishing it from other descriptors. The accuracy measurement is determined by the average of AUC_pr, as shown in Table 2, which displays the AUC_pr values for the aforementioned challenging datasets. Clearly, the BWPH descriptor has higher accuracy than the other descriptors. Collectively, these experimental findings provide compelling evidence of the effectiveness of the proposed method.

Table 2. The average AUC_pr on challenging datasets. The four best performances are reported in bold face.

View Table | View all tables in this article

3.4 Compactness

Different descriptors have different dimensions which affect matching time and memory footprint [26]. The compactness of a feature descriptor is defined as its descriptiveness per unit length. Here, we calculate the ratio of the average AUC_pr from Table 2 to the length of each descriptor in Table 1 to assess the compactness of each descriptor. The length of a descriptor is measured in bytes. For real-valued descriptors, the length is determined by multiplying the storage size of floating-point data by the corresponding dimension [2]. On the other hand, binary descriptors have a length equal to the dimension divided by 8. This compactness metric provides insights into the trade-off between descriptor length and accuracy performance.

(8)$$\textrm{compactness = }\frac{{\textrm{Average value of the AU}{\textrm{C}_{pr}}}}{{\textrm{Length of the descriptor}}}$$

In Fig. 8, it is evident that the BWPH descriptor exhibits the highest level of compactness among all descriptors. It surpasses other binary descriptors by 2 times in average compactness and outperforms all real-valued descriptors by about 17 times in average compactness. While WHI and TOLDI achieve higher accuracy, as indicated in Table 2, they receive low scores in terms of compactness. This suggests that the dimensions of these descriptors severely limit their compactness. In contrast, the BWPH descriptor stands out with fewer dimensions and higher accuracy, marking a significant breakthrough in existing technologies. This breakthrough implies that improved compactness does not compromise matching accuracy, even as dimensionality decreases. Consequently, lightweight features are expected to strike an appropriate balance between storage occupancy and accuracy, providing practical advantages in various applications.

Fig. 8. Compactness of the selected descriptors.

Download Full Size | PDF

3.5 Efficiency

Datasets acquired from 3D sensors usually contain numerous point clouds, making descriptor efficiency critical, especially for real-time applications. In order to assess the computational time involved in feature description and surface matching, we present experimental results on the Space Time dataset. As shown in Fig. 9, binary descriptors are more efficient than real-valued descriptors. Among these selected descriptors, BWPH and B-RCS are the most effective, followed by the LOVS descriptor. Nevertheless, BWPH has the advantage of achieving superior matching accuracy over B-RCS and LOVS. Moreover, TOLDI is the most time-consuming because it combines local depth data from three view planes to form high-dimensional features that occupy a large amount of memory.

Fig. 9. Efficiency of the selected descriptors.

Download Full Size | PDF

4. Point cloud registration performance

In this section, we focus on the registration application of local feature descriptors and present both qualitative and quantitative results. The objective is to validate the registration accuracy of the BWPH descriptor on public and real-world datasets. To begin, feature points are detected using a US detector on both the model and scene point clouds. In addition, the BWPH descriptor is employed to describe these feature points. Next, the nearest neighbor distance ratio (NNDR) matching technique [35] is utilized to establish feature correspondences. Then using the Random sample consensus (RANSAC) algorithm [36], we remove outlier correspondences and estimate a coarse model-to-scene transformation. The threshold is 0.8 for this process. This algorithm framework can be seen in Fig. 10. These steps collectively contribute to the registration process, allowing for the alignment of the model and scene point clouds. The results, both qualitative and quantitative, assess the accuracy and effectiveness of the BWPH descriptor in various datasets.

Fig. 10. BWPH descriptor-based point cloud registration.

Download Full Size | PDF

4.1 Qualitative experiments

In Fig. 11, the registration results of six shape pairs with different data modalities are depicted. These shape pairs are sourced from both public datasets and real-world datasets, namely the Stanford 3D Scanning Repository obtained via LiDAR [27], the public Bologna Mesh Registration (BMR) [28] acquired via the Kinect, and the 3DMatch dataset obtained via the Kinect [29]. Figure 11(a) showcases the 3D surface matching results of two view pairs derived from the Bunny and Armadillo data in the Stanford Repository, which are affected by noise and mesh decimation. In addition, Fig. 11(b) shows two view pairs from the Mario and Frog data in the BMR dataset, where missing regions are present. As shown in Fig. 11(c), two sets of indoor scenes in the 3DMatch dataset, which are partially overlapped and contain real noise, are registered. Notably, the registration performed using the BWPH descriptor achieves high accuracy visually and successfully aligns the data into a unified coordinate system. These visual results provide compelling evidence of the effectiveness and accuracy of the BWPH descriptor in the registration of shape pairs with various data modalities.

Fig. 11. Visual results of six pairs of 3D object and scene point clouds using the BWPH method. From left to right, initial position, correspondence between points, and registration results for each pair of point clouds.

Download Full Size | PDF

To evaluate registration performance on a specified object within scenes, the UWAOR and Space Time datasets [28,30] are considered. These datasets consist of scenes with multiple objects, occlusion, and clutter. Figure 12 shows 3D object registration results. It is evident from the figure that the proposed BWPH descriptor successfully detects corresponding point pairs. These point pairs serve as reliable input for the RANSAC algorithm, which enables coarse transformation estimation. This transformation estimation plays a crucial role in determining the presence of an object in the scene and solving the spatial pose of the object. The registration results demonstrate the efficacy of the BWPH descriptor in detecting and matching corresponding points. This enables accurate and reliable object registration in complex scenes with occlusion and clutter.

Fig. 12. Object registration based on the BWPH descriptor. For each pair of scenes, the left column displays the scenes, the middle column shows the visual correspondences, and the right column illustrates the registration results.

Download Full Size | PDF

4.2 Quantitative experiments

Registration accuracy can be determined by assessing rotational and translational errors in three directions [1]. Smaller values indicate better matching performance. Table 3 lists rotation and translation errors along the x-, y-, and z-directions for selected datasets. Among these datasets, 7-Scenes, UWAOR and Space Time are considered the more challenging, with relatively lower descriptor performance than B3R and BMR. Across all datasets, our method demonstrates very small registration errors. This highlights the effectiveness of our BWPH descriptor in achieving excellent registration results. Moreover, BWPH-based point cloud registration provides an accurate initial pose, which proves beneficial when conducting fine point cloud registration, particularly when utilizing the ICP algorithm [4]. With an accurate initial pose, fewer iterations are required, accelerating the convergence of the ICP algorithm.

Table 3. Registration error with BWPH descriptor on the selected datasets

View Table | View all tables in this article

As a further illustration of the performance of the BWPH descriptor for point cloud registration, we performed 3D registration using the selected descriptors on the B3R dataset and Space Time dataset. We use the same experimental operations for all point cloud registrations based on the selected descriptors. The registration error with six different methods is listed in Table 4, 5. Obviously, on the Space Time dataset, all registration errors are relatively higher than the B3R dataset. Moreover, it can be seen that our method’s registration error is the smallest on the B3R dataset and Space Time dataset compared with Table 3. It further demonstrates our algorithm’s effectiveness.

Table 4. Registration error with selected descriptors on the B3R dataset

View Table | View all tables in this article

Table 5. Registration error with selected descriptors on the Space Time dataset

View Table | View all tables in this article

5. Conclusions

In this paper, we introduce a novel descriptor called the BWPH descriptor for 3D local surfaces. Our proposed method comprises several key components. Firstly, we employ the Gaussian kernel density estimation method to enhance the robustness of the descriptor, as well as weighted projection-point height to enhance its distinctiveness. Next, we transform the description of 3D point clouds into a series of binarizations, improving both the efficiency and compactness of the descriptor. Then, we establish point-to-point correspondences based on the BWPH descriptor and employ the RANSAC algorithm to achieve accurate initial alignment. Finally, we have conducted extensive experiments on several datasets acquired from low-cost sensors to evaluate BWPH descriptor performance. Experimental results demonstrate that our proposed method achieves high registration accuracy, exhibits strong robustness against various challenges, and delivers fast matching speeds compared to state-of-the-art methods. Overall, our proposed method presents a significant advancement in 3D local surface descriptors and demonstrates its effectiveness and applicability to various registration scenarios.

Acknowledgments

We sincerely thank the editors and anonymous reviewers for their contributions to this paper.

Disclosures

The authors declare that there are no conflicts of interest related to this article.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. L. Zhao, Z. Xiang, M. Chen, et al., “Establishment and Extension of a Fast Descriptor for Point Cloud Registration,” Remote Sens. 14(17), 4346 (2022). [CrossRef]

2. Y. Zou, X. Wang, T. Zhang, et al., “BRoPH: An efficient and compact binary descriptor for 3D point clouds,” Pattern Recogn. 76(4), 522–536 (2018). [CrossRef]

3. J. Yang, Z. Cao, and Q. Zhang, “A fast and robust local descriptor for 3D point cloud registration,” Inf. Sci. 346-347, 163–179 (2016). [CrossRef]

4. Y. He, J. Yang, X. Hou, et al., “ICP registration with DCA descriptor for 3D point clouds,” Opt. Express 29(13), 20423–20439 (2021). [CrossRef]

5. S. Rusinkiewicz and M. Levoy, “Efficient variants of the ICP algorithm,” Proceedings of the 3rd International Conference on 3-D Digital Imaging and Modeling, Quebec, Canada, 145–152 (2002).

6. J. Chen, X. Wu, M. Wang, et al., “3D shape modeling using a self-developed hand-held 3D laser scanner and an efficient HT-ICP point cloud registration algorithm,” Opt. Laser Technol. 45, 414–423 (2013). [CrossRef]

7. J. Sun, Z. Yang, F. Li, et al., “Projected feature assisted coarse to fine point cloud registration method for large-size 3D measurement,” Opt. Express 31(11), 18379–18398 (2023). [CrossRef]

8. S. Quan, J. Ma, F. Hu, et al., “Local voxelized structure for 3D binary feature representation and robust registration of point clouds from low-cost sensors,” Inf. Sci. 444, 153–171 (2018). [CrossRef]

9. L. Hao, X. Yang, K. Xu, et al., “Rotational Voxels Statistics Histogram for both real-valued and binary feature representations of 3D local shape,” J. Vis. Commun. Image R. 93, 103817 (2023). [CrossRef]

10. X. Liu, A. Li, J. Sun, et al., “Trigonometric projection statistics histograms for 3D local feature representation and shape description,” Pattern Recogn. 143, 109727 (2023). [CrossRef]

11. F. Poiesi and D. Boscaini, “Learning General and Distinctive 3D Local Deep Descriptors for Point Cloud Registration,” IEEE Trans. Pattern Anal. Mach. Intell. 45(3), 1 (2022). [CrossRef]

12. S. Ao, Y. Guo, Q. Hu, et al., “You only train once: Learning general and distinctive 3D local descriptors,” IEEE Trans. Pattern Anal. Mach. Intell. 45(3), 1–18 (2022). [CrossRef]

13. C. Choy, J. Park, and V. Koltun, “Fully convolutional geometric features,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Seoul, South Kore, 8957–8965 (2019).

14. B. Zhao, J. Yue, Z. Tang, et al., “A Novel Local Feature Descriptor and an Accurate Transformation Estimation Method for 3-D Point Cloud Registration,” IEEE Trans. Instrum. Meas. 72, 1–15 (2023). [CrossRef]

15. L. Hao and H. Wang, “Geometric feature statistics histogram for both real-valued and binary feature representations of 3D local shape,” Image Vis. Comput. 117, 104339 (2022). [CrossRef]

16. J. Yang, Q. Zhang, X. Yang, et al., “TOLDI: An effective and robust approach for 3D local shape description,” Pattern Recogn. 65, 175–187 (2017). [CrossRef]

17. J. Yang, Y. Zhang, X. Ke, et al., “Rotational contour signatures for both real-valued and binary feature representations of 3d local shape,” Comput. Vis. Image Understand 160, 133–147 (2017). [CrossRef]

18. T. Sun, G. Liu, S. Liu, et al., “An efficient and compact 3D local descriptor based on the weighted height image,” Inf. Sci. 520, 209–231 (2020). [CrossRef]

19. S. M. Prakhya, B. Liu, and W. Lin, “B-SHOT: A binary feature descriptor for fast and efficient keypoint matching on 3D point clouds,” IEEE RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 1929–1934 (2015).

20. S. Salti, F. Tombar, and L. D. Stefano, “SHOT: Unique signatures of histograms for surface and texture description,” Comput. Vis. Image Understand 125, 251–264 (2014). [CrossRef]

21. Z. Du, Y. Zuo, X. Song, et al., “A Novel Binary Descriptor for 3D Registration of Point Clouds from Low-cost Sensors,” Asia Communications and Photonics Conference (ACP), Shenzhen, China, 1974–1977 (2022).

22. Z. Dong, B. Yang, Y. Liu, et al., “A novel binary shape context for 3D local surface description,” ISPRS J. Photogram. Remote Sens. 130, 431–452 (2017). [CrossRef]

23. Y. Zhang, C. Li, B. Guo, et al., “KDD: A kernel density based descriptor for 3D point clouds,” Pattern Recogn. 111(2), 107691 (2021). [CrossRef]

24. R. Zhou, X. Li, and W. Jiang, “3D Surface Matching by a Voxel-Based Buffer-Weighted Binary Descriptor,” IEEE Access 7(99), 86635–86650 (2019). [CrossRef]

25. D. Bibissi, J. Yang, S. Quan, et al., “Dual spin-image: A bi-directional spin-image variant using multi-scale radii for 3D local shape description,” Comput Graph. 103, 180–191 (2022). [CrossRef]

26. Z. Du, Y. Zuo, J. Qiu, et al., “MDCS with fully encoding the information of local shape description for 3D Rigid Data matching,” Image Vis. Comput. 121, 104421 (2022). [CrossRef]

27. B. Curless and M. Levoy, “A volumetric method for building complex models from range images,” in Proc. Annu. Conf. Comput. Graph. Interact. Tech. 303–312 (1996).

28. A. E. Johnson and M. Hebert, “Using spin images for efficient object recognition in cluttered 3D scenes,” IEEE Trans. Pattern Anal. Machine Intell. 21(5), 433–449 (1999). [CrossRef]

29. A. Zeng, S. Song, M. Nießner, et al., “3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions,” Proceedings of the IEEE conference on computer vision and pattern recognition, Hawaii, USA, 1802–1811 (2017).

30. A. S. Mian, M. Bennamoun, and R. Owens, “Three dimensional model-based object recognition and segmentation in cluttered scenes,” IEEE Trans. Pattern Anal. Mach. Intell. 28(10), 1584–1601 (2006). [CrossRef]

31. J. Yang, Q. Zhang, and Z. Cao, “Multi-attribute statistics histograms for accurate and robust pairwise registration of range images,” Neurocomputing 251, 54–67 (2017). [CrossRef]

32. W. Tao, X. Hua, K. Yu, et al., “A Pipeline for 3-D Object Recognition Based on Local Shape Description in Cluttered Scenes,” IEEE Trans. Geosci. Remote Sensing 59(1), 801–816 (2021). [CrossRef]

33. F. Tombari, S. Salti, and Luigi Di Stefano, “Unique signatures of histograms for local surface description,” Proceedings of 11th Eur. Conference Computer Vision, Heraklion, Crete, 356–369 (2010).

34. I. Sipiran and B. Bustos, “Harris 3d: a robust extension of the Harris operator for interest point detection on 3d meshes,” Vis. Comput. 27(11), 963–976 (2011). [CrossRef]

35. K. Mikolajczyk and C. Schmid, “A Performance Evaluation of Local Descriptors,” IEEE Trans. Pattern Anal. Machine Intell. 27(10), 1615–1630 (2005). [CrossRef]

36. F. Tombari, S. Salti, and L. Di Stefano, “Performance evaluation of 3d keypoint detectors,” Int. J. Comput. Vis. 102(1-3), 198–220 (2013). [CrossRef]

Descriptor	Datasets(AUC_pr)				Average AUC_pr
Descriptor	B3R + Gaussian	B3R + Decimation	Space time	7-Scenes	Average AUC_pr
TOLDI	0.9435	0.3059	0.4942	0.1689	0.4781
RCS	0.8899	0.3688	0.4894	0.1186	0.4667
WHI	0.9612	0.3828	0.649	0.0231	0.5040
B-RCS	0.8926	0.1967	0.2255	0.093	0.3519
LOVS	0.9336	0.3933	0.29	0.164	0.4452
BWPD	0.9857	0.6406	0.6281	0.1669	0.6053
BWPH	0.9845	0.7995	0.7087	0.2403	0.6832

Dataset	θ_x(°)	θ_y (°)	θ_z(°)	T_x(m)	T_y(m)	T_z(m)
B3R	-0.0072	-0.0070	0.0068	-0.0011	-2.78e-05	0.0009
BMR	-0.0040	-0.0064	0.0142	0.0038	-0.00315	0.0030
7-Scenes	0.0098	0.0586	-0.0426	-0.0917	-0.0079	0.0109
UWAOR	-0.0081	-0.0002	0.0027	-0.0918	5.1483	-0.0390
Space Time	-0.0059	-0.0024	0.0079	-0.0300	-0.3451	-0.0008

Descriptor	θ_x(°)	θ_y (°)	θ_z(°)	T_x(m)	T_y(m)	T_z(m)
TOLDI	0.0127	-0.0161	0.0008	-0.0016	-0.0003	0.0002
RCS	0.0141	0.0318	-0.0019	0.0058	0.0002	-0.0029
WHI	-0.0055	-0.0023	0.0169	0.0029	0.0016	-0.0004
B-RCS	0.0164	-0.0002	-0.0041	0.0062	9.68e-05	-0.0020
LOVS	0.0075	-0.0189	0.0067	-0.0013	0.0001	-0.0003
BWPD	-0.0064	-0.0061	0.0066	-0.0011	3.39e-05	0.0011

Descriptor	θ_x(°)	θ_y (°)	θ_z(°)	T_x(m)	T_y(m)	T_z(m)
TOLDI	0.0138	-0.0089	-0.0180	0.9190	0.1065	-0.0750
RCS	0.0160	-0.0037	-0.0194	0.5798	0.2450	-0.0193
WHI	-3.26e-05	-0.0047	-6.13e-05	0.3277	-0.3282	-0.0405
B-RCS	-0.0190	-0.0016	0.0099	-0.2856	-0.4659	0.0365
LOVS	0.0086	-0.0004	-0.0090	0.2379	0.2114	0.0928
BWPD	0.0038	-0.0049	-0.0033	0.4087	-0.1445	-0.0513

Descriptor	Datasets(AUC_pr)				Average AUC_pr
Descriptor	B3R + Gaussian	B3R + Decimation	Space time	7-Scenes	Average AUC_pr
TOLDI	0.9435	0.3059	0.4942	0.1689	0.4781
RCS	0.8899	0.3688	0.4894	0.1186	0.4667
WHI	0.9612	0.3828	0.649	0.0231	0.5040
B-RCS	0.8926	0.1967	0.2255	0.093	0.3519
LOVS	0.9336	0.3933	0.29	0.164	0.4452
BWPD	0.9857	0.6406	0.6281	0.1669	0.6053
BWPH	0.9845	0.7995	0.7087	0.2403	0.6832

Efficient and accurate registration with BWPH descriptor for low-quality point clouds

Abstract

1. Introduction

2. Local feature description method

2.1 Local reference frame construction

2.2 Weighted projection-point height feature calculation

2.3 Binarization

2.4 BWPH generation parameters analysis

3. Experiments and analysis

3.1 Datasets and experimental setup

3.2 Evaluation criteria

3.3 Accuracy of feature descriptor matching

3.4 Compactness

3.5 Efficiency

4. Point cloud registration performance

4.1 Qualitative experiments

4.2 Quantitative experiments

5. Conclusions

Acknowledgments

Disclosures

Data availability

References

Data availability

Cited By

Figures (12)

Tables (5)

Equations (8)

Optics Express

Descriptor	Type	Parameters	Dimension	Bytes
TOLDI	float	W = 20	1200	4800
RCS	float	N_θ=6, N_c= 12	72	288
WHI	float	N_k= 12	144	576
B-RCS	binary	N^quant= 4	288	36
LOVS	binary	V_b= 9	729	92
BWPH	binary	m = 19	361	46