## Abstract

Bundle adjustment (BA) is a common estimation algorithm that is widely used in machine vision as the last step in a feature-based three-dimensional (3D) reconstruction algorithm. BA is essentially a non-convex non-linear least-square problem that can simultaneously solve the 3D coordinates of all the feature points describing the scene geometry, as well as the parameters of the camera. The conventional BA takes a parameter either as a fixed value or as an unconstrained variable based on whether the parameter is known or not. In cases where the known parameters are inaccurate but constrained in a range, conventional BA results in an incorrect 3D reconstruction by using these parameters as fixed values. On the other hand, these inaccurate parameters can be treated as unknown variables, but this does not exploit the knowledge of the constraints, and the resulting reconstruction can be erroneous since the BA optimization halts at a dramatically incorrect local minimum due to its non-convexity. In many practical 3D reconstruction applications, unknown variables with range constraints are usually available, such as a measurement with a range of uncertainty or a bounded estimate. Thus to better utilize these pre-known, constrained, but inaccurate parameters, a bound constrained bundle adjustment (BCBA) algorithm is proposed, developed and tested in this study. A scanning fiber endoscope (the camera) is used to capture a sequence of images above a surgery phantom (the object) of known geometry. 3D virtual models are reconstructed based on these images and then compared with the ground truth. The experimental results demonstrate BCBA can achieve a more reliable, rapid, and accurate 3D reconstruction than conventional bundle adjustment.

© 2015 Optical Society of America

## 1. Introduction

3D reconstruction is a process to retrieve the geometry and appearance of real objects or scene, which can be achieved by two main categories: active and passive methods [1]. A common way in active methods is to use an artificial light source, such as projecting structured-light of a known pattern onto an object and then recovering the depth map from the reflectance image [2–5]. Passive methods require multiple overlapped images of normally illuminated object without interfering any artificial light, such as stereo vision [6, 7] and structure from motion (SfM) [8–10]. Typically in SfM, the multiple images that are captured from relative motion of camera and object are used for feature-based 3D reconstruction of the object or scene.

Bundle Adjustment (BA) is an optimization technique, which involves simultaneously refining the camera parameters (focal length, center pixel, distortion, position or/and orientation), as well as the 3D coordinates of all the feature points describing the object [11]. The feature point is a specific structure in the image data, such as a corner. BA is often used as the last step in feature-based 3D reconstruction, following the prior steps of feature detection and feature matching. The cost function of BA is to minimize the *L2* norm of reprojection error of each feature point in every image where it is visible. Reprojection error is the pixel distance between the real position of each observed 2D feature and the calculated reprojection of its reconstructed 3D corresponding point, based on the current estimation of camera and object parameters [11]. This optimization can be expressed as a large-scale, non-convex, non-linear and real-valued least square problem [12], which usually consumes a significant amount of computation time in the whole 3D reconstruction process. In the past decade, many approaches have been developed to make the BA as efficient as possible [13–16]. Moreover, the non-convexity of BA causes the existence of multiple minima so that it is difficult to find the global minimum without good initialization.

Constraints are logical conditions that bound the estimation with allowable error, reflecting the real-world restriction of tolerancing. This bound constraint could be used as direct prior knowledge in the 3D reconstruction algorithm. In the most practical cases, the general constraints of the camera and object parameters are retrievable. Such as, the camera should be in the hollow body of the reconstructed model for the case of defect detection for the pipeline internal surface (location constraints); the shape of football is oval-like but planar (shape constraints); the size of a cell should be in *~ μ*m, instead of ~cm (size constraints) and other constraints that are obvious in real world applications. Even those 3D reconstruction without any case-based information, there are also some constraints that can be utilized which have been ignored: such as the focal length of a camera is positive; all the object points should be in front of camera instead of behind; the field of view (FOV) angle is positive but less than 180 degrees. Moreover for the cases that use a well-calibrated camera, the camera parameters are treated as known values without variation. Actually these calibrated parameters are not exact but lie within a range of uncertainty [17]. Taking these parameters as fixed-value may cause inaccurate 3D reconstruction. Considering that the uncertainty can be constrained by lower and upper values, called bound constraint, these constraints can be utilized as prior knowledge to improve the 3D reconstruction algorithm and generate more reliable and accurate models.

Previous work on constraining BA to achieve a more robust and realistic solution has been limited, and did not include the more general approach of having a bound constraint condition. Wong and Chang constrained the distance between camera and object for all the frames as the same value, which was fixed to a single parameter to reduce the number of unknown variables in the BA algorithm [18]. With the similar concept, Albl and Pajdla constrained the camera positions on a circular trajectory for the cases with a revolving camera [19]. These two constrained BA approaches can both achieve more accurate 3D reconstruction with higher efficiency. However they require that either the object or the camera rotates around a fixed axis while the other one is stationary, which limits the generalizability of application for these two constrained BA algorithms [18, 19]. For the 3D reconstruction of planar structures such as floors and walls, planarity or multi-planarity constraints were applied to produce an accurate model [20–22], but again these applications are limited to the particular constraint of planarity. Cohen *et al.* improved the accuracy and efficiency of 3D reconstruction of symmetric and repetitive urban scenes by applying symmetric structure constraints [23], but its applications were limited to symmetric geometries. For the case of 3D reconstruction of internal surface of organ from medical images containing insufficient features, Soper *et al*. constrained the geometric model of bladder as a spherical shape in the initial step and then relaxed this shape constraints for subsequent model refining [24]. This proper initialization reduces the influence of non-convexity to find the correct minimum and generates an accurate 3D reconstruction. It is essentially an approach to achieve the global minimum by choosing a good initial guess that is closer to the global minimum. However it does not guarantee that this initial guess is good enough to eliminate the deleterious effects of a local minima nearby.

All the constrained BA algorithms mentioned above are not generally applicable to beyond the strict requirements imposed by the specific applications. They achieved higher accuracy and efficiency by essentially reducing the number of variables for each of their specific cases.

Conventional BA algorithm either takes a parameter as known (fixed-value), or as an unknown variable (unconstrained). The former case may result in an incorrect 3D reconstruction with known but inaccurate parameters [25]. The latter case may cause erroneous estimation of the camera and the object parameters since the optimization gets trapped in a dramatically wrong local minimum. In this study, a new approach is proposed to solve this 3D reconstruction problem with bound constrained parameters. We call it Bound Constrained Bundle Adjustment (BCBA) algorithm. The biggest significance of this new algorithm is to achieve a reliable and accurate 3D reconstruction efficiently by taking advantage of known but inaccurate parameters, such as the intrinsic or extrinsic parameters of camera, or the general shape of object. Figure 1 shows the schematic diagram of an applicable case of BCBA. A sequence of images were captured by inaccurate cameras around a static object. The term “inaccurate” represents those cameras that the intrinsic parameters, positions or/and orientations are not exactly known but are within a known range.

The motivation of solving this problem is to reconstruct an accurate 3D model of surgical field for an autonomous image-guided surgery robotic system [25]. In our case, an endoscope (camera) was moved around by an ill-calibrated robot arm to capture images of the surgical field. The camera position and orientation could be obtained approximately from the robotic system, which was not accurate, but in a measurement range. Using the inaccurate information directly with conventional BA would result in an erroneous 3D model, which can be solved by utilizing the suitable bounds with BCBA.

In this study, each step of the conventional BA algorithm is presented, and the modifications made for the use of bound constrained parameters are listed and discussed. Following the proposed constrained algorithm, an experiment is conducted by applying the conventional BA algorithm that is compared to the proposed BCBA algorithm for reconstructing the surgery phantom.

## 2. Methodology

Our BCBA algorithm proposed in this letter is an extension of the traditional BA, which is essentially a non-linear real-valued least square problem. *Levenberg-Marquardt* (LM) algorithm is widely used as standard tool to solve BA due to its efficient damping strategy and ease of implementation [26]. Instead of directly tracking nonlinear problems, LM algorithm iteratively solves a sequence of *linear* least square problems. From the optimization perspective, it can be viewed as a combination of *Gauss-Newton method* and *steepest descent method*: the closer the current iteration point approaches the local minimum, the more like the Gauss-Newton method the LM algorithm behaves. This adjusting strategy allows LM to achieve a better tradeoff between stability of the steepest descent method and faster convergence of the Gauss-Newton method. Our BCBA algorithm inherits this advantage from the LM algorithm.

Given an object with a set of *n* 3D feature points *Q*_{3×}* _{n}* ={

*Q*

_{1}

*,Q*

_{2}

*,…,Q*}in world coordinates, a sequence of

_{n}*m*images {

*I*

_{1}

*,I*

_{2}

*,…,I*} were captured by a camera at different locations and orientations. Let

_{m}*R*represent the rotation matrix and

_{j}*t*denote the translation vector of the camera at

_{j}*I*. Then the 3D position

_{j}*P*= [

_{ij}*X*]

_{ij},Y_{ij}, Z_{ij}*of a feature point*

^{T}*Q*with respect to

_{i}*I*camera coordinate is given as

_{j}*P*=

_{ij}*R*+

_{j}Q_{i}*t*. The calculated pixel location of its projection on

_{j}*I*is

_{j}*p*= [

_{ij}*x*]

_{ij},y_{ij}*, see Fig. 1, which can be calculated by*

^{T}*ω*=

*Z*, and

_{ij}*K*is the camera calibration matrix containing the parameters of focal lengths (

*f*), center pixel (

_{x}, f_{y}*c*). Coefficients (

_{x},c_{y}*k*

_{1}

*,k*

_{2}) are the first- and second- order radial distortion of the real lens. Since we were using a single camera in this study, the calibration for all the images

*K’s*and the distortion coefficients (

*k*

_{1}

*,k*

_{2}) are the same.

Let the position of the observed feature be
${\widehat{p}}_{ij}={[{\widehat{x}}_{ij},{\widehat{y}}_{ij}]}^{T}$, then the reprojection error
$r=p-\widehat{p}$ was calculated and rearranged as a column vector of size *M*. The value of *M* depends on the number of views that the feature is visible in. The projection error *r* is essentially a function of *N* = 3*n*+6*m*+6 variables: the 3D coordinate of each feature point (3*n*), the camera position and orientation (6*m*), and the camera intrinsic parameters (6). Denote all *N* variables as *s*. As mentioned before, information of the lower bound and upper bound on *s* is often available. We would like to exploit this additional bound information to better minimize the total reprojection error.

Given the bound constraints of parameters, the BCBA optimization problem can be written as

In most cases of feature-based 3D reconstruction, *N* is potentially equal to several thousands or larger. *l* and *u* are the same size of *s*, and represent the lower and upper bound vectors for the constraint of *s*, respectively. For those variables with no bound defined, we set −∞ and +∞ as its lower and upper bound, respectively.

#### 2.1. Bundle adjustment

The optimization problem Eq. (2) without the constraints is the conventional BA problem. Its objective function *f* (*s*) is non-linear, which can be approximated by a local linear model [11], such as a quadratic Taylor series expansion of *f* (*s*), For a small step *δ*,

*g*(

*s*) and

*H*(

*s*) are the gradient vector and Hessian matrix of

*f*(

*s*), respectively. This approximation model of Eq. (3) is a simple quadratic with a unique global minimum that can be calculated explicitly.

The Gauss-Newton method is built to solve the step *δ* in the following linear system repeatedly

*N*×

*N*matrix $\overline{H}(s)$ is the Gauss-Newton approximation of the Hessian matrix

*H*(

*s*) of

*f*(

*s*).

*s*is updated at each iteration as

*s ← s*+

*δ*, if the cost

*f*(

*s*+

*δ*) <

*f*(

*s*). Let

*J*(

*s*) denote the

*M*×

*N*Jacobian matrix of

*r*(

*s*), given by

*J*=

_{ij}*∂r*. Then the gradient

_{i}/∂s_{j}*g*(

*s*) can be computed by

*g*(

*s*) =

*J*(

*s*)

*(*

^{T}r*s*). The $\overline{H}(s)$ can be calculated by $\overline{H}(s)=J{(s)}^{T}J(s)$ by ignoring the second derivative terms. It is easy to see $\overline{H}(s)$ is a symmetric matrix. For now, assume that $\overline{H}(s)$ is positive definite.

To solve step *δ* in Gauss-Newton method of Eq. (4), the inverse matrix of
$\overline{H}(s)$ needs to be calculated. Since *N* could be very large, computing the inverse matrix of
$\overline{H}(s)$ is often prohibitively expensive. Therefore, it is desired to reduce the size of the linear system first. Among all potential methods, *Schur complement* method is typically used in the BA algorithm to split the original linear equation into two smaller linear systems by Gaussian elimination.

Denote the parameter column vector *s* = [*s _{c},s_{p}*] where

*s*and

_{c}*s*are the camera and point parameters, respectively. Similarly, the subscripts “

_{p}*c*” and “

*p*” are used to denote camera and point for other notations:

*J*(

*s*),

*g*(

*s*) and

*δ*. So $\overline{H}(s)$ can be expressed as a block matrix as follows:

Since
$\overline{H}(s)$ is positive definite,
${\overline{H}}_{cc}$ and
${\overline{H}}_{pp}$ are then positive definite, since they are *principal matrices* of
$\overline{H}(s)$ [27]. Equation (4) also can be rewritten as:

By separating the camera and point parameters using *Schur complement*, the solution *δ* = [*δ _{c},δ_{p}*] of Eq. (4) can be calculated in the form of reduced camera system:

Note that the reason why *δ _{c}* is chosen to be computed first is that the number of the camera parameters is usually smaller than the number of points. Once

*δ*= [

*δ*] is solved, the parameter vector will be updated if it leads to a smaller cost

_{c},δ_{p}*f*(

*s*).

The LM algorithm can be considered as the interpolation between Gauss-Newton method and steepest descent method. The steepest descent method is given by *D*′*δ* = −*g*(*s*), *D*′ is a diagonal matrix of positive constant value, size of *N* × *N*. Thus, LM can be written as

*D* is the diagonal matrix of
$\overline{H}$. The damping factor *λ* is a non-negative value, adjusted at each iteration. Bigger *λ* brings the algorithm closer to steepest descent method, while the algorithm behaves more like a Gauss-Newton method as *λ* becomes smaller.

Similarly, to calculate the step at each iteration of LM algorithm, *Schur complement* is applied on Eq. (8) by splitting *B* as four sub-matrices corresponding to camera and point parameters. *δ* can be calculated by Eq. (7). Without considering the bound constraints in the optimization problem of Eq. (2), *s ← s* + *δ* will be accepted if the updated *s* vector leads to a smaller *f*(*s*) value.

The non-negative damping factor *λ* is adjusted in each iteration. The initial value of *λ* in the first iteration is pre-defined as *λ*_{0}. If the current value of updated *s* leads to a cost reduction, a smaller *λ* can be used to bring the algorithm closer to Gauss-Newton method exhibiting faster convergence. Whereas if an iteration could not provide reduction in the cost, larger *λ* is chosen, giving a step closer to the gradient descent direction that is slow but guaranteed to converge [26]. Once the *λ* exceeds the pre-defined threshold *λ _{max}* that is usually very large, or the updated cost reduction is less than a threshold ∆

*f*, that means the local minimum is found. Conventional bundle adjustment is achieved by repeating the update of

_{min}*s*and

*λ*of Eq. (8) until reaching the local minimum.

#### 2.2. Bound constrained bundle adjustment (BCBA)

BA can refine the camera and 3D structure parameters simultaneously in an efficient and stable way by utilizing LM algorithm. However, it considers parameters either as known data (fixed-value), or unknown variable (unconstrained). The conventional BA could not take advantage of the known but inaccurate information, which leads to bound constraints that are commonly available in practical applications. We now address this problem by modifying the BA algorithm mentioned above. Before the discussion, some preliminary tools are needed.

Firstly, we define the projection of the parameter vector *s* onto the feasible set with [*l,u*] bound as a function:

*max*{

*a,b*} is a vector whose

*i*th entry as

*max*{

*a*}; and

_{i},b_{i}*min*{

*a,b*} is a vector whose

*i*-th entry as

*min*{

*a*}.

_{i},b_{i}Secondly, an active set *A*(*s*) for *s ∈* [*l,u*] is defined. *A*(*s*) is a set that contains parameters at which either upper or lower bound is tight and the update direction drives the parameter outward away from the bounds. Consider the following cases:

- Case 1:
*s*∈ (_{i}*l*)_{i},u_{i}The computation of

*g*is not constrained by the bounds, the same for_{i}*J*(*s*). The calculation of_{i}*δ*can be treated as an unconstrained problem. - Case 2: s
_{i}< l_{i}or s_{i}> u_{i}The

*s*value will be projected onto the bounds of the feasible set by Eq. (9)._{i} - Case 3:
*s*=_{i}*l*_{i}If

*g*> 0, that means the updated_{i}*s*is supposed to be smaller than_{i}*l*, which will be constrained by the bounds; if_{i}*g*0, that means the updated_{i}≤*s*will be larger than_{i}*l*, falling in the feasible set. Thus it is not affected by the constraints._{i} - Case 4:
*s*=_{i}*u*_{i}We have the opposite observation as case 3, the constraints are active if

*g*< 0, but inactive if_{i}*g*0._{i}≥

Finally the active set for the bound constrained problem *s ∈* [*l,u*] can be given below. The complementary set of *A*(*s*) is called inactive set and denoted by *I*(*s*).

*Gradient projection method* [28] is then applied in this algorithm, which is a common method for solving constrained optimization problems. The projected gradient of *f* (*s*) in the feasible set is a vector of size *N*, denoted by ĝ(*s*):

The projected gradient is the same as the unconstrained gradient for those parameters whose constraints are inactive. For the parameters that constraints are active, the projected gradient will be forced to be zero to keep the parameters at one of the bounds.

One more component that we need to modify in Eq. (8) for the BCBA algorithm is the approximation of Hessian matrix
$\overline{H}(s)$, which we define as the *reduced Hessian* matrix. The reduced Hessian is designed as follows: since the parameters in the active set *A*(*s*) are fixed then *∂f/∂s _{i}* = 0; and the projected gradient of active parameters are set to zero, we zero out all rows and columns of the Hessian matrix
$\overline{H}$ that are corresponding to the active parameters, that is, to set
${\overline{H}}_{i}{}_{j}=0$ if either

*i*or

*j*is in

*A*(

*s*). However, we will add the diagonal entries ${\overline{H}}_{ii}$

*, i ∈ A*(

*s*) back so that all diagonal entries of $\overline{H}$ are retained in the reduced Hessian. It can be easily verified that this does not change the gradient update but will later make it easier to control the positive definiteness of the reduced Hessian matrix. Finally, we define the reduced Hessian matrix

*Ĥ*to be

Once we have the reduced Hessian matrix and projected gradient vector, Eq. (8) in LM algorithm is modified as below for the bound constraints.

$\widehat{D}$ is the diagonal matrix of *Ĥ*(*s*), which is also equal to the diagonal matrix *D*. Similar to the process of Eq. (4), *Schur complement* is also used to calculate the step *δ* in Eq. (13). The update of *s ← pro j*(*s* + *δ*) is accepted once it leads to a reduction in the cost function *f*(*s*).

Similar to the BA algorithm, the non-negative damping factor *λ* in BCBA is also adjusted in each iteration. Smaller *λ* value is chosen if current iteration leads to less cost residual; a larger *λ* value can be used to guarantee the convergence to local minimum under bound constraints.

#### 2.3. Convergence of BCBA

Because we have assumed that
$\overline{H}(s)=J{(s)}^{T}J(s)$ is positive definite, it is easy to see that *Ĥ*(*s*) is also a positive-definite matrix based on the modification in Eq. (12) [27].
$\widehat{D}$ is the diagonal matrix of *Ĥ*(*s*), it is also positive definite. Thus
$\widehat{B}$ is positive-definite in Eq. (13), regardless of how the damping factor *λ* changes. Notice the right side of Eq. (13) is actually the steepest descent direction. Hence, our proposed algorithm can generate a sequence of parameter vectors (*s*) in each iteration that converges to a local minimum.

## 3. Experiment and result

To demonstrate the feasibility of this constrained algorithm, an experiment was performed to compare the 3D reconstruction results of conventional BA and BCBA. In this study, a Scanning Fiber Endoscope (SFE) [29] was used to capture a sequence of images above a surgery phantom (the object) of known geometry. The SFE is a tiny flexible endoscope of only 1.6 mm outer diameter, see Fig. 2(a), which is feasible to Minimal Invasive Surgery (MIS). It can generate 30 Hz high resolution color video with wide FOV from scanning a single beam of red (635 nm), green (532 nm), and blue (444 nm) laser light. To validate and compare two 3D reconstruction algorithms, an object with pre-known geometry was required. In this study, a 3D model was designed and 3D-printed out to mimic the surgical field [30], see Figs. 2(b) and 2(c). This reconstruction of the surgical field aims to guide the autonomous surgical robot by obtaining the accurate 3D coordinates of residual tumor within the surgical field [31].

In this experiment, the SFE was attached on a micro-positioning stage, where the position of the camera was obtained. The orientation of the camera was set perpendicularly downwards and kept the same in the entire experiment. Figure 2 shows the SFE, CAD design, printed phantom and also the experiment setup. The distance between the SFE and the top surface of phantom is about 15 mm. Note, the BCBA algorithm is independent of the choice of the camera and the object. It can be applied as long as the bound constraints exist.

In the experiment, 16 images (size of 608 × 608 pixels) were captured above the phantom using the SFE. The position of SFE was read from micro-positioning stage with high accuracy, which can be considered as ground truth. To simulate known but inaccurate information, noise of uniform distribution on the interval [−0.25, 0.25] mm was added to X, Y, Z coordinate of the camera positions, which can be treated as the pre-known bound constraints when using the BCBA algorithm. Other constraints were also pre-known: a rough camera calibration concluded that the FOV was in the range of [50, 56] degrees; the surface geometry of the object was 10 mm in depth; and the distance of the object to camera was 15 mm. Thus, the depth of object points should be in the range of [15, 25] mm. However, for this experiment, we only used the constraints of camera position for BCBA. Other pre-known parameters were used to evaluate accuracy and reliability of the 3D reconstruction result.

Our method was implemented in MATLAB, running on a workstation Dell Precision M4700 with 2.7 GHz Intel i7-3740QM CPUs, 20.0 GB memory in a 64-bit Window operating system.

In the same manner as conventional BA, features were detected in each image and matched with its correspondences in other images. In this study, scale-invariant feature transform (SIFT) algorithm [32] was applied to find feature points from each SFE image. Figure 3 shows two endoscopic frames with image overlap, and the matching features were lined up with color lines. This experiment generated 2,250 feature points and 5,689 point-to-image reprojections in total from 16 frames.

After adding the same set of random uniform noise (∈ [−0.25,0.25] mm) to the accurate camera position, both BA and BCBA optimizations were performed with the same initialization. The process of BA showed the reconstructed 3D model shrinking from an initial spherical-like shape to a flat surface in the final optimization iteration, see Figs. 4(a)–4(d). While the process of BCBA showed the reconstructed points bounded in a range and quickly stabilized to a spherical shape, Figs. 4(e)–4(h). We can notice that the reconstructed point clouds at the first iterations of BA and BCBA are the same, Figs. 4(a) and 4(e), since the initial guess of parameter vector was identical for both algorithms. Figures 4(d) and 4(h) are the final reconstructed 3D points of BA and BCBA, respectively.

Figure 5(a) shows the BA resulted in a flat surface that is very different to our real object of a spherical concave surface. In addition, the reconstructed 3D points are located around a depth of 10 mm, which is dramatically deviated from the experiment setup of 15 mm distance. Whereas the BCBA algorithm achieves a much better 3D reconstruction with most points distributing as a spherical concave dome shape, see Fig. 5(d). Furthermore, 97.56% of the reconstructed object points are located in the range of [15, 25] mm, which is the range we get from the experiment setup. Once the 3D point cloud was obtained, a thin spline algorithm was applied to generate a smooth 3D surface by fitting these reconstructed 3d points [24]. Figures 5(b) and 5(e) show the reconstructed surface of BA and BCBA, respectively. Also their corresponding depth maps are shown in Figs. 5(c) and 5(f).

To quantify the reconstruction result, Iterative Closest Points (ICP) algorithm [33] was employed to align and compare the reconstructed virtual models to the CAD designed model (ground truth, see Fig. 2(b)). Figure 6 gives the comparison of ICP error analysis of BA and BCBA. The alignment of the two point clouds are shown in Figs. 6(a) and 6(d), and the blue point cloud represents the CAD designed model for each reconstructed surface. Qualitatively, the BA algorithm produces a radically different 3D surface, which does not match the CAD model; while the BCBA result matches the CAD model very well. The distance maps between the aligned models are shown in Figs. 6(b) and 6(e), and also the histograms of the distance in Figs. 6(c) and 6(f), which show the distribution of ICP error of each reconstructed point to the CAD model. The quantitative error of BA’s result was very large, which was diminished greatly by using the BCBA algorithm.

To further understand the significance of our proposed BCBA algorithm, a table was generated with other comparison details of BA and BCBA. Table 1 shows the number of iteration, time spent, the minimum error achieved, the estimated parameters of camera (FOV), the estimated depth of 3D points *Q _{z}* and also the ICP error. With the bound constraints of camera position, the BCBA optimization can reach to the local minimum faster than BA. Also the accurate 3D reconstruction result of BCBA has a much smaller ICP error. Moreover, only the pre-known constraints of camera position were used in this study, the estimated FOV and

*Q*can be compared with the pre-known values to demonstrate the reliability of our constrained algorithm. For example, the estimated FOV using BA was 177.05°, which is far away from the calibration value ∈ [50°, 56°]. In stark contrast, the BCBA algorithm estimates a FOV of 54.69° which is within our experimentally determined range (Table 1).

_{z}Table 1 shows that BA can result in a smaller cost value of the objective function Eq. (2). The reason is that the corresponding local minimum is located outside the feasible region. BA can achieve the local minimum while BCBA was constrained by the bound constraints, which results in a larger cost. Note that all the values in Table 1 might change slightly based on a different set of random noise added to the accurate camera position data used to simulate motion in a constrained range of uncertainty.

In order to validate the reliability of the BCBA algorithm, the same comparison shown above was performed for an additional ten times with different sets of uniform noise (∈ [−0.25, 0.25] mm). Since the accuracy and efficiency are the most important criteria for the potential applications of this proposed algorithm, we listed the comparison of ICP error between BA and BCBA in Table 2, as well as the computation time. The obtained ICP error values of BA were in the range of [1.8776, 1.9072] mm with an average of 1.8903 mm and a standard deviation of 0.011 mm; while for BCBA results, ICP error values were within [0.4521, 0.5343] mm with an average of 0.4955 mm and a standard deviation of 0.0306 mm. These results show that BCBA can generate more reliable and accurate 3D models than BA with very small deviation across the ten data sets simulating the real-world measurements. While the computation time of either BA or BCBA varied for these ten cases. BA spent [67.34, 49.27] sec, with average of 60.61 sec and deviation of 6.88 sec; whereas BCBA spent much less time for the computation, which was in the range of [3.56, 30.61] sec, with average of 14.40 sec and deviation of 9.56 sec.

From the comparison results of these ten tests in Table 2, we see the ICP error was very stable with a small deviation. Overall, the accuracy of BCBA was around 3.8× higher than the conventional BA algorithm. The time spend for both BA and BCBA was less stable, being highly dependent on the noise added. On average, the BCBA’efficiency was roughly 4× higher than BA’s for these ten tests.

## 4. Discussion

This paper proposed a bound constrained BA algorithm to take advantage of the known but inaccurate information of the parameters of camera positions. The theoretical development was provided on the convergence property of the proposed BCBA algorithm. Moreover, the experimental result demonstrated its feasibility and reliability for a practical problem. For this specific test (Table 1), the accuracy and efficiency of BCBA was 3.8× (ICP error) and about 4× (computation time) improvement over the conventional BA algorithm, respectively. This test simulated the application of a robotically-driven camera with motion that is defined within a range of accuracy (tolerance). In this case, only the constraints of camera positions were used for the 3D reconstruction, while pre-known information of “camera FOV” from calibration and “object to camera distance *Q _{z}*” from the experimental setup were utilized to validate the estimation of the unknown parameters. To better evaluate the value and impact of BCBA, we used the constraints of FOV and

*Q*for the reconstruction and relaxed the constraints on the camera positions and replicated the experiment analysis, the result of which was shown in Table 1. By bounding the FOV ∈ [50°,56°] and

_{z}*Q*∈ [15,25] mm, the comparison between the conventional BA and BCBA demonstrates the higher accuracy (3.8× less ICP error) and improved efficiency, as shown in Table 3. Importantly, the estimation of camera positions compared to ground truth (micro-positioning stage values) shows that BCBA performed significantly better than BA. This additional test demonstrates the feasibility and versatility of BCBA for more applications.

_{z}Comparing with conventional BA algorithm that considers a parameter either as known fixed-value or an unconstrained variable, BCBA achieves less optimization iterations, less computation time, smaller ICP error and more reliable parameter estimation. Furthermore, BCBA also achieved a good estimation of other parameters that have no constraints, such as the estimation of camera FOV in Table 1, and estimation of camera position in Table 3.

As the extension work of our previous research [25, 30], these advantages of BCBA can be utilized for intraoperatively reconstructing accurate 3D virtual models of the surgical field to extract the tumor coordinates, which requires highly efficient processing. These advantages also can be utilized to provide reliable estimation of camera poses for 3D image-guided surgery [31] and MIS. Not limited to biomedical research, the BCBA algorithm may be utilized in other fields, such as a common application of the 3D reconstruction of Google street view [34]. The position of street-view-car is provided by GPS; it is inaccurate but the error range is pre-known. Such bound constraints can be utilized to improve the accuracy and efficiency of the 3D reconstruction of the street scene, similar to the experiment performed in this paper with bound constrained camera positions. Another application of 3D reconstruction with bound constrained parameters could be the 3D metrology of internal threads for the quality control of automobile engine parts, which requires high reconstruction accuracy. In a preliminary study, high-accuracy robot motion is mandatory to reconstruct a dense 3D point cloud [7]. In practical applications, BCBA could be utilized to reduce the deleterious effect of inaccuracy of industrial robots.

Previous approaches to constraining the BA algorithm cannot be generalized to the examples of cameras involved in robotic MIS and 3D street-view reconstruction. Several constrained BA algorithms could improve the reconstruction accuracy and efficiency [18–23], but their applications were limited in specific cases, such as the reconstruction of planar, symmetric or self-rotating objects. Soper *et al*.’s method achieved reliable 3D reconstruction, but limited to the spherical-like internal surface for a specific medical application [24]. In contrast to these constrained algorithms, our proposed BCBA is applicable to these specific cases as well as more general cases as long as the bound constraints of parameters are known.

However, notice that the constrained algorithms [18–24] may work more efficiently and/or accurately than BCBA for their particular cases since they used constraints among different parameters, such as the X, Y, Z parameters of the object which are subject to specific mathematics equations of plane, circle or sphere. However, these authors assumed the particular objects are mathematically perfect without any manufacturing tolerance, which cannot be true for real world applications. To improve the 3D reconstruction for such cases, the future work is to improve BCBA with geometric constraints. For example, for the 3D reconstruction of bladder phantoms [24], we can replace the sphericity constraint to BA with a spherical-like shape constraint that contains lower/upper bounds once the minimum and maximum radius of the bladder is known based on human anatomical data.

Similar to the conventional bundle adjustment, BCBA is also subject to halting at a local minimum. Even within the bounds, different initial guesses of parameter vector *s* produce different optimization results, as shown in Table 2. Nonetheless in BCBA, the pre-known bounds can prevent the reconstructed model running out to an unreasonable shape. Future work will be finding the global minimum in the feasible set, from which the computation could be dramatically reduced, compared with the unconstrained cases.

## Acknowledgments

Funding was provided by NIH NIBIB R01 EB016457 “NRI-Small: Advanced biophotonics for image-guided robotic surgery”, PI: Dr. Eric Seibel and Dr. Blake Hannaford (co-I). The authors appreciate Professor Maryam Fazel at University of Washington for her comments, reviewers for their helpful critiques, and Richard Johnston and David Melville at Human Photonics Lab, University of Washington for their technical support.

## References and links

**1. **G. Bianco, A. Gallo, F. Bruno, and M. Muzzupappa, “A comparison between active and passive techniques for underwater 3d applications,” in International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences3816 (2011), pp. 357–363.

**2. **Y. Gong and S. Zhang, “Improving 4-D shape measurement by using projector defocusing,” Proc. SPIE **7790**, 77901A (2010). [CrossRef]

**3. **J. Geng, “Structured-light 3D surface imaging: a tutorial,” Adv. Opt. Photon. **3**(2), 128–160 (2011). [CrossRef]

**4. **Y. Gong and S. Zhang, “Ultrafast 3-D shape measurement with an off-the-shelf DLP projector,” Opt. Express **18**(19), 19743–19754 (2010). [CrossRef] [PubMed]

**5. **Z. Zhang, “Microsoft kinect sensor and its effect,” MultiMedia IEEE **19**(2), 4–10 (2012). [CrossRef]

**6. **S. M. Seitz, B. Curless, J. Diebel, D. Scharstein, and R. Szeliski, “A comparison and evaluation of multi-view stereo reconstruction algorithms,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2006), pp. 519–528.

**7. **Y. Gong, R. S. Johnston, C. D. Melville, and E. J. Seibel, “Axial-stereo 3D optical metrology of internally machined parts using high-quality imaging from a scanning laser endoscope,” in International Symposium on Optomechatronic Technologies (ISOT), Seattle, USA, November 5–7, (2014).

**8. **S. Agarwal, N. Snavely, I. Simon, S. M. Seitz, and R. Szeliski, “Building rome in a day,” in IEEE 12th International Conference on Computer Vision (IEEE, 2011), pp. 105–112.

**9. **C. Wu, “Towards linear-time incremental structure from motion,” in International Conference on 3D Vision (IEEE, 2013), pp. 127–134.

**10. **D. Crandall, A. Owens, N. Snavely, and D. Huttenlocher, “Discrete-continuous optimization for large-scale structure from motion,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2011), pp. 3001–3008.

**11. **B. Triggs, P. F. McLauchlan, R. I. Hartley, and A. W. Fitzgibbon, “Bundle adjustment–a modern synthesis,” In *Vision Algorithms: Theory and Practice*, B. Triggs, A. Zisserman, and R. Szeliski, eds. (Springer Berlin Heidelberg, 2000), pp. 298–372. [CrossRef]

**12. **K. Mitra and R. Chellappa, “A Scalable Projective Bundle Adjustment Algorithm using the L infinity Norm,” in Sixth Indian Conference on Computer Vision, Graphics and Image Processing (IEEE, 2008), pp. 79–86.

**13. **C. Engels, H. Stewénius, and D. Nistér, “Bundle adjustment rules,” in Photogrammetric computer vision, (2006).

**14. **M. I. A. Lourakis and A. A. Argyros, “SBA: A software package for generic sparse bundle adjustment,” ACM Trans. Math. Software **36**(1), 1–30 (2009) [CrossRef]

**15. **C. Wu, S. Agarwal, B. Curless, and S. M. Seitz, “Multicore bundle adjustment,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2011), pp. 3057–3064.

**16. **Y. Jeong, D. Nister, D. Steedly, R. Szeliski, and I. S. Kweon, “Pushing the envelope of modern methods for bundle adjustment,” IEEE Trans. Pattern Anal. Mach. Intell. **34**(8), 1605–1617 (2012). [CrossRef] [PubMed]

**17. **J. Salvi, X. Armangu, and J. Batlle, “A comparative review of camera calibrating methods with accuracy evaluation,” Pattern Recognit. **35**(7), 1617–1635 (2002). [CrossRef]

**18. **K. H. Wong and M. M. Y. Chang, “3D model reconstruction by constrained bundle adjustment,” in Proceedings of the 17th International Conference on Pattern Recognition (IEEE, 2004), pp. 902–905.

**19. **C. Albl and T. Pajdla, “Constrained Bundle Adjustment for Panoramic Cameras,” in 18th Computer Vision Winter Workshop, Hernstein, Austria, 4–6 February 2013.

**20. **Y. Zhang, K. Hu, and R. Huang, “Bundle adjustment with additional constraints applied to imagery of the Dunhuang wall paintings,” ISPRS J. Photogramm. Remote Sens. **72**, 113–120 (2012). [CrossRef]

**21. **R. Szeliski and P. H. Torr, “Geometrically constrained structure from motion: Points on planes,” In *3D Structure from Multiple Images of Large-Scale Environments*, R. Koch and L. V. Gool, eds. (Springer Berlin Heidelberg, 1998), pp. 171–186. [CrossRef]

**22. **A. Bartoli and P. Sturm, “Constrained structure and motion from multiple uncalibrated views of a piecewise planar scene,” Int. J. Comput. Vision **52**(1), 45–64 (2003). [CrossRef]

**23. **A. Cohen, C. Zach, S. N. Sinha, and M. Pollefeys, “Discovering and exploiting 3D symmetries in structure from motion,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2012), pp. 1514–1521.

**24. **T. D. Soper, M. P. Porter, and E. J. Seibel, “Surface mosaics of the bladder reconstructed from endoscopic video for automated surveillance,” IEEE Trans. Biomed. Eng. **59**(6), 1670–1680 (2012). [CrossRef] [PubMed]

**25. **Y. Gong, T. D. Soper, V. W. Hou, D. Hu, B. Hannaford, and E. J. Seibel, “Mapping surgical fields by moving a laser-scanning multimodal scope attached to a robot arm,” Proc. SPIE **9036**, 90362S (2014). [CrossRef]

**26. **M. I. A. Lourakis and A. A. Argyros, “Is Levenberg-Marquardt the most efficient optimization algorithm for implementing bundle adjustment?,” in Proceedings of IEEE Conference on Computer Vision (IEEE, 2005), pp. 1526–1531.

**27. **A. Berman and N. Shaked-Monderer, *Completely positive matrices* (World Scientific, 2003), Chap. 1.

**28. **P. H. Calamai and J. J. Mor, “Projected gradient methods for linearly constrained problems,” Math. Program. **39**(1), 93–116 (1987). [CrossRef]

**29. **C. M. Lee, C. J. Engelbrech, T. D. Soper, F. Helmchen, and E. J. Seibel, “Scanning fiber endoscopy with highly flexible, 1 mm catheterscopes for wide-field, full-color imaging,” J. Biophotonics **3**(5-6), 385–407 (2010). [CrossRef]

**30. **Y. Gong, D. Hu, E. J. Seibel, and B. Hannaford, “Accurate 3D virtual reconstruction of surgical field using calibrated trajectories of an image-guided medical robot,” J. Med. Imag. **1**(3), 035002 (2014). [CrossRef]

**31. **Y. Gong, D. Hu, B. Hannaford, and E.J. Seibel, “Toward real-time endoscopically-guided robotic navigation based on a 3D virtual surgical field model,” Proc. SPIE **9415**, 94150C (2015).

**32. **D. G. Lowe, “Object recognition from local scale-invariant features,” in Proceedings of IEEE Conference on Computer Vision (IEEE, 1999), pp. 1150–1157.

**33. **P. J. Besl and N.D. McKay, “A method for registration of 3-D shapes,” IEEE Trans. Pattern Anal. Machine Intell. **14**(2), 239–256 (1992). [CrossRef]

**34. **B. Klingner, D. Martin, and J. Roseborough, “Street view motion-from-structure-from-motion,” in Proceedings of IEEE Conference on Computer Vision (IEEE, 2013), pp. 953–960.