## Abstract

This paper presents a new linear framework to obtain 3D scene reconstruction and camera calibration simultaneously from uncalibrated images using scene geometry. Our strategy uses the constraints of parallelism, coplanarity, colinearity, and orthogonality. These constraints can be obtained in general man-made scenes frequently. This approach can give more stable results with fewer images and allow us to gain the results with only linear operations. In this paper, it is shown that all the geometric constraints used in the previous works performed independently up to now can be implemented easily in the proposed linear method. The study on the situations that cannot be dealt with by the previous approaches is also presented and it is shown that the proposed method being able to handle the cases is more flexible in use. The proposed method uses a stratified approach, in which affine reconstruction is performed first and then metric reconstruction. In this procedure, the additional constraints newly extracted in this paper have an important role for affine reconstruction in practical situations.

© 2013 OSA

## 1. Introduction

To develop a method obtaining camera calibration and 3D scene reconstruction simultaneously from images are one of the important topics in the field of computer vision. There have been many approaches concerning these problems.

First, classical approaches utilize known 3D positions of points usually printed on a calibration rig manufactured with accuracy [1–3]. However, such position information can be obtained through specific acquisition systems for large 3D scenes and is rarely available in general situations. Second, the camera calibration and scene reconstruction can be acquired simultaneously from solely the image sequences using only the constraints on the camera intrinsic parameters. This approach is known as self-calibration and provides great flexibility [4, 5]. However, to acquire high quality, a lengthy image sequence and a set of many accurate feature correspondences are necessary. Third, the information from restricted camera motion can be used to calibrate cameras [6–8]. Finally, there have been many methods that use the properties of scene geometry or special objects [9–19]. As these methods do not require a calibration rig but utilize the geometry of a scene or the shapes of objects as a calibration rig, accurate calibration results can be acquired without a lengthy image sequence. The proposed algorithm is categorized into these methods.

In this paper, we propose a linear stratified approach that can use full geometric constraints. The method can obtain affine structure from parallelograms in general position and all available geometric information including constraints from partial knowledge of camera parameters can be utilized to upgrade the result to a metric one. The parallelograms need not to be the part of a parallelepiped. To cope with the general position of parallelograms in practical situations, we extract additional constraints which are from parallelograms and have an important roll for the affine reconstruction step. It can be easily validated that all of the constraints used in the previous related works can be implemented in the proposed method. Detailed comparisons between the proposed method and the previous ones are also presented in this paper.

## 2. Related works

Many methods using geometric constraints of a scene have been introduced. The work most closely related with that of the present paper was presented by Wilczkowiak et al. who suggested elegant linear formulation for camera calibration and 3D modeling using parallelepipeds in a scene [13]. The non-linear optimization method using the *differential evolution* also can be applied for cuboid that is a special case of parallelepiped [16]. For these methods, at least six vertices of a parallelepiped must be visible in all views to obtain affine or metric reconstruction. However, the primitive that can give full affine information is not only a parallelepiped. Parallelograms are most general primitives in man-made environment, such as the architecture, but pairs of parallelograms do not always form two faces of a parallelepiped in a scene.

It is also possible to obtain projective reconstruction linearly from at least one plane visible in all views [12]. However, to obtain metric reconstruction linearly, the method requires that the plane should be the plane at infinity determined from three orthogonal vanishing points as in [9]. Given calibration and rotation parameters, a linear estimation method using the metric information of a scene geometry was also suggested in [14]. Quadratic constraints from coplanar parallelograms were discovered for camera calibration in [15]. These constraints can be converted to linear constraints with the metric information of the parallelograms. There have also been many methods using other constraints of scene geometry: symmetric polyhedra [17], ellipses [18], spheres [19]. If there are only symmetric polyhedra and no parallelograms in a scene, the method described in [17] is a good alternative to our method. However, this method assumes the simplest camera model where only the focal length is unknown and needs a nonlinear optimization approach initialized by solving quadric equations.

As mentioned above, in-depth comparisons between the proposed method and the previous ones are presented in detail in the following related sections, respectively.

## 3. The infinity homography from parallelism and coplanarity

#### 3.1. Preliminaries and motivations

In general, it is possible to estimate the infinite homography, **H**_{∞}, with at least four vanishing point correspondences in two views by solving the relevant linear equations of

**v**and

**v**′ are homogeneous coordinates of the corresponding vanishing points in two views and ’≅’ indicates equality up to scale. The vanishing point can be estimated by the intersection of imaged lines parallel in 3D space. One of the vanishing points can be replaced by the epipole related to the two views, as it is mapped between the views by the infinite homography [20].

The two sets of parallel lines with different directions construct imaged parallelograms in camera images. If the two sets of parallel lines lie on the same supporting plane in 3D, the imaged parallelogram corresponds to an actual parallelogram existing on the plane in 3D (See Fig. 1(a)). This is common in typical man-made scenes. However, an imaged parallelogram consisting of two sets of parallel lines on a different supporting plane can exist, as in Figs. 1(b) and 1(c). In this case, the parallelogram does not actually exist in 3D. The proposed framework for computing **H**_{∞} utilizes the coplanarity constraint related to the former case.

#### 3.2. Image rectification for a novel framework

The rectification homographies **H*** _{ri}*, for

*i*= 1, 2, are defined here for two views. The rectification homography is defined so that it maps each imaged parallelogram to a rectangle that has edges parallel to the vertical and horizontal axes of the image plane (see Fig. 2). The position and the shape of the rectangle can be chosen arbitrarily. The rectification homography is allocated to each imaged parallelogram in views and is easily computed by using four vertex correspondences.

The image transformed by rectification homography can be considered to be an image on a new camera image plane. A *rectified camera* is defined as a camera having this new image. This camera can be considered as a virtual camera acquired by rotating the original camera about its optical center and varying the intrinsic parameters. It is worthwhile to note that the image plane of a rectified camera is parallel to the related parallelogram in 3D.

In the images transformed using **H*** _{ri}*, it is clear that the two vanishing points deduced from the transformed parallelogram, which is rectangle, are [1, 0, 0]

*and [0, 1, 0]*

^{T}*in the homogenous coordinate system without any calculation because the parallel lines intersect in infinity (see Fig. 2).*

^{T}Since the newly deduced vanishing points from the rectangle are also a transformation of the original vanishing points, it is possible to conclude that the infinite homography, ${\mathbf{H}}_{\infty}^{r}$, between the rectified cameras satisfies the following equations:

*λ*and

_{x}*λ*are arbitrary scale factors. From Eqs. (2) and (3), ${\mathbf{H}}_{\infty}^{r}$ has the following form:

_{y}*u*,

*v*, and

*w*are arbitrary scalar values.

From Fig. 2, the relationship between the original **H**_{∞} and the newly defined
${\mathbf{H}}_{\infty}^{r}$ is as follows:

**H**

_{∞}and ${\mathbf{H}}_{\infty}^{r}$.

As Eq. (5) can be obtained for each parallelogram (two sets of parallel lines), the number of equations constraining the variables is 9*m* + 1 (1 is for scale factor determination), where *m* is the number of parallelograms. Since the number of variables are 9 for **H**_{∞} and 5*m* for
${\mathbf{H}}_{\infty}^{{r}_{k}}$ (*k* = 1,⋯ ,*m*), 9*m*+1 ≥ 9+5*m* is required for the estimation to be possible. Thus, the required number of parallelograms is *m* ≥ 2. It is worthwhile to note that this requirement is also identical for the method that uses vanishing point correspondences because two parallelogram can give four vanishing points. It can be concluded that, in the new framework, there is no change in the number of constraints. Accordingly, although Eq. (5) is used to compute **H**_{∞}, no advantage is given without using the additional constraint depicted in Section 3.1.

#### 3.3. An additional constraint from an actual parallelogram

This section verifies that one additional constraint applies to ${\mathbf{H}}_{\infty}^{r}$ when all lines included in the two sets of parallel lines related to the imaged parallelogram lie on the same plane in 3D.

First, the camera calibration matrix of a rectified camera is deduced. The camera calibration matrix is

*f*and

_{u}*f*denote the focal length expressed in horizontal and vertical pixel dimensions, respectively,

_{v}*s*is the skew parameter, and (

*u*

_{0},

*v*

_{0}) are the pixel coordinates of the principal point.

The *aspect ratio* of a parallelogram is defined as the ratio of the height to the length of the bottom side. Let the four vertices of a parallelogram in 3D be [0, 0, 1]* ^{T}*, [1, 0, 1]

*, [*

^{T}*rcotθ*+ 1,

*r*, 1]

*, and [*

^{T}*rcotθ*,

*r*, 1]

*, where*

^{T}*r*is the aspect ratio of the parallelogram and

*θ*is the angle between the two sides. [0, 0, 1]

*, [1, 0, 1]*

^{T}*, [1,*

^{T}*a*, 1]

_{i}*, and [0,*

^{T}*a*, 1]

_{i}*, for*

^{T}*i*= 1, 2, are chosen as the four vertices of two rectangles in each view of two rectified cameras, where

*a*, for

_{i}*i*= 1, 2, are the aspect ratios of the two rectangles.

In fact, it will be shown that the variables *r* and *θ* disappear from the equation for the additional constraint, thus, the metric information of the parallelogram is not necessary.

The homography mapping between the vertices of the parallelogram and the rectangle of *i*th camera can then be computed as

*ω*be the image of the absolute conic (IAC) on the rectified image plane of the

_{ri}*i*th camera. It is then possible to obtain two equations that are linear in

*ω*[10]

_{ri}*ω*has the form of

_{r}*α*,

*β*, and

*γ*are arbitrary scalar values.

If **K*** _{ri}* represents the camera calibration matrix of the

*i*th rectified camera, then, because ${\omega}_{ri}={\mathbf{K}}_{ri}^{-T}{\mathbf{K}}_{ri}^{-1}$[20], after some manipulations it is possible to obtain

*α*′,

*β*′, and

*γ*′ are arbitrary scalar values.

As the circular points are fixed under a similarity transformation, the aforementioned selection of the vertices of the parallelogram and the rectangle does not lose generality.

**Theorem 1***The rotation matrix between two rectified cameras related to the same parallelogram is*

*Proof:* Without loss of generality, it is assumed that the plane supporting the parallelogram is on *Z* = 0 of the world coordinate system and that [**r _{1}**,

**r**,

_{2}**r**,

_{3}**t**] is the rotation and translation which relate the world coordinate system to the camera coordinate system. It can then be seen that

**H**

*corresponds to Eq. (6),*

_{i}**K**

*corresponds to Eq. (8), and*

_{ri}*μ*is an arbitrary scale factor.

By substituting Eq. (6) and Eq. (8) into Eq. (9), the result is

*α*″,

*β*″, and

*γ*″ are arbitrary scalar values.

Considering that ||**r _{1}**|| = ||

**r**|| =

_{2}**1**and

**r**=

_{3}**r**×

_{1}**r**, it is given that [

_{2}**r**,

_{1}**r**,

_{2}**r**] =

_{3}**R**

*or*

_{I}**R**

*. Since the rotation matrix between two rectified cameras is one of ${\mathbf{R}}_{I}^{T}{\mathbf{R}}_{I}$, ${\mathbf{R}}_{\overline{I}}^{T}{\mathbf{R}}_{\overline{I}}$, ${\mathbf{R}}_{I}^{T}{\mathbf{R}}_{\overline{I}}$, and ${\mathbf{R}}_{\overline{I}}^{T}{\mathbf{R}}_{I}$, it is*

_{Ī}**R**

*or*

_{I}**R**

*.*

_{Ī}**Theorem 2** *Assume that parallelograms imaged in two views are related to the same parallelogram which actually exists in 3D, and that the rectified parallelograms, which are rectangles, in each view have the aspect ratio of a*_{1}*and a*_{2}*, respectively, then λ _{y}* = (

*a*

_{2}/

*a*

_{1})

*λ*

_{x}in Eq. (4).*Proof:* Let **K**_{r}_{1} and **K**_{r}_{2} be the camera calibration matrices of the two rectified cameras. Assume **R _{r}** is the rotation matrix between two rectified cameras. From Theorem 1 and Eq. (8),
${\mathbf{H}}_{\infty}^{r}$ can be computed as follows [21]:

*u*′,

*v*′, and

*w*′ are arbitrary scalar values.

As mentioned above, it is worthwhile to note that
${\mathbf{H}}_{\infty}^{r}$ is independent of the angle between the two sides of the parallelogram, *θ*, and the aspect ratio of the parallelogram, *r*. From Theorem 2, if the two cameras observe the same parallelogram which actually exists in 3D, the additional and linear constraint of *λ _{y}* = (

*a*

_{2}/

*a*

_{1})

*λ*can be inserted in Eq. (5) because the aspect ratios,

_{x}*a*

_{1}and

*a*

_{2}, are values that can be chosen arbitrarily. If a parallelogram does not in fact exist in 3D, it is not possible to assume that

*r*is identical for each camera due to the parallax of the two cameras (refer to Figs. 1(b) and 1(c)); the constraint cannot be used.

Assume that *m* parallelograms are viewed with two cameras and that *m _{r}*(≤

*m*) is the number of actual parallelograms. From Eq. (5) and the additional constraint, the infinite homography,

**H**

_{∞}, can be computed by solving the set of homogeneous equations

**A**is a 9

*m*× (9 + 5

*m*−

*m*) matrix composed of the element of the rectification homographies, ${\mathbf{H}}_{r1}^{k}$ and ${\mathbf{H}}_{r2}^{k}$, and

_{r}*H*is the (

_{ij}*i*,

*j*) element of

**H**

_{∞}. The parentheses in Eq. (12) indicate that if

*k*th parallelogram is an actual parallelogram, the variable, ${\lambda}_{y}^{k}$, is not necessary due to the additional constraint.

#### 3.4. Comparison with related works

As mentioned in Section 3.1, four vanishing points or three vanishing points together with the fundamental matrix are required to obtain the infinite homography. However, it is difficult to acquire more than three vanishing points giving independent constraints in general man-made scenes as in Fig. 3 and it is also annoying to obtain many point correspondences between all view pairs for the fundamental matrix estimation. However, even in that case, the proposed method can obtain the infinite homography due to the additional constrains arising from the actual parallelograms. A degenerate case of the proposed method occurs only when all parallelograms are on planes parallel to each other. Moreover, even if there are four vanishing points, the additional constraints can improve the accuracy of camera calibration when metric constraints are not sufficient. This issue is explored further in Section 7.1.2 and 7.2.1.

However, it can be seen that these results also can be acquired from the method of [13] because parallelepiped also gives only three vanishing points. This is due to the fact that coplanarity constraints are merged into the canonical projection matrix suggested in [13]. However, as mentioned previously, the proposed method can deal with the general position of parallelograms in practical situations due to the additional constraints and, consequently, is free from the restriction that the parallelograms should be the faces of a parallelepiped.

Additional quadric constraints can also be derived from parallelograms and can be used to calibrate camera parameters directly [15]. However, metric information of a parallelogram are needed to convert these non-linear constraint equations to linear ones. For example, a parallelogram should be a diamond or rectangle. In fact, those linear constraint equations can also be derived in the proposed method as well. Those equations are equivalent to the constraint equations in Section 5.1 when a parallelogram’s two sides are equal in length or orthogonal, which means that a parallelogram is a diamond or rectangle.

## 4. Reconstruction up to affine transformation

Once the infinite homography, **H**^{0i}, between one reference view and *i*th view are computed, the camera matrices of an affine reconstruction are **P**_{0} = [**I**|**0**] and
${\mathbf{P}}_{i}=\left[{\mathbf{H}}_{\infty}^{0i}|{\mathbf{t}}_{i}\right]$[20]. If *i*th view do not have common images of two or more parallelogrmas with reference view, some manipulations may be needed using intermediate view. If the infinite homographies of
${\mathbf{H}}_{\infty}^{ji}$ and
${\mathbf{H}}_{\infty}^{0j}$ are given,
${\mathbf{H}}_{\infty}^{0i}$ can be computed as follows:

**t**and 3D points

**X̃**. The equation for point projection is

*nm*set of equations in 3

*n*+ 3(

*m*− 1) unknowns is generated with

*m*views and

*n*3D points. It is worthwhile to note that

**t**of the reference view

**P**

_{0}is assumed to be

**0**without loss of generality. These equations can be solved linearly to obtain

**t**and

**X̃**of an affine reconstruction. It is worthwhile to note that all 3D points need not to be visible in all views in this process.

#### 4.1. Parameter reduction using affine invariance

If there are parallelograms or parallelepipeds in views, the number of variables can be reduced with geometric constraints. In affine reconstruction process, all corner points of parallelograms or parallelepipeds are represented by one reference point and two or three vectors. Moreover, when using the above affine reconstruction formulation, the coordinate of vanishing point in the first view can be considered as a direction vector that is parallel to the line segments corresponding to this vanishing point. Representing the points on edges and surfaces of parallelograms and parallelepipeds with weighted sum of these vectors, coplanarity and colinearity constraints can be satisfied. If the length ratios between line segments parallel to each other are given, these constraints can be implemented as well using ratios of scalar weight. All these constraints are affine invariant.

Let **D** be a column vector containing the scalar weights for above direction vectors and the coordinates of 3D points, which cannot be represented by weighted sum of these vectors. Then **X̃** can be represented by

**X̃**can be obtained by Eq. (15). Due to this parameterization, the reconstructed vertices constitute parallelograms and parallelepipeds exactly and the coplanarity and colinearity constraints for the points on the edges and surfaces of parallelograms and parallelepipeds are exactly satisfied.

If non-linear optimization are performed to refine the linear estimation results, these parameter reduction is helpful to satisfy the geometric constraints described in this subsection. If these constraints are not embedded through the reduction, extra complex terms describing the geometric constraints between the 3D points should be added to the cost function.

#### 4.2. Comments on linear reconstruction formula

The linear reconstruction formula described above is similar to the method presented in [12]. In that work, a reference plane visible in all views are needed for linear projective reconstruction. If three orthogonal vanishing points, which span the plane at infinity, are detected in all views, internal parameters and metric reconstruction can be obtained linearly. It was assumed that the camera had zero skew and known aspect ratio or fixed internal parameters. However, the assumptions about aspect ratio and fixed internal parameters are not valid for arbitrary archival images or images gathered from the internet while modelling a scene.

The problem of the method utilizing orthogonal vanishing points as in [12] and [9] is a degenerate case where one or two of the vanishing points are at infinity. This case often occurs when the parallel lines are near parallel to image plane. However, this problem does not occur in the proposed method while estimating the infinity homography and reconstructing affine structure from parallelism. Moreover, the orthogonality can be used as well without omission in the process of the metric reconstruction described in the next section.

The structure parameter reduction is similar to [11] presenting single view based approach and is also suggested in the linear step in [14]. However, in [14], given that calibration and rotation parameters are estimated previously by other calibration methods, the parameterization using the scene geometry is done in metric space not in affine space. It is also straightforward that the regularity and symmetry information used in [14] can be implemented during the overall process of the proposed method by using the constraints of length ratios in Section 4.1 and orthogonality in 5.1. This means that, in the proposed method, full geometric information contributes to the camera calibration and reconstruction simultaneously because the calibration results are obtained at the last step.

## 5. Upgrade to metric reconstruction

Let ${\mathbf{H}}_{E}^{A}$ be the projective transformation from metric to affine space and has the following form

**A**. This is equivalent to obtain constraints on the absolute conic ${\mathrm{\Omega}}_{\infty}^{A}$ in affine space.

**A**can be obtained from ${\mathrm{\Omega}}_{\infty}^{A}\left(={\mathbf{A}}^{-T}{\mathbf{A}}^{-1}\right)$ by Cholesky factorization. The constraints described in the following subsections can be used to estimate the matrix ${\mathrm{\Omega}}_{\infty}^{A}$.

#### 5.1. Constraints from scene geometry

Assume that there are four points, **X̃*** _{Ei}*, for

*i*= 1, ⋯ ,4, in metric space and two vectors,

**d**

_{E}_{1}=

**X̃**

_{E}_{2}−

**X̃**

_{E}_{1}and

**d**

_{E}_{2}=

**X̃**

_{E}_{4}−

**X̃**

_{E}_{3}. If four points in affine space corresponding to the above four points are

**X̃**

*, for*

_{Ai}*i*= 1, ⋯,4, and

**d**

_{A}_{1}=

**X̃**

_{A}_{2}−

**X̃**

_{A}_{1}and

**d**

_{A}_{2}=

**X̃**

_{A}_{4}−

**X̃**

_{A}_{3}, then,

**d**

_{E}_{1}=

**A**

^{−1}

**d**

_{A}_{1}and

**d**

_{E}_{2}=

**A**

^{−1}

**d**

_{A}_{2}. Assume that

**d**

_{E}_{1}and

**d**

_{E}_{2}are orthogonal to each other in metric space. Then, since ${\mathbf{d}}_{E1}^{T}{\mathbf{d}}_{E2}=0$, we can obtain following constraint

If we know the ratio of **d**_{E}_{1} to **d**_{E}_{2} as *r*, then,

**d**

_{E}_{1}and

**d**

_{E}_{2}as

*θ*, then,

#### 5.2. Constraints from partial knowledge of camera parameters

The absolute dual quadric (ADQ), ${\mathbf{Q}}_{\infty}^{*A}$, in affine space can be computed from ${\mathbf{H}}_{E}^{A}{\mathbf{Q}}_{\infty}^{*E}{{\mathbf{H}}_{E}^{A}}^{T}$[20], where ${\mathbf{Q}}_{\infty}^{*E}$ is ADQ in metric space, and has the following form

The IAC, *ω _{i}*, in

*i*th view can be related to ${\mathbf{Q}}_{\infty}^{*A}$ as follows [20]:

If the internal camera parameters are static, we can set all {*ω _{i}*} to be

*ω*in Eq. (22) and, then,

*ω*can be eliminated using the relation $\omega ={\mathrm{\Omega}}_{\infty}^{A}$ in reference view. These results in linear equations on ${\mathrm{\Omega}}_{\infty}^{A}$ provided ${\mathbf{H}}_{\infty}^{0i}$ is normalized as $\text{det}\left({\mathbf{H}}_{\infty}^{0i}\right)=1$.

Knowing that pixels are rectangles (zero camera skew), and principal point is at origin, and aspect ratio is known as *r*, the following linear equations on
${\mathrm{\Omega}}_{\infty}^{A}$ can be obtained from Eq. (22):

#### 5.3. Comparison with related works

The constraints in Section 5.1 and 5.2 can be derived by basic knowledge of projective geometry. These constraints are somewhat equivalent to the constraints in [13] that is extracted by the prior knowledge about intrinsic parameters of cameras and parallelepipeds. The difference is that the 3D points used to derive the constraints are not only the vertices of parallelograms or parallelepipeds but also any 3D points in a scene in the proposed method.

## 6. Outline of the algorithm

In summary, the complete algorithm is composed of the following steps:

- Compute the rectification homographies for all parallelograms commonly viewed at the cameras.
- Establish a linear equation system on
**H**_{∞}and ${\mathbf{H}}_{\infty}^{{r}_{k}}$’s for camera pairs and solve it. - Compute the the infinity homographies between one reference view and the other views.
- Establish a linear equation system on
**D**and**t**based on affine invariant geometric constraints related to parallelograms and parallelepipeds and solve it. - Construct a linear equation system on ${\mathrm{\Omega}}_{\infty}^{A}$ based on prior knowledge of scene geometry and intrinsic camera parameters and solve it.
- Extract
**A**from ${\mathrm{\Omega}}_{\infty}^{A}$ using Cholesky factorization. - Upgrade the affine reconstruction results obtained from (4) to metric ones with ${\mathbf{H}}_{E}^{A}$.

## 7. Experimental results

#### 7.1. Results on synthetic data

### 7.1.1. Performance evaluation

Simulated experiments were performed in order to make careful analysis of the performance of the algorithm in various noise magnitude, parallelogram size, and singular configurations. The scenario is shown in Fig. 4. Tests were performed with synthetic 1024×768 images, taken by three cameras with the following intrinsic parameters: (*f _{u}*,

*f*,

_{v}*s*,

*u*

_{0},

*v*

_{0})=(1200, 1000, 0, 512, 384). Two parallelograms were placed in front of three cameras and arranged so that there are only three vanishing points. The distances from the cameras to the parallelograms was about 8

*m*. The cameras were placed horizontally and had arbitrary roll angles. The distances between the neighborhood cameras were about 6

*m*. The constraints used in this experiment were: orthogonality of the edge of the parallelograms and zero skew angle of the cameras.

The data normalization described in [22] was used for all experiments in this paper. For each parameters, 1,000 independent trials were performed and the results shown are the average. The estimated results are relative quantities and determined up to scale. To obtain position error, the estimated position of the vertices of the parallelograms are fitted to the ground truth using the method of [23].

First, the performance was evaluated with respect to the noise magnitude. Zero-mean uniformly distributed noise over the [−*n*, *n*] pixel was added to the image projections of the vertices of the parallelograms where *n* = 0.0, 0.25, 0.5,⋯ ,2. The edge size of the parallelograms and the angle between the planes of the two parallelograms were set to 1.5*m* and 120°, respectively. The results are shown at Figs. 5(a) and 5(d). We can see that the errors increase linearly with increasing the noise magnitude.

Second, we tested the performance while varying the edge size of the parallelograms from 0.5*m* to 2.0*m*. In this experiments, the angle between the planes of the two parallelograms were set to 120° and zero-mean uniformly-distributed noise over the interval [−0.5, 0.5](pixel) was added. The results are shown in Figs. 5(b) and 5(e). We can see that the algorithm can acquire reasonable results for the edge size over 1.0*m*.

Third, we tested the performance while varying the angle between the planes of the two parallelograms from 50° to 170°. In this experiments, the edge size of the parallelograms were set to 1.5*m* and zero-mean uniformly-distributed noise over the interval [−0.5, 0.5] pixel was added. The results are shown in Figs. 5(c) and 5(f). We can see that the results are unstable when the angle is near to 100° and 170°. In case of 100°, parallelograms are nearly perpendicular to the image planes of the side cameras. In case of 170°, two parallelograms are almost parallel.

### 7.1.2. Effect of additional constraints

As mentioned in Section 3.4, the additional constraints can contribute to obtaining the infinite homography even in the case of three vanishing point. In these simulated experiments, the additional accuracy effect of the additional constraint is evaluated. When there are no geometric information except for two parallelograms in a scene, the camera parameters estimated is evaluated. The prior knowledge used is that cameras have a static intrinsic parameters.

Comparisons were made with the following algorithms:

**Corr**: The method using vanishing point correspondences.**NAC**: The method using the proposed framework not including the additional constraints.**AC**: The proposed method including the additional constraints.**F**: Using the infinite homography obtained through the fundamental matrix estimation and the projective reconstruction of the cameras and the plane at infinity [20].**Zhang**: The classical method of Zhang [2].

The simulated camera had the following parameters: *f _{x}* = 1, 200,

*f*= 1, 000,

_{y}*s*= 0,

*u*

_{0}= 512, and

*v*

_{0}= 384. The resolution of the simulated image was 1024 × 768. Two parallelograms of side 1m were placed around the origin. The angle between the plans containing each parallelogram was 90°.

Zero-mean uniformly distributed noise over the [−*n*, *n*] pixel was added to the image projections of the vertices of the parallelograms where *n* = 0.25, 0.5, 0.75,⋯ ,2. For each method and noise level, 1,000 independent trials were performed and the results shown are the average. For each trial, three images were taken from the cameras randomly placed 5m ahead of the origin and pointing towards the origin so that the faces of the parallelograms can be viewed. The number of features for the fundamental matrix estimation of **F** was 20, and the features were randomly distributed over the 3D region containing the two parallelograms. For **Zhang**, it was assumed that the metric information of the vertices of the parallelograms was given. The estimation results of these experiments are shown in Fig. 6. Since the performances are highly similar both for *f _{u}* and

*f*and for

_{v}*u*and

_{o}*v*, the estimation results for

_{o}*f*and

_{v}*v*are not depicted here.

_{o}The results with four vanishing point correspondences are shown in Figs. 6(a), 6(b), and 6(c). If none of the sides of the parallelograms are parallel to the plane containing the other parallelogram, there are four vanishing point correspondences, and three of these are not collinear. It is clear that the performance of **AC** is superior to that of **Corr** and **NAC**. It is worthwhile to note that the performances of **Corr** and **NAC** are similar, as expected in Section 3.2. The results of **AC** are similar with those of **F**, which require the fundamental matrix estimation from many feature correspondences.

If one side of one of the parallelograms is parallel to one side of the other parallelogram as in Fig. 4, only three vanishing points exist. A comparison between the performances of the methods is shown in Figs. 6(d), 6(e), and 6(f) when the number of vanishing points is three or four. Since the case of three vanishing points is a degenerate case for **Corr** and **NAC**, no results are depicted for these methods. However, **AC** and **F** can provide a solution even in the case of three vanishing points. It was found that the estimation results of **AC** are nearly identical for the two cases. The estimation results of **F** with three vanishing points are degraded comparing to the case of four vanishing points. From these result, it can be concluded that **AC** is superior to **F** in practical situations in which there are only three vanishing points.

The camera parameters from **AC** are comparable to those of **Zhang**. However, it is worthwhile to note that **AC** uses only parallelism and does not require the metric information that is necessary when using **Zhang**.

#### 7.2. Results on real images

Various experiments with real images were also performed to test the algorithms. All line segments in the images were extracted automatically [24] (See Fig. 7) and the line segments corresponding to the parallelogram’s sides were selected manually. The vertices of the parallelograms were extracted from the intersections of the lines.

### 7.2.1. Tower scene

The resolution of the images was 1024 × 768. The images used in this experiment are shown in Fig. 8. The two parallelograms denoted in Fig. 8(a) with the white dotted lines were used as the input for the algorithms. The prior information used for the algorithm was that cameras have static intrinsic parameters. Fig. 8(d) shows the reconstructed model and the camera poses. Rendered new views of the reconstructed model are shown in Figs. 8(e) and 8(f).

Since the ground truth or any references for the reconstruction results are not available in this experiment, the accuracy of the reconstruction for the known geometry that is nevertheless not used in the algorithm is a useful measure of the performances of the algorithms. The line *a* and the plane consisting of the two lines *b* and *c* depicted in Fig. 8(c) were reconstructed. It was known that the angle between the line *a* and the normal to the plane was zero. The measure of this angle was 2.22° through the proposed method. In this experiment, there were four vanishing points and the method using vanishing point correspondences also can be used. Using that method, the measurement was 16.81°. From these results, it can be seen that the proposed method give more accurate calibration results owing to the additional constraints even in the case of four vanishing points when the static camera constraint is only used. If the geometric constraints were sufficient in this case, the results from the two methods were comparable.

### 7.2.2. Plant scene

The resolution of the images was 1024 × 768 and the camera parameters were not static while the images were captured. Four captured images are shown in Fig. 9. *Image 0* corresponds to the reference camera. The infinite homographies
${\mathbf{H}}_{\infty}^{01}$,
${\mathbf{H}}_{\infty}^{12}$, and
${\mathbf{H}}_{\infty}^{23}$ are computed in this experiments.

${\mathbf{H}}_{\infty}^{01}$ can be obtained with the two parallelograms corresponding to the frontal and right face *B* of the building. However, it looks like that one parallelogram is insufficient between *Image 1* and *Image 2* because at least two parallelograms not parallel to each other are required to obtain sufficient constraints. In this case, it can be considered that the right and left faces (*B* and *C*) of the building are same parallelograms. This is due to the fact that the proposed method computing the infinite homogrphy is independent to the translational motion of the parallelograms. So, face *A*, *B*, and *C* are used to compute
${\mathbf{H}}_{\infty}^{12}$. Thus, in contrast to the method described in [13], since the proposed method can use parallelograms and partially overlapped images, it is more flexible in use. For
${\mathbf{H}}_{\infty}^{23}$, face *A* and *C* are used. Metric constraints used in this experiment were: orthogonality of the edge of the building, right angle between line segment *a* and *b*, and length ratio of *c* to *d*. Fig. 10 shows the reconstructed model and the camera poses in new view positions.

### 7.2.3. Scene of the *Bank of China*

*Bank of China*

Five images of the *Bank of China* in Hong Kong were gathered from the internet(see Fig. 11). The resolution of the images was various from 419 × 783 to 1536 × 2048.

${\mathbf{H}}_{\infty}^{01}$,
${\mathbf{H}}_{\infty}^{02}$,
${\mathbf{H}}_{\infty}^{23}$, and
${\mathbf{H}}_{\infty}^{24}$ are computed using {*A*, *C*}, {*A*, *B*, *D*}, {*D*, *E*}, and {*A*, *D*, *F*, *G*}, respectively. It is worthwhile to note that there are no parallelepipeds of which at least six vertices are seen across *Image 0* and *Image 2* and the method of [13] cannot be applied. However, we can find common parallelograms. It is also the same for the pair of *Image 2* and *Image 3*. {*A*, *G*} and {*D*, *F*} can be considered as the same parallelograms respectively in the estimation of
${\mathbf{H}}_{\infty}^{24}$ as explained in Section 7.2.2. Metric constraints used in this experiment were: orthogonality of the edge of the building and orthogonality of crossing diagonal line. Reconstructed model and the camera poses are shown in Fig. 12.

### 7.2.4. Scene of the *Casa da Música*

*Casa da Música*

In this section, reconstruction of the *Casa da Música* in Porto is presented. Ten images were collected from the internet (see Fig. 13). The resolution of the images was various from 640 × 480 to 2313*×*2736. It is also noted that there are no parallelepipeds of which at least six vertices are seen across the images and the method of [13] cannot be applied.

The infinite homographies
${\mathbf{H}}_{\infty}^{0i}$ (*i* = 1,⋯ ,5) were computed using the faces *A*, *B*, and *C*. The infinite homographies
${\mathbf{H}}_{\infty}^{67}$,
${\mathbf{H}}_{\infty}^{78}$, and
${\mathbf{H}}_{\infty}^{89}$ were based on the faces *D*, *E*, *F*, and *G*. In this experiment, since two parallelograms commonly viewed across the image group {0,⋯ ,5} and {6,⋯ ,9} were not found, the reconstruction process was separately applied to each image group. Then, the two reconstruction results were merged so that the four vertices *a*, *b*, *c*, and *d* were aligned (see the points depicted in Figs. 13(a), 13(c), 13(h), and 13(i)).

Metric constraints used in this experiment were: right angles for the parallelograms and zero camera skew. The reconstructed model and the camera poses are shown in Fig. 14.

## 8. Conclusions

In this paper, a novel framework was introduced to reconstruct a 3D scene using uncalibrated views and geometric information of the scene. The proposed method is a stratified and linear method. It was shown that all of the constraints derived in the previous works can be implemented in the proposed method.

The proposed method uses the infinite homography estimated together with the additional constraints. The additional constraints arising from the parallelograms are clearly visible when developing a novel framework for estimating the infinite homography via rectification of parallelograms. It was shown that the novel approach including the additional constraints has two advantages. First, the camera parameters with greater accuracy can be acquired when the geometric constraints are not sufficient. Second, even if only three vanishing points exist in the views, the infinite homography can be computed without the need of estimating the fundamental matrix.

The proposed method was tested with both simulated data and real images to show its advantages. The study on the situations that can not be dealt with by the previous approaches was presented and it was shown that the proposed method is more flexible in that the cases can be handled by the proposed method.

## Acknowledgments

This work was supported by the strategic technology development program of MCST/MKE/KEIT [KI001798, Development of Full 3D Reconstruction Technology for Broadcasting Communication Fusion].

## References and links

**1. **R. Tsai, “A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf tv cameras and lenses,” IEEE Trans. Robot. Autom. **3**, 323–344 (1987). [CrossRef]

**2. **Z. Zhang, “A flexible new technique for camera calibration,” IEEE Trans. Pattern Anal. Mach. Intell. **22**, 1330–1334 (2000). [CrossRef]

**3. **J.-H. Kim and B.-K. Koo, “Convenient calibration method for unsynchronized camera networks using an inaccurate small reference object,” Opt. Express **20**, 25292–25310 (2012). [CrossRef] [PubMed]

**4. **M. Pollefeys and L. V. Gool, “Stratified self-calibration with the modulus constraint,” IEEE Trans. Pattern Anal. Mach. Intell. **21**, 707–724 (1999). [CrossRef]

**5. **M. Pollefeys, L. V. Gool, M. Vergauwen, F. Verbiest, K. Cornelis, J. Tops, and R. Koch, “Visual modeling with a hand-held camera,” Int. J. Comput. Vision **59**, 207–232 (2004). [CrossRef]

**6. **T. Moons, L. V. Gool, M. Proesmans, and E. Pauwels, “Affine reconstruction from perspective image pairs with a relative object-camera translation in between,” IEEE Trans. Pattern Anal. Mach. Intell. **18**, 77–83 (1996). [CrossRef]

**7. **P. Hammarstedt, F. Kahl, and A. Heyden, “Affine reconstruction from translational motion under various auto-calibration constraints,” J. Math. Imaging Vis. **24**, 245–257 (2006). [CrossRef]

**8. **L. Agapito, E. Hayman, and I. Reid, “Self-calibration of rotating and zooming cameras,” Int. J. Comput. Vision **45**, 107–127 (2001). [CrossRef]

**9. **R. Cipolla, T. Drummond, and D. P. Robertson, “Camera calibration from vanishing points in images of architectural scenes,” in “*Proc. British Machine Vision Conferece*,” (Nottingham, England, 1999), pp. 382–391.

**10. **D. Liebowitz and A. Zisserman, “Combining scene and auto-calibration constraints,” in “*Proc. IEEE International Conference on Computer Vision*,” (Kerkyra, Greece, 1999), pp. 293–300. [CrossRef]

**11. **D. Jelinek and C. J. Taylor, “Reconstruction of linearly parameterized models from single images with a camera of unknown focal length,” IEEE Trans. Pattern Anal. Mach. Intell. **23**, 767–773 (2001). [CrossRef]

**12. **C. Rother and S. Carlsson, “Linear multi view reconstruction and camera recovery using a reference plane,” Int. J. Comput. Vision **49**, 117–141 (2002). [CrossRef]

**13. **M. Wilczkowiak, P. Sturm, and E. Boyer, “Using geometric constraints through parallelepipeds for calibration and 3D modelling,” IEEE Trans. Pattern Anal. Mach. Intell. **27**, 194–207 (2005). [CrossRef] [PubMed]

**14. **E. Grossmann and J. Santos-Victor, “Least-squares 3D reconstruction from one or more views and geometric clues,” Comput. Vis. Image Und. **99**, 151–174 (2005). [CrossRef]

**15. **F. C. Wu, F. Q. Duan, and Z. Y. Hu, “An affine invariant of parallelograms and its application to camera calibration and 3D reconstruction,” in “Proc. European Conference on Computer Vision,” (2006), pp. 191–204.

**16. **L. G. de la Fraga and O. Schutze, “Direct calibration by fitting of cuboids to a single image using differential evolution,” Int. J. Comput. Vision **80**, 119–127 (2009). [CrossRef]

**17. **N. Jiang, P. Tan, and L.-F. Cheong, “Symmetric architecture modeling with a single image,” ACM T. Graphic. (Proc. SIGGRAPH Asia) **28** (2009). [CrossRef]

**18. **F. Mai, Y. S. Hung, and G. Chesi, “Projective reconstruction of ellipses from multiple images,” Pattern Recogn. **43**, 545–556 (2010). [CrossRef]

**19. **K.-Y. K. Wong, G. Zhang, and Z. Chen, “A stratified approach for camera calibration using spheres,” IEEE Trans. Image Process. **20**, 305–316 (2011). [CrossRef]

**20. **R. Hartley and A. Zisserman, *Multiple View Geometry in Computer Vision*, Second Edition (Cambridge University Press, 2003).

**21. **Q.-T. Luong and T. Viéville, “Canonical representations for the geometries of multiple perspective views,” Comput. Vis. Image Und. **64**, 193–229 (1996). [CrossRef]

**22. **R. Hartley, “In defence of the 8-point algorithm,” in “Proc. International Conference on Computer Vision,” (Sendai, Japan, 1995), pp. 1064–1070.

**23. **B. K. P. Horn, H. M. Hilden, and S. Negahdaripour, “Closed form solution of absolute orientation using orthonormal matrices,” J. Opt. Soc. Am **5**, 1127–1135 (1988). [CrossRef]

**24. **D. A. Forsyth and J. Ponce, *Computer Vision: A Modern Approach* (Prentice Hall, 2003).