## Abstract

Stereoscopic imagers are widely used in machine vision for three-dimensional (3D) visualization as well as in non-destructive testing for quantitative characterization of cracks, delamination and other defects. Measurement capability in these systems is provided by a proper combination of the optical parameters and data processing techniques. Conventional approach to their design consists of two sequential stages: optical system design and optimization of calibration and image processing algorithms. Such two-stage procedure often complicates both the hardware and the software, and results in a time-ineffective design procedure and cost-ineffective solution. We demonstrate a more effective approach and propose to estimate errors of 3D measurements at the early optical design stage. We show that computer simulation using optical design software allows not only optimizing optical parameters of the imager but also choosing the most effective mathematical model of the system and the equipment necessary for calibration. We tested the proposed approach on the design of miniature prism-based stereoscopic system and analyzed the impact of various factors (aberrations, tolerances, etc.) as on the image quality, so on the quality of calibration and 3D measurements accuracy. The proposed joint design approach may be highly effective for various measurement systems and applications when both optical parameters and image processing algorithms are not defined in advance and are necessary to be optimized.

© 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

Stereoscopic systems are widely used in machine vision and non-destructive testing for three-dimensional (3D) imaging and quantitative characterization of cracks, delamination and other defects. As well as other computational imaging systems, they implement a number of digital image processing algorithms including image enhancement and rectification, stereo matching, calibration and 3D reconstruction [1–3]. Therefore, 3D structure reconstruction and measurement capabilities in these systems are provided by a proper combination of the optical parameters and data processing techniques.

Conventional methodology of computational imaging system design includes two sequential stages dedicated to the optimization of optical system and image processing algorithms using different merit functions. Such two-stage procedure often complicates both the hardware and the software, and results in a time-ineffective design procedure and non-optimal cost-ineffective solution [2,4,5]. As shown in [4,6–8], the joint optimization of the optical system parameters and the image processing algorithm may be implemented using optical design software and leads to higher image quality under a fixed design budget compared to conventional component-wise approach.

The joint design concept for stereoscopic systems implies that the evaluation of the entire system performance should be based on 3D measurement errors. It is especially important for complex stereoscopic imagers such as catadioptric [9–11], depth-from-refraction [12,13] and prism-based [14–17] systems, because these systems include optical elements which parameters directly affect both image quality and 3D measurement accuracy via stereo baseline distance. Therefore, data processing algorithms and geometrical model of image formation (so called “camera model” [3,18]) should be considered at the stage of optical system design and should be optimized together. Up to now, application of joint approach to stereoscopic imager design has not been reported and demonstrated experimentally.

The optimal choice of camera model is often a key condition for precise measurements. It is a compromise between the sufficient description of the ray tracing through the optical system (which requires the model to have more parameters) and the reliability (the fewer number of parameters is better). On the one hand, the optical system may be designed with significant residual distortion and reduced cost if the camera model properly accounts for this distortion and allows its digital correction. On the other hand, if the residual distortion is large, its mathematical description may require many parameters, which leads to complicated calibration procedure, expensive calibration equipment and lower reliability. The choice of an optimal camera model for a specific optical system is an open problem, which is challenging for catadioptric and prism-based stereoscopic systems, optical systems with significant pupil aberrations [19,20] and narrow-band imaging systems [21] at the initial design stage. As shown in [22,23], the optical design software Zemax together with MATLAB may be used for the computer simulation of the camera calibration procedure. In particular, this simulation was applied to assess the impact of temperature variations and tolerancing on the pinhole camera model parameters. The optical design software was also used to simulate camera calibration experiments in [20,24].

In this paper, we apply the joint design approach to stereoscopic measurement systems and demonstrate that the computer simulation using the optical design software allows estimating systematic and random errors of 3D measurements and evaluating appropriateness of camera models and calibration equipment. This simulation is necessary to implement the joint design approach for optical system, calibration equipment and software as shown in Fig. 1.

When conventional approach to stereoscopic imager design is applied, insufficient accuracy or poor reliability of the chosen camera model or the calibration procedure may be revealed only after the prototype of the entire system has been assembled and tested. Moreover, experimental studies using a single prototype cannot ensure appropriateness of the hardware and software for mass production. The joint design approach using computer simulation allows estimating the impact of tolerances for imaging system and calibration equipment, temperature variations, image noise and other factors on measurements errors. It may be used to compare calibration procedures, for example, arbitrary displacements and rotations of calibration target [25] vs. pure translation of the target with a defined step [18], and to optimize calibration equipment. If the results of the computer simulation are unsatisfactory, it is possible to return to any design stage and change the design parameters of the optical system as well as the camera model.

We apply this strategy to design a miniature prism-based stereoscopic imager necessary to obtain two images of the object from different viewpoints on a single sensor. Due to simplicity and small dimensions, systems based on this scheme are widely used, for instance, in stereo endoscopes for the inspection of hard-to-reach objects [26,27] (Fig. 2). Prism-based optical system leads to strong image distortion and pupil aberrations, which cannot be completely corrected by lenses [28] and should be properly accounted for by a camera model. The peculiarities of this system make it a good example to illustrate the proposed joint design approach.

The next section of the paper gives a summary of camera models, which may be used for calibration of prism-based stereoscopic systems. Additionally, we propose a new model to combine ray tracing through the prism with a radial polynomial for pupil aberrations of the lens. In Section 3, we describe the computer simulation of calibration and measurements using optical design software and utilize it to evaluate camera models and analyze the influence of tolerances and resistance to image noise. By comparing two variants of the optical system, we demonstrate that the optimal choice of camera model depends on lens design parameters. In Section 4, we confirm the results of the computer simulation by the experiments carried out using a prototype of the designed prism-based stereoscopic system.

## 2. Camera models and 3D reconstruction

Camera model is necessary for 3D reconstruction and image processing algorithms to enable measurement capabilities of stereoscopic system. To reconstruct 3D points from their 2D projections on stereo images, this model has to provide the correspondence between 2D image points and 3D rays in object space [3,18]. First, we consider the general camera model for 3D reconstruction in multicamera systems to introduce all necessary notations and then specify the model for prism-based stereoscopic systems.

The forward mapping for *i*-th camera can be written as ${p}_{i}={P}_{i}\circ {E}_{i}({l}_{w,i})$ where **l**_{w}* _{i}* is the vector of 3D line coordinates in the world coordinate system (WCS) and

**p**

*is the vector of 2D image coordinates in pixels,*

_{i}*i*= 1…

*i*

_{max},

*i*

_{max}is the number of cameras. The notation

*E*is used for the Euclidean transformation ${x}_{i}={E}_{i}({x}_{w})={R}_{i}{x}_{w}+{t}_{i}$ from WCS to the 3D coordinate system associated with

_{i}*i*-th camera (

*i*-th camera CS). Here

**R**

*is the rotation matrix,*

_{i}**t**

*is the translation vector,*

_{i}**x**

_{w}and

**x**

*are 3D point coordinates in WCS and*

_{i}*i*-th camera CS correspondingly. We specify the vector of 3D line coordinates as ${l}_{w}={({c}_{w}^{T},{v}_{w}^{T})}^{T}$ combining the 3D coordinates of the base point

**c**

_{w}and the unit direction vector

**v**

_{w}. The Euclidean mapping is applied to the line coordinates in the following way: ${l}_{i}={E}_{i}({l}_{w})={({c}_{i}^{T},{v}_{i}^{T})}^{T}$, where ${c}_{i}={R}_{i}{c}_{w}+{t}_{i}$ and ${v}_{i}={R}_{i}{v}_{w}$. The transformation

*P*is the camera internal transformation ${p}_{i}={P}_{i}({l}_{i})$. The notation $\circ $ corresponds to a composition of transformations, i.e. ${P}_{i}\circ {E}_{i}({l}_{w,i})\equiv {P}_{i}({E}_{i}({l}_{w,i}))$. Similarly, one can write the inverse mapping ${l}_{wi}={E}_{i}^{-1}\circ {P}_{i}^{-1}({p}_{i})$, which allows to find the 3D line coordinates

_{i}**l**

_{w}

*for each image point*

_{i}**p**

*of*

_{i}*i*-th camera. The Euclidean transformation is always invertible, but the same is not always true for the camera internal transformation. For example, if

*P*is a high-order polynomial, then ${P}_{i}{}^{-1}$ would actually require the iterative solver to find the root of this polynomial. Camera models can be divided into groups depending on whether the forward or inverse transformation (or both transformations) can be written in explicit form. It also affects the 3D reconstruction algorithm because it commonly uses the transformation in explicit form to reduce computational complexity. The set of transformations

_{i}*E*and

_{i}*P*is usually described by the vector of parameters

_{i}**k**which elements are determined as a result of a calibration procedure. Some camera models are parameter-free, for example, the models using look-up-tables and interpolation [15,18].

The image acquired by prism-based system consists of *i*_{max} parts obtained via *i*_{max} parts of the prism, we further refer to them as “*i*-th image part” and “*i*-th prism part”. Thus, all notations considered above can be applied if we assume that *i*-th image part of the prism-based system corresponds to *i*-th camera. Though the description of camera models in this paper is given for a biprism-based stereoscopic system with *i*_{max} = 2 [14–17], it can be easily generalized for the systems using trinocular or multiocular prisms [14] with *i*_{max} > 2.

#### 2.1 Pinhole camera models

Pinhole camera model is widely used in computer vision [3,10,15,18,23,25,29]. It assumes that all rays **l*** _{i}* for the

*i*-th camera pass through the same central point. Hence, it is possible to use one point

**x**

*to determine each ray and replace*

_{i}**l**

*by*

_{i}**x**

*in the equations for the forward transformation. The pinhole projection is usually composed with 2D transformation, which describes a lens geometric distortion. Using the same approach as [30], we consider the forward projection as the combination of a few simple transformations*

_{i}*F*is the central projection onto the unit plane ${x}_{i}^{\text{'}}=F({x}_{i})$,

*D*is the 2D distortion transformation ${x}_{i}^{\text{'}\text{'}}={D}_{i}({x}_{i}^{\text{'}})$,

_{i}*A*is the 2D affine transformation to the image coordinates in pixels ${p}_{i}={A}_{i}({x}_{i}^{\text{'}\text{'}})$. The 2D distortion transformation is often represented as the combination of radial and tangential part as follows [18,23]

_{i}If we use the forward distortion transformation specified in Eq. (2), the camera model would not have the inverse transformation in explicit form. We can write the inverse transformation similar to Eq. (1) in the following form

In order to get the inverse transformation ${D}_{i}^{-1}$ in explicit form, one can rewrite the Eq. (2) in the inverse form swapping ${x}^{\prime}$ and ${x}^{\u2033}$

The parameters of this camera model are the focal lengths *f*_{u}, *f*_{v}, the coordinates of the principal point *u*_{0}, *v*_{0} and the distortion parameters ${\rho}_{1},\mathrm{...},{\rho}_{5}$. Thus, the vector of parameters contains 9 internal parameters describing *P _{i}* and 6 external parameters describing

*E*for each camera. If the CS of the first camera is assumed to coincide with WCS, the total number of parameters for

_{i}*i*

_{max}= 2 is equal to 24. The pinhole camera model with the inverse transformation ${D}_{i}^{-1}$ is shown in Fig. 3(a).

#### 2.2 Introducing pupil aberrations

In the pinhole model, the inverse transformation *F*^{−1} is expanded as

In order to account for pupil aberrations, we should discard the assumption that all rays pass through the central point. Thus, we should replace the coordinates of the central point ${\left(0,0,0\right)}^{T}$ by a point ${\left({x}_{\text{p}i},{y}_{\text{p}i},0\right)}^{T}$ which coordinates depends on ${x}^{\prime}$ or ${x}^{\u2033}$ as shown in Fig. 3(b). For these coordinates, we use Eq. (4) with different coefficients

To keep the notations used in Eq. (3), we modify ${D}_{i}^{-1}$ and *F*^{−1} as follows

*z*= 0 and

*z*= 1) and is similar to the model used in [20]. It is important to note that this model is non-central and does not provide the forward transformation in explicit form. The vector of parameters contains 5 additional coefficients ${\rho}_{6},\mathrm{...},{\rho}_{10}$ describing pupil aberrations for each camera.

#### 2.3 Expanded polynomial distortion

We can replace Eqs. (4), (5) and (7) by an expanded polynomial as follows

*m*

_{max},

*n*

_{max},

*m*

_{p}_{max}and

*m*

_{p}_{max}are the maximum degrees of the polynomial. Terms with $\left(m+n\right)<2$should not be used because they correlate with the parameters of other transformations (${A}_{i}^{-1}$ and ${E}_{i}^{-1}$). For example, ${\alpha}_{1,0}$ is an equivalent of the focal length and ${\xi}_{0,0}$ acts as

*x*coordinate of the translation vector. In this work we limit the maximum degree in Eq. (9) by $2<\left(m+n\right)<5$ for ${\left({x}_{\text{p}i},{y}_{\text{p}i}\right)}^{T}$ and $2<\left(m+n\right)<7$ for ${\left({{x}^{\prime}}_{i},{{y}^{\prime}}_{i}\right)}^{T}$. In contrast to [20], where optical system is considered to be designed as a rotationally symmetric one, the polynomial in Eq. (9) suits better for describing distortion and pupil aberrations of prism-based systems without this symmetry. The vector of parameters contains coefficients ${\alpha}_{m,n}$, ${\beta}_{m,n}$instead of ${\rho}_{1},\mathrm{...},{\rho}_{5}$ and ${\xi}_{m,n}$, ${\eta}_{m,n}$instead of ${\rho}_{6},\mathrm{...},{\rho}_{10}$.

#### 2.4 Ray tracing models

The ray tracing models provides the inverse transformation expanding Eq. (3) by additional ray transformations such as reflections for catadioptric systems or refractions for prism-based systems. We consider prism-based system and use ray tracing from the image plane to the object space similar to [16,17]. The transformation ${l}_{q,i}={S}_{q,i}\left({l}_{q-1,i}\right)$ describes the refraction of each ray **l*** _{q,i}* on the

*q*-th surface of the prism for

*i*-th image part. Each transformation

*S*is derived from the vector form of Snell’s law. To find the ray coordinates

_{q,i}**l**

_{2}

*after the prism for each image point*

_{,i}**p**

*, the pinhole model with radial distortion for the main lens is applied to define ray coordinates*

_{i}**l**

_{0}

*with subsequent transformations*

_{,i}*S*and

_{1,i}*S*to trace this ray through the prism [17], see Fig. 3(c). As a result, the inverse transformation for the ray tracing model is represented in the form

_{2,i}*E*is the same for all image parts due to the choice of a single CS for all parts of the prism. If the camera CS and the WCS are assumed to be coincident,

*E*can be omitted. For the biprism located in the air, 9 parameters for 3 faces of the prism and one more parameter for the refractive index are sufficient to describe it.

The model used in [17] takes into account distortion and pupil aberrations introduced by the prism using ray tracing. It contains the transformation *D*^{−1} described by Eq. (4) without tangential part, so it considers the radial distortion of the lenses. Meanwhile, it does not include parameters to describe pupil aberrations of the lenses. It is known [12,13] that plane parallel glass plate introduces the lateral displacement of the passing ray, which is proportional to the plate thickness and depends on the incident angle of the ray. Thus, it introduces pupil aberrations, which may be described by radial polynomial similar to the first term of Eq. (7). Hence, the pupil aberrations of the lenses can be partially accounted for in this model by changing the thickness of the prism, i.e. by adding a virtual plane parallel plate to it. It can be insufficient, if these aberrations have significant values.

In this paper, we modify this model by replacing Eq. (4) with Eq. (5) and adding pupil aberrations as considered in Eq. (7). Since the main lens is designed as rotationally symmetric optical system, we exclude the tangential part from Eqs. (4) and (5). Another reason for this approximation is that the parameters of the tangential part correlate with the parameters describing the position of prism faces which can lead to calibration instability. We also fix the thickness of the prism because it correlates with the parameters in Eq. (7) as explained above. Thus, the vector of parameters for the ray tracing model includes 4 intrinsic camera parameters (*f*_{u}, *f*_{v}, *u*_{0}, *v*_{0}), 3 or 6 distortion parameters (${\rho}_{1},{\rho}_{2},{\rho}_{3}$ and ${\rho}_{6},{\rho}_{7},{\rho}_{8}$) and 10 prism parameters (if ${\rho}_{6},{\rho}_{7},{\rho}_{8}$ are enabled, prism thickness is fixed).

#### 2.5 Overview of considered camera models

Summarized information about the camera models considered in this paper is given in Table 1.

The notations presented in the left column are used to identify each model in the following sections. Total number of parameters for each model is presented for the prism-based stereoscopic system, i.e. *i*_{max} = 2. We will further use the terms “forward camera model” and “inverse camera model” to indicate which type of transformation may be written in explicit form.

#### 2.6 3D reconstruction

After point matching has been finished, the reconstruction of 3D point coordinates **x**_{w} from *i*_{max} corresponding image points **p*** _{i}* is generally considered as the optimization problem

*C*depends on the type of camera model, a priori data about the statistics of coordinate measurement errors and other factors. For forward camera models, the optimal cost function is based on the Mahalanobis distance in the image plane [29]. For simplicity, we assume that a priori information is not available and assign equal weights to all points and coordinates. Thus, the cost function may be written as follows

The minimization problem in Eq. (11) can be solved by iterative techniques such as Levenberg-Marquardt method [29]. For inverse camera models, we use the cost function based on the sum of squared distances *d*^{2} from the point **x**_{w} to rays **l**_{w}* _{i}*:

*i*

_{max}= 2 comes down to determining the midpoint of the common perpendicular and does not require iterations.

## 3. Computer simulation

In this section, we describe the computer simulation of calibration and 3D measurements using the design of miniature prism-based stereoscopic system as an example. The optimization of the optical system design and the computer simulation were performed using optical design software Zemax.

#### 3.1 Optical system design

Remote visual inspection using video endoscopes requires probe diameter less than 6 mm. Due to small dimensions of the distal end and interchangeable measurement tips, placing two separate lenses is barely reasonable. In this case, prism-based stereo tips are the most popular solution for 3D measurements [26–28]. Distortion and chromatic aberrations for the systems of this type are significant. For a prism-based stereo imager, detailed analysis of image quality may be found in [27].

The optical system has been designed for 1/6” 1920 × 1080 color sensor and has field of view 40° × 45° of each channel, F/11 and the effective focal length 2.36 mm. The design layout of the optical system is shown in Fig. 4. The range of working distances in object space is 5-40 mm measured from the first surface (from S1 to S0). The overall length of the system is 13.7 mm (from S1 to the image plane). We compare two variants of the optical system design (Variant 1 and Variant 2) which differ in the configuration of the first negative optical element (S5 and S6 in Fig. 4). The prism and other optical elements have the same parameters for both variants.

Variant 2 has smaller pupil aberrations of the lens compared to Variant 1. The pupil aberrations of the lens can be clearly seen at the plot indicating rays after the prism: the chief rays from the bottom part of the object do not pass through the center of the entrance pupil of the lens for Variant 1. The entrance pupil of the lens shown in this plot is the paraxial image of the aperture stop through the front part of the lens (S5–S8). However, Variant 1 has a smaller pupil aberration of the whole optical system including the prism, as the aberrations of the prism are partially compensated by the lens aberrations (see the chief rays from the bottom part of the object after S0 in Fig. 4). The entrance pupil of the whole system is calculated as the image of the aperture stop through S1–S8 using the chief rays from the point of object near the optical axis for top and bottom parts of the prism separately. Thus, the entrance pupil consists of two laterally shifted parts. The distance between the centers of these parts is actually the base distance of the stereopair which has the main impact on the uncertainty of 3D measurements [3,10,13].

For Variant 2, the distortion is mainly introduced by the prism, the impact of the lens is not significant. Thus, the distortion is stronger for the central part of the combined image. In contrast, the lens in Variant 1 introduces large radial distortion, which can be seen in Fig. 4 (bottom) as the “barrel” for both sub-images.

In this paper, we concentrate on the connection between optical system design and calibration procedure. Therefore, we discuss in detail only distortion and pupil aberrations as these aberrations are included in the mathematical models necessary for calibration. It allows estimating systematic error of 3D measurements caused mainly by non-optimal camera model. The analysis of other aberrations affecting image quality for both variants of the optical system is not considered in this paper because it is less relevant for camera calibration and comparison of camera models.

#### 3.2 Computer simulation of calibration and measurements

The aim of the calibration procedure is to determine the parameter vector **k** from the images of a calibration target. These targets may be manufactured as flat, step-like or cube objects with many contrast markers which centers can be easily determined on the image (chessboard pattern, circles, line grid or other geometric patterns). Similar to [10,11,13,20,22,23,25], we consider that the calibration target is flat and the 3D coordinates of the marker points **x**_{t} * _{j}*,

*j*= 1…

*j*

_{max}are known with a certain accuracy in the CS of the calibration target. The target is placed at

*k*

_{max}different positions to acquire images. The image coordinates

**p**

*,*

_{i,j,k}*k*= 1…

*k*

_{max}are calculated for each point and each position using image processing. We should notice that for

*i*-th camera (i.e. for

*i*-th image part) and

*k*-th position only a part of

*j*

_{max}points will be visible. The complete projection transformation for these points can be written as ${p}_{i,j,k}={P}_{i}\circ {E}_{i}\circ {E}_{k}^{\text{'}}({x}_{tj})$, where ${E}_{k}^{\text{'}}$ stands for the Euclidean mapping from the CS of the calibration target to the WCS. In addition to the parameter vector

**k**for transformations

*P*and

_{i}*E*, we also need to find vector

_{i}**k**

_{t}describing transformations ${E}_{k}^{\text{'}}$ for each

*k*. If we introduce the composite vector

**x**

_{t}of all 3D points and the composite vector

**p**of all projections, we can write the optimization problem for calibration as

Following the same reasons as for the 3D reconstruction algorithm above, the most appropriate choice of the cost function *C* for forward camera models is the Mahalanobis distance in the image plane:

*P*,

_{i}*E*and ${{E}^{\prime}}_{k}$ correspond to the current estimation of

_{i}**k**and

**k**

_{t}. The sum over all

*i*,

*j*,

*k*actually means the sum over all visible points which image coordinates

**p**

*can be calculated. As before, we assume that a priori information is not available and assign equal weights to all points and coordinates.*

_{i,j,k}The cost function for inverse camera models should use the sum of squared distances from 3D points to back-projected rays **l**_{t} * _{i,j,k}*: ${d}^{2}\left({x}_{\text{t}j},{{E}^{\prime}}_{k}^{-1}\circ {E}_{i}{}^{-1}\circ {P}_{i}{}^{-1}\left({p}_{i,j,k}\right)\right)$. Since we use a flat calibration target, we can replace rays

**l**

_{t}

*by points*

_{i,j,k}**x**

_{t}

*of ray intersections with XOY plane in the CS of the target and write the cost function as follows*

_{i,j,k}In this paper, we use the Levenberg-Marquardt method [29] to solve the minimization problem in Eq. (14). We have implemented calibration and 3D reconstruction algorithms for all considered camera models using C++ and Matlab.

The stages of computer simulation are shown in Fig. 5. The input data is the “true” 3D coordinates of points ${\overline{x}}_{\text{w}j,k}$ in WCS. For the simulation of calibration, these coordinates are defined according to the current position of calibration target: ${\overline{x}}_{wj,k}={E}_{k}^{\text{'}}({x}_{tj})$. We assume that WCS coincides with the coordinate system used to define positions of optical elements as shown in Fig. 4. The coordinates of “true” image points ${\overline{p}}_{i,j,k}$ are calculated by forward tracing of chief ray from each point of target using Zemax for both parts of the prism. Since the analyzed optical system has significant pupil aberration, the “ray aiming” feature should be switched on [31]. Next, point coordinates **p*** _{i,j,k}* in “noisy” image are calculated by adding random values with the standard deviation ${\sigma}_{\text{p}}$. It makes sense for simulating the overall impact of discretization, sensor noise and aberrations on the uncertainty of image coordinates. The obtained coordinates

**p**

*for all positions of calibration targets are transferred to the calibration algorithm for the chosen camera model which calculates the vector*

_{i,j,k}**k**. To simulate the measurements, the coordinates

**p**

*are calculated for the points of 3D grid ${\overline{x}}_{\text{w}j,k}$ in the same way. Together with the vector*

_{i,j,k}**k**, they are necessary to find 3D point coordinates ${\widehat{x}}_{\text{w}j,k}$ using the 3D reconstruction algorithm. Thus, we obtain the distorted 3D grid which can be used to assess the error of 3D coordinate measurements across the working volume: ${\widehat{x}}_{\text{w}j,k}-{\overline{x}}_{\text{w}j,k}$. We also estimate the error for typical measurements conducted by stereoscopic imagers such as segment length and area of triangle. For example, two neighboring points of 3D grid ${\widehat{x}}_{\text{w}j,k}$ and ${\widehat{x}}_{\text{w}j+1,k}$ are used to calculate segment length ${\widehat{r}}_{j,k}$ (the segment has the indices

*j*,

*k*of the first point). The “true” value ${\overline{r}}_{j,k}$ for the measurement is defined by ${\overline{x}}_{\text{w}j,k}$ and ${\overline{x}}_{\text{w}j+1,k}$, and the error is ${\widehat{r}}_{j,k}-{\overline{r}}_{j,k}$.

#### 3.3 Comparison of camera models by estimation of systematic errors

Computer simulation of geometrical calibration should take into account maximum details of real calibration procedure. We use the flat calibration target with chessboard pattern of 24 × 24 squares, so it is represented by 25 × 25 points grid. For the designed miniature prism-based system, three calibration targets with 0.5 mm (small-sized), 1 mm (middle-sized) and 2 mm (large-sized) chessboard square size are necessary due to sufficient range of distances and magnifications [17,21]. Each target is placed at 6 positions including ones perpendicular to the optical axis of the lens (*z*-axis) and rotated by 30° around transverse axes (*x*, *y*) to cover the total range of distances from 10 to 35 mm as shown in Fig. 6(a). The coordinates of image points for all positions of calibration targets were calculated using Zemax as described above without adding noise for Variant 1 and 2 at the wavelength 587.6 nm. We deleted some points in the areas near image edges and the center area where two sub-images overlap. The overlay of image points for Variant 1 is shown in Fig. 6(c) as an example. We used the collected data to calculate the parameters of all considered camera models.

To simulate the acquisition of test sequences for measurements [17], the calibration target was placed perpendicular to *z*-axis and shifted along *z*-axis with 1 mm step (see Fig. 6(b)). The series contained 9 positions for small-size target and 13 positions for medium-sized one in the range from 10 to 26 mm. The coordinates of image points were used to calculate the 3D coordinates using each of the considered camera model. As an example, the distorted 3D grid obtained using models ‘Usual_IR’ and ‘Prism_IR’ are shown in Fig. 7 for Variant 1. One can see that using ‘Usual_IR’ leads to severe errors.

The 3D error ${\widehat{x}}_{\text{w}j,k}-{\overline{x}}_{\text{w}j,k}$ is hard to visualize, so we calculated RMS of the distance $\Vert {\widehat{x}}_{\text{w}j,k}-{\overline{x}}_{\text{w}j,k}\Vert $ for all *j*, *k*. The results are presented in Table 2.

We have calculated the uncertainty of measured segment length as stated above. We divided the obtained 3D data into zones according to the distance along the *z*-axis and calculated mean and standard deviation (STD) of the segment length along *x*, *y* and *z* axes for every zone. These values depend only on the distance *z* to calibration target and are illustrated easily in comparison to 3D error distributions. The results correspond to systematic error caused by insufficient description of the optical system provided by the considered camera models. The STD values actually show the differences in the measurement of segment length at the specified distance across the field of view. The results are shown in Fig. 8 for Variant 1 and in Fig. 9 for Variant 2. The bottom rows of the figures present the models which use the ray tracing through the prism, while the top rows present the other ones. ‘Poly_IP’ is duplicated in the bottom rows because it provides the performance similar to the ray tracing models.

One can see that all models, which do not use ray tracing through the prism, lead to significant systematic errors except ‘Poly_IP’. The reason is that the proper description of distortion and pupil aberrations induced by the prism requires high-order cross terms of polynomials which are not presented in other models. Although ‘Prism_I’ and ‘Prism_IR’ do not have the parameters to consider the pupil aberration of the lens separately, they compensate these aberrations by changing the thickness of the prism (the calibrated thickness reaches 3...4 mm while the nominal value is 1.7 mm). For Variant 2, both ‘Prism_IR’ and ‘Prism_IRP’ provide the same negligible values of systematic errors. However, adding the parameters to consider the pupil aberration is necessary for Variant 1 because it has larger distortion and pupil aberrations of the lens. Thus, ‘Prism_IRP’ allows to improve the result compared to ‘Prism_IR’ for Variant 1. As a result, we can use ‘Poly_IP’ and ‘Prism_IRP’ for Variant 1. ‘Poly_IP’, ‘Prism_IP’ and ‘Prism_IRP’ are suitable for Variant 2. Although the systematic errors for Variant 2 are lower than for Variant 1, the difference is insignificant compared to random measurement errors as will be shown in Section 3.6. Variant 1 provides better image quality, so we further use it as a base variant.

Since using ‘Prism_IRP’ requires the fixed value of the prism thickness, we should estimate the impact of this parameter on the calibration accuracy. We have repeated the computer simulation of calibration by varying the thickness in the range 1...3 mm (1.7 mm is nominal value) and concluded that it did not lead to significant changes (< 10%) in terms of mean and STD values.

#### 3.4 Analysis of resistance to image noise

To analyze the robustness of the calibration for different camera models, we performed the computer simulation with the non-zero values of ${\sigma}_{\text{p}}$ for the calibration sequence. The standard deviation was the same for all coordinates and all points. All other conditions did not change. The 3D coordinates were calculated using noise-free test sequence. The results of 50 trials using two best models (‘Poly_IP’ and ‘Prism_IRP’) for Variant 1 with ${\sigma}_{\text{p}}=1$ are shown in Fig. 10.

We can see that adding image noise significantly increases STD values for ‘Poly_IP’, so ‘Prism_IRP’ is more resistant to image noise. It is not surprising because ‘Poly_IP’ has much more parameters and this apparently leads to overfitting.

#### 3.5 Tolerance analysis

We have performed tolerance analysis for Variant 1 of the designed system using Zemax. In particular, we have considered the following non-strict tolerances: ± 3 rings for surface radii; ± 0.3 rings for surface irregularity; ± 0.07 mm for thicknesses of optical elements and air spaces; ± 0.04 mm for element decentering; ± 0.04 degrees for element tilt; ± 0.2 degrees for prism rotation about *z* axis; ± 0.04 mm for decentering of spherical surfaces; ± 0.12 degrees for tilt of flat surfaces (including base angle of the prism); ± 0.05 degrees for pyramidal angle of the prism; ± 0.0003 for refractive indices; ± 0.3% for Abbe numbers. These tolerances were used to generate 50 optical systems with random variations of the design parameters. Next, we repeated the computer simulation of calibration and measurements for each of these systems. The results of 50 trials are shown in Fig. 11.

Both ‘Prism_IRP’ and ‘Poly_IP’ demonstrate approximately the same stability. Thus, although ‘Prism_IRP’ includes only radial distortion and cannot describe lens decentering, its performance is just slightly worse than ‘Poly_IP’ which does not have this limitation.

#### 3.6 Summary of computer simulations for different factors

Though the dependence of mean and STD on the distance *z* helps to analyze error trends and measurement capabilities of the system, it does not suit well as a merit function. To summarize and compare the results for different factors, we have calculated the mean value and 95% confidence interval of length measurement error for the 1 mm segments over the whole measurement volume of the designed system (Variant 1). The boundaries of confidence intervals were defined as the quantiles of simulated data set for the cumulative probabilities 0.025 and 0.975. The results obtained by noise-free simulation of calibration and measurements for each of camera models are shown in Fig. 12 as ‘nominal’ (blue color). In the same way, the confidence intervals were calculated for the tolerance analysis (‘tolerances’, green color) and the “noisy” image coordinates with ${\sigma}_{\text{p}}=1$ for the calibration sequence (‘noise-calib.’, yellow color). In both cases, the results are shown for all 50 trials and for the worst trial.

The impact of systematic errors induced by non-ideal camera models and other factors should be compared with random measurements errors. The confidence interval for random errors caused by image coordinates uncertainty can be estimated using Monte-Carlo simulation or other techniques [32]. We have calculated these intervals for length measurement errors by means of the proposed computer simulation with noise-free calibration and “noisy” test sequences. The results obtained with ${\sigma}_{\text{p}}=0.1$ and ${\sigma}_{\text{p}}=0.2$ are shown in Fig. 12 for ‘Prism_IRP’ and ‘Poly_IP’ (‘noise-meas.’, red color). If the systematic errors for the best of the analyzed camera models are low, there is no need to calculate random errors for other models. In Fig. 12, we have done this for illustrative purposes.

As a summary of the analysis, we conclude that Variant 1 is the appropriate design of miniature prism-based stereoscopic system. Although it has significant distortion and pupil aberrations, they are properly taken into account by ‘Prism_IRP’ and ‘Poly_IP’ camera models. The analysis of other factors has confirmed that the parameters of these models can be successfully calculated using the standard calibration procedure and the developed system can provide the desired measurement accuracy.

The mean value of measurement error and boundaries of its confidence interval can be used to define a merit function for evaluating appropriateness of camera models, calibration equipment and procedures. These values can be estimated for 3D coordinates, segment length, area or other required measurements by the proposed computer simulation technique using optical design software after the initial optical system design has been implemented. The estimation of the intervals for systematic and random measurement errors caused by tolerances, image noise and other factors allows performing analysis and optimization required for the joint design approach.

## 4. Experimental verification

The prototype of the designed prism-based stereoscopic system (Variant 1) has been manufactured and used for experiments to verify the computer simulation results. Three calibration targets with chessboard pattern have been produced by chrome etching on glass with inaccuracy about 1 μm. The implemented calibration method does not require precise positioning of targets and specific equipment for it, but we tried to repeat the positions used in the computer simulation. For this purpose, we used the rotation table and the linear translation stage to provide 30° rotation around transverse axes and set approximately the same distances from the protective glass to the calibration target. To capture test sequences, the linear translation stage was used to shift target along the *z*-axis. Figure 13 shows the calibration setup and a few images of the test chart used for calibration.

After obtaining calibration and test sequences, we processed them to calculate image coordinates for each chessboard node. Our software for image processing allows setting the initial correspondence between image points and points on calibration target using two specific white-square marks (see Fig. 13(b)) and further to refine image coordinates using the technique described in [33]. Since the matching between image points and points on calibration target has been done for both image parts, the stereo correspondence for left and right half-images was obtained automatically. The results obtained by calibration and measurements of segment length for all considered models are presented in Fig. 14.

We should note that calculated 3D coordinates were originally obtained in the CS of camera. Its origin is purely virtual and defined as the result of calibration. Thus, this CS is different from WCS used to define *z*-coordinates in Figs. 8–11, WCS origin has been set in the center of the aperture stop. The transformation *E*^{−1} from the CS of the camera to WCS for the experimental data is unknown. In order to make *z*-coordinates in Fig. 14 corresponding to the previous figures, we applied the approximate transformation based on the measured distance from the calibration target to the protective glass and the nominal (design) distance from the protective glass to the aperture stop. We chose orientation of the calibration targets so that the grid lines were approximately parallel to *x*- and *y*-axis of the stand and used the distance between chessboard nodes as *x*- and *y*-segments. Points to measure the segment along *z* axis were taken either from the previous or from the next image of the series. The measured segments are not exactly aligned along *x*, *y* and *z* axes in any CS, but we consider this data to be sufficient to analyze measurement uncertainty for different orientation of segments and compare it with the results of the computer simulation.

The experimental results demonstrate the same trends as the computer simulation (for Variant 1). ‘Prism_IRP’ and ‘Poly_IP’ provide better measurement accuracy compared to other models. The coincidence of the results is better for shorter distances where the impact of random errors is lower and the measurement accuracy is mainly determined by systematic errors.

For a quantitative comparison of simulation and experimental results, we have calculated the mean value and 95% confidence interval of length measurement error for the 1 mm segments over the whole measurement volume as before (see Fig. 15). By comparing Figs. 12 and 15, we can conclude that the choice of the optimal camera model by means of the computer simulation and the experiments coincides. The measurement accuracy provided by the prototype of the developed system is acceptable.

## 5. Conclusion

In this paper, we have shown that computer simulation using optical design software is a powerful tool for the development of stereoscopic measurement systems. Instead of two sequential stages of optical hardware and image processing software development, it allows to optimize simultaneously both, optical design and algorithms. We have demonstrated that the proposed approach allows estimating systematic errors of 3D measurements and evaluating appropriateness of camera models and calibration equipment for the designed optical system.

The developed algorithm may be applied for Monte Carlo simulation to estimate the impact of tolerances, temperature and other deformations and many other factors on the stereo image quality and calibration correctness as well as to calculate random errors of 3D measurements caused by the errors in markers positions definition, the errors caused by test object imperfectness, etc. This approach may be also used for the comparison of calibration procedures, modeling of multi-spectral calibration for narrow-band stereoscopic imagers and other tasks.

To show the practicability of the proposed approach, we discuss the design of specific miniature prism-based stereo systems, which are widely used in industrial video endoscopes for remote visual inspection and 3D measurements. After comparison of several camera models, we have shown that the ray tracing model modified with respect to consider lens pupil aberrations provides the best accuracy and robustness. We have also shown that changing the optical system design parameters might lead to the choice of another camera model.

Though we have demonstrated the effectiveness of the proposed approach for this specific imager, similar simulations can be carried out for many stereoscopic and other measurement systems.

The final step to complete the implementation of this approach would be applying the technique to estimate uncertainty of 2D image point coordinates based on optical system aberrations. This would make possible to optimize the optical system parameters using merit function directly based on 3D measurement errors.

## Funding

Russian Science Foundation (project 17-19-01355).

## References

**1. **C. Zhou and S. K. Nayar, “Computational cameras: convergence of optics and processing,” IEEE Trans. Image Process. **20**(12), 3322–3340 (2011). [CrossRef] [PubMed]

**2. **J. N. Mait, G. W. Euliss, and R. A. Athale, “Computational imaging,” Adv. Opt. Photonics **10**(2), 409–483 (2018). [CrossRef]

**3. **N. Pears, Y. Liu, and P. Bunting, *3D imaging, analysis and applications* (Springer-Verlag, 2012).

**4. **D. G. Stork and M. D. Robinson, “Theoretical foundations for joint digital-optical analysis of electro-optical imaging systems,” Appl. Opt. **47**(10), B64–B75 (2008). [CrossRef] [PubMed]

**5. **D. G. Stork, “Toward a signal-processing foundation for computational sensing and imaging: electrooptical basis and merit functions,” APSIPA Trans. Signal. Inf. Process. **6**, E8 (2017). [CrossRef]

**6. **K. Kubala, E. Dowski, and W. Cathey, “Reducing complexity in computational imaging systems,” Opt. Express **11**(18), 2102–2108 (2003). [CrossRef] [PubMed]

**7. **M. D. Robinson and D. G. Stork, “Joint digital-optical design of superresolution multiframe imaging systems,” Appl. Opt. **47**(10), B11–B20 (2008). [CrossRef] [PubMed]

**8. **O. S. Cossairt, D. Miau, and S. K. Nayar, “Scaling law for computational imaging using spherical optics,” J. Opt. Soc. Am. A **28**(12), 2540–2553 (2011). [CrossRef] [PubMed]

**9. **A. Agrawal, Y. Taguchi, and S. Ramalingam, “Beyond Alhazen’s problem: analytical projection model for non-central catadioptric cameras with quadric mirrors,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2011), pp. 2993–3000. [CrossRef]

**10. **L. Yu and B. Pan, “Structure parameter analysis and uncertainty evaluation for single-camera stereo-digital image correlation with a four-mirror adapter,” Appl. Opt. **55**(25), 6936–6946 (2016). [CrossRef] [PubMed]

**11. **S. Barone, P. Neri, A. Paoli, and A. V. Razionale, “Catadioptric stereo-vision system using a spherical mirror,” Procedia Structural Integrity **8**, 83–91 (2018). [CrossRef]

**12. **Z. Chen, K.-Y. K. Wong, Y. Matsushita, and X. Zhu, “Depth from refraction using a transparent medium with unknown pose and refractive index,” Int. J. Comput. Vis. **102**(1–3), 3–17 (2013). [CrossRef]

**13. **S.-H. Baek and M. H. Kim, “Stereo fusion: combining refractive and binocular disparity,” Comput. Vis. Image Underst. **146**, 52–66 (2016). [CrossRef]

**14. **X. Cui, K. B. Lim, Q. Guo, and D. Wang, “Accurate geometrical optics model for single-lens stereovision system using a prism,” J. Opt. Soc. Am. A **29**(9), 1828–1837 (2012). [CrossRef] [PubMed]

**15. **K. Genovese, L. Casaletto, J. Rayas, V. Flores, and A. Martinez, “Stereo-digital image correlation (DIC) measurements with a single camera using a biprism,” Opt. Lasers Eng. **51**(3), 278–285 (2013). [CrossRef]

**16. **L. Wu, J. Zhu, and H. Xie, “Single-lens 3D digital image correlation system based on a bilateral telecentric lens and a bi-prism: validation and application,” Appl. Opt. **54**(26), 7842–7850 (2015). [CrossRef] [PubMed]

**17. **A. V. Gorevoy and A. S. Machikhin, “Optimal calibration of a prism-based videoendoscopic system for precise 3D measurements,” Comput. Opt. **41**(4), 535–545 (2017). [CrossRef]

**18. **P. Sturm, S. Ramalingam, T.-P. Tardif, S. Gasparini, and J. Barreto, “Camera models and fundamental concepts used in geometric computer vision,” Found. Trends Comput. Graph. Vis. **6**(1-2), 1–183 (2010). [CrossRef]

**19. **Z. Huang, J. Bai, and X. Y. Hou, “Design of panoramic stereo imaging with single optical system,” Opt. Express **20**(6), 6085–6096 (2012). [CrossRef] [PubMed]

**20. **T. Matsuzawa, “Camera calibration based on the principal rays model of imaging optical systems,” J. Opt. Soc. Am. A **34**(4), 624–639 (2017). [CrossRef] [PubMed]

**21. **A. S. Machikhin, A. V. Gorevoy, D. D. Khokhlov, and A. O. Kuznetsov, “Modification of calibration and image processing procedures for precise 3D measurements in arbitrary spectral bands by means of a stereoscopic prism-based imager,” Opt. Eng. **58**(3), 033104 (2019). [CrossRef]

**22. **A.-S. Poulin-Girard, X. Dallaire, S. Thibault, and D. Laurendeau, “Virtual camera calibration using optical design software,” Appl. Opt. **53**(13), 2822–2827 (2014). [CrossRef] [PubMed]

**23. **A.-S. Poulin-Girard, S. Thibault, and D. Laurendeau, “Influence of camera calibration conditions on the accuracy of 3D reconstruction,” Opt. Express **24**(3), 2678–2686 (2016). [CrossRef] [PubMed]

**24. **J. Li, F. Xing, D. Chu, and Z. Liu, “High-accuracy self-calibration for smart, optical orbiting payloads integrated with attitude and position determination,” Sensors (Basel) **16**(8), 1176 (2016). [CrossRef] [PubMed]

**25. **Z. Zhang, “Flexible camera calibration by viewing a plane from unknown orientations,” in Proc. International Conference on Computer Vision (IEEE, 1999), pp. 666–673. [CrossRef]

**26. **J. Geng and J. Xie, “Review of 3-D endoscopic surface imaging techniques,” IEEE Sens. J. **14**(4), 945–960 (2014). [CrossRef]

**27. **A. S. Machikhin, V. I. Batshev, A. V. Gorevoy, D. D. Khokhlov, A. A. Naumov, and A. O. Kuznetsov, “Compact stereoscopic prism-based optical system with an improved accuracy of 3-D geometrical measurements,” Optik (Stuttg.) **185**, 1172–1181 (2019). [CrossRef]

**28. **V. Batshev, A. Machikhin, and Y. Kachurin, “Stereoscopic tip for a video endoscope: problems in design,” Proc. SPIE **10466**, 104664D (2017).

**29. **R. I. Hartley and A. Zisserman, *Multiple view geometry*, 2nd ed. (Cambridge University, 2004).

**30. **J. Kannala and S. S. Brandt, “A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses,” IEEE Trans. Pattern Anal. Mach. Intell. **28**(8), 1335–1340 (2006). [CrossRef] [PubMed]

**31. **H. Sun, *Lens design: a practical guide* (CRC, 2016).

**32. **A. V. Gorevoy, V. Y. Kolyuchkin, and A. S. Machikhin, “Estimation of the geometrical measurement error at the stage of stereoscopic system design,” Comput. Opt. **42**(6), 985–997 (2018). [CrossRef]

**33. **J. Mallon and P. F. Whelan, “Which pattern? Biasing aspects of planar calibration patterns and detection methods,” Pattern Recognit. Lett. **28**(8), 921–930 (2007). [CrossRef]