Low-cost miniature wide-angle imaging for self-motion estimation

Christel-Loic Tisse

doi:10.1364/OPEX.13.006061

1. Introduction

Over the past five years or so there has been considerable interest in the design and development of fast imaging sensors with wide field-of-view (FOV). Such omnidirectional imaging systems are useful in a variety of military and civilian applications including surveillance, security, video conferencing and robotics. There are typically two methods for increasing the FOV of imaging systems [1]: (i) stopping down the entrance pupil and thus increase the relative aperture of the system and (ii) adding multiple optical elements to balance off-axis aberrations out to edge of the field which results in higher complexity. For small-scale lens systems, the space bandwidth product (i.e., the number of transported pixels) is also critical [2]. Currently, there exists a growing need for small, lightweight, task-specific wide-angle imaging system to be embedded on mobile application platforms. Although the processes of widening the FOV and downscaling the overall size of the optics are real challenges for the optical designer, their difficulty depends on the relationship between specific tasks, the information-processing algorithms and the type of eye to be used (i.e., task-specific sensing paradigm).

In this letter, the principle of task-specific sensing is applied to the task of self-motion (or egomotion) estimation in micro-unmanned-aerial-vehicles (micro-UAVs). Micro-UAVs are autonomous flying drone with a typical wingspan on the order of tens of centimeters that could benefit by using such a miniature, wide-FOV visual sensor for guidance, flight control and stabilization. In micro-UAVs, the concept of joint optimization of sensor hardware and signal-processing algorithms is required to meet stringent requirements in terms of reduced payload (typically a few grams), limited size, low power budget, trade-off between motion sensing accuracy and available computational power. We present here the design of a low-cost, compact, hemispherical eye sensor for egomotion estimation. This design results from a nontraditional, top-down approach: the methodology for processing the optical information has guided the way of sensing it.

The rest of this letter is organized as follows. Section-2 defines the relevant visual information required to solve the self-motion estimation task and its relationship with the camera design. Section-3 outlines first prior work on wide FOV imaging system. We describe then the design of our hemispherical eye sensor that integrates field-widened pinhole optics and last CMOS imager technology. Preliminary performance simulations and laboratory results are also presented. We conclude in Section-4 with a few observations and a discussion of future work.

2. Motion estimation from spherical images

The relationship between camera design and the problem of recovering egomotion have been studied [3]. There are two principles relating camera design to performance in egomotion estimation: the stability of the estimation and its linearity. The former improves as the number m of rays captured from each scene point increases; it relates scene dependence of the motion estimation. The latter improves as the FOV gets wider. As noted in [3], polydioptric spherical eyes are very well suited for motion sensing and they offer a number of advantages over conventional single-pinhole cameras with a narrow FOV. The reason is that a polydioptric spherical eye incorporates the typical properties of insects’ apposition compound eyes such as omni-directional distribution of discrete light receptors in a spherical FOV and overlapping local receptive fields of the individual receptor units (cf m>1). Prototypes of polydioptric spherical eyes have been reported in the literature based on: one photoreceptor per view direction [4,5]; assemblies of glass fiber bundles [6]; two dimensional arrays of curved gradient index (GRIN) lenses [7]; and microlens array with insertion a lattice of vertical walls [8]. However, despite the recent advances in the fabrication of artificial ommatidia [9], the imaging units of insects’ compound eyes, the construction of a small, wide-FOV artificial compound eye remains a significant challenge. Moreover, since compound eyes create a mosaic of partially overlapping unit images, the degree of complexity of image interpretation is often increased substantially.

Single-viewpoint imaging systems have a unique center of projection which permits the generation of geometrically correct perspective images. The single-viewpoint constraint means that every pixel in the sensed images measures the irradiance of the light passing though the viewpoint in one particular direction. When the imaging system is spherical in its FOV, a single effective viewpoint allows the construction of geometrically correct spherical images. Although the computation of egomotion estimation from a single-viewpoint spherical camera is inherently scene-dependent, it is still well-understood, stable and robust [10,11]. The theory underlying the viewing geometry of the spherical camera is reviewed below.

Consider that the M×N photoreceptors of a single-viewpoint spherical camera are arranged onto a unit sphere (see Fig. 1(a)), such that each pixel defines a viewing direction d_i (i=1,2,…M×N). For the spherical camera moving rigidly with respect to its 3D static environment with translational and angular velocities, respectively T=(T_x,T_y,T_z ) and R=(R_x,R_y,R_z ), the equation that relates the spherical motion field p_i (motion parallax) to the ego-motion parameters is given by [12]

p_{i} = \frac{\partial d_{i}}{\partial t} = - \frac{1}{D_{i}} (T - (T \cdot d_{i}) d_{i}) - R \times d_{i}

where D_i is the distance between the unit sphere and the object pointed by the camera in the direction d_i . Although p_i , is a 3D vector (tangential to sensor surface), it is constrained to be orthogonal to the unit vector d_i and can be described as a 2D tangent vector in a local coordinate system after projective transformation (i.e., the tangential optical flow field p_i =[∂θ/∂t;∂ϕ/∂t]^T_{d_i}, =x_i .u+y_i .v).

In Eq. (1) we can notice that p_i , is the sum of one rotational flow and one translational flow. The translational component is inversely proportional to the range of a scene point D_i . Therefore only the direction of translation t=T/|T| (also known as the focus of expansion) can be estimated. By replacing the translational velocity vector in Eq. (1) by t and eliminating the range D_i (since t×d_i is perpendicular to t-d_i ), one obtains the epipolar constraint Eq. (2) as follows.

(t \times d_{i}) \cdot (p_{i} + R \times d_{i}) = (t \times d_{i}) \cdot (- \frac{1}{D_{i}} (t - (t \cdot d_{i}) d_{i})) = 0 .

A set of optic flow vectors can then be used to estimate egomotion either by minimizing deviation from the epipolar constraint Eq. (2) (e.g., iterative method described in [13]) or through the simplified linear model of fly tangential neurons described in [14] using Eq. (1).

Fig. 1. Spherical Sensor model: illustration of the image formation and the measurements of the optic flow p_i induced by the sensor motion. θ and ϕ are respectively azimuthal and polar angles.

Download Full Size | PDF

2.1 Influence of the field-of-view

A narrow FOV makes the motion estimation difficult because rotation and translation may induce similar flows (cf. aperture problem in planar camera). This confusion in determining the motion parameters due to the FOV, easy to understand intuitively, has been theoretically investigated [15,16,17]. Geometric interpretations of this ambiguity for planar cameras are known as the line and the orthogonality constraints. These constraints described in [16] are presented below.

In the case of a spherical sensor, assuming that the intensity I_i(t) of the image point in the direction d_i(θ,ϕ) keeps the same during dt, we obtain the image brightness constancy constraint

- \frac{\partial I_{i}}{\partial t} = \frac{\partial I_{i}}{\partial d_{i}} \cdot p_{i} .

By inserting the epipolar constraint Eq. (2) in Eq. (3), we get

\frac{\partial I_{i}}{\partial t} = \frac{\partial I_{i}}{\partial d_{i}} \cdot \frac{t}{D_{i}} + (d_{i} \times \frac{\partial I_{i}}{\partial d_{i}}) \cdot R .

First, since the motion field projection onto the gradients ∂I_i/∂t lies on the plane (u,v) tangent to the spherical sensor, for a narrow FOV (i.e,. little variations of d_i ), the component (t∙d_i)d_i does not contribute to the first term (∂I_i/∂d_i)∙(t/D_i) into Eq. (4). Therefore the component of the direction of translation t parallel to d_i cannot be accurately recovered. Secondly, we see that the set of vectors {∂I_i/∂d_i, d_i×∂I_i/∂d_i } forms an orthogonal basis. Therefore, for a narrow FOV, a rotational error R_e =(R_xe,R_ye,R_ze) in the estimation of the rotation vector R can compensate a translational error t_e =(θ_e,ϕ_e,0) in the estimation of t without violating the constraint in Eq. (4). This is true as long as the projections of R_e and t_e onto the tangent plane (u,v) at d_i are orthogonal.

If we now increase the FOV, all these ambiguities disappear because (i) projections of the translation vector t are available in highly different directions (e.g., θ∈ [-π/2;π/2] and ϕ∈ [0;π] for a hemisphere of viewing direction) and (ii) the orthogonality constraint on the rotational and translational errors cannot be satisfied in all the directions. Thus, for a very large FOV, distinct motion fields can be expected even in the presence of noise.

2.2 Robustness to noise

At the beginning of this section, we reviewed how computing self-motion from a set of motion field vectors on a sphere (or hemisphere). Solving the self-motion estimation task requires a prior step: the computation of optic flow. As noted in [10], dense and accurate optic flow estimates (e.g., a motion vector for each pixel) from pure perspective images (i.e., generated through an optical system with single centre of projection) are desirable.

Traditional omnidirectional image sensors make use of planar imager, which allows the camera to capture wide-angle images with non-linear radial distortions as illustrated in Fig. 2. That is why a spherical lens model cannot be used to characterize the optical unit surface and describe the perspective transformation. Nevertheless the formation of such non-linear image distortions are usually governed by two postulates: (i) all distortions inherent in the optical unit are radially symmetric and (ii) the relationship between the azimuth θ and the image coordinates (x,y) is invariant (i.e., θ=arctan(y/x)). These assumptions allow mapping the optic flow vectors pi’ computed on the image plane to a sphere without introducing artifacts, using the following transformation [10]

P_{i} = [\begin{matrix} \frac{\partial θ}{\partial x} & \frac{\partial θ}{\partial y} \\ \frac{\partial ϕ}{\partial x} & \frac{\partial ϕ}{\partial y} \end{matrix}] [\begin{matrix} \frac{\partial x}{\partial t} \\ \frac{\partial y}{\partial t} \end{matrix}] = [\begin{matrix} - \frac{y}{x^{2} + y^{2}} & \frac{x}{x^{2} + y^{2}} \\ \frac{\partial ϕ}{\partial x} & \frac{\partial ϕ}{\partial y} \end{matrix}] p_{i}'

where the partial derivatives ∂ϕ/∂x and ∂ϕ/∂y in the Jacobian matrix depends on the projection function f(ϕ);ϕ:ℜ ₊ → r: ℜ ₊ given by the geometry of the imaging device. As a matter of fact, the primary source of errors in the computation of egomotion comes from the accuracy of optic flow techniques and their sensitivity to noise.

Fig. 2. Perspective transformation model of a wide-angle imaging system; radial image distortions are non-linear; the projection function f(ϕ);ϕ:ℜ ₊ → r: ℜ ₊ can be derived from the optical unit geometry; for imaging devices that have a locus of projection centers (e.g. fisheye lens), f is characterized by calibration.

Download Full Size | PDF

The image brightness constancy constraint Eq. (3) forms the basis of gradients-based optic flow methods. These differential methods have replaced the well-known Elementary Motion Detector (cf. Reichardt correlator) because they perform well when considering the robustness to noise [18,19], provide a dense optic flow field and permit “real-time” operation. For instance, by regularizing the ill-conditioned Eq. (3) using a linear combination of a set of overlapped basis functions to model the optic flow instead of a global cost function based on local and/or local field smoothness factor(s), S. Srinivasan et al. [18] have shown that it is still possible to compute accurately the optical flow on a pixel-wise basis (100% dense flow field) at signal-to-noise ratio (SNR) of 20dB. Unlike conventional techniques, S. Srinivasan’s et al. approach has no recourse to iterative search or multi-resolution technique. Eq. (3) is basically linearised by modeling the motion fields x_i and y_i in terms of a weighted sum of basis functions (e.g., cosine window). The least-squares solution (in terms of its optimality quantified by the bias and covariance of the estimates) of the resulting linear system gives an accurate estimation of the weights that constitute the model parameters. At noise level of 20db, the mean value of the flow vector error angle (over all image points and measured on real data) varies typically between 2.62° and 4.28°.

3. Low-cost hemispherical pinhole camera

In the previous section we have discussed the type of visual information necessary to solve the egomotion estimation problem. It is highly desirable that the imaging system has a single viewpoint and a wide FOV. Existing optical solutions to increase the camera’s FOV include: traditional fisheye lens and catadioptric imaging system [22]; distributed camera systems [20]; liquid crystal spatial light modulators [21]; and spatial superposition of erect images created by adjacent imaging channels [2]. However, these designs do not allow either further miniaturization to meet payload requirements of micro-UAVs, or a simultaneous acquisition of a true hemispherical image (i.e., near 180° azimuth and elevation or greater) in order to process the dense optic flow on the surrounding visual fields. The feasibility of providing micro-UAVs with a visual microsensor for self-motion estimation remains consequently an open research question.

We examine below successively (i) the sensitivity of currently available CMOS sensors, (ii) the capabilities of pinhole imaging, and (iii) how we can build a miniature hemispherical eye by combining these two technologies.

3.1 Photon shot noise limit in CMOS imager

To date, CMOS (Complementary Metal Oxide Semiconductor) image sensors have been unable to match CCDs (Charge-Coupled Devices) in resolution and image quality. However, recent improvements in CMOS imaging technology allow the production of high-quality image capture devices that consume much less power than CCDs; a particularly important advantage for mobile applications such as in micro-UAVs. In addition, with CMOS technology, signal processing can be integrated directly on the chip which results in a more compact system that decreases defects, increases reliability and reduces the cost of assembly.

In CMOS imagers, potential noise sources are present from the photodiodes through the read-out circuitry [23]. CMOS imagers’ performance relies on noise suppression processes and can be specified in terms of the lowest light level at which the sensor produces an image with a minimally acceptable signal-to-noise ratio (SNR). Assuming conditions in which photon shot noise dominates over dark current, fixed pattern, and read-out noises, the absolute limit for noise reduction is given by the Poisson statistics of the incoming light (i.e., random process of photons detection):

SNR \approx \sqrt{\bar{n} η}

where n̄ is the average arrival rate of photons reaching a pixel and η is the external quantum efficiency. η relates the fraction of the incident photon flux that contributes to the photocurrent in the pixel as a function of wavelength; it comprises both internal quantum efficiency (silicon substrate capability to convert the light energy to electrical energy) and optical efficiency (light sensitivity depends on the geometric arrangement of the photo-detector within an image sensor pixel and the pixel location with respect to the imaging optics axis). The signal-to-quantization noise ratio gives another upper limit of approximately 6×N decibels for an N-bit resolution imaging system (cf. SQNR =20∙ log (2^N/0.5) ≈ 6.02 N dB).

From Eq. (6), given a minimum expected SNR, the minimum number of photons n that reach a single pixel within the exposure time Δt determines the minimal required illuminance I onto the image plane. This amount of light I incident on the photocells of the sensor is also affected by the light gathering capability of the imaging lens. For instance, by assuming that the lens transmittance is perfect, then the light throughput is inversely proportional to the square of the relative aperture F _/# [24].

3.2 Pinhole optics design and performance

Consideration of optimum performance of pinhole optics has been discussed in [25,26]. The diffraction limit depends on the pinhole aperture; the precise diameter Ø that yields best image resolution is obtained for [26]

Ø = \sqrt{β \times \frac{λ d f}{d + f}} \approx \sqrt{β λ f} in the limit d >> f .

where λ is the wavelength, d the object distance, f the focal length (or image distance), and β the Petzval constant (β =2). Therefore, we approximate the relative aperture (ratio of the focal length f to the diameter Ø for lensless imaging) by

{(F / #)}^{2} \approx \frac{f}{β λ} \approx \frac{r}{β λ \tan (θ_{max})}

where θ_max is the maximum angle of incidence of light rays (i.e. half the FOV angle) and r defines the size of projection onto the sensor. Given this optimum configuration in the sense of diffraction limit minimization and using Eq. (6), the required level of scene illuminance I_o can be calculated through the equation I₀ = $F_{/ #}^{2}$ ×I = $F_{/ #}^{2}$ ×(α n h c)/(z ² Δt λ), where α is a conversion factor (α≈6.68×10¹⁴ for conversion between J/μm²/s to Lux), h the Planck’s constant, c the celerity, z the pixel size (in microns), Δt the integration time (or exposure time) of the CMOS sensor. The number of photons n is itself a function of the SNR, Δt, and η. The relative $F_{/ #}^{2}$ aperture depends on the parameters θ_max , r, and λ.

We present in Fig. 3 a prediction of the required level of scene illuminance I_o as a function of the exposure time Δt of the CMOS sensor. This estimation assumes that all photons in the visible range have roughly the same energy (e.g., λ ≈550nm), a typical pixel size of z ²=7.5×7.5μm² (i.e., 1/3” sensor with 640*480 pixels resolution), an average external quantum efficiency η=37% [27] and a attenuation of the optical efficiency as a function of the angle θ of rays’ incidence, like observed in [28]. We observe in Fig. 3 that, unlike the minimum number of photons per pixel per exposure that increases as a function of the angle θ, the minimal illuminance I_o of objects to be observed in a scene decreases monotonically as the angle Ө_max increases, until a minimum is reached at Ө_max = 45 degrees where f equals r. Nevertheless the maximum acceptable angular field may depend on the use of integrated micro-lens arrays and the sensor’s ability to tolerate or correct a large variation in exposure between the centre and the edge of the of the image plane. Figure 3 shows also that it is possible to capture high-contrasted images in an indoor environment (cf. typical level of illuminance in living room is within the range of 50 to 200 Lux) with SNR greater than 24dB by increasing the exposure time (e.g., Δt constrained to 20ms to avoid motion blur in the image).

Fig. 3. Minimum level of scene illuminance I₀ versus the exposure time Δt. This estimation is respectively plotted for (a) different rays incidence angle θ_max with a SNR fixed to 38dB and (b) for different expected SNRs at θ_ma =45°. Otherwise for all plots: λ≈550nm, z=7.5μm, r=1.8mm (height of 1/3” sensor), η=37% and optical efficiency attenuation slope of -1.5%/deg.

Download Full Size | PDF

Other features of pinhole camera are the virtually infinite depth-of-field (DOF) and the hemispherical response (2π acceptance angle) which is restricted in practice by the size of the projection r. A simple technique to overcome this limitation consists of an inverted glass hemisphere attached underneath the pinhole [28]. In a preferred embodiment shown in Fig. 4, an additional miniature plano-convex (PCX) spherical lens is cemented on top the half-ball lens which result in lower ratio of the energy reflected to incident light and lower radial distortion at the edge of field.

Fig. 4. Field-widened pinhole camera; It consists of only three optical elements (half-ball + aperture disc + plano-convex lens) cemented together to form a compact optical system.

Download Full Size | PDF

Neither the hemisphere’s radius nor the PCX lens’ one is critical, which allows a further miniaturization. Indeed, using the notation of Fig. 4, we obtain the three following equations:

\sin θ_{a} = \frac{(\sqrt{1 - x^{2} + \cot^{2} θ} - x \times \cot θ)}{(1 + \cot^{2} θ)}

FOV = 2 (θ_{a} + θ_{IN})

θ = θ_{a} + θ_{OUT} .

where x ∈ [0; 1] is related to the ratio of the PCX lens’ centre thickness T to its radius R such that x=1-(T/R); Ө_a is the angle of the normal to the PCX lens’ surface relative to the central axis of the lens; ӨIN and ӨOUT are respectively the angles of incidence and refraction of incoming light rays; Ө is the angle of incidence of light rays at the sensor surface.

From Eq. (9), Eq. (10) and Eq. (11), the FOV and the reflectance of the field-widened pinhole camera are estimated as a function of x and Ө, as illustrated in Fig. 5(a) and Fig. 5(b). These plots assume same index of refraction N for both optical lenses and θ_max =45°. For each glass material represented, we choose x such that the reflectance does not exceed ~20%. Then we notice in Fig. 5(c) that the best FOV is nearly hemispherical. From the plots of Fig. 5(c), we can also estimate the field angle ϕ versus the ratio d/f where d is the distance between an image point projected onto the sensor and the principal point at the centre of the imager (d/f≤ 1 since f≈r, and d = f×tanθ). This is plotted in Fig. 5(d). We note the non-linear relationship between ϕ and d/f. The amount of radial distortion δϕ, the undistorted field angle ϕ₀ , and the estimated field angle ϕ are related by the equation ϕ =ϕ₀ +δϕ = k₁ ×(d/f)+δϕ, where k₁ ∈ ℜ ₊ (in degrees). The radial distortion is given by the ratio σ_radial=|ϕ-ϕ₀ |/ϕ₀ . The maximal radial distortion at the edge of field (d/f=1) is about σ_radial ≈10%. Finally, the plots in Fig. 6 show that using optical elements of different refractive indexes (PCX lens’ index of refraction N ₁ higher than half-ball’s one N ₂) to compensate radial distortion results in a narrower FOV which is not suitable for the self-motion estimation process. Symmetric aspherical microlens design has not been considered because this would require costly custom made optical elements.

Fig. 5. (a) Half-FOV as a function of x; (b) Reflectance as a function of x; (c) Half-FOV as a function of θ, with different PCX lens geometries such that the reflectance at the edge of field does not exceed ~20%; (d) Field angle ϕ as a function of the ratio d/f. These estimations are plotted for several refractive indexes (N).

Download Full Size | PDF

Fig. 6. Half-FOV as a function of θ when using a half-ball with a lower index of refraction than the PCX lens’ one.

Download Full Size | PDF

3.3 Limits to motion estimation

We have demonstrated that is possible to increase the FOV of a miniature imaging system by simply adding two micro lenses in a pinhole optical design at the expense of slower optics with non negligible radial distortion. According to our hypothesis about optic-flow-based motion estimator capable of performing an optimal and noiseless extraction of motion information from noisy images (cf., discussion in Section 2), we can say that the limitations of the proposed artificial eye to the reliability of visual information encoding are imposed by the diffraction resolution limits. It is relevant to roughly estimate the precision and range of motion measurements we can expect from our electro-optical system.

The performance of the proposed hemispherical pinhole eye is mainly affected by five features. Two of them are structural: (i) the optical diffraction limit δx ≈ λ.(f/Ø) which creates a blur of angular width δϕ_diff ≈ ½FOV.(δx/r), and (ii) the angular resolution δϕ_res ≈ ½FOV.(z/r), where z is the pixel pitch. The three other features are environmental: (iii) the amount of light available to the receptors (already previously discussed), (iv) the contrast of stimulus, and (v) the micro-UAV’s motion. In our design, an optimal pixel size is not guaranteed since the two angular limits can be slightly different. An approximation of the combined effect of δϕ_diff and δϕ_res (assumed Gaussian in profile: Point Spread Function, PSF) is obtained by calculating the acceptance angle δρ² = ${δ ϕ}_{diff}^{2}$ + ${δ ϕ}_{res}^{2}$ .

Now suppose we look at a contrast edge with intensity stepping from I=I₀ ×(1+C) to I=I₀ ×(1-C), where I₀ is the mean irradiance and C the contrast parameter. If we express photocell gain as the transduction from contrast to voltage, then a single photodetector element (with its optical axis aligned to both PSF and edge) that moves by an angle δӨ will see a change in contrast of roughly [31]

δ C^{2} = {(\frac{δI}{I_{0}})}^{2} \approx \frac{2 \cdot C^{2}}{\prod} \cdot {(\frac{δθ}{δρ})}^{2} .

Therefore, a lower limit to the angular displacement is obtained by determining δӨ at which the contrast signal to contrast noise ratio (SN_cR) for M≈∏.(r/δx)² transported pixels crosses ~24dB (cf. results in Fig. 3) using the approximation [32]

S N_{c} R = M \cdot \frac{δ C^{2}}{δN c^{2}} \approx M \cdot \frac{2 \cdot C^{2}}{\prod} \cdot {(\frac{δθ}{δρ})}^{2} \cdot \bar{n} Δ t .

where n̄ is the average arrival of photons to be transduced per photocell (similar to quantum bumps rate). This rough calculation gives δӨ_min ≈ 0.01° for conditions typical of realistic experiment: λ≈550nm, f= r = 1.8mm, z = 7.5μm, C = 0.1, M≈ 10⁵ and n̄×Δt≈ 2130 photons (room light conditions and Δt=20ms). This estimate leads to detectable angular velocity as low as δӨ/δt ≈0.5 °/s. However, due to the long integration time Δt of photoreceptors, the proposed eye is expected to be sensitive to temporal modulations up to 50Hz only. As a consequence, motion blur will occur in the digital image at angular velocities greater than δϕ_res /Δt ≈ 19 °/s. This upper limit shows that the only way that a micro-UAV achieves higher speed maneuvers using our hemispherical eye for self-motion estimation is to adapt image resolution according to angular velocity.

3.4 Experiments

In this section we present some first results obtained from the fabrication of our near hemispherical FOV visual microsensor using off-the-shelf components. The CMOS imager is the OV7411 single camera chip from OmniVision [30] which features 4.69×3.54mm² image area, 510×492 pixels resolution, 9.2 ×7.2μm² pixel size and 16ms maximal exposure time. The specifications of the optical components are listed below in Table 1. The precision optical aperture disc is placed at approximately 1.8mm away from the image plane.

Table 1. Specifications of the off-the-shelf components used for the realization of our optical sensor system.

View Table

In our first experiment, captured images are presented. Figure 7 shows a limited view of circular images characteristic of our wide-viewing angle imaging system. The measured FOV is about 165 degrees and the radial distortion at the edge of field is 17%. In our second experiment we have computed optic flow field for real-world sequences of images acquired in our laboratory (office room shown in Fig. 7). The depicted color plots in Fig. 8 are obtained for different single camera motions and by estimating optic flow with overlapped basis function [18] (software available at ftp.cfar.umd.edu/pub/shridhar/Software). In these sequences, the camera is translating parallel to the ground (forward or on the left) or rotating on a turntable (counter clockwise). The motion field induced by the motion of the camera includes velocities less than 3.5 pixels per frame. We can observe that our miniature imaging system is capable of producing dense measurements of 2D velocities vectors that qualitatively match the expected optic flow field of relatively simple motion. Undoubtedly, the near hemispherical FOV is still suboptimal, although in our experiments it gives acceptable results to remove ambiguities between translation T_y along Y-axis and rotation R_z about Z-axis. Indeed, the magnitude of the motion fields computed for R_z decreases slowly as the elevation angle ϕ increases, whereas the one computed for T_y is more homogenously distributed over the entire FOV.

Fig. 7. From left to right: A field-widened pinhole image acquired in our office (illuminance ~ 150 Lux), another image of a testing target (straight lines of chessboard grid are severely distorted), external view of the box-shaped testing target (chessboard pattern on every side).

Download Full Size | PDF

Fig. 8. Sample of computed optic flow field for four different single camera motions: (a) color code, where the color represents the direction ψ _pi=arctan(y_i /x_i ) of a flow vector p_i =[x_i ;y_i ]^T (e.g. red for ψ _pi=0°, yellow for ψ _pi=90°, green for ψ _pi=+/-180°, blue for ψ _pi=-90°) and the brightness shows its magnitude (i.e. darker as |p_i | diminishes); (b) translation along Y-axis; (c) rotation about Z-axis; (d) translation along X-axis; (e) rotation about X-axis (cf. reference frame in Fig. 6); (f), (g), (h) and (i) theoretical projections of the motion field for the corresponding cameras motions. This simple example illustrates that the measured 2D velocities vectors qualitatively match the expected optic flow field of simple motion.

Download Full Size | PDF

4. Conclusion and future work

We have shown the relationship between large FOV and computation of observer motion. We have seen that recent research in optic flow computation and motion estimation provide robust-to-noise algorithms to solve the egomotion problem. We have presented the theoretical design of a novel miniature optical sensor for egomotion estimation based on an advanced pinhole imaging system. This design enables to acquire near-hemispherical images with SNR>20dB in low-light conditions and with distortion at the edge of field not exceeding 10%. The proposed technique for increasing the FOV has allowed us to fabricate a low-cost prototype using inexpensive off-the-shelf micro optical components. The resulting thin, light imaging optics leads to a high degree of integration with the CMOS sensor, which meets all physical requirements of micro-UAVs. However, adaptive mechanisms need to be implemented to compensate the slow response of this egomotion sensor. This will result in a variable spatial resolution as a function of the self-motion velocities. We have applied noise-resilient differential techniques for the computation of optical flow on image sequences captured with the developed visual sensor. Our preliminary results indicate that such miniature field-widened pinhole cameras can yield promising performance for self-motion estimation on a micro flyer. Further research is now required to analyze the capabilities of a sensory-motor control system that integrates this technology.

Acknowledgments

This work is supported by the ARC Centre of Excellence programme, funded by the Australian Research Council (ARC) and the New South Wales state government.

References and links-

1. D.V. Wick, T. Martinez, S.R. Restaino, and B.R. Stone, “Foveated imaging demonstration,” Opt. Express 10, 60–65 (2002). [PubMed]

2. R. Volkel, M. Eisner, and K.J. Weible, “Miniaturized imaging system,” J. Microelectronic Engineering, Elsevier Science , 67-68, 461–472 (2003). [CrossRef]

3. J. Neumann, C. Fermuller, and Y. Aloimonos, “Eyes from eyes: new cameras for structure from motion,” in IEEE Proceedings of Third Workshop on Omnidirectional Vision (Copenhagen, Denmark, 2002), 19–26. [CrossRef]

4. T. Netter and N. Franceschini, “A robotic aircraft that follows terrain using a neuro-morphic eye,” in IEEE Proceedings of Conference on Intelligent Robots and Systems (Lausanne, Switzerland, 2002), 129–134. [CrossRef]

5. K. Hoshino, F. Mura, H. Morii, K. Suematsu, and I. Shimoyama, “A small-sized panoramic scanning visual sensor inspired by the fly’s compound eye,” in IEEE Proceedings of Conference on Robotics and Automation (ICRA, Leuven, Belgium, 1998), 1641–1646.

6. R. Hornsey, P. Thomas, W. Wong, S. Pepic, K. Yip, and R. Krishnasamy, “Electronic compound eye image sensor: construction and calibration,” in Sensors and Camera Systems for Scientific, Industrial, and Digital Photography Applications V, M. MBlouke, N. Sampat, and R. Motta, eds., Proc. SPIE 5301, 13–24 (San Jose, US Calif., 2004). [CrossRef]

7. J. Neumann, C. Fermuller, Y. Aloimonos, and V. Brajovic, “Compound eye sensor for 3D ego motion estimation,” in IEEE Proceedings of Conference on Intelligent Robots and Systems (Sendai, Japan, 2004).

8. J. Tanida, T. Kumagai, K. Yamada, S. Miyatake, K. Ishida, T. Morimoto, N. Kondou, D. Miyazaki, and Y. Ichioka, “Thin observation module by bound optics (TOMBO): concept and experimental verification,” Appl. Opt. 40 (11), 1806–1813 (2001). [CrossRef]

9. J. Kim, K.H. Jeong, and L.P. Lee, “Artificial ommatidia by self-aligned microlenses and waveguides,” Opt. Express 30, 5–7 (2005)

10. J. Gluckman and S.K. Nayar, “Egomotion and omnidirectional cameras,” in IEEE Proceedings of Conference on Computer Vision and Pattern Recognition (Bombay, India, 1998), 999–1005.

11. P. Baker, R. Pless, C. Fermuller, and Y. Aloimonos, “New eyes for shape and motion estimation,” in IEEE Proceedings of the first international Workshop on Biologically Motivated Computer Vision, Lectures Notes in Computer Science 1811, Springer-Verlag eds. (2000), 118–128. [CrossRef]

12. J.J. Koenderink and A.J. Van Doorn, “Facts on optic flow,” J. Biol. Cybern. 56, Springer-Verlag eds. (1987), 247–254. [CrossRef]

13. T. Tian, C. Tomasi, and D. Heeger, “Comparison of approaches to egomotion estimation,” in IEEE Proceedings of Conference on Computer Vision and Pattern Recognition (San Francisco, US Calif., 1996), 315–320.

14. M. Franz, J. Chahl, and H. Krapp, “Insect-inspired estimation of egomotion,” J. Neural Computation 16, 2245–2260 (2004). [CrossRef]

15. G. Adiv, “Inherent ambiguities in recovering 3D motion and structure from a noisy field,” IEEE Trans. Pattern Anal. Mach. Intell. 11 (5), 477–489 (1989). [CrossRef]

16. C. Fermuller and Y. Aloimonos, “Observability of 3D motion,” J. Computer Vision, Springer Science eds., 37 (1), 46–63 (2000)..

17. J. Neumann, “Computer vision in the space if light rays: plenoptic video geometry and polydioptric camera design,” Dissertation for the degree of Doctor of Philosophy, Department of Computer Science, University of Maryland (2004).

18. S. Srinivasan and R. Chellappa, “Noise-resilient estimation of optical flow by use of overlapped basis functions,” J. Opt. Soc. Am. A 16, 493–507 (1999). [CrossRef]

19. A. Bruhn, J. Weickert, and C. Schnorr, “Combining the advantages of local and global optic flow methods,” in DAGM Proceedings of Symposium on Pattern Recognition, (2002), 457–462

20. C. Fermuller, Y. Aloimonos, P. Baker, R. Pless, J. Neumann, and B. Stuart, “Multi-camera networks: eyes from eyes,” in IEEE Proceedings of Workshop on Omnidirectional Vision, (2000), 11–18.

21. T. Martinez, D.V. Wick, and S.R. Restaino, “Foveated, wide field-of-view imaging system using a liquid crystal spatial light modulator,” Opt. Express 8 (10), 555–560 (2001). [CrossRef] [PubMed]

22. J. Beckstead and S. Nordhauser, “360 degree/forward view integral imaging system,” US Patent 6028719, InterScience Inc., 22 February 2000.

23. R. Constantini and S. Susstrunk, “Virtual sensor design,” in Sensors and Camera Systems for Scientific -Industrial and Digital Photography Applications, Proc. SPIE 5301, 408–419 (2004). [CrossRef]

24. T.H. Nilsson, “Incident photometry: specifying stimuli for vision and light detectors,” Appl. Opt. 22, 3457–3464 (1983). [CrossRef] [PubMed]

25. M. Young, “Pinhole optics,” Appl. Opt. 10, 2763–2767 (1971). [CrossRef] [PubMed]

26. K.D. Mielenz, “On the diffraction limit for lensless imaging,” J. Nat. Inst. Stand. Tech. 104, (1999).

27. B. Fowler, A.E. Gamal, D. Yang, and H. Tian, “A method for estimating quantum efficiency for CMOS image sensors,” in Solid State Sensor Arrays - Development and Applications, Proc. SPIE 3301, 178–185 (1998). [CrossRef]

28. P.B. Catrysse and B.A. Wandell, “Optical efficiency of image sensor pixels,” J. Opt. Soc. Am. A 19, (2002). [CrossRef]

29. J.M. Franke, “Field-widened pinhole camera,” Appl. Opt. 18, 1979. [CrossRef] [PubMed]

30. http://www.ovt.com

31. R. Steveninck and W. Bialek, “Timing and counting precision in the blowfly visual system,” in Methods in Neural Networks IV, J.Van Hemmen, J.D. Cowan, and E. Domany, ed., (Heidelberg; Springer-Verlag, 2001), 313–371.

32. W. Bialek. “Thinking about the brain,” in Les Houches Lectures on Physics of bio-molecules and cells, H. Flyvbjerg, F. Jülicher, P. Ormos, and F. David, ed., (Les Ulis, France; Springer-Verlag, 2002), 485–577.

Components	Diam. (mm)	Thickness (mm)		Material	Weight (g)
Components	Diam. (mm)	Centre	Edge	Material	Weight (g)
PCX lens	2.50	0.80	0.47	LaSFN9 (N≈ 1.850)	<0.10
Half-ball lens	2.00	1.00	-	BK7 (N≈ 1.517)	<0.05
500μm aperture	9.53	<0.01		Blackened 302 stainless steel	<0..05

Low-cost miniature wide-angle imaging for self-motion estimation

Abstract

1. Introduction

2. Motion estimation from spherical images

2.1 Influence of the field-of-view

2.2 Robustness to noise

3. Low-cost hemispherical pinhole camera

3.1 Photon shot noise limit in CMOS imager

3.2 Pinhole optics design and performance

3.3 Limits to motion estimation

3.4 Experiments

4. Conclusion and future work

Acknowledgments

References and links-

Cited By

Figures (8)

Tables (1)

Equations (13)

Optics Express