Frequency analysis of light field sampling for texture information

Changjian Zhu; Hong Zhang; Qiuming Liu; Yanping Yu; Hongtao Su

doi:10.1364/OE.383606

1. Introduction

The technology of computing novel views from a set of existing captured images is usually referred to as image-based rendering (IBR) [1,2,3,4]. IBR does not require detailed scene information and can generate a novel view using a set of captured images, so it can be widely used in virtual reality (VR), free viewpoint TV (FTV) and 3DTV [5,6,7]. For instance, we use a limited number of cameras or use a camera to capture scenes in a limited number of positions, as shown in Fig. 1(a), and then render arbitrary novel view from those captured images in Fig. 1(b). Obviously, to prevent distortion of the novel rendered views, it is necessary to capture a sufficient number of multi-view images [8,9]. The sufficient number of multi-view images can be determined by light field sampling theory, i.e., minimum sampling rate for alias-free rendering of IBR.

Fig. 1. Conceptual illustration of IBR. (a) Multi-view images are captured by cameras in a scene. Diagram (b) illustrates novel view rendering.

Download Full Size | PDF

The minimum sampling rate of IBR is mainly determined by the scene attributes and camera configuration (i.e., direction and position). The complexity of scene information and the multidimensional nature of signals make it difficult to determine the sampling rate. Particularly, scene complexity make the solution of this problem extremely difficult [10]. To determine the sampling rate, we usually reduce the dimension of the stereo signals or deduce only one scene attribute. The previous literature shows that depth [11], surface reflection [12], smoothness [13,14] and geometry [15] have a profound impact on the sampling rate of IBR. These theories are based on ideal assumptions related to studying and deducing the sampling rate theory from one aspect. These sampling theories of IBR are not fully realized in practice, as some aliasing is always present, and aliasing has an impact on rendered output images [16,17]. Therefore, we need to study the impact of scenario attributes on sampling rate theory to derive a comprehensive and practical theory. We observed that scene textures will have a profound impact on the sampling rate of IBR. The difficulty with this problem is that the quantification of texture information is very difficult because its changes are random and irregular. Therefore, we propose a sampling theory of IBR based on texture information (STTI) quantification by using light wave signals in this paper.

Our theory is motivated by insights from the information captured by the camera is composed of regular light rays. Thus, we can analyze the textures in the light field and derive the minimum theoretical sampling rate. Because other attributes of the scene, such as geometric shape, also affect the sampling of IBR, to reflect the impact of texture information on the sampling, we map the texture of the complex scene to a plane to analyze it. We can then study the spectral properties of the IBR of a plane to obtain the spectral properties of an irregularly shaped scene. The key to this result is an achievable scheme called spectral analysis of IBR that is especially relevant to the properties of a scene. We demonstrate the application of our formulations to solve this problem for actual scenes and synthetic scenes. Some preliminary results of this report were presented in [15]. In [15], we studied the geometric changes in the scene on the plenoptic spectrum. Based on this work, we studied the influence of scene texture changes on the plenoptic spectrum. In summary, our main contributions are as follows:

$\bullet$ We first study the regularity of texture information and various phenomena of the IBR in terms of mathematical quantization and propose a quantization model based on color variables.

$\bullet$ In the existing theory, STTI is a detailed study of the influence of texture information on the IBR. The STTI can effectively describe the changes of texture information with changes in the camera position, thus improving the captured information and the rendering quality of a scene.

$\bullet$ Our results can be applied to determine the minimum sampling rate for IBR. Our proposed method is more accurate than the existing methods described in the literature. For a complex scene, such as a scene with strong color variations, STTI improves the richness of the scene capture information.

The outline of this paper is as follows. Related work is presented in Section 2. Section 3 presents on the light field parameterization and signal model of scene. The spectrum analysis of light field is presented in Section 4. The sampling theory of light field is presented in Section 5. The algorithm evaluation for different data sets and a comparison with the state of the art is presented in Section 6. Finally, the work is concluded in Section 7.

2. Related work

2.1 Scene signal quantization

To study the sampling of IBR data, one must use a method to represent the captured information of a 3D scene. For 3D scenes, we must describe the relationship between cameras and scenes. As the position and direction of the camera change, how we use the camera to capture scene information will also change significantly. This change will create difficulties for the quantization of scene signals. Generally, the captured scene information can be regarded as a set of light ray components. For light ray parameterization, the best-known approach is the plenoptic function introduced by Adelson and McMillan [18,19]. Thus, the IBR problem can be treated as an application of sampling theory to the pleonptic function. Using the plenoptic function, the light ray’s captured position, looking direction, wavelength and a time can be parameterized. The plenoptic function can be denoted by seven dimensions, $P = {P_7}\left ( {\theta ,\phi ,\lambda ,\tau ,{V_x},{V_y},{V_z}} \right )$. Generally, it is very difficult to deal with the seven dimensions; therefore, we can make certain assumptions to reduce the dimensionality. For example, if we use two parallel planes, the camera plane $\left ( {t,\xi } \right )$ and image plane $\left ( {v,u} \right )$ represent a light ray position and direction when the wavelength and time are fixed, as shown in the 4D light field in Fig. 2 [20,21]. Thus, the sampling problem of IBR can be regarded as light field sampling (LFS).

Fig. 2. A 4D light field model with respect to the scene surface. Each captured light ray is denoted using a camera plane $t \xi$ and image plane $vu$. Additionally, the light ray related to the scene surface is also represented using position $\left ( {s,r} \right )$ in surface and direction $\left ( {\theta _s},{\theta _r} \right )$.

Download Full Size | PDF

2.2 Light field sampling

LFS is an information theoretic concept of IBR in which novel views are directly synthesized using a set of captured multi-view images [1]. The traditional approach for LFS in such 3D scenes is a sampling theory that is sufficiently small to avoid aliasing effects, where the minimum sampling rate is determined by the spectral support of the sampled light field within the smallest interval [2,3,11]. Recent work in LFS theory, however, has shown that the spectral support is bounded by the depths and scene surface under certain assumptions [12,13,14,15]. Unfortunately, for more complicated scenes, such as non-Lambertian reflections and occlusions, the plenoptic spectral support is unknown. In real-world complex scenes, some aliasing of synthesized views always exists [6,10]. Therefore, at a coarse level, the above spectral analysis results and their information theoretic basis cannot be used in practice.

The sampling scheme within the lower bound of the sampling rate has been shown to apply depth information. The necessary minimum sampling rate can be reduced, and the rendering quality can be improved when depth information has been utilized [11]. In this case, the utilized depth information extraction is a critical process for LFS. Layered depth [22,23] is a well-known method. In [16], Pearson et al. proposed an automatic layer-based method for LFS. The depth layers for the matched curve of the scene surface geometry were chosen based on the scene distribution of the layers placed nonuniformly. To avoid sparse depth information, Chaurasia et al. [24] applied the silhouette-aware warping method to select important depth by silhouette edges, which enables depth-preserving variational warping. Similarly, Zhu et al. presented a simpler structure model to quantify the pivotal geometric information of a complex scene. This approach was not dependent on dense accurate geometric reconstruction; instead, the authors compensated for sparse 3D information by using single salient points [25].

LFS rates have also been studied from other angles. For instance, a signal-processing framework based on the frequency analysis method for light transport is presented in [26] as closed-form expressions of the radiance of light rays. Light ray expressions are altered through phenomena such as shading and occlusion and can be studied through the frequency analysis of light transport. Expressions of light ray transport can be used to determine the LFS. Additionally, the authors mathematically derive a frequency estimation function of light field signals using the autocorrelation theory in [27] in the spatial domain. This method can simplify the complexity of the frequency estimation of the light field. Additionally, the learning method can be applied to study light field capturing [28]. Some work has been performed to reduce the sampling rate of light fields from the study of sparse optical fields [29,30]. In contrast to these prior works, we aim to examine the spectral properties of the light field based on the texture information of a 3D scene.

2.3 Light field reconstruction

Generally, a continuous signal can be reconstituted by filter interpolation from a set of captured images. In [31], Stewart et al. used a reconstruction filter to reduce the sampling rate. They presented a linear, spatially invariant reconstruction filter for reconstructing the plenoptic function. Similarly, Hoshino et al. [32] designed an appropriate prefilter to study multiview image acquisition to limit the aliasing of reconstituted images. In [33,34] Wu et al. took advantage of the clear texture structure of the epipolar plane image (EPI) in the light field data and modeled the problem of light field reconstruction from a sparse set of views as a CNN-based angular detail restoration on EPI. Vagharshakyan et al. [35] proposed using the shearlet transform to study the sampling and reconstitution of light fields. They used a sparsely represented EPI [36] in a directionally sensitive transform domain obtained from an adapted discrete shearlet transform. In [37], Shi et al. used the sparsity in the continuous Fourier domain to study light field reconstruction. Liu et al. proposed a method of the light field reconstruction from projection modeling of focal stack [38]. Jin et al. studied the light field reconstruction by approximation and blind theory to improve the rendering quality. In this paper, the purpose of studying the minimum sampling rate of light fields is to improve the quality of viewpoint reconstruction.

3. Proposed light filed sampling algorithm

Our developed light field sampling algorithm includes sampling and reconstruction as shown in Fig. 3. First, a light field is used to describe a 3D scene signal, that is, a position and a direction of capturing scene information. Then, we quantify the texture information of the scene. The Fourier transform is performed based on the light field and texture information, and the spectral structure of the light field under the influence of the texture information is analyzed in the frequency domain. Based on the fundamental analysis of the spectral structure, we derive to get the light field sampling theory based on texture information and a well-designed reconstruction filter.

Fig. 3. A flow diagram of our developed light field algorithm. The proposed algorithm includes light field sampling and reconstruction.

Download Full Size | PDF

4. Signal model of 3D scene

For clarity, we list the important notations and the associated definitions used throughout the paper in Table 1.

Table 1. Notations and definitions

View Table | View all tables in this article

4.1 Light field parameterization

In Fig. 2, the variables of the 4D light field contain two parts: position $t, \xi$ and angular coordinates $u,v$ coordinates. To simplify and facilitate the theoretical derivation and explain the light field spectrum, the dimensions of the 4D light field can be reduced to 2D by fixing $\xi$ and $u$ because the two dimensions for the position and direction are the same as one dimension. When the results of $t, v$ are obtained, it can be extended to $\xi , u$. Thus, the 2D light field is represented using function as shown in Fig. 4(a). The 2D light field can be expediently represented using the concept of EPI, which is applied to describe the mapping relationship between camera position $t$ and angular coordinates $v$ and scene object $z\left ( x \right )$, as shown in Figs. 4(a) and (b). According to the trigonometric function mapping relationship shown in Fig. 4, the EPI mapping relationship can be expressed as

(1)$$t = x - z\;\left( x \right) \cdot \tan\;\left( \theta \right),\;\tan\;\left( \theta \right) = {v \mathord{\left/ {\vphantom {v f}} \right. } f}.$$

where $\theta$ is the angle between the light ray and principal axis of a camera and mainly depends on the image pixel $v$ and camera position $t$. It should be noted that the EPI expression is related to the camera position $t$, $x$ coordinate, and scene object $z\left ( x \right )$. Thus, information from a captured image with the camera position and scene object can be described by an EPI.

Fig. 4. An illustration of an EPI formation, $p\left ( {t,v} \right )$. (a) Scene objects $z\left ( x \right )$ are captured by a set of cameras with a long camera plane $t$ at image plane coordinates $v$, such as ${t_1}$, ${t_2}$ and ${v_1}$, ${v_2}$. (b) An example of EPI; the red line is a scene point in different captured camera positions.

Download Full Size | PDF

4.2 Plane surface representation

To describe the texture information, the scene surface must be quantified. Generally, we use two curvilinear surface coordinates $\left ( {s,r}\right )$ to parameterize the plane surface $S\left ( {s,r}\right )$ [14]. Additionally, the variables ${\left [ {x,y,z} \right ]^T}$ are defined to represent a point’s Cartesian coordinates [39]. Then, using the function $S\left ( {s,r} \right ) = {\left [ {x\left ( {s,r} \right ),y\left ( {s,r} \right ),z\left ( {s,r} \right )} \right ]^T}$, we define the location of a point on the plane surface, as shown in Fig. 2. Here, we provide an example of the expression of a plane for a 2D space. The plane surface $S\left ( {s}\right )$ can be expressed as

(2)$$\left\{ \begin{array}{l} z\left( x \right) = \frac{{T\left( {x - {x_0}} \right)\;\sin\;\left( \phi \right)}}{{{x_1} - {x_0}}} + {z_0},\;\textrm{for}\;x \in \left[ {{x_0},{x_1}} \right],\\ x\left( s \right) = s\;\cos\;\left( \phi \right) + {x_0},\;\textrm{for}\;s \in \left[ {0,T} \right], \end{array} \right.$$

where $\phi$ represents the angle between the plane and the ${x}$-axis, ${x_0}$ is the starting point of $x$, ${x_1}$ is the maximum of $x$, ${z_0}$ is the minimum depth of the plane, and ${T}$ is the maximum of the curvilinear coordinate, $s$. Using this curve, the surface of the plane is approximately represented. According to the actual situation, we adjust the parameters ${T}$ and $\phi$ to approximately obtain the expression of the surface geometry shape.

The point of origin of the light rays is represented using the point of origin on the plane surface, $S\left ( {s}\right )$. Furthermore, the direction of the light ray is defined by the viewing angle $\left ( {{\theta }} \right )$. Therefore, we can use the function $l\left ( {s,{\theta }} \right )$ to represent the intensity of the light ray emitted from a point $\left ( {s} \right )$ on the scene surface in viewing direction $\left ( {{\theta }} \right )$, as shown in Fig. 2. Then, we substitute (2) for (1) to obtain a new expression of EPI as

(3)$$\begin{array}{l} t = {\psi \left( s \right)}=s \cdot \cos\;\left( \phi \right) + {x_0} - \frac{v}{f} \cdot \left( {s \cdot \sin\;\left( \phi \right) + {z_0}} \right)\\ \ = s \cdot \cos\;\left( \phi \right) + {x_0} - \tan\;\left( \theta \right) \cdot \left( {s \cdot \sin\;\left( \phi \right) + {z_0}} \right). \end{array}$$

Then, using $l\left ( {s,\theta } \right )$ and under the no-self-occlusion assumption, the expression of a light ray is rewritten as

(4)$$l\left( {s,\theta } \right) = p\left( {\psi \left( s \right),\;f\;\tan\;\left( \theta \right)} \right).$$

Additionally, the above light ray parametrization is applied to study the spectrum of the light field to determine the minimum sampling rate.

4.3 Texture information quantification

In image processing and computational vision, texture was applied to characterize a scene object surface or region [40]. Many texture analysis algorithms have been developed over the past few decades [41,42]. For example, in [43], Galloway used the gray level run lengths to analyze texture information. In this paper, texture information refers only to color variations. An additional feature is needed to describe the size of the texture information; however, the quantification of the size of the texture information of the object surface is very complicated. When the texture information of the surface of the plane is known, the measuring method of regional entropy [44] in the spatial frequency domain is applied to measure the size of the texture information

A captured image contains different texture colors, and it is very difficult to quantify the changes in the color of these textures mathematically. The reason is that the change in texture color is irregular, and mathematical quantification seems to be powerless, as shown in Fig. 5(a). Fortunately, an arbitrary color is composed of red component $R$, green component $G$ and blue component $B$. Different texture colors can use different wavelengths of light waves to approximate the quantitative description. Although the change in texture color is not intermittent, we still approximate that each color corresponds to a specific wavelength of light wave signal. We assume that the unit area $s$ of the object surface corresponds to a texture signal, and let ${\lambda _s}$ denote its wavelength. Then, a captured image $I$ is composed of multiple texture signals ${\Lambda _i}\left ( {s,{\lambda _s}} \right )$, as follows

(5)$$I \approx \sum_{i = - \infty }^\infty {{\Lambda _i}\left( {s,{\lambda _s}} \right)}.$$

Therefore, we can use different wavelengths of texture signals to quantify the different texture colors, as shown in Fig. 5(c). Furthermore, a texture signal can be decomposed into a set of sine or cosine functions, as shown in Fig. 5(d). Then, the texture signal per unit area of the object surface for the continuous texture color can be expressed by a set of light waves as

(6)$$\Lambda \left( {s,{\lambda _s}} \right) = \mathop \int\limits_S {{\alpha _s}\;\exp\;\left( {j{\omega _s}s} \right)ds},$$

where ${{\alpha _s}}$ is scaling factor for a texture signal, ${\omega _s} = \frac {\eta }{{{\lambda _s}}}$ denotes the texture frequency, and $\eta$ is the wave velocity. To fully reflect the color of the texture information features, we can apply a limited number of light waves such as $2K$, in the unit area, to describe the color variable as

(7)$$\Lambda \left( {s,{\lambda _s}} \right) \approx \sum_{k = - K}^K {{\alpha _k}\;\exp\;\left( {j{\omega _k}s} \right)}.$$

Here, we use the variable ${\omega _k}$ to represent the frequency of texture information; the setting of ${\omega _k}$ takes into account the complexity of the scene.

Fig. 5. Texture information quantification using the signal decomposition method. (a) A captured image; (b) selected texture information; (c) texture quantization form of the light wave signal; (d) light wave signal decomposition into cosine functions.

Download Full Size | PDF

When light flows in a scene, phenomena such as transport in free space, occlusion, and texture variation each modify the local light field in a characteristic fashion [26]. In these cases, the light field can be multiplied by the corresponding expression for the phenomena. Object surface texture creates different frequencies or even high frequency components in the light field function. The light field is multiplied by the binary texture function of the texture information:

(8)$$l\left( {s,\theta ,{\lambda _s}} \right) = l\left( {s,\theta } \right) \cdot \Lambda \left( {s,{\lambda _s}} \right) = l\left( {s,\theta } \right) \cdot \sum_{k = - K}^K {{\alpha _k}\;\exp\;\left( {j{\omega _k}s} \right)}.$$

Therefore, the plane surface $S\left ( {s}\right )$ can be expressed as $S\left ( s \right )$ with respect to the color variable of the texture information. The light field $l\left ( {s,\theta ,{\lambda _s}} \right )$ in (4) is modified as

(9)$$\begin{array}{l} l\left( {s,\theta ,{\lambda _s}} \right) = p\left( {\psi \left( s \right),\;f\;\tan\;\left( \theta \right),{\lambda _s}} \right)\\ \qquad \qquad = p\left( {\psi \left( s \right),\;f\;\tan\;\left( \theta \right)} \right) \cdot \Lambda \left( {s,{\lambda _s}} \right). \end{array}$$

5 Spectral analysis of the light field signal

To analyze the spectral support of the light field signal, we derive the spectral support expression with respect to texture information. From the results in [11], the Fourier transform of light field is expressed as

(10)$$P\left( {{\Omega _t},{\Omega _v}} \right) = \int_{ - \infty }^\infty {\int_{ - \infty }^\infty {p\left( {t,v} \right)\;\exp \left( { - j\left( {{\Omega _v}v + {\Omega _t}t} \right)} \right)dtdv} }.$$

where ${\Omega _t}$ is the frequency index along the camera position $t$-axis and ${\Omega _v}$ is the frequency index along the image pixel $v$-axis. This expression is based on the Lambertian reflection and nonocclusion for the scene surface, not considering the texture information. The corresponding spectrum is shown in Fig. 6(a). From (8) and (9), we derive the light field spectrum with respect to the texture information.

Fig. 6. Diagram showing the spectral supports of the 2D light field. (a) The spectral supports for the assumption of Lambertian reflection and nonocclusion; (b) diagram of the spectral supports affected by texture information.

Download Full Size | PDF

5.1. Spectral models of the light field

The spectral expression of the light field is constructed for a line in the space, such as the camera position and direction. An interesting aspect of this construction is that the spectrum will be broadened in a real-world scene, and the spectrum broadening is mainly caused by the scene’s characteristics. The characteristics include irregular shape, occluding effects, non-Lambertian reflection, and texture information variations [12]. Nevertheless, in most cases, we can select only certain characteristics to study the spectrum of the light field. Therefore, assuming that the surface of the object is flat, without occlusions [13,14], based on the (9), the light field spectrum for (10) with respect to the texture formation can be modified as follows:

(11)$$\begin{array}{l} P\left( {{\Omega _t},{\Omega _v}} \right) = \int_{ - \infty }^\infty {\int_{ - \infty }^\infty {p\left( {t,v,{\lambda _s}} \right){e^{ - j\left( {{\Omega _v}v + {\Omega _t}t} \right)}}dtdv} } .\\ = \int_{ - \infty }^\infty {\int_{ - \infty }^\infty {p\left( {t,v} \right) \cdot \Lambda \left( {s,{\lambda _s}} \right){e^{ - j\left( {{\Omega _v}v + {\Omega _t}t} \right)}}dtdv} } \\ = \int_{ - \infty }^\infty {\int_{ - \infty }^\infty {\frac{{f{\psi ^\prime }\left( s \right)l\left( {s,\theta } \right) \cdot \Lambda \left( {s,{\lambda _s}} \right)}}{{{{\cos }^2}\left( \theta \right)}}{e^{ - j\left( {{\Omega _v}\;f\;\tan\;\left( \theta \right) + {\Omega _t}\psi \left( s \right)} \right)}}d\theta ds} }. \end{array}$$

where ${\psi \left ( s \right )}$ is the function between $t$ and $s$, and its expression is given by (3). We perform the integral operation on (11) to derive the light field spectrum with the texture information. Thus, the resulting light field spectrum of texture information is provided in the following theorem.

Theorem 1: When the light field signal is replaced by $l\left ( {s,\theta ,{\lambda _s}} \right )$, the Fourier transform of the light field is given by the light field spectrum function:

(12)$$\begin{array}{l} P\left( {{\Omega _t},{\Omega _v}} \right) = 2\pi f\;\cos\;\left(\phi \right)\left( {1 + \sin\;\left( \phi \right)} \right)\delta \left( {{\Omega _\textrm{A}}} \right)\\ \qquad \cdot \int_{ - \infty }^\infty {l\left( {s,\theta } \right) \cdot \Lambda \left( {s,{\lambda _s}} \right){e^{ - j{\Omega _t}\left( {\Lambda \left( {s,{\lambda _s}} \right) \cdot s\;\cos\;\left( \phi \right) + {x_0}} \right)}}ds,} \end{array}$$

where ${\Omega _\textrm {{A}}} = {\Omega _v}f - {\Omega _t}\left ( {\Lambda \left ( {s,{\lambda _s}} \right ) \cdot s \cdot \sin \left ( \phi \right ) + {z_0}} \right )$. Regarding (12), the light field spectrum will be nonzero if ${\Omega _\textrm {{A}}} = 0$. In this case, the mathematical relationship between ${{\Omega _t}}$ and ${{\Omega _v}}$ is ${{\Omega _t} = {\Omega _v}\frac {f}{{\Lambda \left ( {s,{\lambda _s}} \right )s \cdot \sin \left ( \phi \right ) + {z_0}}},{\Omega _t} = 0}$. Otherwise, the light field spectrum is zero when ${{\Omega _t} \ne {\Omega _v} \cdot \frac {f}{{\Lambda \left ( {s,{\lambda _s}} \right )s \cdot \sin \left ( \phi \right ) + {z_0}}},{\Omega _t} \ne 0}$, as shown in Fig. 6(b).

Proof: We derive the expression of the light field spectrum with the texture information using the surface plenoptic function (SPF). In combining the plane surface expression and SPF $l\left ( {s,\theta ,{\lambda _s}} \right )$, (11) can be substituted for (3) to obtain a new expression for the light field signal with $s$ and $\theta$ as

(13)$$\begin{array}{l} {P\left( {{\Omega _t},{\Omega _v}} \right) = f\int_{ - \infty }^\infty {l\left( {s,\theta } \right) \cdot \Lambda \left( {s,{\lambda _s}} \right)ds} }\\ {\qquad \int_{ - \infty }^\infty {\frac{{\cos\;\left( \phi \right)\left( {1 - \sin\;\left( \phi \right)\;\tan\;\left( \theta \right)} \right)}}{{{{\cos }^2}\left( \theta \right)}}{e^{ - j\left( {f{\Omega _v}\;\tan\;\left( \theta \right) + \Gamma } \right)}}d\theta ,} } \end{array}$$

where $\Gamma$ is given as ${\Gamma } = {\Omega _t}\left ( {\Lambda \left ( {s,{\lambda _s}} \right )s\cos \left ( \phi \right ) + {x_0} - \left ( {\Lambda \left ( {s,{\lambda _s}} \right )s \cdot \sin \left ( \phi \right ) + {z_0}} \right )\;\tan\;\left ( \theta \right )} \right )$. To expediently describe (13), an intermediate variable $H\left ( {s,{\Omega _A}} \right )$ can be defined as

(14)$$H\left( {s,{\Omega _A}} \right) = \int_{ - \infty }^\infty {\frac{{f\;\cos\;\left( \phi \right)\left( {1 - \sin\;\left( \phi \right)\;\tan\;\left( \theta \right)} \right)}}{{{{\cos }^2}\left( \theta \right)}}{e^{ - j{\Omega _A}\;\tan\;\left( \theta \right)}}d\theta } ,$$

where ${\Omega _\textrm {{A}}} = {\Omega _v}f - {\Omega _t}\left ( {\Lambda \left ( {s,{\lambda _s}} \right )s \cdot \sin \left ( \phi \right ) + {z_0}} \right )$. Further, let equation (14) be composed of two parts:

(15)$$\begin{array}{l} H\left( {s,{\Omega _A}} \right) = \int_{ - \infty }^\infty {{e^{ - j{\Omega _A}\;\tan\;\left( \theta \right)}}\frac{{\cos\;\left( \phi \right)f}}{{{{\cos }^2}\left( \theta \right)}}d\theta } + {H_1},\\ \qquad \qquad \quad = 2\pi f\;\cos\;\left( \phi \right)\delta \left( {{\Omega _A}} \right) + {H_1}. \end{array}$$

where

(16)$$\begin{array}{l} {{H_1} = \int_{ - \infty }^\infty {\frac{{f \;\cdot\;\sin\;\left( \phi \right)\;\cos\;\left( \phi \right)\;\tan\;\left( \theta \right)}}{{{{\cos }^2}\left( \theta \right)}}{e^{ - j{\Omega _\textrm{A}}\;\tan\;\left( \theta \right)}}d\theta } }\\ { = f\;\sin\;\left( \phi \right)\;\cos\;\left( \phi \right)\int_{ - \infty }^\infty {\tan\;\left( \theta \right){e^{ - {\Omega _\textrm{A}}\;\tan\;\left( \theta \right)}}d\;\tan\;\left( \theta \right)} }\\ { = 2\pi\;f \cdot \sin\;\left( \phi \right)\;\cos\;\left( \phi \right)\delta \left( {{\Omega _\textrm{A}}} \right).} \end{array}$$

Finally, note that we are interested only in the spectral support of the light field signal. Therefore, by substituting (15) for (13), we can obtain a new expression of $P\left ( {{\Omega _t},{\Omega _v}} \right )$ as (12).

Theorem 1 shows that the spectrum depends on the light ray $l\left ( {s,\theta ,{\lambda _s}} \right )$, the object plane surface $s,{\lambda _s}$, the angle of the object plane $\phi$, and the depths of the scene. Moreover, the spectral support consists of ${\Omega _t} = \frac {{{\Omega _v}f}}{{\Lambda \left ( {s,\lambda } \right )s \cdot \sin \left ( \phi \right ) + {z_0}}}$ and ${\Omega _t} = 0$. Using this spectral characteristic, the light field spectrum with respect to the texture information can be analysed via the corresponding mathematical quantification model.

5.2 Spectral analysis with texture information

First, we consider the reflection characteristics of the plane surface. In special cases, surfaces are often assumed to be Lambertian [12–15]. The Lambertian surface can be expressed as $l\left ( {s,\theta ,{\lambda _s}} \right ) = l\left ( {s,{\lambda _s}} \right ) = l\left ( s \right ) \cdot \Lambda \left ( {s,{\lambda _s}} \right )$ for all $\theta$. Thus, it is reasonable to assume that $l\left ( {s,\theta ,{\lambda _s}} \right )$ is a band-limited function of the variable $\theta$. Under the assumption of the Lambertian surface, we now state the light field spectrum with respect to the texture information in the following theorem.

Theorem 2: In the Lambertian surface of scene object, when $\left | {{\Omega _t}} \right | \le \max \left ( {\frac {{{\omega _s}}}{{\Lambda \left ( {s,{\lambda _s}} \right )s \cdot \sin \left ( \phi \right ) + {z_0} - \left | {\cos \left ( \phi \right )} \right |\frac {{{v_0}}}{f}}}} \right )$, the expression of the light field spectrum $P\left ( {{\Omega _t},{\Omega _v}} \right )$ can be written as

(17)$$\begin{array}{l} P\left( {{\Omega _t},{\Omega _v}} \right) = 2\pi f\;\cos\;\left(\phi \right)\left( {1 + \sin\;\left( \phi \right)} \right)\\ \cos\;\left( {\frac{{{\omega _s}{{\left( {\Lambda \left( {s,{\lambda _s}} \right)} \right)}^{ - 1}}}}{{\sin\;\left( \phi \right)}} \cdot \left( {\frac{{{\Omega _v}f}}{{{\Omega _t}}} - {z_0}} \right)} \right){e^{ - j{\Omega _t}{z^{ - 1}}\left( {\frac{{{\Omega _v}f}}{{{\Omega _t}}}{{\left( {\Lambda \left( {s,{\lambda _s}} \right)} \right)}^{ - 1}}} \right)}}. \end{array}$$

where ${{v_0}}$ is the maximum of FOV, and ${z^{ - 1}}\left ( \cdot \right )$ is the inverse function of $z\left ( \cdot \right )$. Additionally, ${\Lambda \left ( {s,{\lambda _s}} \right )}$ can be calculated using (6) or (7).

Proof: By applying (12) and a Lambertian surface assumption, the spectral expression of the plenoptic function can be rewritten using $l\left ( s \right )$ as

(18)$$\begin{array}{l} P\left( {{\Omega _t},{\Omega _v}} \right) = \\ \quad \int_{ - \infty }^\infty {H\left( {s,{\Omega _A}} \right)l\left( s \right)\Lambda \left( {s,{\lambda _s}} \right){e^{ - j{\Omega _t}\left( {\Lambda \left( {s,{\lambda _s}} \right)s\;\cos\;\left(\phi \right) + {x_0}} \right)}}ds.} \end{array}$$

By the Lambertian surface assumption, $l\left ( s \right ) = \cos \left ( {{\omega _c}s} \right )$ can be given. Therefore, the spectral expression of the light field for (18) can be rewritten using $l\left ( s \right ) = \cos \left ( {{\omega _c}s} \right )$ as

(19)$$\begin{array}{l} P\left( {{\Omega _t},{\Omega _v}} \right) = 2\pi f\;\cos\;\left(\phi \right)\left( {1 + \sin\;\left( \phi \right)} \right)\Lambda \left( {s,{\lambda _s}} \right)\\ \qquad \int_{ - \infty }^\infty {\delta \left( {{\Omega _A}} \right)\;\cos\;\left({{\omega _c}s} \right){e^{ - j{\Omega _t}\left( {\Lambda \left( {s,{\lambda _s}} \right)s\;\cos\;\left(\phi \right) + {x_0}} \right)}}ds.} \end{array}$$

According to property impulse function $\delta \left ( \cdot \right )$, we find that only when ${\Omega _A} = 0$ does ${P\left ( {{\Omega _t},{\Omega _v}} \right )} \ne 0$ in (19). Based on ${\Omega _A} = 0$, the relationship between the depth of the scene and ${\Omega _t}$ and ${\Omega _v}$ can be represented by the following expression:

(20)$$z\left( s \right) = \Lambda \left( {s,{\lambda _s}} \right)s \cdot \sin\;\left( \phi \right) + {z_0} = \frac{{{\Omega _v}f}}{{{\Omega _t}}}.$$

From (20), the expression of $s$ can be written as follows:

(21)$$s = \left( {\frac{{{\Omega _v}f}}{{{\Omega _t}\;\sin\;\left( \phi \right)}} - \frac{{{z_0}}}{{\sin\;\left( \phi \right)}}} \right){\left( {{\Lambda \left( {s,{\lambda _s}} \right)}} \right)^{ - 1}}.$$

Furthermore, from (2), $\Lambda \left ( {s,{\lambda _s}} \right )s\cos \left ( \phi \right ) + {x_0}$ can be represented using ${x\left ( s \right )}$. Additionally, $x = {z^{ - 1}}\left ( {x \cdot {{\left ( {\Lambda \left ( {s,{\lambda _s}} \right )} \right )}^{ - 1}}} \right )$. Therefore, by substituting (21) for (20), we can obtain the spectral expression of the light field with the Lambertian surface as (17).

Consider now the spectrum of the light field given by (17), $P\left ( {{\Omega _t},{\Omega _v}} \right )$ depends on the depth $z\left ( x \right )$, texture information ${\Lambda \left ( {s,{\lambda _s}} \right )}$ and the plane surface $S$. Using (17), spectral support with the texture information of a plane can be obtained. To analyze the influence of texture information on the light field spectrum, Fig. 7 shows light fields and the corresponding light field spectra. The spectrum in Fig. 7(d) is extended to that of Fig. 7(b). Using this spectral characteristic, a theorem for the sampling rate of the light field is derived in the next section.

Fig. 7. Diagrams of spectral support of the light field with respect to texture information: (a) light field when the scene surface is Lambertian reflection; (b) the spectral support for the corresponding light field; (c) the light field when the scene includes texture information; (d) spectral support with respect to texture information.

Download Full Size | PDF

6. Sampling theory of the light field

As IBR involves the use of a set of images, the sampling rate (i.e., the number of cameras) is a key factor affecting the quality of reconstructions. LFS methods are examined to fully characterize spectral support for the light field. Nevertheless, in most cases, spectral support is not band-limited along the spatial frequency due to the scene object features. However, the light field signal is band-limited in particular cases (e.g., when an object surface is flat without occlusions and is Lambertian) [13]. Therefore, the results of this section are brief, as they are of interest only from a theoretical point of view. In the scope of this paper, we assume that the light field is almost band-limited for object surfaces that are flat, that do not exhibit occlusion and that are Lambertian.

In the previous sections, the influence of the texture information of a plane surface on the light field spectrum has been formulated. In consideration of the spectral support, the frequency estimation for the light field is presented before generalizing for LFS. The sampling theorem of the light field with texture information is also presented in this section.

6.1 Spectral support analysis

In this subsection, we aim to mathematically derive the light field spectral support. Consider the spectrum of the light field signal given by (17). We can model the spectral support with the plane surface and texture information, as shown in Fig. 8(a) for the light field spectrum of a 3D plot. To facilitate the analysis of the spectral structure, we cross-sectionally cut the spectrogram of Fig. 8(a) to obtain the spectrogram shown in Fig. 8(b). Fig. 8(b) shows that the model of the light field spectral support of the 2D plot comprises two regions, ${\Psi _{{R_1}}}$ and ${\Psi _{{R_2}}}$, which contain most of the spectral energy. First, according to (12), the relationship between ${{\Omega _t}}$ and ${{\Omega _v}}$ can be expressed as

(22)$${\Omega _v} = \frac{{{\Omega _t}\left( {\Lambda \left( {s,{\lambda _s}} \right)s \cdot \sin\;\left( \phi \right) + {z_0}} \right)}}{f}.$$

We can then derive expressions of the two regions of spectral support. In this case, we can observe that the first region is defined by two parallel lines. The expressions of the two parallel lines can be obtained from (19) as

(23)$${{\Psi _{{R_1}}} = \left\{ {{\Omega _t}:{\Omega _t} \in \left[ {{\Omega _{t,3}},{\Omega _{t,4}}} \right]} \right\}},$$

where

(24)$$\begin{array}{l} {{\Omega _{t,3}} = - \max \left( {\frac{{{\omega _c}}}{{\Lambda \left( {s,{\lambda _s}} \right)s \cdot \sin\;\left( \phi \right) + {z_0} - \left| {\cos\;\left( \phi \right)} \right|\frac{{{v_0}}}{f}}}} \right)\textrm{and}}\\ {\qquad {\Omega _{t,4}} = \max \left( {\frac{{{\omega _c}}}{{\Lambda \left( {s,{\lambda _s}} \right)s \cdot \sin\;\left( \phi \right) + {z_0} - \left| {\cos\;\left( \phi \right)} \right|\frac{{{v_0}}}{f}}}} \right).} \end{array}$$

As an interesting aspect of this region, the expressions of the two lines are related to the plane surface $S$. They are also dependent on $f$ and ${\omega _s}$. We conclude that texture information also influences spectral support for the light field signal. Therefore, the estimated bandwidth of the light field spectrum for the ${\Omega _t}$-axis is expressed as

(25)$$\small \small {B_t} = \left\{ {{\Omega _t}:\left| {{\Omega _t}} \right| \le \max \left( {\frac{{{\omega _c}}}{{\Lambda \left( {s,{\lambda _s}} \right)s \cdot \sin\;\left( \phi \right) + {z_0} - \left| {\cos\;\left( \phi \right)} \right|\frac{{{v_0}}}{f}}}} \right)} \right\}.$$

Using this bandwidth, we set the maximum value of ${{\Omega _t}}$ equal to the maximum frequency in ${B_t}$ and obtain

(26)$${\Omega _t} = \max \left( {\frac{{{k_t}{\omega _c}}}{{\Lambda \left( {s,{\lambda _s}} \right)s \cdot \sin\;\left( \phi \right) + {z_0} - \left| {\cos\;\left( \phi \right)} \right|\frac{{{v_0}}}{f}}}} \right).$$

where ${{k_t}}$ is a scaling factor.

Fig. 8. Spectral support with a texture signal. (a) The figure shows the main characteristics of the spectrum from a transverse in 3D space. (b) The spectral support for the 2D plot. It is obtained from the cross section of (a). The spectral support involves two regions: ${\Psi _{{R_1}}}$ and ${\Psi _{{R_2}}}$.

Download Full Size | PDF

Furthermore, the second region consists of two diagonal lines and two parallel lines, as shown in Fig. 8. Considering (22) and the spectral support levels, the second region, ${\Psi _{{R_2}}}$ , can be expressed as

(27)$${{\Psi _{{R_2}}} = \left\{ {{\Omega _t},{\Omega _v}:{\Omega _t} \in \left[ {{\Omega _{t,1}},{\Omega _{t,2}}} \right],{\Omega _v} \in \left[ {{\Omega _{v,1}},{\Omega _{v,2}}} \right]} \right\}},$$

where

(28)$${\Omega _{t,1}} = \frac{{{\Omega _v}f}}{{{z_0}}}\quad \textrm{and}\quad {\Omega _{t,2}} = \frac{{{\Omega _v}f}}{{T \cdot \sin\;\left( \phi \right) + {z_0}}},$$

(29)$${\Omega _{v,1}} = - \frac{\pi }{{\Delta v}} \quad \textrm{and} \quad {\Omega _{v,2}} = \frac{\pi }{{\Delta v}} .$$

Note that the two diagonal lines are dependent on the maximum and minimum depths for the scene from (28). In particular, light field spectral support is related mainly to the maximum and minimum depths derived from [11]. Additionally, the two parallel lines are related to the camera resolution, $\Delta v$, drawn from (29). In particular, the camera resolution is dependent on the scene complexity levels (e.g., texture information). The derivation process for $\Delta v$ is highly complex; thus, we do not explore its derivation in this paper. Instead, we use the influence of texture information on the camera resolution to analyze the spectral support levels.

Similarly, the estimated bandwidth of the light field spectrum for the ${{\Omega _v }}$-axis is

(30)$${B_v} = \left\{ {{\Omega _v}:{\Omega _v} \in \left[ { - \frac{\pi }{{\Delta v}},\frac{\pi }{{\Delta v}}} \right]} \right\}.$$

Accordingly, we set the maximum value of ${{\Omega _v }}$ equal to the maximum frequency in ${B_v }$ and obtain

(31)$${\Omega _v} = \frac{{\Lambda \left( {s,{\lambda _s}} \right){\Omega _x}{z_{\max }}}}{f}.$$

where ${{z_{\max }}}$ is the maximum depth. Using the estimated bandwidth, we can analyze the expansion of the spectrum caused by texture information in various. Thus, the acquired scene data are analyzed, and a reasonable filter is designed to improve the quality of the rendered view.

Based on the above analysis, as the texture information becomes more complex, the camera resolution decreases. In addition, according to (29), as the camera resolution decreases, ${\Omega _{v,1}}$ and ${\Omega _{v,2}}$ increase. Thus, the area of the second region ${\Psi _{{R_2}}}$ broadens as ${\Omega _{v,1}}$ and ${\Omega _{v,2}}$ increase according to (27) and Fig. 8. For example, as the texture information becomes more complex, the spectral support will be enhanced. Therefore, the light field spectrum relies on support that becomes more pronounced as the texture information becomes more complex.

6.2 Sampling theorem with texture information

Based on the light field spectral support, we can study the sampling theorem of the light field, i.e., how we determine the number of cameras and the spacing between them . The purpose is to obtain the best balance between the minimum sampling rate and the rendering quality. As shown in Fig. 9, a set of cameras is uniformly placed along the camera plane, $\left ( {t} \right )$; we mainly study the camera spacing $\Delta t$ for the rectangular sampling pattern. Generally, the sampling rate is obtained by the frequency ${{\Omega _t}}$, which is given by the spectral support. From (24), ${{\Omega _t}}$ is related to ${{\Omega _v}}$, the plane surface, $s$, depth ${z_0}$, and the focal length. Furthermore, ${{\Omega _v}}$ is related to the texture information and camera resolution. Considering the representation of ${{\Omega _v}}$, texture information also influences the frequency ${{\Omega _t}}$. When (12), (24) and (29) are combined, the frequency estimation for ${{\Omega _t}}$ can be written as

(32)$${\Omega _t} \approx \frac{{{k_t}{\omega _c}}}{{\Lambda \left( {s,{\lambda _s}} \right)s \cdot \sin\;\left( \phi \right) + {z_0} - \left| {\cos\;\left( \phi \right)} \right|\frac{{{v_0}}}{f}}}.$$

In (32), ${{\Omega _t}}$ depends on the plane surface, $s$, depth ${z_0}$, texture information, focal length, and FOV.

Fig. 9. Diagram of the rectangular sampling grid used for the LFS. Cameras are denoted by rectangles. $\Delta t$ is the spacing between cameras along the $t$-axis.

Download Full Size | PDF

By combining (32) with Shannon’s uniform-sampling theorem [45], we can determine the sampling rate of the light field. We defined a variable ${\Omega _c}$ to represent the sampling frequency along the $t$-axis. Furthermore, the relationship between sampling frequency ${\Omega _c}$ and camera spacing ${\Delta t}$ is ${\Omega _c} = {{2\pi } \mathord {\left / {\vphantom {{2\pi } {\Delta t}}} \right. } {\Delta t}}$. To anti-alias for IBR, based on (32), the camera spacing can be determined by

(33)$$\Delta t = \frac{{2\pi {z_a}\left( {\Lambda \left( {s,{\lambda _s}} \right)s \cdot \sin\;\left( \phi \right) + {z_0} - \left| {\cos\;\left( \phi \right)} \right|\frac{{{v_0}}}{f}} \right)}}{{{k_t}{\omega _c}\left( {{z_{\max }} - {z_0}} \right)}},$$

where ${{z_a}}$ is the average depth. In this expression, the plane surface shape (e.g., the coordinates of the plane surface) is represented using (2). Based on the above analysis, the sampling rate for the light field also depends on the texture information of scene surface. Therefore, the essential frequency with respect to the texture information is an important factor for LFS.

6.3 Filtering and reconstruction

To eliminate interference caused by texture information variation and remove unwanted replicate spectra, we can design an adaptive filter. Given the essential bandwidth frequencies ${B_t}$ and ${B_v}$, we now attempt to reconstruct the complex texture various scene with the adaptive filter. Based on (25) and (30), the adaptive filter is expressed as as

(34)$${H_{opt}}\left( {{\Omega _t},{\Omega _v}} \right) = \frac{{{P^*}\left( {{\Omega _t},{\Omega _v}} \right)}}{{{B_t} + {B_v}}}{e^{ - j\left( {{\Omega _t} + {\Omega _v}} \right)}}.$$

Additionally, to reconstruct the complex texture various scene after eliminating any interference resulting from texture information, we still require a skew filter. Using the skew of the essential bandwidth, we can obtain the skew filter, which is expressed as

(35)$${F_R} = \frac{{\Lambda \left( {s,{\lambda _s}} \right) \cdot \left| {{z_{\max }} + {z_{\min }}} \right|}}{{2\pi f}}.$$

To finish the scene, we are able to determine a new expression for a new reconstruction filter by determining the essential bandwidth of a slanted plane. Using (34) and (35), the loss of local information from captured images and some aliasing of rendered views can be compensated to improve the rendering quality of the light field for the complex texture various scene.

7. Experimental results and analysis

In this section, we evaluate the performance of the proposed light field spectrum of texture information. Additionally, we demonstrate the performance of the reconstruction algorithm with respect to different sampling rates.

7.1 The light field spectrum with texture information

To evaluate the effect of texture information on the light field spectrum, we detect the spectrum of three synthetic scenes, all rendered from 3Ds Max. As shown in Figs. 10(a1)–(f1), to study the influence of texture on the light field spectrum, we attempt to avoid the interference of other factors and map the texture information to a plane for the experiments.

Fig. 10. Plenoptic spectral support with respect to texture information variations. Diagrams (a1)–(f1) show the walls with different complexities of texture information. Diagrams (a1)–(c1) show the first, second and third walls, respectively, and their slant angle is $\frac {\pi }{4}$ rad. Diagrams (d1)–(f1) show the fourth, fifth and sixth walls, respectively, and their slant angle is $\frac {3\pi }{4}$ rad. Diagrams (a2)–(f2) show the EPI volumes for the corresponding walls. Diagrams (a3)–(f3) show the spectra for different walls calculated from the corresponding EPI volumes.

Download Full Size | PDF

In the presented simulations, the spectra of the different scenes with different texture information are measured. Figures 10(a2)–(f2) show that the depths of each wall surface are variations that lead to differences in the slopes of the EPIs of each wall. For example, the EPIs in Fig. 10(a2) are comprised of a set of tilted lines, and the slopes of these tilted lines are different. Note that the slope of the EPI will increase as the depth becomes larger. Nevertheless, when the texture information of different wall surfaces changes, then the texture information of the EPI for different walls also changes. For example, the slope of the EPI in Figs. 10(a2), (b2), and (c2) is the same. However, the texture information of these EPIs is different because the texture information of these walls is different. Note that the texture information of the EPI will be more complicated as the texture information of the wall becomes more complicated. This phenomenon is also shown in Figs. 10(d2), (e2), and (f2). Therefore, the EPI depends on the depth and the texture information of the object surface.

Additionally, the corresponding spectra of the walls are shown in Figs. 10(a3)–(f3). Fig. 10(a3) shows that the spectral support consists of three intersecting lines. In particular, the area between the two diagonal lines is without padding. As in Fig. 10(a3), the area between the two diagonal lines is also without padding in Fig. 10(d3). Although the slant angle is different for the first wall and the fourth wall in Figs. 10(a1) and (d1), their spectrum supports are the same as those shown in Figs. 10(a3) and (d3). The reason is that the minimum depth and maximum depth are the same for the two walls. The two diagonal lines of the spectrum are related only to the minimum depth and maximum depth of the wall.

Furthermore, the area between the two diagonal lines of the spectral support is filled as the texture information of the wall plane becomes more complicated, as shown in Figs. 10(b3), (c3), (e3) and (f3). Figures 10(b3) and (c3) show that the padding between two diagonal lines of the spectrum is more complicated and fatter that shown in Fig. 10(a3). For example, the labeled area near the red rectangles in Fig. 10(b3) is fatter than the labeled area near the red rectangles in Fig. 10(a3). The reason for this phenomenon is that the texture information of the second wall is more complicated than the that of the first wall. Similarly, the two spectra in Figs. 10(e3) and (f3) are more complicated and fatter that those in Fig. 10(d3). The reason is that the texture information of the fifth wall and sixth wall is more complicated than that of the fourth wall.

The above examples provide a good idea of what the light field spectrum is and how it is influenced by texture information. We can consider that the light field spectrum support is broadened as the texture information becomes more complicated. Based on these characteristics, we can determine the relationship between the texture information and spectrum to derive the LFS theorem.

7.2 Light field spectrum for actual scenes

The light field spectral supports were measured using three different planes, as presented in Figs. 11(a1)–(c1). The color of the first plane was red, as shown in Fig. 11(a1). The color of the second plane varied, as shown in Fig. 11(b1). Two pictures were fixed on the third plane surface, as shown in Fig. 11(c1). In the real experiments, a camera was applied to capture every plane, as shown in Fig. 12. For every plane, 175 images were captured along a straight line. Note that the separation between each two captured positions was 0.5 cm. This parameter setting was random and did not affect the experimental results.

Fig. 11. A case of spectral support for the light field function drawn from real-world scenes. Three planes 122.0 cm wide and 82.0 cm tall are used to perform the experiment. The slant angles are set to $\frac {\pi }{4}$ rad. The first plane is shown in (a1), its EPI is shown in (a2), and its spectral support level is shown in (a3). (b1) shows the second plane, its EPI is shown in (b2), and its spectral support level is shown in (b3). (c1) shows the third plane, its EPI is shown in (c2), and its spectral level is shown in (c3).

Download Full Size | PDF

Fig. 12. The data acquisition setup for a camera and a rail. The camera is uniformly moved along a rail as the scene is captured.

Download Full Size | PDF

Echoing the simulation results presented in Figs. 10(a2)–(f2), Fig. 11(a2) shows that the slopes of the EPIs differ as the depths of the plane vary. Furthermore, the slope of the EPI increases with depth. Additionally, the texture information of the EPI of the three planes varies as the texture information between the three planes differs. For example, when the texture information of a plane surface varies, as shown in Figs. 10(b1) and (c1), the texture information of the corresponding EPI varies, as shown in Figs. 11(b2) and (c2).

Finally, the corresponding spectral support for the three planes is shown in Figs. 11(a2)–(c2). Similar to the spectral support patterns shown in Fig. 10(a3), those shown in Fig. 11(a3) include two regions with three intersecting lines. Additionally, the area between the minimum and maximum spectrum lines is not padded in Fig. 11(a3), as the texture information of the first plane remains unchanged. The area between the minimum and maximum spectrum lines is filled as the plane texture information becomes more complex, as shown in Figs. 11(b3) and (c3). Figures 11(b3) and (c3) show that the padding between the two diagonal lines of the spectrum is thicker and more complex than that shown in Fig. 11(a3), as the texture information of the second and third planes is more complex than that of the first plane. Therefore, the results of the experiment are the same as the results of the simulation.

7.3 EPI Reconstructions for novel rendered views

We use an experimental method to obtain the minimum sampling rate of light field, which is compared with the minimum sampling rate obtained by theoretical calculation. This comparison can be used to verify that the theoretical calculation is correct. We now analyze reconstructions of EPI volumes generated from different numbers of captured images for the three synthetic scenes shown in Figs. 10(a1), (b1), and (c1). The experimental studies were carried out in a static scene with images captured at different positions along a line. The number of capture images was changed every 25 positions in the range of [10, 425] to study the influence of the sampling rate on the rendering quality. The reconstruction method was applied based on the interpolation of nearby images [46].

In particular, the rendered EPI volumes for the corresponding walls are shown in Figs. 13(a)–(c). Fig. 13 shows serious aliasing patterns for the rendered EPI volumes when the sampling rate reaches 25. Furthermore, when the sampling rate reaches 25, the aliasing of the second and third walls shown in Figs. 13(b) and (c) is more serious than the aliasing of the first wall shown in Fig. 14(a). This phenomenon can also be observed from the rendered EPI volumes when the sampling rate is 75 and 125, as the texture information complexities of the second and third walls are greater than those of the first wall. The number of captured images is also too low; thus, the rendered results are blurry. The experimental results show that rendering quality levels decline as the texture information becomes more complex.

Fig. 13. The reconstruction of synthetic EPIs using different sampling rates. (a) shows the rendered EPI volume of the first plane. The captured images involve six cases: 25, 75, 125, 300, 325, and 350. (b) shows the rendered EPI volume of the second plane. (c) shows the rendered EPI volume of the third plane. Red rectangles denote the blurry areas of the rendered views. In (b) and (c), the captured images involve six cases: 25, 75, 125, 350, 375, and 400.

Download Full Size | PDF

Fig. 14. Novel rendered views for the corresponding EPI volume of the different planes shown in Fig. 9. (a) The view is a rendered virtual view for rendered virtual views of the first wall; diagram (b) shows one of the rendered virtual views of the second wall; diagram (c) shows one of the rendered virtual views of the third wall. Diagrams (a1), (a2) and (a3) show the rendering results of the first wall when the number of captured images is 75, 300, and 325, respectively. Diagrams (b1)–(c1), (b2)–(c2), and (b3)–(c3) show the rendering results of the corresponding walls when the number of captured images is 125, 375, and 400, respectively. Diagrams (a4), (b4) and (c4) show the ground truths. Red rectangles denote the blurry areas of the rendered views.

Download Full Size | PDF

However, the aliasing will be reduced as the sampling rate increases. For example, aliasing can be observed when the sampling rate is 125, such as in the red rectangles labeled in Fig. 13(a) for the first wall. Aliasing can also be observed in Fig. 14(a1), which is one rendered view of the corresponding 450 rendered virtual views. There is aliasing for the rendered view because the sampling rate is too low. For the first wall, there is minimal aliasing when the number of captured images is 300, as shown in Fig. 13(a) and Fig. 14(a2), and the minimal aliasing is barely visible. When the number of captured images is 325, no aliasing is observed, as shown in Fig. 13(a) and Fig. 14(a3). Additionally, the PSNR will increase as the sampling rate increases, as shown in Fig. 15. Nevertheless, the variations in the PSNR are very small when the number of captured images is greater than 325. For the second and third walls, the phenomena are the same as those shown in Figs. 12-14. Nevertheless, the sampling rate is different for the different walls due to the texture information of the scene surface.

Fig. 15. The PSNRs are the result of the corresponding EPIs of different planes.

Download Full Size | PDF

More interestingly, the PSNR improves dramatically when the sampling rate is increased, but it improves very slowly when the sampling rate is greater than 275. The reason for this phenomenon is that the influence of the sampling rate on the rendering quality is very small when the sampling rate increases to a certain amount. However, the variation tendency differs for different scenes. The variation in the PSNR for the first wall with the sampling rate is gentle compared to the PSNR of the second and third walls. For example, the variation in the PSNR shown in Fig. 15 for the first wall is gentler than the PSNR for the second wall because the texture information of the second wall is more complicated than that of the first wall. Based on the above experimental results, the required number of captured images will increase as the texture information complexity of the wall surface increases.

7.4 Comparison results for data sets

To evaluate the proposed STTI, we compared the results with three publicly available datasets from Fish [47] and Cones and Teddy [48]. The method of view reconstruction uses bilinear interpolation, taking into account two situations: considering texture information (i.e., STTI) following (35) and not considering texture information. For the situation in which texture information is not considered, we compared the accuracy levels based on an alternative solution termed the closed-form expression of the light field spectrum, which is based on the assumption that a scene surface consists of a set of slanted planes (SCSS) [14]. The reconstruction filter presented by Chai [11] is selected. This method only uses the minimum depth and the maximum depth (USMM). Additionally, a light field reconstruction filter considered geometric shapes (RFGS) [15] is also selected to compare the reconstruction results.

The results for the Fish scene are presented in Figs. 16(a1)(a2)(a3)(a4) and (a5). Obviously, the view reconstructed by STTI is better than that reconstructed by SCSS and USMM and RFGS, as shown in the top row of the figure. When nonconsidered texture information is present, there is obvious distortion in the rendered views of Figs. 16(a3)(a4)(a5) (the red box indicates a more distorted area). The PSNRs of Figs. 16(a2)(a3)(a4) and (a5) are 32.97 dB and 31.09 dB and 29.87 dB and 30.09 dB, respectively. The same phenomenon can be seen in the corresponding reconstructed images shown in Figs. 16(b3)(b4)(b5) and (c3)(c4)(c5) for the Cones and Teddy scenes. The PSNRs are presented in Table 2. Based on the experimental results of these two data sets, we also find that the quality of the reconstructed viewpoints can be significantly improved when occlusion is considered.

Fig. 16. The rendered novel views for Fish and Cones and Teddy scenes using the methods of STTI and SCSS and USMM and RFGS. Diagrams (a1)(b1)(c1) show the ground truths of the rendered novel views. Diagrams (a2)(b2)(c2) depict rendered novel views derived from the proposed STTI method. Diagrams (a3)(b3)(c3) illustrate corresponding rendered novel views derived from the SCSS method. Diagrams (a4)(b4)(c4) illustrate corresponding rendered novel views derived from the USMM method. Diagrams (a5)(b5)(c5) illustrate corresponding rendered novel views derived from the RFGS method.

Download Full Size | PDF

Table 2. The PSNRs of rendered views using our STTI and SCSS and USMM and RFGS for the data-sets, (dB).

View Table | View all tables in this article

7.5 Comparison of actual scenes

To evaluate the proposed occlusion model, the rendering experiments were also carried out using three actual scenes depicted in Figs. 17(a1)(b1) and (c1): named car, inflatable castle and statue, respectively. The three actual scenes are captured using a camera along a track. The rendering quality of the novel views is measured using STTI, SCSS, USMM and RFGS.

Fig. 17. The rendered novel views of actual scenes for car and inflatable castle and statue scenes using the methods of STTI and SCSS and USMM and RFGS. Diagrams (a1)(b1)(c1) show the ground truths of the rendered novel views. Diagrams (a2)(b2)(c2) depict rendered novel views derived from the proposed method. Diagrams (a3)(b3)(c3) illustrate corresponding rendered novel views derived from the SCSS method. Diagrams (a4)(b4)(c4) illustrate corresponding rendered novel views derived from the USMM method. Diagrams (a5)(b5)(c5) illustrate corresponding rendered novel views derived from the RFGS method.

Download Full Size | PDF

For the rendering results, Fig. 17 shows that the reconstructed views contain some obvious ghosting and aliasing in either the car scene or the inflatable castle scene and statue scene. Apparently, the rendering quality of Fig. 17(a2) is better than that of Figs. 17(a3)(a4)(a5). The ghosting occurs because the scene complexity, such as the leafage and irregular shape, are challenging to reconstruct accurately. However, when using the proposed STTI, the ghosting and aliasing in the rendered views will decrease. The same phenomena occur between Figs. 17(b2)(b3)(b4) and (b5) for the inflatable castle scene and Figs. 17(c2)(c3)(c4) and (c5) for the statue scene. The PSNRs of the corresponding rendered views are presented in Table 3. These results are the same as those of the datasets. Therefore, our STTI can effectively improve the quality of view rendering.

Table 3. The PSNRs of rendered views using our STTI and SCSS and USMM and RFGS for the actual scenes, (dB).

View Table | View all tables in this article

8. Conclusion

In this paper, we have presented a spectral analysis framework for sampling the light field signal using a Fourier transform. The purpose is to solve the plenoptic sampling. The main feature of the method is to analyze the influence of texture information on the spectrum of the light field signal. We have shown that variations in the texture information of the plane surface can broaden the spectrum of the light field signal. Finally, using the spectral analysis of the light field signal, the spectral analysis framework can be used to analyze the sampling rate of the light field signal.

Funding

Natural Science Foundation of Guangxi Province (2018GXNSFAA281195, 2019AC20121); National Natural Science Foundation of China (61961005); Scientific Research and Technology Development Program of Guangxi (KY2015YB367).

Disclosures

The authors declare no conflicts of interest.

References

1. H.-Y. Shum, S. C. Chan, and S. B. Kang, “Image-based rendering,” Springer-Veg, New York, NY, USA (2007).

2. H.-Y. Shum, S. Kang, and S.-C. Chan, “Survey of image-based representations and compression techniques,” IEEE Trans. Circuits Syst. Video Technol. 13(11), 1020–1037 (2003). [CrossRef]

3. C. Zhang and T. Chen, “A survey on image-based rendering-representation, sampling and compression,” EURASIP Signal Process. Image Commun. 19(1), 1–28 (2004). [CrossRef]

4. M. Liu, C. Lu, H. Li, and X. Liu, “Bifocal computational near eye light field displays and structure parameters determination scheme for bifocal computational display,” Opt. Express 26(4), 4060–4074 (2018). [CrossRef]

5. A. Kubota, A. Smolic, M. Magnor, M. Tanimoto, T. Chen, and C. Zhang, “Multiview imaging and 3dtv,” IEEE Signal Process. Mag. 24(6), 10–21 (2007). [CrossRef]

6. M. Levoy, “Light fields and computational imaging,” IEEE Comput. 39(8), 46–55 (2006). [CrossRef]

7. Y. Ma, J. Zheng, and J. Xie, “Foldover-free mesh warping for constrained texture mapping,” IEEE Trans. Visual. Comput. Graphics 21(3), 375–388 (2015). [CrossRef]

8. J. Zhang, Z. Fan, D. Sun, and H. Liao, “Unified mathematical model for multilayer-multiframe compressive light field displays using lcds,” IEEE Trans. Visual. Comput. Graphics 25(3), 1603–1614 (2019). [CrossRef]

9. C. Koniaris, M. Kosek, D. Sinclair, and K. Mitchell, “Compressed animated light fields with real-time view-dependent reconstruction,” IEEE Trans. Visual. Comput. Graphics 25(4), 1666–1680 (2019). [CrossRef]

10. J. Berent and P. L. Dragotti, “Plenoptic manifolds,” IEEE Signal Process. Mag. 24(6), 34–44 (2007). [CrossRef]

11. J.-X. Chai, X. Tong, S.-C. Chan, and H.-Y. Shum, “Plenoptic sampling,” in Proc. SIGGRAPH., (2000), pp. 307–318.

12. C. Zhang and T. Chen, “Spectral analysis for sampling image-based rendering data,” IEEE Trans. Circuits Syst. Video Technol. 13(11), 1038–1050 (2003). [CrossRef]

13. M. N. Do, D. Marchand-Maillet, and M. Vetterli, “On the bandwidth of the plenoptic function,” IEEE Trans. on Image Process. 21(2), 708–717 (2012). [CrossRef]

14. C. Gilliam, P. Dragotti, and M. Brookes, “On the spectrum of the plenoptic function,” IEEE Trans. on Image Process. 23(2), 502–516 (2014). [CrossRef]

15. C. J. Zhu and L. Yu, “Spectral analysis of image-based rendering data with scene geometry,” Multimed. Syst. 23(5), 627–644 (2017). [CrossRef]

16. J. Pearson, M. Brookes, and P. L. Dragotti, “Plenoptic layer-based modeling for image based rendering,” IEEE Trans. on Image Process. 22(9), 3405–3419 (2013). [CrossRef]

17. O. H. Kwon, C. Muelder, K. Lee, and K. L. Ma, “Plenoptic layer-based modeling for image based rendering,” IEEE Trans. Visual. Comput. Graphics 22(7), 1802–1815 (2016). [CrossRef]

18. E. H. Adelson and J. R. Bergen, “The plenoptic function and the elements of early vision,” in Computational Models of Visual Processing. MIT Press, Cambridge, MA, USA, pp. 3–20 (1991).

19. L. McMillan and G. Bishop, “Plenoptic modeling: An image-based rendering system,” in Computer Graphics (SIGGRAPH’95), (1995), pp. 39–46.

20. M. Levoy and P. Hanrahan, “Light field rendering,” in Proc. SIGGRAPH., (1996), pp. 31–40.

21. S. Gortler, R. Grzeszczuk, R. Szeliski, and M. Cohen, “The lumigraph,” in Proc. SIGGRAPH., (1996), pp. 43–54.

22. J. Shade, S. Gortler, L. W. He, and R. Szeliski, “Layered depth images,” in Proc. SIGGRAPH., (1998), pp. 231–242.

23. M. L. Pendu, C. Guillemot, and A. Smolic, “A fourier disparity layer representation for light fields,” IEEE Trans. on Image Process. 28(11), 5740–5753 (2019). [CrossRef]

24. M. L. Pendu, C. Guillemot, and A. Smolic, “Silhouette-aware warping for image-based rendering,” Proc. of Computer Graphics Forum 30(4), 1223–1232 (2011). [CrossRef]

25. C. Zhu, H. Zhang, and L. Yu, “Structure models for image-assisted geometry measurement in plenoptic sampling,” IEEE Trans. Instrum. Meas. 67(1), 150–166 (2018). [CrossRef]

26. F. Durand, N. Holzschuch, C. Soler, E. Chan, and F. X. Sillion, “A frequency analysis of light transport,” ACM Trans. on Graphics (TOG) 24(3), 1115–1126 (2005). [CrossRef]

27. C. Zhu, L. Yu, Z. Yan, and S. Xiang, “Frequency estimation of the plenoptic function using the autocorrelation theorem,” IEEE Trans. Comput. Imaging 3(4), 966–981 (2017). [CrossRef]

28. T. C. Wang, J. Y. Zhu, N. K. Kalantari, A. A. Efros, and R. Ramamoorthi, “Light field video capture using a learning-based hybrid imaging system,” ACM Trans. Graph. 36(4), 1–13 (2017). [CrossRef]

29. X. Cao, Z. Geng, T. Li, M. Zhang, and Z. Zhang, “Accelerating decomposition of light field video for compressive multi-layer display,” Opt. Express 23(26), 34007–34022 (2015). [CrossRef]

30. M. Volino, A. Mustafa, J. Y. Guillemaut, and A. Hilton, “Light field compression using eigen textures,” in 2019 International Conference on 3D Vision (3DV), (Sept. 2019), pp. 16–19.

31. J. Stewart, J. Yu, S. J. Gortler, and L. McMillan, “A new reconstruction filter for undersampled light fields,” in ACM International Conference Proceeding Series, (2003), pp. 150–156.

32. H. Hoshino, F. Okano, and I. Yuyama, “A study on resolution and aliasing for multi-viewpoint image acquisition,” IEEE Trans. Circuits Syst. Video Technol. 10(3), 366–375 (2000). [CrossRef]

33. G. Wu, M. Zhao, L. Wang, Q. Dai, T. Chai, and Y. Liu, “Light field reconstruction using deep convolutional network on epi,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), pp. 1638–1646.

34. G. Wu, Y. Liu, L. Fang, Q. Dai, and T. Chai, “Light field reconstruction using convolutional network on epi and extended applications,” IEEE Trans. Pattern Anal. Mach. Intell. 41(7), 1681–1694 (2019). [CrossRef]

35. S. Vagharshakyan, R. Bregovic, and A. Gotchev, “Light field reconstruction using shearlet transform,” IEEE Trans. Pattern Anal. Mach. Intell. 40(1), 133–147 (2018). [CrossRef]

36. R. Bolles, H. Baker, and D. Marimont, “Epipolar-plane image analysis: An approach to determining structure from motion,” Int. J. Comput. Vis. 1(1), 7–55 (1987). [CrossRef]

37. L. Shi, H. Hassanieh, A. Davis, D. Katabi, and F. Durand, “Light field reconstruction using sparsity in the continuous fourier domain,” ACM Trans. Graph. 34(1), 1–13 (2014). [CrossRef]

38. C. Liu, J. Qiu, and M. Jiang, “Light field reconstruction from projection modeling of focal stack,” Opt. Express 25(10), 11377–11388 (2017). [CrossRef]

39. H. T. Nguyen and M. N. Do, “Error analysis for image-based rendering with depth information,” IEEE Trans. Image Process 18(4), 703–716 (2009). [CrossRef]

40. H. T. Nguyen and M. N. Do, “Texture information in run-length matrices,” IEEE Trans. on Image Processing 7(11), 1602–1609 (1998). [CrossRef]

41. M. M. Galloway, “Texture analysis using gray level run lengths,” Comput. Graphics Image Process. 4(2), 172–179 (1975). [CrossRef]

42. M. Unser and M. Eden, “Multiresolution feature extraction and selection for texture segmentation,” IEEE Trans. Pattern Anal. Machine Intell. 11(7), 717–728 (1989). [CrossRef]

43. R. M. Haralick, K. S. Shanmugam, and I. Dinstein, “Texture analysis using gray level run lengths,” IEEE Trans. Syst., Man, Cybern. SMC-3(6), 610–621 (1973). [CrossRef]

44. M. E. Jernigan and F. D’Astous, “Entropy-based texture analysis in the spatial frequency domain,” IEEE Trans. Pattern Anal. Machine Intell. PAMI-6(2), 237–243 (1984). [CrossRef]

45. M. Unser and M. Eden, “Communications in the presence of noise,” Proc. IREE 37, 10–21 (1949).

46. P. Vaidyanathan, “Multirate systems and filter banks,” in Prentice-Hall, Englewood Cliffs, NJ, (1992).

47. S. Toyohiro, “Nagoya university multi-view sequences,” in http://www.fujii.nuee.nagoya-u.ac.jp/multiview-data, (2020).

48. in http://www.fujii.nuee.nagoya-u.ac.jp/multiview-data, (2020).

$P_{7} (θ, ϕ, λ, τ, V_{x}, V_{y}, V_{z})$	The plenoptic function
$p (t, ξ, u, v)$	The 4D light field
$p (t, v)$	The 2D light field and the EPI
$p (t, v, λ_{s})$	The light field with the texture
$l (s, θ)$	The surface light field
$l (s, θ, λ_{s})$	The surface light field with the texture
$(s, r)$	Two curvilinear surface coordinates
$S (s, r)$	The plane surface
$Λ (s, λ_{s})$	texture signal
$t, ξ$	The camera position in a plane
$u, v$	The virtual view position in a plane
$θ$	The direction of light ray
$Ω_{t}$	The spatial frequency
$Ω_{v}$	The imaging frequency
$f$	Focal length
$ω_{s}$	The frequency of texture information
$Δ t$	The camera spacing
$Δ v$	The pixel spacing
$v_{0}$	The maximum of FOV
$z (x)$	The depth of scene object
$F_{R}$	The skew filter

Methods	Car	Inflatable castle	Statue
STTI	29.15	28.87	30.54
SCSS	26.43	27.92	29.95
USMM	26.24	26.68	27.85
RFGS	27.78	27.37	28.12

$P_{7} (θ, ϕ, λ, τ, V_{x}, V_{y}, V_{z})$	The plenoptic function
$p (t, ξ, u, v)$	The 4D light field
$p (t, v)$	The 2D light field and the EPI
$p (t, v, λ_{s})$	The light field with the texture
$l (s, θ)$	The surface light field
$l (s, θ, λ_{s})$	The surface light field with the texture
$(s, r)$	Two curvilinear surface coordinates
$S (s, r)$	The plane surface
$Λ (s, λ_{s})$	texture signal
$t, ξ$	The camera position in a plane
$u, v$	The virtual view position in a plane
$θ$	The direction of light ray
$Ω_{t}$	The spatial frequency
$Ω_{v}$	The imaging frequency
$f$	Focal length
$ω_{s}$	The frequency of texture information
$Δ t$	The camera spacing
$Δ v$	The pixel spacing
$v_{0}$	The maximum of FOV
$z (x)$	The depth of scene object
$F_{R}$	The skew filter

Methods	Car	Inflatable castle	Statue
STTI	29.15	28.87	30.54
SCSS	26.43	27.92	29.95
USMM	26.24	26.68	27.85
RFGS	27.78	27.37	28.12

Frequency analysis of light field sampling for texture information

Abstract

1. Introduction

2. Related work

2.1 Scene signal quantization

2.2 Light field sampling

2.3 Light field reconstruction

3. Proposed light filed sampling algorithm

4. Signal model of 3D scene

4.1 Light field parameterization

4.2 Plane surface representation

4.3 Texture information quantification

5 Spectral analysis of the light field signal

5.1. Spectral models of the light field

5.2 Spectral analysis with texture information

6. Sampling theory of the light field

6.1 Spectral support analysis

6.2 Sampling theorem with texture information

6.3 Filtering and reconstruction

7. Experimental results and analysis

7.1 The light field spectrum with texture information

7.2 Light field spectrum for actual scenes

7.3 EPI Reconstructions for novel rendered views

7.4 Comparison results for data sets

7.5 Comparison of actual scenes

8. Conclusion

Funding

Disclosures

References

Cited By

Figures (17)

Tables (3)

Equations (35)

Optics Express

Methods	Fish	Cones	Teddy
STTI	32.97	31.78	31.06
SCSS	31.09	29.03	28.90
USMM	29.87	29.02	28.54
RFGS	30.09	30.68	29.39