Abstract
Light field sampling (LFS) theory can properly reduce minimum sampling rate while ensuring that novel views are not distorted for image-based rendering (IBR). The minimum sampling rate is determined by spectral support of light field. The spectral support of light field has studied the influence of the following factors: the minimum depth and the maximum depth, non-Lambertian reflections, whether the scene surfaces are flat, maximum frequency of painted signals. In this paper, we further perfect the light field spectrum analysis from the quantitative description of scene texture information based on the existing spectrum analysis theory. The quantification of texture information can be interactively refined via detected regional entropy. Thus, we can derive a spectral analytical function of light field with respect to texture information. The new function allows the spectral support of light field to be analyzed and estimated for different texture information associated with scene objects. In this way, we limit the spectral analysis problems of light field to those of a simpler signal. We show that this spectral analysis approach can be easily extended to arbitrary scene complexity levels, as we simplify the LFS of complex scenes to a plane. Additionally, the spectral support of light field broadens as the plane texture information becomes more complex. We present experimental results to demonstrate the performance of LFS with texture information, verify our theoretical analysis, and extend our conclusions on the optimal minimum sampling rate.
© 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement
1. Introduction
The technology of computing novel views from a set of existing captured images is usually referred to as image-based rendering (IBR) [1,2,3,4]. IBR does not require detailed scene information and can generate a novel view using a set of captured images, so it can be widely used in virtual reality (VR), free viewpoint TV (FTV) and 3DTV [5,6,7]. For instance, we use a limited number of cameras or use a camera to capture scenes in a limited number of positions, as shown in Fig. 1(a), and then render arbitrary novel view from those captured images in Fig. 1(b). Obviously, to prevent distortion of the novel rendered views, it is necessary to capture a sufficient number of multi-view images [8,9]. The sufficient number of multi-view images can be determined by light field sampling theory, i.e., minimum sampling rate for alias-free rendering of IBR.
The minimum sampling rate of IBR is mainly determined by the scene attributes and camera configuration (i.e., direction and position). The complexity of scene information and the multidimensional nature of signals make it difficult to determine the sampling rate. Particularly, scene complexity make the solution of this problem extremely difficult [10]. To determine the sampling rate, we usually reduce the dimension of the stereo signals or deduce only one scene attribute. The previous literature shows that depth [11], surface reflection [12], smoothness [13,14] and geometry [15] have a profound impact on the sampling rate of IBR. These theories are based on ideal assumptions related to studying and deducing the sampling rate theory from one aspect. These sampling theories of IBR are not fully realized in practice, as some aliasing is always present, and aliasing has an impact on rendered output images [16,17]. Therefore, we need to study the impact of scenario attributes on sampling rate theory to derive a comprehensive and practical theory. We observed that scene textures will have a profound impact on the sampling rate of IBR. The difficulty with this problem is that the quantification of texture information is very difficult because its changes are random and irregular. Therefore, we propose a sampling theory of IBR based on texture information (STTI) quantification by using light wave signals in this paper.
Our theory is motivated by insights from the information captured by the camera is composed of regular light rays. Thus, we can analyze the textures in the light field and derive the minimum theoretical sampling rate. Because other attributes of the scene, such as geometric shape, also affect the sampling of IBR, to reflect the impact of texture information on the sampling, we map the texture of the complex scene to a plane to analyze it. We can then study the spectral properties of the IBR of a plane to obtain the spectral properties of an irregularly shaped scene. The key to this result is an achievable scheme called spectral analysis of IBR that is especially relevant to the properties of a scene. We demonstrate the application of our formulations to solve this problem for actual scenes and synthetic scenes. Some preliminary results of this report were presented in [15]. In [15], we studied the geometric changes in the scene on the plenoptic spectrum. Based on this work, we studied the influence of scene texture changes on the plenoptic spectrum. In summary, our main contributions are as follows:
$\bullet$ We first study the regularity of texture information and various phenomena of the IBR in terms of mathematical quantization and propose a quantization model based on color variables.
$\bullet$ In the existing theory, STTI is a detailed study of the influence of texture information on the IBR. The STTI can effectively describe the changes of texture information with changes in the camera position, thus improving the captured information and the rendering quality of a scene.
$\bullet$ Our results can be applied to determine the minimum sampling rate for IBR. Our proposed method is more accurate than the existing methods described in the literature. For a complex scene, such as a scene with strong color variations, STTI improves the richness of the scene capture information.
The outline of this paper is as follows. Related work is presented in Section 2. Section 3 presents on the light field parameterization and signal model of scene. The spectrum analysis of light field is presented in Section 4. The sampling theory of light field is presented in Section 5. The algorithm evaluation for different data sets and a comparison with the state of the art is presented in Section 6. Finally, the work is concluded in Section 7.
2. Related work
2.1 Scene signal quantization
To study the sampling of IBR data, one must use a method to represent the captured information of a 3D scene. For 3D scenes, we must describe the relationship between cameras and scenes. As the position and direction of the camera change, how we use the camera to capture scene information will also change significantly. This change will create difficulties for the quantization of scene signals. Generally, the captured scene information can be regarded as a set of light ray components. For light ray parameterization, the best-known approach is the plenoptic function introduced by Adelson and McMillan [18,19]. Thus, the IBR problem can be treated as an application of sampling theory to the pleonptic function. Using the plenoptic function, the light ray’s captured position, looking direction, wavelength and a time can be parameterized. The plenoptic function can be denoted by seven dimensions, $P = {P_7}\left ( {\theta ,\phi ,\lambda ,\tau ,{V_x},{V_y},{V_z}} \right )$. Generally, it is very difficult to deal with the seven dimensions; therefore, we can make certain assumptions to reduce the dimensionality. For example, if we use two parallel planes, the camera plane $\left ( {t,\xi } \right )$ and image plane $\left ( {v,u} \right )$ represent a light ray position and direction when the wavelength and time are fixed, as shown in the 4D light field in Fig. 2 [20,21]. Thus, the sampling problem of IBR can be regarded as light field sampling (LFS).
2.2 Light field sampling
LFS is an information theoretic concept of IBR in which novel views are directly synthesized using a set of captured multi-view images [1]. The traditional approach for LFS in such 3D scenes is a sampling theory that is sufficiently small to avoid aliasing effects, where the minimum sampling rate is determined by the spectral support of the sampled light field within the smallest interval [2,3,11]. Recent work in LFS theory, however, has shown that the spectral support is bounded by the depths and scene surface under certain assumptions [12,13,14,15]. Unfortunately, for more complicated scenes, such as non-Lambertian reflections and occlusions, the plenoptic spectral support is unknown. In real-world complex scenes, some aliasing of synthesized views always exists [6,10]. Therefore, at a coarse level, the above spectral analysis results and their information theoretic basis cannot be used in practice.
The sampling scheme within the lower bound of the sampling rate has been shown to apply depth information. The necessary minimum sampling rate can be reduced, and the rendering quality can be improved when depth information has been utilized [11]. In this case, the utilized depth information extraction is a critical process for LFS. Layered depth [22,23] is a well-known method. In [16], Pearson et al. proposed an automatic layer-based method for LFS. The depth layers for the matched curve of the scene surface geometry were chosen based on the scene distribution of the layers placed nonuniformly. To avoid sparse depth information, Chaurasia et al. [24] applied the silhouette-aware warping method to select important depth by silhouette edges, which enables depth-preserving variational warping. Similarly, Zhu et al. presented a simpler structure model to quantify the pivotal geometric information of a complex scene. This approach was not dependent on dense accurate geometric reconstruction; instead, the authors compensated for sparse 3D information by using single salient points [25].
LFS rates have also been studied from other angles. For instance, a signal-processing framework based on the frequency analysis method for light transport is presented in [26] as closed-form expressions of the radiance of light rays. Light ray expressions are altered through phenomena such as shading and occlusion and can be studied through the frequency analysis of light transport. Expressions of light ray transport can be used to determine the LFS. Additionally, the authors mathematically derive a frequency estimation function of light field signals using the autocorrelation theory in [27] in the spatial domain. This method can simplify the complexity of the frequency estimation of the light field. Additionally, the learning method can be applied to study light field capturing [28]. Some work has been performed to reduce the sampling rate of light fields from the study of sparse optical fields [29,30]. In contrast to these prior works, we aim to examine the spectral properties of the light field based on the texture information of a 3D scene.
2.3 Light field reconstruction
Generally, a continuous signal can be reconstituted by filter interpolation from a set of captured images. In [31], Stewart et al. used a reconstruction filter to reduce the sampling rate. They presented a linear, spatially invariant reconstruction filter for reconstructing the plenoptic function. Similarly, Hoshino et al. [32] designed an appropriate prefilter to study multiview image acquisition to limit the aliasing of reconstituted images. In [33,34] Wu et al. took advantage of the clear texture structure of the epipolar plane image (EPI) in the light field data and modeled the problem of light field reconstruction from a sparse set of views as a CNN-based angular detail restoration on EPI. Vagharshakyan et al. [35] proposed using the shearlet transform to study the sampling and reconstitution of light fields. They used a sparsely represented EPI [36] in a directionally sensitive transform domain obtained from an adapted discrete shearlet transform. In [37], Shi et al. used the sparsity in the continuous Fourier domain to study light field reconstruction. Liu et al. proposed a method of the light field reconstruction from projection modeling of focal stack [38]. Jin et al. studied the light field reconstruction by approximation and blind theory to improve the rendering quality. In this paper, the purpose of studying the minimum sampling rate of light fields is to improve the quality of viewpoint reconstruction.
3. Proposed light filed sampling algorithm
Our developed light field sampling algorithm includes sampling and reconstruction as shown in Fig. 3. First, a light field is used to describe a 3D scene signal, that is, a position and a direction of capturing scene information. Then, we quantify the texture information of the scene. The Fourier transform is performed based on the light field and texture information, and the spectral structure of the light field under the influence of the texture information is analyzed in the frequency domain. Based on the fundamental analysis of the spectral structure, we derive to get the light field sampling theory based on texture information and a well-designed reconstruction filter.
4. Signal model of 3D scene
For clarity, we list the important notations and the associated definitions used throughout the paper in Table 1.
4.1 Light field parameterization
In Fig. 2, the variables of the 4D light field contain two parts: position $t, \xi$ and angular coordinates $u,v$ coordinates. To simplify and facilitate the theoretical derivation and explain the light field spectrum, the dimensions of the 4D light field can be reduced to 2D by fixing $\xi$ and $u$ because the two dimensions for the position and direction are the same as one dimension. When the results of $t, v$ are obtained, it can be extended to $\xi , u$. Thus, the 2D light field is represented using function as shown in Fig. 4(a). The 2D light field can be expediently represented using the concept of EPI, which is applied to describe the mapping relationship between camera position $t$ and angular coordinates $v$ and scene object $z\left ( x \right )$, as shown in Figs. 4(a) and (b). According to the trigonometric function mapping relationship shown in Fig. 4, the EPI mapping relationship can be expressed as
4.2 Plane surface representation
To describe the texture information, the scene surface must be quantified. Generally, we use two curvilinear surface coordinates $\left ( {s,r}\right )$ to parameterize the plane surface $S\left ( {s,r}\right )$ [14]. Additionally, the variables ${\left [ {x,y,z} \right ]^T}$ are defined to represent a point’s Cartesian coordinates [39]. Then, using the function $S\left ( {s,r} \right ) = {\left [ {x\left ( {s,r} \right ),y\left ( {s,r} \right ),z\left ( {s,r} \right )} \right ]^T}$, we define the location of a point on the plane surface, as shown in Fig. 2. Here, we provide an example of the expression of a plane for a 2D space. The plane surface $S\left ( {s}\right )$ can be expressed as
The point of origin of the light rays is represented using the point of origin on the plane surface, $S\left ( {s}\right )$. Furthermore, the direction of the light ray is defined by the viewing angle $\left ( {{\theta }} \right )$. Therefore, we can use the function $l\left ( {s,{\theta }} \right )$ to represent the intensity of the light ray emitted from a point $\left ( {s} \right )$ on the scene surface in viewing direction $\left ( {{\theta }} \right )$, as shown in Fig. 2. Then, we substitute (2) for (1) to obtain a new expression of EPI as
4.3 Texture information quantification
In image processing and computational vision, texture was applied to characterize a scene object surface or region [40]. Many texture analysis algorithms have been developed over the past few decades [41,42]. For example, in [43], Galloway used the gray level run lengths to analyze texture information. In this paper, texture information refers only to color variations. An additional feature is needed to describe the size of the texture information; however, the quantification of the size of the texture information of the object surface is very complicated. When the texture information of the surface of the plane is known, the measuring method of regional entropy [44] in the spatial frequency domain is applied to measure the size of the texture information
A captured image contains different texture colors, and it is very difficult to quantify the changes in the color of these textures mathematically. The reason is that the change in texture color is irregular, and mathematical quantification seems to be powerless, as shown in Fig. 5(a). Fortunately, an arbitrary color is composed of red component $R$, green component $G$ and blue component $B$. Different texture colors can use different wavelengths of light waves to approximate the quantitative description. Although the change in texture color is not intermittent, we still approximate that each color corresponds to a specific wavelength of light wave signal. We assume that the unit area $s$ of the object surface corresponds to a texture signal, and let ${\lambda _s}$ denote its wavelength. Then, a captured image $I$ is composed of multiple texture signals ${\Lambda _i}\left ( {s,{\lambda _s}} \right )$, as follows
Therefore, we can use different wavelengths of texture signals to quantify the different texture colors, as shown in Fig. 5(c). Furthermore, a texture signal can be decomposed into a set of sine or cosine functions, as shown in Fig. 5(d). Then, the texture signal per unit area of the object surface for the continuous texture color can be expressed by a set of light waves asWhen light flows in a scene, phenomena such as transport in free space, occlusion, and texture variation each modify the local light field in a characteristic fashion [26]. In these cases, the light field can be multiplied by the corresponding expression for the phenomena. Object surface texture creates different frequencies or even high frequency components in the light field function. The light field is multiplied by the binary texture function of the texture information:
5 Spectral analysis of the light field signal
To analyze the spectral support of the light field signal, we derive the spectral support expression with respect to texture information. From the results in [11], the Fourier transform of light field is expressed as
5.1. Spectral models of the light field
The spectral expression of the light field is constructed for a line in the space, such as the camera position and direction. An interesting aspect of this construction is that the spectrum will be broadened in a real-world scene, and the spectrum broadening is mainly caused by the scene’s characteristics. The characteristics include irregular shape, occluding effects, non-Lambertian reflection, and texture information variations [12]. Nevertheless, in most cases, we can select only certain characteristics to study the spectrum of the light field. Therefore, assuming that the surface of the object is flat, without occlusions [13,14], based on the (9), the light field spectrum for (10) with respect to the texture formation can be modified as follows:
Theorem 1: When the light field signal is replaced by $l\left ( {s,\theta ,{\lambda _s}} \right )$, the Fourier transform of the light field is given by the light field spectrum function:
Proof: We derive the expression of the light field spectrum with the texture information using the surface plenoptic function (SPF). In combining the plane surface expression and SPF $l\left ( {s,\theta ,{\lambda _s}} \right )$, (11) can be substituted for (3) to obtain a new expression for the light field signal with $s$ and $\theta$ as
Theorem 1 shows that the spectrum depends on the light ray $l\left ( {s,\theta ,{\lambda _s}} \right )$, the object plane surface $s,{\lambda _s}$, the angle of the object plane $\phi$, and the depths of the scene. Moreover, the spectral support consists of ${\Omega _t} = \frac {{{\Omega _v}f}}{{\Lambda \left ( {s,\lambda } \right )s \cdot \sin \left ( \phi \right ) + {z_0}}}$ and ${\Omega _t} = 0$. Using this spectral characteristic, the light field spectrum with respect to the texture information can be analysed via the corresponding mathematical quantification model.
5.2 Spectral analysis with texture information
First, we consider the reflection characteristics of the plane surface. In special cases, surfaces are often assumed to be Lambertian [12–15]. The Lambertian surface can be expressed as $l\left ( {s,\theta ,{\lambda _s}} \right ) = l\left ( {s,{\lambda _s}} \right ) = l\left ( s \right ) \cdot \Lambda \left ( {s,{\lambda _s}} \right )$ for all $\theta$. Thus, it is reasonable to assume that $l\left ( {s,\theta ,{\lambda _s}} \right )$ is a band-limited function of the variable $\theta$. Under the assumption of the Lambertian surface, we now state the light field spectrum with respect to the texture information in the following theorem.
Theorem 2: In the Lambertian surface of scene object, when $\left | {{\Omega _t}} \right | \le \max \left ( {\frac {{{\omega _s}}}{{\Lambda \left ( {s,{\lambda _s}} \right )s \cdot \sin \left ( \phi \right ) + {z_0} - \left | {\cos \left ( \phi \right )} \right |\frac {{{v_0}}}{f}}}} \right )$, the expression of the light field spectrum $P\left ( {{\Omega _t},{\Omega _v}} \right )$ can be written as
Proof: By applying (12) and a Lambertian surface assumption, the spectral expression of the plenoptic function can be rewritten using $l\left ( s \right )$ as
Consider now the spectrum of the light field given by (17), $P\left ( {{\Omega _t},{\Omega _v}} \right )$ depends on the depth $z\left ( x \right )$, texture information ${\Lambda \left ( {s,{\lambda _s}} \right )}$ and the plane surface $S$. Using (17), spectral support with the texture information of a plane can be obtained. To analyze the influence of texture information on the light field spectrum, Fig. 7 shows light fields and the corresponding light field spectra. The spectrum in Fig. 7(d) is extended to that of Fig. 7(b). Using this spectral characteristic, a theorem for the sampling rate of the light field is derived in the next section.
6. Sampling theory of the light field
As IBR involves the use of a set of images, the sampling rate (i.e., the number of cameras) is a key factor affecting the quality of reconstructions. LFS methods are examined to fully characterize spectral support for the light field. Nevertheless, in most cases, spectral support is not band-limited along the spatial frequency due to the scene object features. However, the light field signal is band-limited in particular cases (e.g., when an object surface is flat without occlusions and is Lambertian) [13]. Therefore, the results of this section are brief, as they are of interest only from a theoretical point of view. In the scope of this paper, we assume that the light field is almost band-limited for object surfaces that are flat, that do not exhibit occlusion and that are Lambertian.
In the previous sections, the influence of the texture information of a plane surface on the light field spectrum has been formulated. In consideration of the spectral support, the frequency estimation for the light field is presented before generalizing for LFS. The sampling theorem of the light field with texture information is also presented in this section.
6.1 Spectral support analysis
In this subsection, we aim to mathematically derive the light field spectral support. Consider the spectrum of the light field signal given by (17). We can model the spectral support with the plane surface and texture information, as shown in Fig. 8(a) for the light field spectrum of a 3D plot. To facilitate the analysis of the spectral structure, we cross-sectionally cut the spectrogram of Fig. 8(a) to obtain the spectrogram shown in Fig. 8(b). Fig. 8(b) shows that the model of the light field spectral support of the 2D plot comprises two regions, ${\Psi _{{R_1}}}$ and ${\Psi _{{R_2}}}$, which contain most of the spectral energy. First, according to (12), the relationship between ${{\Omega _t}}$ and ${{\Omega _v}}$ can be expressed as
Furthermore, the second region consists of two diagonal lines and two parallel lines, as shown in Fig. 8. Considering (22) and the spectral support levels, the second region, ${\Psi _{{R_2}}}$ , can be expressed as
Similarly, the estimated bandwidth of the light field spectrum for the ${{\Omega _v }}$-axis is
Based on the above analysis, as the texture information becomes more complex, the camera resolution decreases. In addition, according to (29), as the camera resolution decreases, ${\Omega _{v,1}}$ and ${\Omega _{v,2}}$ increase. Thus, the area of the second region ${\Psi _{{R_2}}}$ broadens as ${\Omega _{v,1}}$ and ${\Omega _{v,2}}$ increase according to (27) and Fig. 8. For example, as the texture information becomes more complex, the spectral support will be enhanced. Therefore, the light field spectrum relies on support that becomes more pronounced as the texture information becomes more complex.
6.2 Sampling theorem with texture information
Based on the light field spectral support, we can study the sampling theorem of the light field, i.e., how we determine the number of cameras and the spacing between them . The purpose is to obtain the best balance between the minimum sampling rate and the rendering quality. As shown in Fig. 9, a set of cameras is uniformly placed along the camera plane, $\left ( {t} \right )$; we mainly study the camera spacing $\Delta t$ for the rectangular sampling pattern. Generally, the sampling rate is obtained by the frequency ${{\Omega _t}}$, which is given by the spectral support. From (24), ${{\Omega _t}}$ is related to ${{\Omega _v}}$, the plane surface, $s$, depth ${z_0}$, and the focal length. Furthermore, ${{\Omega _v}}$ is related to the texture information and camera resolution. Considering the representation of ${{\Omega _v}}$, texture information also influences the frequency ${{\Omega _t}}$. When (12), (24) and (29) are combined, the frequency estimation for ${{\Omega _t}}$ can be written as
By combining (32) with Shannon’s uniform-sampling theorem [45], we can determine the sampling rate of the light field. We defined a variable ${\Omega _c}$ to represent the sampling frequency along the $t$-axis. Furthermore, the relationship between sampling frequency ${\Omega _c}$ and camera spacing ${\Delta t}$ is ${\Omega _c} = {{2\pi } \mathord {\left / {\vphantom {{2\pi } {\Delta t}}} \right. } {\Delta t}}$. To anti-alias for IBR, based on (32), the camera spacing can be determined by
6.3 Filtering and reconstruction
To eliminate interference caused by texture information variation and remove unwanted replicate spectra, we can design an adaptive filter. Given the essential bandwidth frequencies ${B_t}$ and ${B_v}$, we now attempt to reconstruct the complex texture various scene with the adaptive filter. Based on (25) and (30), the adaptive filter is expressed as as
7. Experimental results and analysis
In this section, we evaluate the performance of the proposed light field spectrum of texture information. Additionally, we demonstrate the performance of the reconstruction algorithm with respect to different sampling rates.
7.1 The light field spectrum with texture information
To evaluate the effect of texture information on the light field spectrum, we detect the spectrum of three synthetic scenes, all rendered from 3Ds Max. As shown in Figs. 10(a1)–(f1), to study the influence of texture on the light field spectrum, we attempt to avoid the interference of other factors and map the texture information to a plane for the experiments.
In the presented simulations, the spectra of the different scenes with different texture information are measured. Figures 10(a2)–(f2) show that the depths of each wall surface are variations that lead to differences in the slopes of the EPIs of each wall. For example, the EPIs in Fig. 10(a2) are comprised of a set of tilted lines, and the slopes of these tilted lines are different. Note that the slope of the EPI will increase as the depth becomes larger. Nevertheless, when the texture information of different wall surfaces changes, then the texture information of the EPI for different walls also changes. For example, the slope of the EPI in Figs. 10(a2), (b2), and (c2) is the same. However, the texture information of these EPIs is different because the texture information of these walls is different. Note that the texture information of the EPI will be more complicated as the texture information of the wall becomes more complicated. This phenomenon is also shown in Figs. 10(d2), (e2), and (f2). Therefore, the EPI depends on the depth and the texture information of the object surface.
Additionally, the corresponding spectra of the walls are shown in Figs. 10(a3)–(f3). Fig. 10(a3) shows that the spectral support consists of three intersecting lines. In particular, the area between the two diagonal lines is without padding. As in Fig. 10(a3), the area between the two diagonal lines is also without padding in Fig. 10(d3). Although the slant angle is different for the first wall and the fourth wall in Figs. 10(a1) and (d1), their spectrum supports are the same as those shown in Figs. 10(a3) and (d3). The reason is that the minimum depth and maximum depth are the same for the two walls. The two diagonal lines of the spectrum are related only to the minimum depth and maximum depth of the wall.
Furthermore, the area between the two diagonal lines of the spectral support is filled as the texture information of the wall plane becomes more complicated, as shown in Figs. 10(b3), (c3), (e3) and (f3). Figures 10(b3) and (c3) show that the padding between two diagonal lines of the spectrum is more complicated and fatter that shown in Fig. 10(a3). For example, the labeled area near the red rectangles in Fig. 10(b3) is fatter than the labeled area near the red rectangles in Fig. 10(a3). The reason for this phenomenon is that the texture information of the second wall is more complicated than the that of the first wall. Similarly, the two spectra in Figs. 10(e3) and (f3) are more complicated and fatter that those in Fig. 10(d3). The reason is that the texture information of the fifth wall and sixth wall is more complicated than that of the fourth wall.
The above examples provide a good idea of what the light field spectrum is and how it is influenced by texture information. We can consider that the light field spectrum support is broadened as the texture information becomes more complicated. Based on these characteristics, we can determine the relationship between the texture information and spectrum to derive the LFS theorem.
7.2 Light field spectrum for actual scenes
The light field spectral supports were measured using three different planes, as presented in Figs. 11(a1)–(c1). The color of the first plane was red, as shown in Fig. 11(a1). The color of the second plane varied, as shown in Fig. 11(b1). Two pictures were fixed on the third plane surface, as shown in Fig. 11(c1). In the real experiments, a camera was applied to capture every plane, as shown in Fig. 12. For every plane, 175 images were captured along a straight line. Note that the separation between each two captured positions was 0.5 cm. This parameter setting was random and did not affect the experimental results.
Echoing the simulation results presented in Figs. 10(a2)–(f2), Fig. 11(a2) shows that the slopes of the EPIs differ as the depths of the plane vary. Furthermore, the slope of the EPI increases with depth. Additionally, the texture information of the EPI of the three planes varies as the texture information between the three planes differs. For example, when the texture information of a plane surface varies, as shown in Figs. 10(b1) and (c1), the texture information of the corresponding EPI varies, as shown in Figs. 11(b2) and (c2).
Finally, the corresponding spectral support for the three planes is shown in Figs. 11(a2)–(c2). Similar to the spectral support patterns shown in Fig. 10(a3), those shown in Fig. 11(a3) include two regions with three intersecting lines. Additionally, the area between the minimum and maximum spectrum lines is not padded in Fig. 11(a3), as the texture information of the first plane remains unchanged. The area between the minimum and maximum spectrum lines is filled as the plane texture information becomes more complex, as shown in Figs. 11(b3) and (c3). Figures 11(b3) and (c3) show that the padding between the two diagonal lines of the spectrum is thicker and more complex than that shown in Fig. 11(a3), as the texture information of the second and third planes is more complex than that of the first plane. Therefore, the results of the experiment are the same as the results of the simulation.
7.3 EPI Reconstructions for novel rendered views
We use an experimental method to obtain the minimum sampling rate of light field, which is compared with the minimum sampling rate obtained by theoretical calculation. This comparison can be used to verify that the theoretical calculation is correct. We now analyze reconstructions of EPI volumes generated from different numbers of captured images for the three synthetic scenes shown in Figs. 10(a1), (b1), and (c1). The experimental studies were carried out in a static scene with images captured at different positions along a line. The number of capture images was changed every 25 positions in the range of [10, 425] to study the influence of the sampling rate on the rendering quality. The reconstruction method was applied based on the interpolation of nearby images [46].
In particular, the rendered EPI volumes for the corresponding walls are shown in Figs. 13(a)–(c). Fig. 13 shows serious aliasing patterns for the rendered EPI volumes when the sampling rate reaches 25. Furthermore, when the sampling rate reaches 25, the aliasing of the second and third walls shown in Figs. 13(b) and (c) is more serious than the aliasing of the first wall shown in Fig. 14(a). This phenomenon can also be observed from the rendered EPI volumes when the sampling rate is 75 and 125, as the texture information complexities of the second and third walls are greater than those of the first wall. The number of captured images is also too low; thus, the rendered results are blurry. The experimental results show that rendering quality levels decline as the texture information becomes more complex.
However, the aliasing will be reduced as the sampling rate increases. For example, aliasing can be observed when the sampling rate is 125, such as in the red rectangles labeled in Fig. 13(a) for the first wall. Aliasing can also be observed in Fig. 14(a1), which is one rendered view of the corresponding 450 rendered virtual views. There is aliasing for the rendered view because the sampling rate is too low. For the first wall, there is minimal aliasing when the number of captured images is 300, as shown in Fig. 13(a) and Fig. 14(a2), and the minimal aliasing is barely visible. When the number of captured images is 325, no aliasing is observed, as shown in Fig. 13(a) and Fig. 14(a3). Additionally, the PSNR will increase as the sampling rate increases, as shown in Fig. 15. Nevertheless, the variations in the PSNR are very small when the number of captured images is greater than 325. For the second and third walls, the phenomena are the same as those shown in Figs. 12-14. Nevertheless, the sampling rate is different for the different walls due to the texture information of the scene surface.
More interestingly, the PSNR improves dramatically when the sampling rate is increased, but it improves very slowly when the sampling rate is greater than 275. The reason for this phenomenon is that the influence of the sampling rate on the rendering quality is very small when the sampling rate increases to a certain amount. However, the variation tendency differs for different scenes. The variation in the PSNR for the first wall with the sampling rate is gentle compared to the PSNR of the second and third walls. For example, the variation in the PSNR shown in Fig. 15 for the first wall is gentler than the PSNR for the second wall because the texture information of the second wall is more complicated than that of the first wall. Based on the above experimental results, the required number of captured images will increase as the texture information complexity of the wall surface increases.
7.4 Comparison results for data sets
To evaluate the proposed STTI, we compared the results with three publicly available datasets from Fish [47] and Cones and Teddy [48]. The method of view reconstruction uses bilinear interpolation, taking into account two situations: considering texture information (i.e., STTI) following (35) and not considering texture information. For the situation in which texture information is not considered, we compared the accuracy levels based on an alternative solution termed the closed-form expression of the light field spectrum, which is based on the assumption that a scene surface consists of a set of slanted planes (SCSS) [14]. The reconstruction filter presented by Chai [11] is selected. This method only uses the minimum depth and the maximum depth (USMM). Additionally, a light field reconstruction filter considered geometric shapes (RFGS) [15] is also selected to compare the reconstruction results.
The results for the Fish scene are presented in Figs. 16(a1)(a2)(a3)(a4) and (a5). Obviously, the view reconstructed by STTI is better than that reconstructed by SCSS and USMM and RFGS, as shown in the top row of the figure. When nonconsidered texture information is present, there is obvious distortion in the rendered views of Figs. 16(a3)(a4)(a5) (the red box indicates a more distorted area). The PSNRs of Figs. 16(a2)(a3)(a4) and (a5) are 32.97 dB and 31.09 dB and 29.87 dB and 30.09 dB, respectively. The same phenomenon can be seen in the corresponding reconstructed images shown in Figs. 16(b3)(b4)(b5) and (c3)(c4)(c5) for the Cones and Teddy scenes. The PSNRs are presented in Table 2. Based on the experimental results of these two data sets, we also find that the quality of the reconstructed viewpoints can be significantly improved when occlusion is considered.
7.5 Comparison of actual scenes
To evaluate the proposed occlusion model, the rendering experiments were also carried out using three actual scenes depicted in Figs. 17(a1)(b1) and (c1): named car, inflatable castle and statue, respectively. The three actual scenes are captured using a camera along a track. The rendering quality of the novel views is measured using STTI, SCSS, USMM and RFGS.
For the rendering results, Fig. 17 shows that the reconstructed views contain some obvious ghosting and aliasing in either the car scene or the inflatable castle scene and statue scene. Apparently, the rendering quality of Fig. 17(a2) is better than that of Figs. 17(a3)(a4)(a5). The ghosting occurs because the scene complexity, such as the leafage and irregular shape, are challenging to reconstruct accurately. However, when using the proposed STTI, the ghosting and aliasing in the rendered views will decrease. The same phenomena occur between Figs. 17(b2)(b3)(b4) and (b5) for the inflatable castle scene and Figs. 17(c2)(c3)(c4) and (c5) for the statue scene. The PSNRs of the corresponding rendered views are presented in Table 3. These results are the same as those of the datasets. Therefore, our STTI can effectively improve the quality of view rendering.
8. Conclusion
In this paper, we have presented a spectral analysis framework for sampling the light field signal using a Fourier transform. The purpose is to solve the plenoptic sampling. The main feature of the method is to analyze the influence of texture information on the spectrum of the light field signal. We have shown that variations in the texture information of the plane surface can broaden the spectrum of the light field signal. Finally, using the spectral analysis of the light field signal, the spectral analysis framework can be used to analyze the sampling rate of the light field signal.
Funding
Natural Science Foundation of Guangxi Province (2018GXNSFAA281195, 2019AC20121); National Natural Science Foundation of China (61961005); Scientific Research and Technology Development Program of Guangxi (KY2015YB367).
Disclosures
The authors declare no conflicts of interest.
References
1. H.-Y. Shum, S. C. Chan, and S. B. Kang, “Image-based rendering,” Springer-Veg, New York, NY, USA (2007).
2. H.-Y. Shum, S. Kang, and S.-C. Chan, “Survey of image-based representations and compression techniques,” IEEE Trans. Circuits Syst. Video Technol. 13(11), 1020–1037 (2003). [CrossRef]
3. C. Zhang and T. Chen, “A survey on image-based rendering-representation, sampling and compression,” EURASIP Signal Process. Image Commun. 19(1), 1–28 (2004). [CrossRef]
4. M. Liu, C. Lu, H. Li, and X. Liu, “Bifocal computational near eye light field displays and structure parameters determination scheme for bifocal computational display,” Opt. Express 26(4), 4060–4074 (2018). [CrossRef]
5. A. Kubota, A. Smolic, M. Magnor, M. Tanimoto, T. Chen, and C. Zhang, “Multiview imaging and 3dtv,” IEEE Signal Process. Mag. 24(6), 10–21 (2007). [CrossRef]
6. M. Levoy, “Light fields and computational imaging,” IEEE Comput. 39(8), 46–55 (2006). [CrossRef]
7. Y. Ma, J. Zheng, and J. Xie, “Foldover-free mesh warping for constrained texture mapping,” IEEE Trans. Visual. Comput. Graphics 21(3), 375–388 (2015). [CrossRef]
8. J. Zhang, Z. Fan, D. Sun, and H. Liao, “Unified mathematical model for multilayer-multiframe compressive light field displays using lcds,” IEEE Trans. Visual. Comput. Graphics 25(3), 1603–1614 (2019). [CrossRef]
9. C. Koniaris, M. Kosek, D. Sinclair, and K. Mitchell, “Compressed animated light fields with real-time view-dependent reconstruction,” IEEE Trans. Visual. Comput. Graphics 25(4), 1666–1680 (2019). [CrossRef]
10. J. Berent and P. L. Dragotti, “Plenoptic manifolds,” IEEE Signal Process. Mag. 24(6), 34–44 (2007). [CrossRef]
11. J.-X. Chai, X. Tong, S.-C. Chan, and H.-Y. Shum, “Plenoptic sampling,” in Proc. SIGGRAPH., (2000), pp. 307–318.
12. C. Zhang and T. Chen, “Spectral analysis for sampling image-based rendering data,” IEEE Trans. Circuits Syst. Video Technol. 13(11), 1038–1050 (2003). [CrossRef]
13. M. N. Do, D. Marchand-Maillet, and M. Vetterli, “On the bandwidth of the plenoptic function,” IEEE Trans. on Image Process. 21(2), 708–717 (2012). [CrossRef]
14. C. Gilliam, P. Dragotti, and M. Brookes, “On the spectrum of the plenoptic function,” IEEE Trans. on Image Process. 23(2), 502–516 (2014). [CrossRef]
15. C. J. Zhu and L. Yu, “Spectral analysis of image-based rendering data with scene geometry,” Multimed. Syst. 23(5), 627–644 (2017). [CrossRef]
16. J. Pearson, M. Brookes, and P. L. Dragotti, “Plenoptic layer-based modeling for image based rendering,” IEEE Trans. on Image Process. 22(9), 3405–3419 (2013). [CrossRef]
17. O. H. Kwon, C. Muelder, K. Lee, and K. L. Ma, “Plenoptic layer-based modeling for image based rendering,” IEEE Trans. Visual. Comput. Graphics 22(7), 1802–1815 (2016). [CrossRef]
18. E. H. Adelson and J. R. Bergen, “The plenoptic function and the elements of early vision,” in Computational Models of Visual Processing. MIT Press, Cambridge, MA, USA, pp. 3–20 (1991).
19. L. McMillan and G. Bishop, “Plenoptic modeling: An image-based rendering system,” in Computer Graphics (SIGGRAPH’95), (1995), pp. 39–46.
20. M. Levoy and P. Hanrahan, “Light field rendering,” in Proc. SIGGRAPH., (1996), pp. 31–40.
21. S. Gortler, R. Grzeszczuk, R. Szeliski, and M. Cohen, “The lumigraph,” in Proc. SIGGRAPH., (1996), pp. 43–54.
22. J. Shade, S. Gortler, L. W. He, and R. Szeliski, “Layered depth images,” in Proc. SIGGRAPH., (1998), pp. 231–242.
23. M. L. Pendu, C. Guillemot, and A. Smolic, “A fourier disparity layer representation for light fields,” IEEE Trans. on Image Process. 28(11), 5740–5753 (2019). [CrossRef]
24. M. L. Pendu, C. Guillemot, and A. Smolic, “Silhouette-aware warping for image-based rendering,” Proc. of Computer Graphics Forum 30(4), 1223–1232 (2011). [CrossRef]
25. C. Zhu, H. Zhang, and L. Yu, “Structure models for image-assisted geometry measurement in plenoptic sampling,” IEEE Trans. Instrum. Meas. 67(1), 150–166 (2018). [CrossRef]
26. F. Durand, N. Holzschuch, C. Soler, E. Chan, and F. X. Sillion, “A frequency analysis of light transport,” ACM Trans. on Graphics (TOG) 24(3), 1115–1126 (2005). [CrossRef]
27. C. Zhu, L. Yu, Z. Yan, and S. Xiang, “Frequency estimation of the plenoptic function using the autocorrelation theorem,” IEEE Trans. Comput. Imaging 3(4), 966–981 (2017). [CrossRef]
28. T. C. Wang, J. Y. Zhu, N. K. Kalantari, A. A. Efros, and R. Ramamoorthi, “Light field video capture using a learning-based hybrid imaging system,” ACM Trans. Graph. 36(4), 1–13 (2017). [CrossRef]
29. X. Cao, Z. Geng, T. Li, M. Zhang, and Z. Zhang, “Accelerating decomposition of light field video for compressive multi-layer display,” Opt. Express 23(26), 34007–34022 (2015). [CrossRef]
30. M. Volino, A. Mustafa, J. Y. Guillemaut, and A. Hilton, “Light field compression using eigen textures,” in 2019 International Conference on 3D Vision (3DV), (Sept. 2019), pp. 16–19.
31. J. Stewart, J. Yu, S. J. Gortler, and L. McMillan, “A new reconstruction filter for undersampled light fields,” in ACM International Conference Proceeding Series, (2003), pp. 150–156.
32. H. Hoshino, F. Okano, and I. Yuyama, “A study on resolution and aliasing for multi-viewpoint image acquisition,” IEEE Trans. Circuits Syst. Video Technol. 10(3), 366–375 (2000). [CrossRef]
33. G. Wu, M. Zhao, L. Wang, Q. Dai, T. Chai, and Y. Liu, “Light field reconstruction using deep convolutional network on epi,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), pp. 1638–1646.
34. G. Wu, Y. Liu, L. Fang, Q. Dai, and T. Chai, “Light field reconstruction using convolutional network on epi and extended applications,” IEEE Trans. Pattern Anal. Mach. Intell. 41(7), 1681–1694 (2019). [CrossRef]
35. S. Vagharshakyan, R. Bregovic, and A. Gotchev, “Light field reconstruction using shearlet transform,” IEEE Trans. Pattern Anal. Mach. Intell. 40(1), 133–147 (2018). [CrossRef]
36. R. Bolles, H. Baker, and D. Marimont, “Epipolar-plane image analysis: An approach to determining structure from motion,” Int. J. Comput. Vis. 1(1), 7–55 (1987). [CrossRef]
37. L. Shi, H. Hassanieh, A. Davis, D. Katabi, and F. Durand, “Light field reconstruction using sparsity in the continuous fourier domain,” ACM Trans. Graph. 34(1), 1–13 (2014). [CrossRef]
38. C. Liu, J. Qiu, and M. Jiang, “Light field reconstruction from projection modeling of focal stack,” Opt. Express 25(10), 11377–11388 (2017). [CrossRef]
39. H. T. Nguyen and M. N. Do, “Error analysis for image-based rendering with depth information,” IEEE Trans. Image Process 18(4), 703–716 (2009). [CrossRef]
40. H. T. Nguyen and M. N. Do, “Texture information in run-length matrices,” IEEE Trans. on Image Processing 7(11), 1602–1609 (1998). [CrossRef]
41. M. M. Galloway, “Texture analysis using gray level run lengths,” Comput. Graphics Image Process. 4(2), 172–179 (1975). [CrossRef]
42. M. Unser and M. Eden, “Multiresolution feature extraction and selection for texture segmentation,” IEEE Trans. Pattern Anal. Machine Intell. 11(7), 717–728 (1989). [CrossRef]
43. R. M. Haralick, K. S. Shanmugam, and I. Dinstein, “Texture analysis using gray level run lengths,” IEEE Trans. Syst., Man, Cybern. SMC-3(6), 610–621 (1973). [CrossRef]
44. M. E. Jernigan and F. D’Astous, “Entropy-based texture analysis in the spatial frequency domain,” IEEE Trans. Pattern Anal. Machine Intell. PAMI-6(2), 237–243 (1984). [CrossRef]
45. M. Unser and M. Eden, “Communications in the presence of noise,” Proc. IREE 37, 10–21 (1949).
46. P. Vaidyanathan, “Multirate systems and filter banks,” in Prentice-Hall, Englewood Cliffs, NJ, (1992).
47. S. Toyohiro, “Nagoya university multi-view sequences,” in http://www.fujii.nuee.nagoya-u.ac.jp/multiview-data, (2020).
48. in http://www.fujii.nuee.nagoya-u.ac.jp/multiview-data, (2020).