The article deals with a method of calculation of off-axis light propagation between parallel planes using discretization of the Rayleigh-Sommerfeld integral and its implementation by fast convolution. It analyses zero-padding in case of different plane sizes. In case of memory restrictions, it suggests splitting the calculation into tiles and shows that splitting leads to a faster calculation when plane sizes are a lot different. Next, it suggests how to calculate propagation in case of different sampling rates by splitting planes into interleaved tiles and shows this to be faster than zero-padding and direct calculation. Neither the speedup nor memory-saving method decreases accuracy; the aim of the proposed method is to provide reference data that can be compared to the results of faster and less precise methods.
©2010 Optical Society of America
A fundamental tool of digital holography, or computer generated holography, is a numerical simulation of coherent light propagating in free space. We will use, as usual, scalar approximation of the vectorial nature of light, and will not consider time-dependent behavior of the light . One of the most usual tasks is to calculate light propagation between two parallel planes; the Rayleigh-Sommerfeld integral , its mathematical equivalent angular spectrum [1,2] or their various approximations are used most often.
Those equations need to be discretized for numerical calculation, i.e. one has to both sample and spatially restrict optical fields involved. The discretization itself leads to various errors well described in literature [3,4]. The calculation tries to avoid those errors while trying to work as fast as possible.
It is, however, often hard to decide what is an error of discretization itself, what is an inherent error of a given fast method and what is just an error of a particular implementation. The goal of this article is to describe the reference numerical calculation of light propagation between parallel planes that has just one “error”, the discretization. Any other method can be then compared to this one. As we will assume fine sampling that does not lead to aliasing errors, we have to deal with a large amount of data. We will try to handle it as fast as possible while retaining the accuracy of the calculation.
The proposed method focuses on off-axis light propagation between two rectangular areas that share neither size nor sampling, or, between a spatial light modulator (SLM) and a camera sensor. Reference calculation is described, e.g., in , off-axis propagation, e.g., in [6,7]; different samplings are treated in  by coordinate system change, in  by shifted convolution kernel, in  by scaled Fourier transform; different sizes of SLM and sensor is solved in [10,11] using tiling while decreasing memory demands. This article solves all the requirements in a unified way by using the convolution approach and tiling; in contrast to references given, it deals with optimization too.
2. One-dimensional case
We are going to show all the principles in a one-dimensional case before showing the full 2D version. This means that we will calculate light propagation between two line segments instead of two rectangular areas. We will call them Source and Target.
In linear optics it is assumed that light sources do not influence each other and that every single point of Target (e.g. a camera sensor) is affected by the shining of all points of Source (e.g. a SLM, see Fig. 1a ). As mentioned in the introduction, we will assume scalar approximation of coherent light, i.e. the light source can be completely described by amplitude A and phase ϕ, or complex amplitude , where .
Light changes both amplitude and phase by propagation. This change can by described by multiplication with some complex number p. Light propagation is space invariant; therefore, it is just the mutual position of points on Source and Target that matters, not their absolute positions. It follows that the calculation will have a convolution form:
We can discretize the equation by equidistant splitting of Source and Target into M and N basic elements () and (), i.e. by uniform sampling. Therefore, the element receives from the element light with complex amplitude , see Fig. 1b. Equation (1) can be written in discrete form:
The calculation of all elements is most often done by rewriting Eq. (2) as a cyclic convolution and subsequent use of the discrete Fourier transform. The cyclic convolution has a form:12]. Choosing a suitable C is important because the actual calculation time is highly sensitive to its character.
Let us assume that the sampling rate of Target is twice as fine as the sampling rate of Source. Then the subset of even samples from Target has the same sampling rate as Source. Consequently, we can easily calculate the propagation of Source to even samples of Target. Obviously, we can do the same with odd samples. It means we can split the calculation of light propagation into two calculations; they are denoted in Fig. 1c by black and magenta arrows. It follows that the same principle can be applied when the sampling rate of Target is τ-times finer than the sampling rate of Source, where τ is a natural number; we have to split Target into τ “interleaved tiles”, i.e. the calculation has to be split into τ calculations.
The same situation appears when Source has σ-times finer sampling than Target. The idea can be generalized: if the sampling rates of Source and Target are in a ratio , where σ and τ are coprime natural numbers (i.e. σ elements of Source have the same size as τ elements of Target), we can split the calculation into independent calculations. More precisely, we have to σ-times calculate the propagation for the sampling rate ratio and sum the results. Usually we do not care about the exact value of the sampling rate; therefore, we can choose such σ and τ that approximate the desired sampling fairly well.
It follows from Eq. (3) that the vectors s and t have to be padded to size C. This means that for , approximately 50% of elements are held in memory uselessly; in 2D convolution, it is as much as 75%. Therefore, for big M and N we can expect a lack of memory very soon, especially when using special hardware for calculation, e.g. GPU. For example a naive approach to propagation of a microscopic Source to an extended detector Target could require hundreds of gigabytes.
We need two memory spaces of size for the calculation of Eq. (4); in practice a restriction may appear: that just two spaces of size are available. Let us assume, for example, that , and . We cannot make the calculation directly because ; but we can split Target in the middle into two parts (let us call them “common tiles”) with elements and make two calculations. For them, we need only two spaces of size at least , and therefore we are not limited by . The same idea would apply if , and : we would calculate the light propagation of both small parts of Source to Target and sum the results. As in the “different sampling” case, even this idea can be generalized: Source can be split into S parts, Target into T parts and then we have to calculate propagations.
3. Two-dimensional case
The extension of equations from section 2 to the 2D case is straightforward: instead of sums we just put double sums there. For calculation of light propagation of a part of Source to a part of Target, it is necessary to carefully calculate the convolution kernel, i.e. 2D array p. A practical aid is the fact that its element describes light propagation from element of a particular part of Source to element of a particular part of Target. The equations for sampling rates, samples counts and offsets are simple but technically demanding, so we will not show them here.
We should, however, mention the calculation of convolution kernel values. The Rayleigh-Sommerfeld equation  for the light propagation from the plane to point is:
The simplest method of convolution kernel discretization (the expression on the right side of the convolution operator in Eq. (5)) is a plain sampling, i.e. its calculation for a particular x, y (z is a constant). Alternatively, we can assume that a sample of Source in fact expresses – using some pixel spread function – the shining of a particular non-zero-area element of Source and alter the kernel accordingly. If we take non-zero area of Target (sensor) elements into account, we can pre-filter the kernel. If Source is a mathematical model of a real display, we can even measure the kernel. The proposed method therefore does not depend on particular features of the kernel; its only assumption is the description of light propagation as a convolution. In our implementation, we did not deal with advanced methods of kernel calculation, however, and we have chosen plain sampling.
4. Theoretical time of computation
The calculation works with arrays of size samples. It consists of three 's (more precisely, two forward and one backward), calculation of the convolution kernel p with complexity and the Hadamard product of the same complexity. For a large C, only times spent on 's matter. The time of calculation using the fast Fourier transform is therefore proportional  toEq. (6), we get time of calculationFigure 2a shows the graph for ; for different N it does not change very much.
A more interesting result arises when Source and Target have different sampling rates. Let us assume that Target has τ-times finer sampling than Source, i.e. . For direct calculation, Source must have the same sampling rate as Target, i.e. it must have samples too. We can increase number of samples by interpolation or just by interleaving current ones by a suitable amount of zeros. By substituting into Eq. (6), we get time of calculationEq. (6), we get time of calculationFig. 2b. Again, a different M leads to a similar graph.
One can see in Fig. 2 that measured speedups are scattered around theoretical curves. A discussion of this fact is contained in the following section.
Until now we have assumed that Target has more samples than Source. That led to saving nearly one-third of all 's. In the opposite situation this does not hold any more; however, the results of the analysis are similar, although not so outstanding. The analysis of general situations, where the ratio of sampling rates or sizes is a general fraction, leads to the following results: speedup is greater when the nominator and denominator of the sampling rate ratio are big too; splitting of Source and Target of the same sampling rate into common tiles in a complicated ratio leads rather to slowdown if we do not mind memory restrictions.
5. Implementation notes
The implementation of the proposed method is simple, although due to a number of variables quite demanding. The first step is to estimate the ratio of Source and Target sampling rates, and to split them into interleaved tiles of common sampling. In case we have enough memory to calculate light propagations between these tiles, we can calculate them (but see later). In another case, we are facing a still unresolved problem.
Let us assume that two memory spaces of size C are available. The interleaved tiles created by splitting Source and Target due to the requirement of common sampling have to be split into common tiles of size , and common tiles of size (due to integer division one row and one column may be smaller) so that:Fig. 2. One can see in Fig. 2b that measured speedups are bigger than theoretical ones. This behavior starts to appear for in our implementation. We expect this behavior appears due to a following fact. When calculating a propagation without any tiling, we have to increase Source side size τ-times, so DFT has to work with a big array. Using interleaved tiling, DFT works with smaller arrays that fit into cache memory more easily, and therefore bigger speedup can be expected.
Implemented heuristic suggests splitting in this way. First, we have measured calculation times of one-dimensional 's of array sizes i. From these times we have picked “friendly-size” ones that satisfy condition for all measured . Two-dimensional is separable, so it is expected that a two-dimensional array of friendly-size side sizes will be “friendly” too. Next, we have measured calculation times for those arrays. The propagation itself is then calculated in 2D “friendly-size” arrays. In case we need to split into common tiles due to memory restrictions, we choose tiles of maximum width (tiles are then in fact horizontal stripes); height of Source and Target tiles is chosen to be approximately equal and as big as possible. We have compared this heuristic to the optimal solution found using brute force; it suggests at most approximately worse tiling.
A topic that has not been discussed yet is the precision of the proposed algorithm compared to the precision of direct calculation without any tiling. We have tried to propagate the Source to the Target of the same size (for several different sizes) using different tiling schemes. We have found that the relative difference of the calculated complex amplitudes is in the order of 10−10 or smaller; the difference was calculated as the absolute value of complex amplitudes differences. This difference is so small that we have not tried to find where it comes from.
A method for reference calculation of light propagation between two rectangular areas of parallel planes has been presented. These areas do not have to have either the same sampling rate or the same size, see Fig. 3 . The calculation can be split into tiles to meet memory restrictions given; on the other hand, saving memory often leads to worse calculation times. We have shown that in the case of a simple sampling rates ratio, the proposed method actually speeds up the calculation up to by splitting the areas into interleaved tiles compared to zero-padding and direct calculation. We have shown as well that if one area is much bigger than the other, it is faster to split the bigger one into common tiles. A still unresolved problem is how to split the calculation into common tiles (due to memory restrictions) to get the fastest calculation. We have, however, proposed suboptimal heuristic to address this problem.
This work has been supported by the Ministry of Education, Youth and Sports of the Czech Republic under the research program LC-06008 (Center for Computer Graphics). The author would like to thank Ivo Hanák, Václav Skala and the reviewers for their valuable comments.
References and links
1. J. W. Goodman, Introduction to Fourier Optics (Roberts & Company Publishers, 2004), 3rd ed.
2. E. Lalor, “Conditions for the validity of the angular spectrum of plane waves,” J. Opt. Soc. Am. 58(9), 1235–1237 (1968). [CrossRef]
3. L. Onural, “Exact analysis of the effects of sampling of the scalar diffraction field,” J. Opt. Soc. Am. A 24(2), 359–367 (2007). [CrossRef]
4. L. Onural, A. Gotchev, H. Ozaktas, and E. Stoykova, “A survey of signal processing problems and tools in holographic three-dimensional television,” IEEE Trans. Circ. Syst. Video Tech. 17(11), 1631–1646 (2007). [CrossRef]
5. V. Katkovnik, J. Astola, and K. Egiazarian, “Discrete diffraction transform for propagation, reconstruction, and design of wavefield distributions,” Appl. Opt. 47(19), 3481–3493 (2008). [CrossRef] [PubMed]
6. N. Delen and B. Hooker, “Free-space beam propagation between arbitrarily oriented planes based on full diffraction theory: a fast Fourier transform approach,” J. Opt. Soc. Am. A 15(4), 857–867 (1998). [CrossRef]
7. J.-L. Kaiser, E. Quertemont, and R. Chevallier, “Light propagation in the pseudo-paraxial fresnel approximation,” Opt. Commun. 233(4-6), 261–269 (2004). [CrossRef]
8. E. Sziklas and A. Siegman, “Diffraction calculations using fast Fourier transform methods,” Proc. IEEE 62(3), 410–412 (1974). [CrossRef]
12. M. Frigo and S. G. Johnson, “The design and implementation of FFTW3,” Proc. IEEE 93(2), 216–231 (2005) (Special issue on “Program Generation, Optimization, and Platform Adaptation”). [CrossRef]