Abstract

In this paper, we show fast signal reconstruction for compressive holography using a graphics processing unit (GPU). We implemented a fast iterative shrinkage-thresholding algorithm on a GPU to solve the 1 and total variation (TV) regularized problems that are typically used in compressive holography. Since the algorithm is highly parallel, GPUs can compute it efficiently by data-parallel computing. For better performance, our implementation exploits the structure of the measurement matrix to compute the matrix multiplications. The results show that GPU-based implementation is about 20 times faster than CPU-based implementation.

© 2016 Optical Society of America

1. Introduction

Compressive holography is an application of compressive sensing (CS) [1–4] to holography, which can reconstruct 3D scattering density from a single 2D hologram [5–9]. In holographic 3D imaging, tomographic images obtained from a conventional backpropagation technique suffer from out-of-focus objects, twin images, and squared terms, while compressive holography can remove these unwanted terms.

A problem in compressive holography is that signal reconstruction in a CS framework is computationally intensive. This is because the sizes of reconstructed signals in compressive holography (i.e., 3D scattering density) are large. For example, assuming that we have a hologram with 1000 × 1000 pixels and try to reconstruct a 3D scattering density with 10 different depths from it, the reconstructed signal has 10 million variables. Since CS reconstruction is performed through an optimization problem (e.g., 1 norm minimization), we must solve a large-scale optimization problem that has 10 million variables.

To address this problem, we propose fast CS reconstruction for compressive holography using a graphics processing unit (GPU). A GPU is a massively parallel computing device and suitable for data-parallel computing; the same program is executed on many data elements in parallel [10–12]. Since CS reconstruction can be computed using a data-parallel model, GPUs can compute the reconstruction very efficiently. GPU-accelerated CS reconstruction has been studied in many fields (e.g., magnetic resonance imaging [13, 14], optical coherence tomography [15], and hyperspectral imaging [16]), and, to the best of our knowledge, this study is the first to show that GPU-accelerated computing works efficiently for compressive holography. We have implemented a GPU solver for 1 and total variation (TV) [17, 18] regularized problems that are typically used in compressive holography (Section 3). Our implementation is based on a fast iterative shrinkage-thresholding algorithm (FISTA) [19, 20] (Section 3.1). In FISTA, we have accelerated the matrix computations by exploiting the structure of the measurement matrix of compressive holography (Section 3.2). We compared the computation times of GPU-based implementation and CPU-based implementation, and the results show that GPU-based implementation is about 20 times faster than CPU-based implementation (Section 4).

2. Compressive holography

We briefly explain compressive holography. Figure 1 illustrates a Gabor holography setup. The 3D object with the scattering density is illuminated by a plane wave, and then the Gabor hologram is recorded by an image sensor. Assuming that Nx × Ny is the number of lateral samples, Nz is the number of axial samples, and the object of size Nx × Ny × Nz is written as the vector f=[f1Tf2TfNzT]TNxyz where fi ∈ ℂNxy is the i-th depth slice of size Nx × Ny = Nxy and Nx × Ny × Nz = Nxyz, the diffracted field on the image sensor u ∈ ℂNxy is formed as

u=Hf=F1QBf,
Q=[P1P2PNz],
B=bdiag(F,F,,F),
where F ∈ ℂNxy×Nxy is the 2D Fourier transform matrix, Q ∈ ℂNxy×Nxyz consists of diagonal matrices Pi ∈ ℂNxy×Nxy expressing the transfer function of wave propagation [21], and B ∈ ℂNxyz×Nxyz is the block diagonal matrix of F where bdiag(·) denotes a block diagonal matrix. The Gabor hologram g ∈ ℝNxyz is the intensity of the sum of the diffracted field u and the reference beam r described as
gi=2Re(riui)+|ri|2+|ui|2,i=1,,Nxy,
where Re(x) is the operator to take the real part of x. For simplicity, we assume that (1) the reference light has a constant value on the measurement plane so that ri = 1 and (2) the DC term of the reference light |ri|2 = 1 can be eliminated. This assumption simplifies Eq. (4) as
gi=2Re(ui)+|ui|2=2Re(ui)+ei,i=1,,Nxy,
where the nonlinear term |ui|2 can be regarded as a model error ei and may be eliminated by the reconstruction process [5]. Substituting Eq. (1) with Eq. (5), we obtain the linear system to express the measurement of the Gabor hologram:
g=2Re(Hf)+e.

 figure: Fig. 1

Fig. 1 Setup for Gabor holography.

Download Full Size | PPT Slide | PDF

The linear system Eq. (6) is an ill-posed inverse problem (i.e., Nxy < Nxyz) and does not have a unique solution. In the CS framework, we exploit the sparsity of desired solutions in a certain space to find solutions from underdetermined systems [4]. In this paper, we infer the solution f by solving the following unconstrained optimization problem

minf{(f)+τ(f)}:=minf{2Re(Hf)g22+τ(f)},
where ‖ · ‖2 is the 2 norm, (f) expresses the fidelity between the measurement g and the estimated value f, (f) is a regularizer that promotes sparseness of the estimated value, and τ is the regularization parameter of (f). In this study, we used the 1 norm and total variation (TV) [17, 18] as regularizers, which are defined by
f1:=i=1Nxyz|fi|,
TV(f):=j=1Nzi=1NxyDifj2,
where Di ∈ ℂNxy is a 2D local finite difference operator for the i-th element of the slice fj.

3. GPU-accelerated CS reconstruction

3.1. Fast iterative shrinkage-thresholding algorithm

For reconstruction through the optimization problem Eq. (7), we used FISTA, which is a simple and fast iterative algorithm [19, 20]. FISTA is based on the iterative shrinkage-thresholding algorithm and improves it by using the last two iterations at each iteration step. We used it due to its low computational cost, high scalability, and simplicity.

In FISTA, taking an initial guess f0 = 0, h1 = f0 and a stepsize 1/L, we perform iterative update of the approximate solution at the k-th step fk by

fk=proxτ/L(hk1L(hk)),
tk+1=1+1+4tk22,
hk+1=fk+(tk1tk+1)(fkfk1),
where ∇ is the gradient of and proxτℛ/L is the proximal map [22] associated with the regularizer defined by
proxτ/L(h):=argminf{τL(f)+12fh22}.
From Eq. (6), the gradient ∇ can be written explicitly by [23]
(f)=2HH(2Re(Hf)g),
where HH is adjoint to H. For (f) = ‖f1, the proximal map is simplified to
proxτ/L(h)=𝒯τ/L(h),
where 𝒯κ is the shrinkage operator (soft-thresholding) defined by
𝒯κ(a):=(1κ|a|)+a,
where (a)+ = max{0, a}. For (f) = TV(f), since the proximal map cannot be written explicitly, we must solve Eq. (13) as the sub-problem at each iteration. This sub-problem can be solved by iterative schemes like Chambolle’s dual-based approach [18, 20], and we used the gradient projection (GP) algorithm in [20]. The GP requires internal iterations at each iteration of FISTA (in this paper, we set the number of iterations of the GP to 10 as described in Section 4); thus, TV regularization is more computationally intensive than 1 regularization.

3.2. GPU implementation

Figure 2 illustrates a schematic diagram of our GPU-accelerated compressive holography. We used an NVIDIA GTX 980 and CUDA 7.5. GTX 980 has 2048 cores at 1126 MHz and 4 GB of GDDR5 RAM. For better performance of the GTX 980, we used single-precision data types. CUDA is a parallel computing platform and programming model invented by NVIDIA, which allows developers to use a CUDA-enabled GPU for general-purpose processing. In the CUDA programming model, the system consists of a host that is a CPU and one or more devices that are GPUs. We write a function called kernel that describes the work of a single thread on a device, and invoke it with a large number of threads from a host. These threads are executed in parallel on thousands of cores on a GPU. This programming model naturally fits data-parallel computing and supports developers in using GPUs as a massively parallel computing device. Our GPU-accelerated compressive holography shown in Fig. 2 first sends hologram data from the host to the device and invokes a kernel to compute the transfer functions. Then kernels for FISTA are invoked as many times as the number of required iterations. After the FISTA step, we obtain the solution (i.e., a scattering density) by transferring it from the global device memory to the host. Basically kernels for the transfer functions and FISTA are easily parallelized, because most operations in those kernels are vector operations where each element can be processed independently by a single thread.

 figure: Fig. 2

Fig. 2 Schematic diagram of the implementation of GPU-accelerated compressive holography.

Download Full Size | PPT Slide | PDF

In general, multiplying the measurement matrix H (and its adjoint HH) in the gradient computation Eq. (14) is the most compute-intensive in FISTA. For compressive holography, multiplying H can be computed efficiently by exploiting its structure; multiplied H reduced to 2D fast Fourier transforms (FFTs), multiplications of the transfer functions, and the sum of the depth slices fi. Figure 3 illustrates the multiplication of H using its structure. We used the cuFFT library to execute 2D-FFTs on the GPU. As shown in Fig. 3, multiplying B and Q can be computed in parallel with depth slices fi due to its separability with respect to fi. Note that this block separability is an advantage when the program is executed on a multi-GPU system.

 figure: Fig. 3

Fig. 3 Multiplication of the measurement matrix H using its structure.

Download Full Size | PPT Slide | PDF

4. Results

We evaluated the performance of our GPU-based implementation by comparing it with CPU-based implementation. Our CPU-based implementation uses gcc 4.8.3 for a C++ compiler, FFTW for an FFT library, and OpenMP for multi-threading, and is run on an Intel Core i7-4790K with four cores at 4 GHz using eight threads. Table 1 shows the computation times of the GPU-based and CPU-based implementations for 1 and TV regularizations. The computation times are the average of 20 runs on the GPU and 10 runs on the CPU. We set the number of iterations of FISTA to 600, which produced moderate reconstruction results in our cases. It should be noted that the total computation times of our implementations grew linearly with the number of iterations. For TV regularization, we set the number of iterations of the GP to 10. We changed the size of the reconstructed signals to the following two situations: (1) the hologram size Nxy = Nx × Ny was changed and the number of depths Nz was fixed and (2) the hologram size Nxy = Nx × Ny was fixed and the number of depths Nz was changed. In both cases, GPU-based implementation is about 20 times faster than CPU-based implementation. Figure 4 illustrates the relationship between the computation time and the data size. As shown in Fig. 4 (a), the computation time of GPU-based implementation grows almost linearly with the data size, which indicates that data-parallel computing by GPU-based implementation works efficiently. Figure 4 (b) indicates that the slight degradation of the speed-up rate in the case of Nx × Ny × Nz = 512×512×40 in Table 1 is due to the performance improvement of CPU-based implementation.

Tables Icon

Table 1. Computation time of compressive holography using FISTA with 600 iterations.

 figure: Fig. 4

Fig. 4 Computation time of compressive holography using FISTA with 600 iterations on (a) the GPU (GTX 980) and (b) the CPU (Core i7-4790K).

Download Full Size | PPT Slide | PDF

We also evaluated each calculation step in the GPU-based implementation in the case of Nx × Ny × Nz = 1024 × 1024 × 10 as shown in Fig. 5. Note that in Fig. 5, Data transfer (to device), Data transfer (to host), and Transfer function are executed only once in the GPU-based compressive holography, while Gradient computation, Proximal map (L1, soft), Proximal map (TV, 1 GP), Update FISTA variables (tk+1, hk+1, fk−1) are executed as many times as the number of iterations of FISTA, and those computation times grow linearly with the number of iterations of FISTA. Moreover, in a single FISTA iteration, the proximal map for TV is executed iteratively, and its computation time also increases linearly with the number of iterations of GP. When the setting is the same as in Table 1 (600 iterations of FISTA and 10 GP), the gradient computation, proximal map for 1, proximal map for TV, and updating FISTA variables take 5.53 sec, 0.884 sec, 27.7 sec, and 1.20 sec, respectively. In this case, the total time of data transfers (14.7 ms) would be small enough to be negligible, and the most time-consuming step in 1 case is the gradient computation, and in TV case is the proximal map through GP.

 figure: Fig. 5

Fig. 5 Computation time of each step in the GPU-accelerated compressive holography where Nx × Ny × Nz = 1024 × 1024 × 10.

Download Full Size | PPT Slide | PDF

We identified bottlenecks of our implementation by profiling the kernels of the time-consuming steps: the gradient computation and the proximal map for TV (GP). For the gradient computation, the average compute resource utilization and memory bandwidth utilization were about 20% and 75%, respectively. For the GP, the average compute resource utilization and memory bandwidth utilization were about 26% and 75%, respectively. These results are low compute resource utilization and high bandwidth utilization case, and indicate that the kernels were memory bound and the bottleneck was the memory bandwidth between multiprocessors and device global memory in Fig. 2. The memory bandwidths of the CPU and GPU system in our evaluation environment were 12.8 GB/sec (DDR3-1600) and 224 GB/sec (GDDR5), respectively. If we fully used these bandwidths, we could achieve ×17.5 speed-up on the GPU. According to this memory bandwidth discussion, we assess the performance of our GPU-based implementation (about ×20 speed-up) as a reasonable result. This discussion also indicates that efficient memory access is important for the performance improvement. It should be noted that our GPU-based implementation has not been fully optimized. For example, a part of memory access in GP is not fully coalesced. This would be improved by using texture cache in GPUs, which is our future work. It should also be noted that when the data size is not sufficiently large, memory latency would be a bottleneck instead of the memory bandwidth. For example, when we tested the kernels of GP with Nx × Ny × Nz = 64 × 64 × 5 data, the memory bandwidth utilization decreased to about 25%.

To check our GPU-based implementation, we show an illustrative example of compressive holography by the simulation. Figure 6 (a) shows the 3D object used in the simulation. The 3D object has Nx × Ny × Nz = 1024 × 1024 × 5 samples with 9.0 μm lateral pitch and 2.0 mm axial pitch. It consists of 100 randomly distributed points with 10 × 10 pixels and four images with 256 × 256 pixels at different distances along z axis. The sensor has 1024 × 1024 pixels with 9.0 μm pixel pitch, and is located z = 100 [mm] away from the 3D object. A Gabor hologram where the DC term was eliminated as shown in Eq. (6) was measured by illuminating the 3D object with a plane wave with a 633 nm wavelength. Figures 6 (b) and 6 (c) show reconstructed images by backpropagation and compressive holography using our GPU-based implementation, respectively. While the reconstructed images by backpropagation were superposed with out-of-focus images, conjugate images, and squared fields, compressive holography suppressed these unwanted images from the reconstructed images. Figure 6 (c) was obtained through TV regularization with τ = 0.01. We ran FISTA with 1000 iterations and 10 GP iterations, which took 29 sec on the GPU and 597 sec on the CPU.

 figure: Fig. 6

Fig. 6 (a) 3D object and reconstructed images by (b) backpropagation and (c) compressive holography.

Download Full Size | PPT Slide | PDF

5. Conclusion

In conclusion, we developed GPU-accelerated CS reconstruction for compressive holography. Our implementation uses FISTA to solve the 1 and TV regularized problems that are typically used in compressive holography. In FISTA, multiplication of the measurement matrix is a compute-intensive task. This multiplication can be calculated efficiently by exploiting the structure of the measurement matrix; the measurement matrix in compressive holography consists of block 2D-FFTs, multiplications of the transfer functions, and the sum of the depth slices. We show the computation times of our GPU-based implementation and CPU-based implementation. The results show that GPU-based implementation is about 20 times faster than CPU-based implementation. We evaluated our GPU-based implementation and identified its bottleneck. This evaluation indicates that the most time-consuming step in 1 case is the gradient computation and in TV case is the proximal map through GP. A bottleneck of both calculation steps was memory bandwidth, and efficient memory access would improve the performances.

The proposed GPU-acceleration scheme is applicable to other CS applications especially in digital holography [7, 24], for example, single-exposure in-line holography [25], compressive Fresnel holography [26], and multi-dimensional imaging [27].

Acknowledgments

This work was partially supported by the Japan Society for the Promotion of Science Grant-in-Aid No. 25240015.

References and links

1. D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006). [CrossRef]  

2. E. J. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inf. Theory 52(2), 489–509 (2006). [CrossRef]  

3. E. J. Candes and T. Tao, “Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies?” IEEE Trans. Inf. Theory 52(12), 5406–5425 (2006). [CrossRef]  

4. E. J. Candes and M. Wakin, “An Introduction To Compressive Sampling,” IEEE Signal Proc. Mag. 25(2), 21–30 (2008). [CrossRef]  

5. D. J. Brady, K. Choi, D. L. Marks, R. Horisaki, and S. Lim, “Compressive Holography,” Opt. Express 17(15), 13040–13049 (2009). [CrossRef]   [PubMed]  

6. J. Hahn, S. Lim, K. Choi, R. Horisaki, and D. J. Brady, “Video-rate compressive holographic microscopic tomography,” Opt. Express 19(8), 7289–7298 (2011). [CrossRef]   [PubMed]  

7. S. Lim, D. L. Marks, and D. J. Brady, “Sampling and processing for compressive holography [Invited],” Appl. Opt. 50(34), H75–H86 (2011). [CrossRef]   [PubMed]  

8. Y. Rivenson, A. Stern, and J. Rosen, “Reconstruction guarantees for compressive tomographic holography,” Opt. Lett. 38(14), 2509–2511 (2013). [CrossRef]   [PubMed]  

9. K. Choi, R. Horisaki, J. Hahn, S. Lim, D. L. Marks, T. J. Schulz, and D. J. Brady, “Compressive holography of diffuse objects,” Appl. Opt. 49(34), H1–H10 (2010). [CrossRef]   [PubMed]  

10. J. Nickolls, I. Buck, M. Garland, and K. Skadron, “Scalable parallel programming with CUDA,” Queue 6(2), 40–53 (2008). [CrossRef]  

11. J. Nickolls and W. J. Dally, “The GPU Computing Era,” IEEE Micro 30(2), 56–69 (2010). [CrossRef]  

12. J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, and J. C. Phillips, “GPU Computing,” Proc. IEEE 96(5), 879–899 (2008). [CrossRef]  

13. D. S. Smith, J. C. Gore, T. E. Yankeelov, and E. B. Welch, “Real-Time Compressive Sensing MRI Reconstruction Using GPU Computing and Split Bregman Methods,” Int. J. Biomed. Imaging 2012, 864827 (2012). [CrossRef]   [PubMed]  

14. Ç. Bilen, Y. Wang, and I. W. Selesnick, “High-Speed Compressed Sensing Reconstruction in Dynamic Parallel MRI Using Augmented Lagrangian and Parallel Processing,” IEEE J. Emerg. Sel. Topics Circuits Syst. 2(3), 370–379 (2012). [CrossRef]  

15. D. Xu, Y. Huang, and J. U. Kang, “GPU-accelerated non-uniform fast Fourier transform-based compressive sensing spectral domain optical coherence tomography,” Opt. Express 22(12), 14871–14884 (2014). [CrossRef]   [PubMed]  

16. S. Bernabe, G. Martin, J. M. P. Nascimento, J. M. Bioucas-Dias, A. Plaza, and V. Silva, “GPU implementation of a hyperspectral coded aperture algorithm for compressive sensing,” in Proceedings of IEEE International Geoscience and Remote Sensing Symposium (IEEE, 2015), pp. 521–524.

17. L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Phys. D 60(1–4), 259–268 (1992). [CrossRef]  

18. A. Chambolle, “An Algorithm for Total Variation Minimization and Applications,” J. Math. Imaging Vis. 20(1), 89–97 (2004). [CrossRef]  

19. A. Beck and M. Teboulle, “A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems,” SIAM J. Imag. Sci. 2(1), 183–202 (2009). [CrossRef]  

20. A. Beck and M. Teboulle, “Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems,” IEEE Trans. Image Process. 18(11), 2419–2434 (2009). [CrossRef]   [PubMed]  

21. J. W. Goodman, Introduction to Fourier Optics, 3 Edition, (Roberts & Company, 2005).

22. N. Parikh and S. Boyd, “Proximal Algorithms,” Foundations and Trends in Optimization 1(3), 127–239 (2014). [CrossRef]  

23. D. H. Brandwood, “A complex gradient operator and its application in adaptive array theory,” IEE Proc. H Microwaves Opt. Antennas 130(1), 11–16 (1983). [CrossRef]  

24. Y. Rivenson, A. Stern, and B. Javidi, “Overview of compressive sensing techniques applied in holography [Invited],” Appl. Opt. 52(1), A423–A432 (2013). [CrossRef]   [PubMed]  

25. Y. Rivenson, A. Stern, and B. Javidi, “Improved depth resolution by single-exposure in-line compressive holography,” Appl. Opt. 52(1), A223–A231 (2013). [CrossRef]   [PubMed]  

26. Y. Rivenson, A. Stern, and B. Javidi, “Compressive Fresnel Holography,” J. Disp. Technol. 6(10), 506–509 (2010). [CrossRef]  

27. R. Horisaki, J. Tanida, A. Stern, and B. Javidi, “Multidimensional imaging using compressive Fresnel holography,” Opt. Lett. 37(11), 2013–2015 (2012). [CrossRef]   [PubMed]  

References

  • View by:

  1. D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006).
    [Crossref]
  2. E. J. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inf. Theory 52(2), 489–509 (2006).
    [Crossref]
  3. E. J. Candes and T. Tao, “Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies?” IEEE Trans. Inf. Theory 52(12), 5406–5425 (2006).
    [Crossref]
  4. E. J. Candes and M. Wakin, “An Introduction To Compressive Sampling,” IEEE Signal Proc. Mag. 25(2), 21–30 (2008).
    [Crossref]
  5. D. J. Brady, K. Choi, D. L. Marks, R. Horisaki, and S. Lim, “Compressive Holography,” Opt. Express 17(15), 13040–13049 (2009).
    [Crossref] [PubMed]
  6. J. Hahn, S. Lim, K. Choi, R. Horisaki, and D. J. Brady, “Video-rate compressive holographic microscopic tomography,” Opt. Express 19(8), 7289–7298 (2011).
    [Crossref] [PubMed]
  7. S. Lim, D. L. Marks, and D. J. Brady, “Sampling and processing for compressive holography [Invited],” Appl. Opt. 50(34), H75–H86 (2011).
    [Crossref] [PubMed]
  8. Y. Rivenson, A. Stern, and J. Rosen, “Reconstruction guarantees for compressive tomographic holography,” Opt. Lett. 38(14), 2509–2511 (2013).
    [Crossref] [PubMed]
  9. K. Choi, R. Horisaki, J. Hahn, S. Lim, D. L. Marks, T. J. Schulz, and D. J. Brady, “Compressive holography of diffuse objects,” Appl. Opt. 49(34), H1–H10 (2010).
    [Crossref] [PubMed]
  10. J. Nickolls, I. Buck, M. Garland, and K. Skadron, “Scalable parallel programming with CUDA,” Queue 6(2), 40–53 (2008).
    [Crossref]
  11. J. Nickolls and W. J. Dally, “The GPU Computing Era,” IEEE Micro 30(2), 56–69 (2010).
    [Crossref]
  12. J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, and J. C. Phillips, “GPU Computing,” Proc. IEEE 96(5), 879–899 (2008).
    [Crossref]
  13. D. S. Smith, J. C. Gore, T. E. Yankeelov, and E. B. Welch, “Real-Time Compressive Sensing MRI Reconstruction Using GPU Computing and Split Bregman Methods,” Int. J. Biomed. Imaging 2012, 864827 (2012).
    [Crossref] [PubMed]
  14. Ç. Bilen, Y. Wang, and I. W. Selesnick, “High-Speed Compressed Sensing Reconstruction in Dynamic Parallel MRI Using Augmented Lagrangian and Parallel Processing,” IEEE J. Emerg. Sel. Topics Circuits Syst. 2(3), 370–379 (2012).
    [Crossref]
  15. D. Xu, Y. Huang, and J. U. Kang, “GPU-accelerated non-uniform fast Fourier transform-based compressive sensing spectral domain optical coherence tomography,” Opt. Express 22(12), 14871–14884 (2014).
    [Crossref] [PubMed]
  16. S. Bernabe, G. Martin, J. M. P. Nascimento, J. M. Bioucas-Dias, A. Plaza, and V. Silva, “GPU implementation of a hyperspectral coded aperture algorithm for compressive sensing,” in Proceedings of IEEE International Geoscience and Remote Sensing Symposium (IEEE, 2015), pp. 521–524.
  17. L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Phys. D 60(1–4), 259–268 (1992).
    [Crossref]
  18. A. Chambolle, “An Algorithm for Total Variation Minimization and Applications,” J. Math. Imaging Vis. 20(1), 89–97 (2004).
    [Crossref]
  19. A. Beck and M. Teboulle, “A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems,” SIAM J. Imag. Sci. 2(1), 183–202 (2009).
    [Crossref]
  20. A. Beck and M. Teboulle, “Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems,” IEEE Trans. Image Process. 18(11), 2419–2434 (2009).
    [Crossref] [PubMed]
  21. J. W. Goodman, Introduction to Fourier Optics, 3 Edition, (Roberts & Company, 2005).
  22. N. Parikh and S. Boyd, “Proximal Algorithms,” Foundations and Trends in Optimization 1(3), 127–239 (2014).
    [Crossref]
  23. D. H. Brandwood, “A complex gradient operator and its application in adaptive array theory,” IEE Proc. H Microwaves Opt. Antennas 130(1), 11–16 (1983).
    [Crossref]
  24. Y. Rivenson, A. Stern, and B. Javidi, “Overview of compressive sensing techniques applied in holography [Invited],” Appl. Opt. 52(1), A423–A432 (2013).
    [Crossref] [PubMed]
  25. Y. Rivenson, A. Stern, and B. Javidi, “Improved depth resolution by single-exposure in-line compressive holography,” Appl. Opt. 52(1), A223–A231 (2013).
    [Crossref] [PubMed]
  26. Y. Rivenson, A. Stern, and B. Javidi, “Compressive Fresnel Holography,” J. Disp. Technol. 6(10), 506–509 (2010).
    [Crossref]
  27. R. Horisaki, J. Tanida, A. Stern, and B. Javidi, “Multidimensional imaging using compressive Fresnel holography,” Opt. Lett. 37(11), 2013–2015 (2012).
    [Crossref] [PubMed]

2014 (2)

2013 (3)

2012 (3)

D. S. Smith, J. C. Gore, T. E. Yankeelov, and E. B. Welch, “Real-Time Compressive Sensing MRI Reconstruction Using GPU Computing and Split Bregman Methods,” Int. J. Biomed. Imaging 2012, 864827 (2012).
[Crossref] [PubMed]

Ç. Bilen, Y. Wang, and I. W. Selesnick, “High-Speed Compressed Sensing Reconstruction in Dynamic Parallel MRI Using Augmented Lagrangian and Parallel Processing,” IEEE J. Emerg. Sel. Topics Circuits Syst. 2(3), 370–379 (2012).
[Crossref]

R. Horisaki, J. Tanida, A. Stern, and B. Javidi, “Multidimensional imaging using compressive Fresnel holography,” Opt. Lett. 37(11), 2013–2015 (2012).
[Crossref] [PubMed]

2011 (2)

2010 (3)

K. Choi, R. Horisaki, J. Hahn, S. Lim, D. L. Marks, T. J. Schulz, and D. J. Brady, “Compressive holography of diffuse objects,” Appl. Opt. 49(34), H1–H10 (2010).
[Crossref] [PubMed]

J. Nickolls and W. J. Dally, “The GPU Computing Era,” IEEE Micro 30(2), 56–69 (2010).
[Crossref]

Y. Rivenson, A. Stern, and B. Javidi, “Compressive Fresnel Holography,” J. Disp. Technol. 6(10), 506–509 (2010).
[Crossref]

2009 (3)

A. Beck and M. Teboulle, “A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems,” SIAM J. Imag. Sci. 2(1), 183–202 (2009).
[Crossref]

A. Beck and M. Teboulle, “Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems,” IEEE Trans. Image Process. 18(11), 2419–2434 (2009).
[Crossref] [PubMed]

D. J. Brady, K. Choi, D. L. Marks, R. Horisaki, and S. Lim, “Compressive Holography,” Opt. Express 17(15), 13040–13049 (2009).
[Crossref] [PubMed]

2008 (3)

E. J. Candes and M. Wakin, “An Introduction To Compressive Sampling,” IEEE Signal Proc. Mag. 25(2), 21–30 (2008).
[Crossref]

J. Nickolls, I. Buck, M. Garland, and K. Skadron, “Scalable parallel programming with CUDA,” Queue 6(2), 40–53 (2008).
[Crossref]

J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, and J. C. Phillips, “GPU Computing,” Proc. IEEE 96(5), 879–899 (2008).
[Crossref]

2006 (3)

D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006).
[Crossref]

E. J. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inf. Theory 52(2), 489–509 (2006).
[Crossref]

E. J. Candes and T. Tao, “Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies?” IEEE Trans. Inf. Theory 52(12), 5406–5425 (2006).
[Crossref]

2004 (1)

A. Chambolle, “An Algorithm for Total Variation Minimization and Applications,” J. Math. Imaging Vis. 20(1), 89–97 (2004).
[Crossref]

1992 (1)

L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Phys. D 60(1–4), 259–268 (1992).
[Crossref]

1983 (1)

D. H. Brandwood, “A complex gradient operator and its application in adaptive array theory,” IEE Proc. H Microwaves Opt. Antennas 130(1), 11–16 (1983).
[Crossref]

Beck, A.

A. Beck and M. Teboulle, “A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems,” SIAM J. Imag. Sci. 2(1), 183–202 (2009).
[Crossref]

A. Beck and M. Teboulle, “Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems,” IEEE Trans. Image Process. 18(11), 2419–2434 (2009).
[Crossref] [PubMed]

Bernabe, S.

S. Bernabe, G. Martin, J. M. P. Nascimento, J. M. Bioucas-Dias, A. Plaza, and V. Silva, “GPU implementation of a hyperspectral coded aperture algorithm for compressive sensing,” in Proceedings of IEEE International Geoscience and Remote Sensing Symposium (IEEE, 2015), pp. 521–524.

Bilen, Ç.

Ç. Bilen, Y. Wang, and I. W. Selesnick, “High-Speed Compressed Sensing Reconstruction in Dynamic Parallel MRI Using Augmented Lagrangian and Parallel Processing,” IEEE J. Emerg. Sel. Topics Circuits Syst. 2(3), 370–379 (2012).
[Crossref]

Bioucas-Dias, J. M.

S. Bernabe, G. Martin, J. M. P. Nascimento, J. M. Bioucas-Dias, A. Plaza, and V. Silva, “GPU implementation of a hyperspectral coded aperture algorithm for compressive sensing,” in Proceedings of IEEE International Geoscience and Remote Sensing Symposium (IEEE, 2015), pp. 521–524.

Boyd, S.

N. Parikh and S. Boyd, “Proximal Algorithms,” Foundations and Trends in Optimization 1(3), 127–239 (2014).
[Crossref]

Brady, D. J.

Brandwood, D. H.

D. H. Brandwood, “A complex gradient operator and its application in adaptive array theory,” IEE Proc. H Microwaves Opt. Antennas 130(1), 11–16 (1983).
[Crossref]

Buck, I.

J. Nickolls, I. Buck, M. Garland, and K. Skadron, “Scalable parallel programming with CUDA,” Queue 6(2), 40–53 (2008).
[Crossref]

Candes, E. J.

E. J. Candes and M. Wakin, “An Introduction To Compressive Sampling,” IEEE Signal Proc. Mag. 25(2), 21–30 (2008).
[Crossref]

E. J. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inf. Theory 52(2), 489–509 (2006).
[Crossref]

E. J. Candes and T. Tao, “Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies?” IEEE Trans. Inf. Theory 52(12), 5406–5425 (2006).
[Crossref]

Chambolle, A.

A. Chambolle, “An Algorithm for Total Variation Minimization and Applications,” J. Math. Imaging Vis. 20(1), 89–97 (2004).
[Crossref]

Choi, K.

Dally, W. J.

J. Nickolls and W. J. Dally, “The GPU Computing Era,” IEEE Micro 30(2), 56–69 (2010).
[Crossref]

Donoho, D. L.

D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006).
[Crossref]

Fatemi, E.

L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Phys. D 60(1–4), 259–268 (1992).
[Crossref]

Garland, M.

J. Nickolls, I. Buck, M. Garland, and K. Skadron, “Scalable parallel programming with CUDA,” Queue 6(2), 40–53 (2008).
[Crossref]

Goodman, J. W.

J. W. Goodman, Introduction to Fourier Optics, 3 Edition, (Roberts & Company, 2005).

Gore, J. C.

D. S. Smith, J. C. Gore, T. E. Yankeelov, and E. B. Welch, “Real-Time Compressive Sensing MRI Reconstruction Using GPU Computing and Split Bregman Methods,” Int. J. Biomed. Imaging 2012, 864827 (2012).
[Crossref] [PubMed]

Green, S.

J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, and J. C. Phillips, “GPU Computing,” Proc. IEEE 96(5), 879–899 (2008).
[Crossref]

Hahn, J.

Horisaki, R.

Houston, M.

J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, and J. C. Phillips, “GPU Computing,” Proc. IEEE 96(5), 879–899 (2008).
[Crossref]

Huang, Y.

Javidi, B.

Kang, J. U.

Lim, S.

Luebke, D.

J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, and J. C. Phillips, “GPU Computing,” Proc. IEEE 96(5), 879–899 (2008).
[Crossref]

Marks, D. L.

Martin, G.

S. Bernabe, G. Martin, J. M. P. Nascimento, J. M. Bioucas-Dias, A. Plaza, and V. Silva, “GPU implementation of a hyperspectral coded aperture algorithm for compressive sensing,” in Proceedings of IEEE International Geoscience and Remote Sensing Symposium (IEEE, 2015), pp. 521–524.

Nascimento, J. M. P.

S. Bernabe, G. Martin, J. M. P. Nascimento, J. M. Bioucas-Dias, A. Plaza, and V. Silva, “GPU implementation of a hyperspectral coded aperture algorithm for compressive sensing,” in Proceedings of IEEE International Geoscience and Remote Sensing Symposium (IEEE, 2015), pp. 521–524.

Nickolls, J.

J. Nickolls and W. J. Dally, “The GPU Computing Era,” IEEE Micro 30(2), 56–69 (2010).
[Crossref]

J. Nickolls, I. Buck, M. Garland, and K. Skadron, “Scalable parallel programming with CUDA,” Queue 6(2), 40–53 (2008).
[Crossref]

Osher, S.

L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Phys. D 60(1–4), 259–268 (1992).
[Crossref]

Owens, J. D.

J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, and J. C. Phillips, “GPU Computing,” Proc. IEEE 96(5), 879–899 (2008).
[Crossref]

Parikh, N.

N. Parikh and S. Boyd, “Proximal Algorithms,” Foundations and Trends in Optimization 1(3), 127–239 (2014).
[Crossref]

Phillips, J. C.

J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, and J. C. Phillips, “GPU Computing,” Proc. IEEE 96(5), 879–899 (2008).
[Crossref]

Plaza, A.

S. Bernabe, G. Martin, J. M. P. Nascimento, J. M. Bioucas-Dias, A. Plaza, and V. Silva, “GPU implementation of a hyperspectral coded aperture algorithm for compressive sensing,” in Proceedings of IEEE International Geoscience and Remote Sensing Symposium (IEEE, 2015), pp. 521–524.

Rivenson, Y.

Romberg, J.

E. J. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inf. Theory 52(2), 489–509 (2006).
[Crossref]

Rosen, J.

Rudin, L. I.

L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Phys. D 60(1–4), 259–268 (1992).
[Crossref]

Schulz, T. J.

Selesnick, I. W.

Ç. Bilen, Y. Wang, and I. W. Selesnick, “High-Speed Compressed Sensing Reconstruction in Dynamic Parallel MRI Using Augmented Lagrangian and Parallel Processing,” IEEE J. Emerg. Sel. Topics Circuits Syst. 2(3), 370–379 (2012).
[Crossref]

Silva, V.

S. Bernabe, G. Martin, J. M. P. Nascimento, J. M. Bioucas-Dias, A. Plaza, and V. Silva, “GPU implementation of a hyperspectral coded aperture algorithm for compressive sensing,” in Proceedings of IEEE International Geoscience and Remote Sensing Symposium (IEEE, 2015), pp. 521–524.

Skadron, K.

J. Nickolls, I. Buck, M. Garland, and K. Skadron, “Scalable parallel programming with CUDA,” Queue 6(2), 40–53 (2008).
[Crossref]

Smith, D. S.

D. S. Smith, J. C. Gore, T. E. Yankeelov, and E. B. Welch, “Real-Time Compressive Sensing MRI Reconstruction Using GPU Computing and Split Bregman Methods,” Int. J. Biomed. Imaging 2012, 864827 (2012).
[Crossref] [PubMed]

Stern, A.

Stone, J. E.

J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, and J. C. Phillips, “GPU Computing,” Proc. IEEE 96(5), 879–899 (2008).
[Crossref]

Tanida, J.

Tao, T.

E. J. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inf. Theory 52(2), 489–509 (2006).
[Crossref]

E. J. Candes and T. Tao, “Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies?” IEEE Trans. Inf. Theory 52(12), 5406–5425 (2006).
[Crossref]

Teboulle, M.

A. Beck and M. Teboulle, “Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems,” IEEE Trans. Image Process. 18(11), 2419–2434 (2009).
[Crossref] [PubMed]

A. Beck and M. Teboulle, “A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems,” SIAM J. Imag. Sci. 2(1), 183–202 (2009).
[Crossref]

Wakin, M.

E. J. Candes and M. Wakin, “An Introduction To Compressive Sampling,” IEEE Signal Proc. Mag. 25(2), 21–30 (2008).
[Crossref]

Wang, Y.

Ç. Bilen, Y. Wang, and I. W. Selesnick, “High-Speed Compressed Sensing Reconstruction in Dynamic Parallel MRI Using Augmented Lagrangian and Parallel Processing,” IEEE J. Emerg. Sel. Topics Circuits Syst. 2(3), 370–379 (2012).
[Crossref]

Welch, E. B.

D. S. Smith, J. C. Gore, T. E. Yankeelov, and E. B. Welch, “Real-Time Compressive Sensing MRI Reconstruction Using GPU Computing and Split Bregman Methods,” Int. J. Biomed. Imaging 2012, 864827 (2012).
[Crossref] [PubMed]

Xu, D.

Yankeelov, T. E.

D. S. Smith, J. C. Gore, T. E. Yankeelov, and E. B. Welch, “Real-Time Compressive Sensing MRI Reconstruction Using GPU Computing and Split Bregman Methods,” Int. J. Biomed. Imaging 2012, 864827 (2012).
[Crossref] [PubMed]

Appl. Opt. (4)

Foundations and Trends in Optimization (1)

N. Parikh and S. Boyd, “Proximal Algorithms,” Foundations and Trends in Optimization 1(3), 127–239 (2014).
[Crossref]

IEE Proc. H Microwaves Opt. Antennas (1)

D. H. Brandwood, “A complex gradient operator and its application in adaptive array theory,” IEE Proc. H Microwaves Opt. Antennas 130(1), 11–16 (1983).
[Crossref]

IEEE J. Emerg. Sel. Topics Circuits Syst. (1)

Ç. Bilen, Y. Wang, and I. W. Selesnick, “High-Speed Compressed Sensing Reconstruction in Dynamic Parallel MRI Using Augmented Lagrangian and Parallel Processing,” IEEE J. Emerg. Sel. Topics Circuits Syst. 2(3), 370–379 (2012).
[Crossref]

IEEE Micro (1)

J. Nickolls and W. J. Dally, “The GPU Computing Era,” IEEE Micro 30(2), 56–69 (2010).
[Crossref]

IEEE Signal Proc. Mag. (1)

E. J. Candes and M. Wakin, “An Introduction To Compressive Sampling,” IEEE Signal Proc. Mag. 25(2), 21–30 (2008).
[Crossref]

IEEE Trans. Image Process. (1)

A. Beck and M. Teboulle, “Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems,” IEEE Trans. Image Process. 18(11), 2419–2434 (2009).
[Crossref] [PubMed]

IEEE Trans. Inf. Theory (3)

D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006).
[Crossref]

E. J. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inf. Theory 52(2), 489–509 (2006).
[Crossref]

E. J. Candes and T. Tao, “Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies?” IEEE Trans. Inf. Theory 52(12), 5406–5425 (2006).
[Crossref]

Int. J. Biomed. Imaging (1)

D. S. Smith, J. C. Gore, T. E. Yankeelov, and E. B. Welch, “Real-Time Compressive Sensing MRI Reconstruction Using GPU Computing and Split Bregman Methods,” Int. J. Biomed. Imaging 2012, 864827 (2012).
[Crossref] [PubMed]

J. Disp. Technol. (1)

Y. Rivenson, A. Stern, and B. Javidi, “Compressive Fresnel Holography,” J. Disp. Technol. 6(10), 506–509 (2010).
[Crossref]

J. Math. Imaging Vis. (1)

A. Chambolle, “An Algorithm for Total Variation Minimization and Applications,” J. Math. Imaging Vis. 20(1), 89–97 (2004).
[Crossref]

Opt. Express (3)

Opt. Lett. (2)

Phys. D (1)

L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal algorithms,” Phys. D 60(1–4), 259–268 (1992).
[Crossref]

Proc. IEEE (1)

J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, and J. C. Phillips, “GPU Computing,” Proc. IEEE 96(5), 879–899 (2008).
[Crossref]

Queue (1)

J. Nickolls, I. Buck, M. Garland, and K. Skadron, “Scalable parallel programming with CUDA,” Queue 6(2), 40–53 (2008).
[Crossref]

SIAM J. Imag. Sci. (1)

A. Beck and M. Teboulle, “A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems,” SIAM J. Imag. Sci. 2(1), 183–202 (2009).
[Crossref]

Other (2)

S. Bernabe, G. Martin, J. M. P. Nascimento, J. M. Bioucas-Dias, A. Plaza, and V. Silva, “GPU implementation of a hyperspectral coded aperture algorithm for compressive sensing,” in Proceedings of IEEE International Geoscience and Remote Sensing Symposium (IEEE, 2015), pp. 521–524.

J. W. Goodman, Introduction to Fourier Optics, 3 Edition, (Roberts & Company, 2005).

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (6)

Fig. 1
Fig. 1 Setup for Gabor holography.
Fig. 2
Fig. 2 Schematic diagram of the implementation of GPU-accelerated compressive holography.
Fig. 3
Fig. 3 Multiplication of the measurement matrix H using its structure.
Fig. 4
Fig. 4 Computation time of compressive holography using FISTA with 600 iterations on (a) the GPU (GTX 980) and (b) the CPU (Core i7-4790K).
Fig. 5
Fig. 5 Computation time of each step in the GPU-accelerated compressive holography where Nx × Ny × Nz = 1024 × 1024 × 10.
Fig. 6
Fig. 6 (a) 3D object and reconstructed images by (b) backpropagation and (c) compressive holography.

Tables (1)

Tables Icon

Table 1 Computation time of compressive holography using FISTA with 600 iterations.

Equations (16)

Equations on this page are rendered with MathJax. Learn more.

u = Hf = F 1 QBf ,
Q = [ P 1 P 2 P N z ] ,
B = bdiag ( F , F , , F ) ,
g i = 2 Re ( r i u i ) + | r i | 2 + | u i | 2 , i = 1 , , N x y ,
g i = 2 Re ( u i ) + | u i | 2 = 2 Re ( u i ) + e i , i = 1 , , N x y ,
g = 2 Re ( Hf ) + e .
min f { ( f ) + τ ( f ) } : = min f { 2 Re ( Hf ) g 2 2 + τ ( f ) } ,
f 1 : = i = 1 N x y z | f i | ,
TV ( f ) : = j = 1 N z i = 1 N x y D i f j 2 ,
f k = prox τ / L ( h k 1 L ( h k ) ) ,
t k + 1 = 1 + 1 + 4 t k 2 2 ,
h k + 1 = f k + ( t k 1 t k + 1 ) ( f k f k 1 ) ,
prox τ / L ( h ) : = argmin f { τ L ( f ) + 1 2 f h 2 2 } .
( f ) = 2 H H ( 2 Re ( Hf ) g ) ,
prox τ / L ( h ) = 𝒯 τ / L ( h ) ,
𝒯 κ ( a ) : = ( 1 κ | a | ) + a ,

Metrics