## Abstract

We demonstrate how to efficiently implement extremely high-dimensional compressive imaging of a bi-photon probability distribution. Our method uses fast-Hadamard-transform Kronecker-based compressive sensing to acquire the joint space distribution. We list, in detail, the operations necessary to enable fast-transform-based matrix-vector operations in the joint space to reconstruct a 16.8 million-dimensional image in less than 10 minutes. Within a subspace of that image exists a 3.2 million-dimensional bi-photon probability distribution. In addition, we demonstrate how the marginal distributions can aid in the accuracy of joint space distribution reconstructions.

© 2015 Optical Society of America

## 1. Introduction

Characterizing high-dimensional joint systems is a difficult problem due to experimental impracticalities such as long measurement times, low flux, or insufficient computing resources. One example of such a characterization is of continuous-variable entangled states – a resource gaining ground in quantum technologies [1–6]. A widely used source of continuous-variable entangled states is Spontaneous Parametric Down-Conversion (SPDC) in a nonlinear crystal [7]. Depending on the configuration, the resulting bi-photon state may be entangled in the transverse degrees of freedom [8–10]. To determine if the system is entangled, both the bi-photon joint position and joint momentum probability distributions composed of signal and idler photons must be measured through correlation measurements.

Much work has been done recently to characterize high-dimensional position-momentum entanglement with discrete measurements [8,11–16]. Characterizations of position or momentum distributions resulting from SPDC are done by measuring signal and idler pixel correlations in either an image plane of the crystal (constituting a position measurement) or a Fourier-transform plane of the crystal (constituting a momentum measurement) through coincidence counting.

For a bi-photon probability distribution, measurements are typically done by measuring correlations via raster scanning through the individual signal- and idler-photon probability distributions. The time required to complete a raster scan with single-photon detectors quickly becomes impractical for certain scans. Imaging these distributions with a camera has been shown in [17, 18], yet cameras often introduce far more noise than a single-photon detector.

Recently, Compressive Sensing (CS) [19, 20] techniques have been introduced as an alternative to raster scanning for characterizing a high-dimensional entangled system, dropping the measurement time from months to hours [21,22]. While the data-acquisition time is drastically reduced, it comes at the cost of computational complexity, requiring a computational reconstruction of the signal. Performing CS on high-dimensional signals is not a new problem, and several clever solutions exist for utilizing separable compressive sensing matrices combined by a Kronecker product [23, 24]. However, these methods are ill suited for sampling the correlations in a joint space.

In this article we propose the use of fast Hadamard transforms for high-dimensional joint space reconstructions. Specifically, we show how the Kronecker-product-based recursion relations of Sylvester-type Hadamard matrices can combine single-particle sensing matrices. This, in turn, enables the use of fast Hadamard transforms in the joint space as they have been shown to drastically reduce CS reconstruction times [25]. Using the randomization techniques outlined in [26, 27], sensing matrices composed of randomized Hadamard matrices offer tremendous speed enhancements in many reconstruction algorithms.

Additionally we show that by using the individual signal and idler marginal distributions in our reconstruction of the joint space distribution, we can more accurately acquire transverse spatial correlations as measured by the mutual information. To the best of our knowledge, this is the first time the single particle information has been used in reconstructing the joint space distribution.

To demonstrate the effectiveness of our method, reconstruct a 16.8 × 10^{6} dimensional joint-space distribution in only a few minutes. Within this joint-space lives a 3.2 million-dimensional bi-photon position probability distribution from which we measure the degree of transverse correlations between signal and idler photons using the mutual information.

The experimental realization here is closely related to the work performed in [21]. The experiment in this article is merely meant to demonstrate how structured randomness enables efficient reconstructions of the joint space distribution at even higher dimensions. In theory, this increase in resolution allows for an increase in the amount of measurable mutual information.

## 2. Compressive sensing

CS helps to overcome unreasonable data-acquisition times associated with sampling signals with limited resources. CS requires that the signal of interest **x̃** has a sparse representation **x** via a basis transform. Limiting the discussion to real signals for simplicity, **x̃** ∈ ℝ* ^{N}*, then there exists a basis transform Ψ in which

*k*<

*N*components of

**x**= Ψ ·

**x̃**are nonzero;

**x**is a

*k*-sparse representation of

**x̃**. CS posits that if

**x**is approximately

*k*-sparse, then only

*M*= 𝒪(

*k*log(

*N/k*)) projections of

**x̃**are needed to accurately sample

**x̃**. A typical CS technique involves taking

*M*≪

*N*projections of a signal

**x̃**with a random sensing matrix

**A**∈ ℝ

^{M×N}to form a measurement vector

**y**=

**A**·

**x̃**where

**y**∈ ℝ

*. Assuming the signal representation*

^{M}**x**is

*k*-sparse,

**x**may be recovered by solving

*ε*and is defined as the

*l*-norm. The function

_{p}*g*(

**x**) is a function to be minimized depending on the assumed sparsity;

*g*(

**x**) is often chosen to be the

*l*

_{1}-norm ${\Vert \mathbf{x}\Vert}_{1}^{2}$, the total variation as defined by the gradient ‖∇

**x̃**‖

_{1}, or a combination of functions where

**x**is known to be sparse.

*τ*is simply a weighting term that weights the sparsity, favoring either the least-squares solution or the sparsity. Because

**A**∈ ℝ

^{M×N}where

*M*≪

*N*, there are an infinite number of solutions confined to the least-squares term. The function

*g*(

**x**) picks out the sparsest of these solutions which should correspond to our signal

**x**. An overview of compressive sensing and its applications may be found in [28].

## 3. Measuring a non-separable joint system

We apply CS to measure the joint position probability distribution of the down-converted signal and idler photons from SPDC. Quantum mechanics tells us that the bi-photon state exists in a Hilbert space composed of the tensor product of the individual signal and idler photon Hilbert spaces. After representing the bi-photon state in a discrete basis, operators on these states are represented as matrices. In order make a measurement, we approximate the state as living in a finite dimensional Hilbert space. We can therefore represent a bi-photon operator matrix in terms of Kronecker products of individual signal and idler photon operator matrices. For CS, we can let these operators be projection operators and manipulate them such that they form the rows of a sensing matrix **A**. The sets of projection operators for the signal and idler spaces are designated by a a set of *M* patterns where each pattern is in ℝ* ^{N}*;

**P**

*and*

_{S}**P**

*∈ ℝ*

_{I}^{M×N}. In this manner, the sensing matrix is written as

*i*∈ 1...

*M*where

**P**[

*i*] represents the

*i*row of

^{th}**P**.

The Kronecker product ⊗ in this article operates on matrices and vectors such that if **a** is of dimension *m* × *n* and **b** is of dimension *p* × *q*, then their Kronecker is of dimension *mp* × *nq* represented as

An experimental diagram for measuring the joint space bi-photon position probability distribution is presented in Fig. 1 where each particle’s space is depicted as two-dimensional. Yet, to simplify the CS formalism, we represent the signal and idler spaces as one-dimensional living in ℝ* ^{N}* and the joint space distribution as a vector

**x**∈ ℝ

^{N2}. As outlined in [21], compressive sensing is experimentally accomplished by taking random projections with patterns composed of $\left[\sqrt{N}\times \sqrt{N}\right]$ pixels within each subspace of the signal and idler systems and then measuring the resulting correlations in photon counts.

During reconstruction, the bi-photon probability distribution is already sparse in the pixel basis due to the tight pixel correlations resulting from energy and momentum conservation. This eliminates the need to define a sparse-basis such that Ψ becomes the identity operator Ψ = 𝟙 in the formalism above.

Random binary matrices were used in [21], yet we wish to use *structured* random binary matrices for reconstruction purposes. Using properties of Kronecker products enables relatively efficient computations of the reconstruction operations **A** · **x** and **A*** ^{T}* ·

**y**, where

**A**

*is the transpose of*

^{T}**A**, because

**A**never needs to be computed explicitly [29]. However, structured randomness can enable the use of fast transforms which are even more efficient. Our contribution is to demonstrate in the following section how randomized Sylvester-Hadamard matrices enable the use of fast Hadamard transforms in joint space CS reconstructions.

## 4. Fast Hadamard transform based sensing matrices

#### 4.1. Randomly sampled & permuted Hadamard sensing matrices

Sylvester-Hadamard matrices have a structure that is particularly advantageous to the CS framework. These matrices are generated from a simple recursion relation defined by a Kronecker product.

*k*> 1. Because of this structure, Sylvester-Hadamard matrices are restricted to powers of two but can be used to build patterns and a sensing matrix that utilizes the speed and efficiency of a fast Hadamard transform ℋ[...]. We use a normally-ordered fast transform in this work. Its algorithm is similar to that of a fast Fourier transform, but it consists of only additions and subtractions. Hence, it performs the reconstruction operations

**A**·

**x**and

**A**

*·*

^{T}**y**in 𝒪 (

*N*

^{2}log

*N*) time – significantly faster than an explicit matrix-vector multiplication of 𝒪 (

*N*

^{4}) time. A thorough overview of Hadamard matrices, fast-Hadamard transforms, and their applications to signal and image processing may be found in [30].

To construct **P*** _{S}*,

**P**

*, and*

_{I}**A**from Hadamard matrices, the Hadamard matrices must be randomized in both their rows and columns. The sensing matrices must be both incoherent with the image yet span the space in which the signal resides. In other words, the sensing matrices must adequately sample the basis components of the signal. Random sensing matrices perform this task well, yet Hadamard matrices naturally contain much structure. To begin, each sensing matrix must be formed by taking specific rows from a Hadamard matrix with the correct dimensions.

**A**is constructed from

**H**

_{N2}while

**P**

*and*

_{S}**P**

*are constructed from*

_{I}**H**

*. Because of the relation in Eq. (2), the rows of*

_{N}**A**will be determined by the rows of

**P**

*and*

_{S}**P**

*.*

_{I}The randomization of the Hadamard matrix rows is accomplished by defining two vectors **r*** _{S}* and

**r**

*∈ ℝ*

_{I}*for each signal and idler system composed of*

^{M}*M*randomly chosen integers on the interval [2,N]. The values in

**r**state which rows should be extracted from

**H**

*when constructing*

_{N}**P**

*and*

_{S}**P**

*. Note that the interval begins at 2 because the first row of a Hadamard matrix is composed entirely of ones. The interval may begin at 1 if the total photon flux on a detector is desired. Also, note that*

_{I}**A**∈ ℝ

^{M×N2}where

*M*<<

*N*

^{2}. This condition allows for scenarios where

**P**

*,*

_{S}**P**

*∈ ℝ*

_{I}^{M×N}such that

*M*>

*N*in the individual subspaces, meaning rows of

**H**

*may be repeated when constructing*

_{N}**P**

*and*

_{S}**P**

*.*

_{I}The randomization of the Hadamard columns is accomplished by defining permutation vectors **p*** _{S}* and

**p**

*∈ ℝ*

_{I}*that randomly permute the*

^{N}*N*columns of

**H**

*. Once*

_{N}**r**and

**p**have been defined for both the signal and idler subspaces, patterns are constructed by the following equations:

*y*and

*x*components of

**H**[

*y*,

*x*] refer to the rows and columns of

**H**respectively.

Although these operations are defined for the signal and idler subspaces, they combine in a particular way according to Eq. (2) to construct a Hadamard-based sensing matrix **A** that enables fast transform operations in the joint space. The manner in which they combine to manipulate a Hadamard matrix **H**_{N2} that spans the joint space is detailed in the next section.

#### 4.2. Joint space Sylvester-Hadamard sensing matrices

Once **r** and **p** have been defined for the individual signal and idler subspaces, they may be used to construct the corresponding joint space row-selection and permutation vectors, **r*** _{SI}* and

**p**

*. Consider the construction of*

_{SI}**r**

*first. By Eq. (4), the complete joint space sensing matrix*

_{SI}**A**is simply formed by the row-wise Kronecker product of the subspace sensing matrices

**P**

*and*

_{S}**P**

*. As*

_{I}**r**

*and*

_{S}**r**

*determine the ordering of the Hadamard rows within these patterns,*

_{I}**r**

*must also be a subset of a Kronecker product of*

_{SI}**r**

*and*

_{S}**r**

*. Knowing that the Kronecker product of*

_{I}**P**

*and*

_{S}**P**

*will form “blocks” of size [*

_{I}*M*×

*N*], it is straightforward to show that

*i*∈ 1...

*M*where

**r**[

*i*] represents the

*i*component of

^{th}**r**. Note that element-wise counting in this article starts at 1 and not 0.

Because **r*** _{S}* and

**r**

*are chosen at random and will often be over-complete,*

_{I}*M*>

*N*, and

**r**

*will probably have repeating units, and a row within*

_{SI}**A**will appear more than once. This is equivalent to taking the same projection more than once, offering no additional information. To prevent this, we simply compare each of the values within

**r**

*and eliminate repeating*

_{SI}*i*values. If

^{th}**r**

*[*

_{SI}*i*] is a repeated value, we eliminate

**r**

*[*

_{SI}*i*] along with the components

**r**

*[*

_{S}*i*] and

**r**

*[*

_{I}*i*]. In this way, the number of samples

*M*will decrease yet contain the same amount of information.

The formation of **p*** _{SI}* follows a similar form as

**r**

*, yet it will be of length*

_{SI}*N*

^{2}. Although it is not a simple Kronecker product, it does follow from the structure in Eq. (2). The structure of

**p**

*takes the form*

_{SI}*i*∈ 1...

*N*and

*j*∈ 1...

*N*. Generating randomized Hadamard matrices using

**r**and

**p**for each signal, idler, and joint space are summarized below:

*y*and

*x*components of

**H**[

*y*,

*x*] refer to the rows and columns of

**H**respectively. The construction of

**A**presented in Eq. (8) allows us to use fast transforms as explained in the next section.

#### 4.3. Joint space fast Hadamard transform operations

Keeping track of the randomization operations allows the use of fast Hadamard transforms when computing **A** · **x** and **A*** ^{T}* ·

**y**. This is accomplished by reordering either

**x**or

**y**according to

**p**, taking the fast Hadamard transform, and then picking specific elements from the final result according to

**r**. The manner in which they are rearranged and picked depends upon the operation

**A**·

**x**or

**A**

*·*

^{T}**y**in either the data acquisition or reconstruction processes.

Starting with the data-taking procedure **y** = **A** · **x**, projections are taken in each signal and idler system by first constructing individual patterns. Pattern construction is done by fast Hadamard transforming basis vectors and then permuting them. Because of the symmetric nature of a Hadamard matrix (**H** = **H*** ^{T}*), a fast Hadamard transform of a basis vector

*α*[

*i*], in which the

*i*component is equal to one and the rest zeros, is equal to the

^{th}*i*row of the Hadamard matrix

^{th}**H**[

*i*, :]. In short, ℋ[

*α*[

*i*]] =

**H**[

*i*]. Hence, every

*i*pattern

^{th}**P**[

*i*, :] can be built according to a fast transform by

*i*∈ 1...

*M*where

*α*[

**r**[

*i*]], a basis vector whose

**r**[

*i*]

*component is equal to 1, is fast-Hadamard transformed and then permuted according to*

^{th}**p**.

To take experimental data, many SLM’s, such as digital micromirror devices, are operated in a binary fashion of on or off – transmitting light either to or away from a detector. If only using one detector per subspace, at any given moment a pattern may only be composed of 0’s or 1’s while Hadamard matrices are composed of 1’s and −1’s. To display the full Hadamard pattern with one detector per subspace, the data-taking operations must be split into positive and negative operations. **H**_{N2} may be decomposed into a sum of four Kronecker products of both positive **H*** _{N}*, represented by
${\mathbf{H}}_{N}^{+}$ which is composed of 0’s and 1’s, and negative

**H**

*, represented by ${\mathbf{H}}_{N}^{-}$ which is composed of −1’s and 0’s.*

_{N}**y**will require four coincidence measurements. Even though 4

*M*coincidence measurements are required when using one detector per subspace, the drastic sampling performance gained through CS methods is such that 4

*M*≪

*N*

^{2}. Alternatively, if two detectors are used in each subspace (one detector to collect projections from the positive components and one detector to collect projections from the negative components), the detection process could be streamlined to measure each of the four correlations in Eq. (10).

In reconstruction, fast-Hadamard transforms may be utilized by CS reconstruction algorithms to perform the operations **A** · **x** and **A*** ^{T}* ·

**y**. The operation

**A**·

**x**first requires that

**x**be inverse-permuted, fast Hadamard transformed, and then have the correct

*M*elements extracted from the final result. The inverse-permutation is done by defining an inverse permutation vector

**q**as

*i*elements in

**p**. Hence,

**A**·

**x**is realized with the following operations

The operation **A*** ^{T}* ·

**y**requires that a vector

*β*composed of

*N*

^{2}zeros be filled with the elements of

**y**according to

**r**

*, fast Hadamard transformed, and then permuted according to*

_{SI}**q**as follows:

Again, these operations work because Hadamard matrices are symmetric. However, it should be noted that the true inverse operation is
${\mathbf{H}}_{N}^{-1}={\mathbf{H}}_{N}^{T}/N$. When taking the fast transform operation in Eq. (13), we are explicitly taking the forward fast transform and neglecting the normalization term. Because of this structure, the operations **A** · **x** and **A*** ^{T}* ·

**y**can be utilized by most reconstruction algorithms to operate more efficiently.

The methods outlined in this article can also be applied to joint space signals that are not sparse in the “pixel-basis” where Ψ ≠ 𝟙. Sparse forward and inverse transform operations, Ψ[...] and Ψ[...]^{−1}, need to be applied to **x** in an appropriate order to bring **x** and **y** back into the pixel-basis before fast-Hadamard transforming. Hence, the operations **A** · **x** and **A*** ^{T}* ·

**y**become

**A**· Ψ[

**x**]

^{−1}and Ψ[

**A**

*·*

^{T}**y**].

To reiterate, the novelty presented in this section is in how Eqns. (6) and (7) enable the use of fast Hadamard transforms for calculating **A** · Ψ[**x**]^{−1} and Ψ[**A**^{T}*·* **y**] as summarized below.

**y**=**A**· Ψ[**x**]^{−1}- If Ψ = 𝟙, neglect this step. Otherwise, inverse transform
**x**out of the sparse basis using the inverse transform to obtain**x′**= Ψ[**x**]^{−1}. - Inverse permute
**x′**using**q**such that_{SI}**x″**=**x′**[**q**]._{SI} - Fast Hadamard transform
**x″**such that**y′**= ℋ[**x″**]. - Extract
*M*elements from**y′**using**r**to obtain_{SI}**y**=**y′**[**r**]._{SI}

**x**= Ψ[**A**·^{T}**y**]- Construct a null-vector
*β*∈ ℝ^{N2}. - Place the components of
**y**into*β*using**r**to assign the locations for the elements of_{SI}**y**such that*β*[**r**] =_{SI}**y**. - Fast Hadamard transform
*β*such that*β′*= ℋ[*β*]. - Permute the elements of
*β′*using**p**to obtain_{SI}**x***′*=*β′*[**p**]._{SI} - If Ψ = 𝟙, neglect this step and let
**x**=**x′**. Otherwise, transform**x′**into the sparse basis to obtain**x**= Ψ[**x′**].

## 5. Compressive measurement in a 16.8 × 10^{6}-dimensional correlated space

#### 5.1. Mutual information

To demonstrate the practicality of the previous results, we compressively measure and quickly reconstruct a 16.8 × 10^{6} dimensional joint space probability distribution. Up to this point, the reason why these joint space measurements are useful has not been explained in detail other than to inform the reader of the characterization of correlated systems. When attempting to use down-converted photons for information transfer, an important question to ask is, How much uncertainty about the position of the signal photon is removed upon knowing the position of the idler photon? This quantity is effectively answered by the Shannon mutual information between the position statistics of the signal and idler photons *I*(*X _{S}*,

*X*) [31]. The mutual information quantifies the classical channel capacity and is easily found by first measuring the joint space probability distribution. Given the discrete random variables’ distributions for signal

_{I}*X*and idler

_{S}*X*, and their allowed set of random values,

_{I}*x*and

_{S}*x*respectively, the mutual information is defined as:

_{I}*p*(

*x*,

_{S}*x*) is the joint space probability distribution while

_{I}*p*(

*x*) and

_{S}*p*(

*x*) are the marginal probability distributions. In terms of our CS formalism,

_{I}*p*(

*x*,

_{S}*x*) =

_{I}**x**. Once

**x**has been procured from

**y**, the marginal distributions

*p*(

*x*) and

_{S}*p*(

*x*) are then found summing over appropriate values of

_{I}*p*(

*x*,

_{S}*x*). Since we may approximate the joint distribution as a double Gaussian, we may also say 2

_{I}^{I(XS, XI)}is equal to the Schmidt number of the state which is a measure of the number of entangled modes [32, 33]. In our case, 2

^{I(XS, XI)}is equal to the number of distinct channel inputs.

#### 5.2. Theoretical expectations

Before reporting our experimental results, it is useful to first estimate the theoretical maximum amount of possible mutual information as derived from first principles based on the crystal and the pump-laser specifications. That result should then be compared with the maximum possible information we could measure given our SLM resolution. A thorough calculation characterizing degenerate SPDC is done in [10] in one transverse spatial dimension. Assuming a double Gaussian bi-photon state, the mutual information in the position domain between down-converted photons (*X _{S}* and

*X*) is

_{I}*L*,

_{z}*λ*, and

_{p}*σ*represent the length of the nonlinear crystal, the pump-laser wavelength, and the standard deviation of the Gaussian intensity pump-laser profile, respectively. We use a 325 nm pump laser and a 1 mm length nonlinear crystal. The maximum 1/

_{p}*e*

^{2}pump diameter is listed as 1.2 mm resulting in a

*σ*that is four times smaller, i.e.

_{p}*σ*= 3 × 10

_{p}^{−4}m. The experiment uses approximately degenerate down-converted light because of the paraxial nature of beam propagation and the use of narrow-band filters. Our pump laser is approximately Gaussian in two dimensions resulting in a mutual information twice as large as reported in Eq. (15). We obtain a theoretical maximum mutual information between signal and idler photons of 10.9 bits. However, when moving from an infinite-dimensional Hilbert space to a finite dimensional space dictated by the resolution of the SLM and its pixel size, the resulting measurable mutual information must be less than or equal to to the continuous variable case of 10.9 bits.

#### 5.3. Reconstruction algorithm: maximizing the mutual information

Given that a motivation for these reconstructions is to infer the mutual information shared between signal and idler photons, it is reasonable to design a reconstruction algorithm that attempts to maximize the mutual information via the suppression of noise through soft or hard thresholding; hard thresholding implies setting components less than a threshold to zero while soft thresholding implies first hard thresholding and then decreasing the amplitude of every remaining signal component by the threshold’s amplitude. For practical applications, only the largest measurable signal components of **x** should be used to infer the mutual information because of the presence of background noise. With this in mind, we use an iterative thresholding algorithm [34] that computes the mutual information at each iteration. The program exits when the mutual information no longer increases with thresholding.

The algorithm we use is summarized as follows:

**c**is a vector composed entirely of ones times a non-zero constant and min(

**x**

*) is a vector composed entirely of ones times the smallest element in*

_{t}**x**

*. Note that at each iteration we take a projection of the current result with the term*

_{t}*η̂*

_{1}[

**A**

*· (*

^{T}**y**−

**A**·

**x**

*)]. During the first iteration,*

_{t}**A**·

**x**

_{0}= 0 because

**x**

_{0}=

**c**. Note that

**A**·

**c**= 0 only because we chose to neglect the first row of the Hadamard matrices. The first iteration results in computing

**A**

*·*

^{T}**y**as in standard iterative style algorithms.

*η̂*

_{1}[...] is an operator that performs soft thresholding on everything within its brackets using a biorthogonal 4.4 wavelet transform with a two-level decomposition [35–37]. Soft threshold in the wavelet basis, often referred to as wavelet shrinkage, is performed using the universal threshold of Donoho and Johnstone [38, 39]. The filtered signal is then inverse transformed back to the pixel basis.

*η̂*

_{2}[...] then performs a hard thresholding on everything within its brackets operating in the pixel basis and renormalizes the final result to be a probability distribution. The threshold of

*η̂*

_{2}gradually increases with each iteration. As

**x**

*becomes less and less noisy through filtering and converges to the true solution,*

_{t}**y**−

**A**·

**x**

*approaches zero. However, it never truly reaches zero and results in a small noise term. To prevent injecting random noise into each filtered iteration, we take the projection of the noisy term with the current clean solution. We then add this projection back to the current solution to prevent discarding current signal components. After hard thresholding the first iteration, min(*

_{t}**x**

_{t>0}) =

**0**where

**0**is a null vector.

#### 5.4. Experimental results

As an experimental demonstration, we compare how the measured mutual information from compressive measurements compares to the theoretical mutual information of 10.9 bits. Knowing that information from CS resides in the standard deviation of the signal from the different projections, this standard deviation must be greater than the shot noise. Otherwise, the signal is obscured by the noise. With binary patterns on each SLM we measured coincidence counts at a rate of 4 × 10^{3} counts per second. Since each SLM reduces the incoming flux by approximately 50%, the total number of coincidences Φ was approximately four times larger, or Φ = 1.6 × 10^{4} coincidences per second. We chose the integration time such that all four projections per **y**-element required a total of 8 seconds due to power constraints. The resulting ratio of the standard deviation from the projections relative to the shot noise was 2.4. In addition, the sparsity *k* should be of order *N* = 4096 due to tight pixel correlations resulting from position and momentum correlations, meaning *M* = 𝒪(*N*log(*N*)) = 𝒪(10^{4}) measurements. We chose the number of measurements to be 2 × 10^{4} (*M/N*^{2} ≈ .001) as a reasonable compromise between the total integration time and reconstruction quality. The resulting scan took just over 44 hours. To compare these values to a raster scan, the signal-to-noise ratio (*SNR*) goes as
$\sqrt{\mathrm{\Phi}t/N}$, assuming perfect pixel correlations and uniform illumination. Here, *t* is the integration time per pixel. When raster scanning in an *N*^{2} dimensional joint space, the total integration time goes as *N*^{2}*t* = *N*^{3} *SNR*^{2}/Φ for shot noise limited signals. Therefore, a raster scan operating under perfect conditions, again considering perfect pixel correlations and uniform illumination, would require 50 days to achieve a *SNR* = 1 for *N* = 4096. Hence, 50 days represents the bare minimum integration time for raster scans. The reconstructed joint space probability distribution is presented in in Fig. 2.

From the same set of data, a comparison of the recovered mutual information versus the number of samples *M* was also conducted. The results state that the resulting mutual information is *highly* dependent on the noise. As the algorithm seeks to find the optimal threshold to discern the largest number of distinguishable modes, random noise is thresholded into sparse speckle patterns when the signal is not large enough to distinguish from the noise. These speckle patterns incorrectly state that there exist tight pixel correlations. To stress the severity of this flaw, we consistently recover about 8 bits of mutual information with *M* = 10 projections. Even for large *SNR* and large *M*, these speckle patterns may still be present along with the true signal for many reconstruction algorithms. Hence, noise will artificially increase the mutual information.

Reconstruction errors that artificially increase the mutual information bring into question the validity of CS methods for these types of characterizations. However, there exists information in the signal and idler marginal probability distributions that can reduce this error. On the right hand side of Fig. 2, the marginal distribution for only the signal is shown since the idler distribution is similar in appearance. The signal’s measured 64 × 64 pixel marginal distribution was recovered from the 2 × 10^{4} projections using photon counts from only that detector. The beam is Gaussian; Fig. 2 plots everything within the 1/*e*^{2} intensity beam diameter while thresholding the background intensity to zero. With the idler’s marginal distribution taking a similar form, these marginal distributions indicate that there are regions where correlations should not exist. This information can be used to reduce the error from reconstructions by only allowing the algorithm to admit correlations where the marginal distributions say that correlations may exist. These results are summarized in Fig. 3.

It is evident from Fig. 3 that background noise significantly alters the results, fabricating a larger mutual information. When neglecting marginal information, the recovered mutual information values appear to asymptotically decrease as *M* grows larger, supposedly approaching an accurate value. However, including information from the marginals decreases these errors, even for small *M*, by systematically reducing background noise. Note that Fig. 3 plots the mutual information in bits. This means that for low sampling percentages when including the marginal information, just under 4 bits of mutual information, presumably from noise, is calculated. However, these reconstructions are in a regime where *M* ≪ *k* log(*N/k*) and the signal is dominated by noise.

Using the reconstructed joint space distribution, the marginal probability distributions can be calculated and compared to the measured marginal distributions. When not including the measured marginal information, the joint space distribution becomes riddled with a low-level noise that drastically warps the resulting reconstructed marginal distributions. However, including the marginal information results in a clean joint space distribution with reconstructed marginal distributions that are similar in appearance to the measured values. The reconstructed distributions in Fig. 2 were recovered while including the marginal distribution information. Using CS and additional information in the marginal distributions, we measure 7.21 bits of mutual information, corresponding to 148 correlated modes. The reason we fall short of 10.9 bits of measured mutual information is likely due to the low resolution imposed by the beam covering a small part of the sample space. This resulted in a discretization of the marginal distributions that was not detailed enough to measure all of the available correlations. Another issue is the misalignment in the overlap of signal and idler pixels. This causes the appearance of a second correlation band as seen in Fig. 2(a). Finally, a misalignment in the focus will also result in a single pixel in one arm sharing correlations with several pixels in the other arm.

While we compressively measured the correlations existing between 4096 signal and 4096 idler SLM pixels, illumination profiles on each SLM suggest that there exist pixels where no correlations are even possible. To determine the actual joint space distribution from which we could measure possible correlations, it is reasonable to consider the 1/*e*^{2} diameter of each Gaussian beam profile and consider the total number of correlations that could exist between those pixels alone. The product of these areas, relative to pixel size, then defines the effective joint space dimensionality. From these enclosed areas, we calculate a joint space probability distribution of 3.23 × 10^{6} dimensions. Notice the measured and recovered marginal distributions presented in Fig. 2. The marginal distribution recovered from the reconstructed joint space distribution is smaller than the 1/*e*^{2} thresholded distribution displayed immediately above. These regions should be identical in the limit of infinite *SNR* and perfect image-plane optical alignment. Again, the difference suggests that either the *SNR* may not have been sufficient to extract all of the existing correlations or that the alignment was imprecise.

Reconstructions in [21] were limited to a maximum resolution of 10^{6} pixels on a desktop computer with 32 GBytes of memory and required several hours to reconstruct **x** before the image quality was sufficient to exit the reconstruction program. However using a laptop with 8 GBytes of memory, we perform reconstructions of a 16.8 × 10^{6} dimensional joint space in under ten minutes with satisfactory results using fast Hadamard transforms with the algorithm presented in Eq. (16). We ran the reconstruction algorithm 10 times per sample point to ensure that the results are consistent. The resulting error bars were four orders of magnitude smaller than the scale allows and are not shown. It should be noted that the reported mutual information values are the result of extracting the largest correlations from the data and are not the result of inferring the mutual information from a best Gaussian fit, assuming the resulting SPDC follows a commonly approximated double-Gaussian wave function [32, 33]. While a Gaussian fit would probably characterize the system better and result in a higher mutual information by reporting correlations in the tails of the Gaussian, it is not indicative of the actual available channel capacity.

## 6. Conclusion

This article describes in detail the methods necessary to efficiently perform high-dimensional compressive Kronecker imaging of a joint system. By randomizing Hadamard matrices and utilizing fast Hadamard transforms, CS is performed in the 16.8 million-dimensional Kronecker space while containing a 3.23 million-dimensional bi-photon probability distribution. The experiment that would require over 50 days to do a raster scan under perfect conditions is instead performed in just under two days and only requires several minutes to reconstruct the data. As an example, we compressively measured 7.21 out of 10.9 bits of mutual information while also demonstrating how to improve the accuracy of these results utilizing information contained within the marginal distributions. We believe these methods will prove to be an invaluable tool in measuring the distribution functions of correlated systems as well as other correlation-based CS implementations.

## Acknowledgments

We would like the thank Gregory A. Howland for the use of his computer code to verify our results and James Schneeloch for his contributions, theoretical discussions, and careful editing to make this a coherent article. This work was sponsored by the Air Force grant AFOSR Grant No. FA9550-13-1-0019.

## References and links

**1. **G. Masada, K. Miyata, A. Politi, T. Hashimoto, J. L. O’Brien, and A. Furusawa, “Continuous-variable entanglement on a chip,” Nat. Photonics **9**, 316–319 (2015). [CrossRef]

**2. **J. L. O’Brien, A. Furusawa, and J. Vučković, “Photonic quantum technologies,” Nat. Photonics **3**, 687–695 (2009). [CrossRef]

**3. **S. L. Braunstein and P. Van Loock, “Quantum information with continuous variables,” Rev. Mod. Phys. **77**, 513 (2005). [CrossRef]

**4. **A. Steane, “Quantum computing,” Rep. Prog. Phys. **61**, 117 (1998). [CrossRef]

**5. **S. D. Huver, C. F. Wildfeuer, and J. P. Dowling, “Entangled fock states for robust quantum optical metrology, imaging, and sensing,” Phys. Rev. A **78**, 063828 (2008). [CrossRef]

**6. **V. Giovannetti, S. Lloyd, and L. Maccone, “Advances in quantum metrology,” Nat. Photonics **5**, 222–229 (2011). [CrossRef]

**7. **M. H. Rubin, “Transverse correlation in optical spontaneous parametric down-conversion,” Phys. Rev. A **54**, 5349 (1996). [CrossRef] [PubMed]

**8. **J. C. Howell, R. S. Bennink, S. J. Bentley, and R. W. Boyd, “Realization of the Einstein-Podolsky-Rosen paradox using momentum-and position-entangled photons from spontaneous parametric down conversion,” Phys. Rev. Lett. **92**, 210403 (2004). [CrossRef]

**9. **T. E. Keller and M. H. Rubin, “Theory of two-photon entanglement for spontaneous parametric down-conversion driven by a narrow pump pulse,” Phys. Rev. A **56**, 1534 (1997). [CrossRef]

**10. **J. Schneeloch and J. C. Howell, “Introduction to the transverse spatial correlations in spontaneous parametric down-conversion through the biphoton birth zone,” http://arxiv.org/abs/1502.06996.

**11. **J. Schneeloch, P. B. Dixon, G. A. Howland, C. J. Broadbent, and J. C. Howell, “Violation of continuous-variable Einstein-Podolsky-Rosen steering with discrete measurements,” Phys. Rev. Lett. **110**, 130407 (2013). [CrossRef] [PubMed]

**12. **J. Schneeloch, S. H. Knarr, G. A. Howland, and J. C. Howell, “Demonstrating continuous variable Einstein-Podolsky-Rosen steering in spite of finite experimental capabilities using Fano steering bounds,” J. Opt. Soc. Am. B **32**(4), A8–A14 (2015). [CrossRef]

**13. **A. I. Lvovsky and M. G. Raymer, “Continuous-variable optical quantum-state tomography,” Rev. Mod. Phys. **81**, 299 (2009). [CrossRef]

**14. **D. Giovannini, J. Romero, J. Leach, A. Dudley, A. Forbes, and M. J. Padgett, “Characterization of high-dimensional entangled systems via mutually unbiased measurements,” Phys. Rev. Lett. **110**, 143601 (2013). [CrossRef] [PubMed]

**15. **E. Bolduc, G. Gariepy, and J. Leach, “Direct measurement of large-scale quantum states,” http://arxiv.org/abs/1506.00851.

**16. **A. Shabani, R. L. Kosut, M. Mohseni, H. Rabitz, M. A. Broome, M. P. Almeida, A. Fedrizzi, and A. G. White, “Efficient measurement of quantum dynamics via compressive sensing,” Phys. Rev. Lett. **106**, 100401 (2011). [CrossRef] [PubMed]

**17. **M. P. Edgar, D. S. Tasca, F. Izdebski, R. E. Warburton, J. Leach, M. Agnew, Gerald S. Buller, Robert W. Boyd, and Miles J. Padgett, “Imaging high-dimensional spatial entanglement with a camera,” Nat. Commun. **3**, 984 (2012). [CrossRef] [PubMed]

**18. **R. Fickler, M. Krenn, R. Lapkiewicz, S. Ramelow, and A. Zeilinger, “Real-time imaging of quantum entanglement,” Sci. Rep. **3**, 1914 (2013). [CrossRef] [PubMed]

**19. **D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory **52**, 1289–1306 (2006). [CrossRef]

**20. **M. F. Duarte, M. A. Davenport, D. Takhar, J. N. Laska, T. Sun, K. E. Kelly, and R. G. Baraniuk, “Single-pixel imaging via compressive sampling,” IEEE Signal Process Mag. **25**, 83 (2008). [CrossRef]

**21. **G. A. Howland and J. C. Howell, “Efficient high-dimensional entanglement imaging with a compressive-sensing double-pixel camera,” Phys. Rev. X **3**, 011013 (2013).

**22. **F. Tonolini, S. Chan, M. Agnew, A. Lindsay, and J. Leach, “Reconstructing high-dimensional two-photon entangled states via compressive sensing,” Sci. Rep. **4**, 6542 (2014). [CrossRef] [PubMed]

**23. **Y. Rivenson and A. Stern, “Practical compressive sensing of large images,” in Proceedings of IEEE Conference on Digital Signal Processing (IEEE, 2009), pp. 1–8.

**24. **M. F. Duarte and R. G. Baraniuk, “Kronecker compressive sensing,” IEEE Trans. Image Process. **21**, 494–504 (2012). [CrossRef]

**25. **S. L. Shishkin, “Fast and Robust Compressive Sensing Method Using Mixed Hadamard Sensing Matrix,” IEEE J. Emerging Sel. Top. Circuits Syst. **2**, 353–361 (2012). [CrossRef]

**26. **C. Li, “Compressive Sensing for 3D Data Processing Tasks: Applications, Models and Algorithms,” Ph.D. thesis, Rice University (2011).

**27. **C. Li, W. Yin, and Y. Zhang, “Users Guide for TVAL3: TV Minimization by Augmented Lagrangian and Alternating Direction Algorithms,” (2009).

**28. **E. J. Candès and M. B. Wakin, “An introduction to compressive sampling,” IEEE Signal Process Mag. **25**, 21–30 (2008). [CrossRef]

**29. **R. A. Horn and C. R. Johnson, *Topics in Matrix Analysis* (Cambridge University Press, 1991.) [CrossRef]

**30. **R. K. Yarlagadda and R. R. Yarlagadda, *Hadamard Matrix Analysis and Synthesis* (Kluwer Academic Publishers, 1997). [CrossRef]

**31. **P. B. Dixon, G. A. Howland, J. Schneeloch, and J. C. Howell, “Quantum mutual information capacity for high-dimensional entangled states,” Phys. Rev. Lett. **108**, 143603 (2012). [CrossRef] [PubMed]

**32. **C. K. Law and J. H. Eberly, “Analysis and interpretation of high transverse entanglement in optical parametric down conversion,” Phys. Rev. Lett. **92**, 127903 (2004). [CrossRef] [PubMed]

**33. **M. V. Fedorov, Y. M. Mikhailova, and P. A. Volkov, “Gaussian modelling and Schmidt modes of SPDC biphoton states,” J. Phys. B: At. Mol. Opt. Phys. **42**, 175503 (2009). [CrossRef]

**34. **A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM J. Imag. Sci. **2**, 183–202 (2009). [CrossRef]

**35. **I. W. Selesnick, R. G. Baraniuk, and N. C. Kingsbury, “The dual-tree complex wavelet transform,” IEEE Signal Process Mag. **22**, 123–151 (2005). [CrossRef]

**36. **I. Daubechies, “The wavelet transform, time-frequency localization and signal analysis,” IEEE Trans. Inf. Theory **36**, 961–1005 (1990). [CrossRef]

**37. **M. Antonini, M. Barlaud, P. Mathieu, and I. Daubechies, “Image coding using wavelet transform,” IEEE Trans. Image Process. **1**, 205–220 (1992). [CrossRef] [PubMed]

**38. **D. L. Donoho and J. M. Johnstone, “Ideal spatial adaptation by wavelet shrinkage,” Biometrika **81**, 4255 (1994). [CrossRef]

**39. **D.L. Donoho, “De-noising by soft-thresholding,” IEEE Trans. Inf. Theory **41**, 613–627 (1995). [CrossRef]