Information theoretic analysis of hyperspectral imaging systems with applications to fluorescence microscopy

Sripad Ram

doi:10.1364/BOE.10.003380

1. Introduction

Hyperspectral imaging represents a broad class of techniques that capture spectral and spatial data from the object of interest. Fluorescence microscopy is a powerful tool to study microscopic objects such as biological cells at high spatial and temporal resolution. Fluorescence imaging supports the visualization of multiple targets within the object of interest, which is facilitated by labeling the targets with fluorescent labels that have distinct excitation and emission spectra (Fig. 1(A)). Consequently, the need for hyperspectral imaging arises in such applications when the emission spectra of the fluorescent labels significantly overlap (Fig. 1(B)). There exist a variety of approaches to carry out hyperspectral imaging of a fluorescently labeled microscopic sample. For instance, hyperspectral imaging can be carried out either with a confocal [1–3] or a linescanning [4] microscope that has spectral detection capability. Alternately, it can also be carried out on a widefield microscope by using either narrowband excitation and emission filters [5–8] or by using an electronically controlled liquid crystal tunable filter [9, 10]. Common to all these techniques is the underlying hyperspectral data which consists of a sequence of 2D images acquired at different spectral windows. Thus given the hyperspectral data, the goal is then to estimate the relative abundance (typically represented in photon counts) of the different labels at each pixel, and this is referred to as the spectral unmixing problem (Fig. 1(C)).

Fig. 1 Hyperspectral imaging in fluorescence microscopy. Panel A shows a typical configuration of a fluorescence microscope for multicolor imaging application. The sample is selectively illuminated at distinct excitation passbands (excitation filter wheel) and the fluorescence signal is then collected at matched emission passbands (emission filter wheel) on an imaging detector. Panel B shows the emission spectra of fluorescent labels exhibiting significant spectral overlap. Panel C illustrates the spectral unmixing problem where given a hyperspectral data set, i.e. the input image cube, which consists of N_ch spectral images each with N_p pixels, the goal is to obtain an estimate of the output image cube that represents the relative abundance θ_k of N_s different fluorescent labels at each pixel in the sample. Here $I_{θ, k}$ is a random vector with probability density p_θ_,_k that is a function of ν_θ_,_k, which describes the signal at the k^th pixel block in the input image cube (see Section 2.2 for details).

Download Full Size | PDF

A common approach to solving the spectral unmixing problem assumes that the number of labels present in the sample is known. Spectral unmixing approaches that make use of this strategy include the least squares estimator [11–13], the maximum likelihood estimator [14, 15], and phasor based approaches [3, 16]. In fluorescence imaging applications, we typically assume a linear mixing model i.e., $ν_{θ, k} = A θ_{k}$ , where $ν_{θ, k}$ , which describes the signal detected at the k $^{t h}$ pixel in the input image cube (see Fig. 1(C)), is a linear combination of the relative abundance θ_k of the different fluorescent labels present at that pixel in the sample. Here A denotes the mixing matrix that specifies the relative contribution of each fluorescent label in every spectral channel. The mixing matrix depends on the optical configuration of the hyperspectral microscope and the spectral properties of the fluorescent dye. The mixing matrix can be computed either theoretically or experimentally by performing a calibration experiment with single color control samples.

A fundamental question that arises in hyperspectral imaging applications concerns with its performance limits which deals with the best possible accuracy with which the relative abundance of the different labels can be determined. Knowledge of the performance limit is important as it provides a metric to design and optimize a hyperspectral imaging system. Moreover, it also acts a benchmark to compare the performance of different spectral unmixing algorithms for a given hyperspectral dataset. In this paper, we present results to compute the performance limits of a hyperspectral imaging system. We adopt a stochastic framework to model the acquired data and derive the Fisher information matrix for the spectral unmixing problem. The Fisher information matrix plays a central role in the theory concerning parameter estimation problems. Through the Cramer-Rao inequality [17], the inverse Fisher information matrix provides a lower bound to the variance of any unbiased estimator $\hat{θ}$ of an unknown parameter θ. This means that the inverse Fisher information matrix provides the best possible accuracy with which the unknown parameter can be estimated. Here, for hyperpectral imaging we consider a linear mixing model and define a linear unmixing performance (LUP) bound in terms of the inverse Fisher information matrix for the underlying spectral estimation problem. By definition, the LUP bound provides the best possible accuracy with which the relative abundance of the different labels can be estimated from the input image cube.

In the past, several groups have investigated the spectral unmixing problem. In [18, 19], the authors calculated the Fisher information matrix for a deterministic data model that is corrupted by white noise (Gaussian data model). In [15, 20], the authors derived the Fisher information matrix for a data model that is corrupted by shot noise (Poisson data model). The present manuscript provides a broad stochastic framework that is applicable to several different data models that are typically used to describe data from an imaging detector. Specifically, in addition to the above data models our results are also applicable for the Poisson + Gaussian data model and for the stochastic Poisson + Gaussian data model which are typically used to describe imaging data acquired from sCMOS/CCD cameras and EMCCD cameras, respectively.

The paper is organized as follow. In Section 2 we present the problem formulation, introduce the different data models, and derive the general expression of the Fisher information matrix for the spectral unmixing problem. We then investigate conditions under which the Fisher information matrix is block diagonal as block diagonality has several implications. Next, we introduce the linear mixing model and derive the Fisher information matrix for the same. We also derive analytical expressions of the Fisher information matrix that pertain to the different data models. Further we derive results to address questions regarding the design of hyperspectral imaging systems such as the addition or splitting of spectral channels and its impact on spectral unmixing.

In Section 4, we illustrate the results derived in Section 2 by considering different hyperspectral imaging configurations and fluorescent label pairs. We show that for a pair of regular fluorescent labels there exists a spectral resolution limit, which predicts how spectrally close the emission spectra of the two labels can be and still be accurately spectrally resolved. We also show that by exploiting the phenomenon of anti-Stokes shift fluorescence, the spectral resolution limit can be overcome and that the emission spectra of the fluorescent labels can be arbitrarily close to each other. We also illustrate how photon statistics, and the addition or splitting of spectral channels can affect the performance bound for different data models. In Section 5, we compare the performance of two spectral unmixing algorithms, namely the least squares estimator and maximum likelihood estimator, on hyperspectral data. Our results show that for the imaging conditions tested here, both estimators are unbiased and their performance come close to the theoretical performance bound with the maximum likelihood estimator being consistently closer to the performance bound than the least squares estimator.

2. Theory

2.1. Problem formulation

We consider a generic model of a hyperspectral imaging system wherein the sample is illuminated by a light source and the light that is transmitted or emitted by the sample is collected into spectral channels and recorded as a sequence of 2D spectral images, which we refer to as the input image cube. Without loss of generality, we assume that the size of every spectral image is the same. Let N_p denote the total number of pixels in a spectral image and N_ch denote the number of spectral images. We define a pixel block as a sequence of N_ch pixels with the same pixel index, say k for $k = 1, \dots, N_{p}$ , across N_ch different spectral images. We assume that the object of interest contains N_s distinct but spectrally overlapping labels. Due to the presence of multiple labels, the signal at the $k^{t h}$ pixel block can be described as a superposition of the relative abundance or photons detected from the individual labels that are present at that pixel in the sample. The spectral unmixing problem can then be stated as follows (Fig. 1(C)) : given an input image cube which is a $N_{p} \times N_{c h}$ dimensional dataset, the goal is to estimate an output (or spectrally unmixed) image cube θ which is a $N_{p} \times N_{s}$ dimensional dataset that contains the photon counts of N_s labels at each pixel in the sample.

2.2. Image formation model

Let $Θ \subseteq ℝ^{N_{p} \times N_{s}}$ be the parameter space and $θ \in Θ$ be the unknown parameter vector. For $k = 1, \dots, N_{p}$ , $j = 1, \dots, N_{c h}$ and $C \subseteq ℝ$ , let $P_{θ}$ denote a family of probability densities $p_{θ, k}^{j}$ on $C$ that is parameterized by θ. The signal detected in the input image cube is modeled as a sequence of random vectors given by

I_{θ} = {I_{θ, 1}, I_{θ, 2}, \dots, I_{θ, N_{p}}}, θ \in Θ,

where

I_{θ, k} : = {I_{θ, k}^{j}; j = 1, \dots, N_{c h}}

for

k = 1, \dots, N_{p}

, and

I_{θ, k}^{j}

denotes an independent random variable with probability density

p_{θ, k}^{j}

that models the detected photons at the

k^{t h}

pixel in the

j^{t h}

spectral image. Here we assume that for

θ \in Θ

,

j = 1, \dots, N_{c h}

and

k = 1, \dots, N_{p}

,

A1 $p_{θ, k}^{j}$ satisfies the following regularity conditions [17]

$\frac{\partial p_{θ, k}^{j} (r)}{\partial θ}$ exists for $r \in C$ and $θ \in Θ$ ,
$\int_{C} | \frac{\partial p_{θ, k}^{j} (r)}{\partial θ} | d r < \infty$ for $θ \in Θ$ , and
the integral $\int_{C} \frac{1}{p_{θ, k}^{j} (r)} \frac{\partial p_{θ, k}^{j} (r)}{\partial θ_{k}} \frac{\partial p_{θ, k}^{j} (r)}{\partial θ_{m}} d r$ exists and is finite for $k, m = 1, \dots, N_{p}$ and $θ \in Θ$ .

A2 $p_{θ, k}^{j}$ depends on θ through a non-negative function $ν_{θ, k}^{j}$ that is differentiable with respect to θ, i.e., $p_{θ, k}^{j} \equiv p_{ν_{θ, k}^{j}, k}^{j}$ .

As we will see, the term $ν_{θ, k}^{j}$ will typically describe the expected number of detected photons at the $k^{t h}$ pixel in the $j^{t h}$ spectral image for $θ \in Θ$ , $j = 1, \dots, N_{c h}$ and $k = 1, \dots, N_{p}$ . We note that $ν_{θ, k}^{j}$ in turn can be expressed as a function of θ which we wish to estimate.

We note that in our image formation model, (Eq. (1)), we have assumed that the photons detected at each pixel in the image are mutually independent of one another. This assumption is typically used in statistical modeling of imaging data acquired with point/imaging detectors [21–23] including fluorescence microscopy data [24].

2.3. Specific data models

In this section we consider specific data models that are used to describe the imaging data under different experimental conditions. We also discuss how the assumptions made in Section 2.2 regarding the probability density function $p_{θ, k}^{j}$ are applicable to these data models and thereby cover a wide variety of imaging conditions.

Gaussian data model: Here we consider the case where the photons detected at the $k^{t h}$ pixel in the $j^{t h}$ spectral image is modeled as a deterministic signal $ν_{θ, k}^{j}$ that is corrupted by additive Gaussian noise. Hence $I_{θ, k}^{j}$ is given by $I_{θ, k}^{j} = ν_{θ, k}^{j} + W_{k}^{j}$ where $W_{k}^{j}$ is an independent Gaussian random variable with mean $η_{k}^{j}$ and standard deviation $σ_{k}^{j}$ for $j = 1, \dots, N_{c h}$ and $k = 1, \dots, N_{p}$ . Then the probability density function of $I_{θ, k}^{j}$ is given by

p_{θ, k}^{j} (z) = \frac{1}{\sqrt{2 π} σ_{k}^{j}} exp (- \frac{{(z - (ν_{θ, k}^{j} + η_{k}^{j}))}^{2}}{2 {(σ_{k}^{j})}^{2}}), z \in ℝ,

θ \in Θ

,

j = 1, \dots, N_{c h}

,

k = 1, \dots, N_{p}

. For the above equation, it is straightforward to verify that assumptions A1 and A2 are satisfied. This data model can be used to describe the photon unlimited imaging scenario where the detected photon can be modeled as a deterministic signal.

Poisson data model: Here we assume that the detected photons at each pixel are Poisson distributed. Thus we have $I_{θ, k}^{j} = S_{θ, k}^{j}$ , where $S_{θ, k}^{j}$ is an independent Poisson random variable with mean $ν_{θ, k}^{j}$ for $θ \in Θ$ , $j = 1, \dots, N_{c h}$ and $k = 1, \dots, N_{p}$ . The probability density function of $I_{θ, k}^{j}$ is given by

p_{θ, k}^{j} (z) = \frac{e^{- ν_{θ, k}^{j}} {(ν_{θ, k}^{j})}^{z}}{z!}, z = 0, 1, 2, \dots,

θ \in Θ

,

j = 1, \dots, N_{c h} k = 1, \dots, N_{p}

. It immediately follows that assumptions A1 and A2 are satisfied. For instance, this specific data model would be applicable for imaging systems such as a confocal or a multiphoton microscope, where the main source of randomness in the data is attributed to shot noise statistics [24].

Poisson + Gaussian data model: Here we consider $I_{θ, k}^{j}$ to be given by $I_{θ, k}^{j} = S_{θ, k}^{j} + W_{k}^{j}$ , for $θ \in Θ$ , $j = 1, \dots, N_{c h}$ , and $k = 1, \dots, N_{p}$ . Here, $S_{θ, k}^{j}$ is a Poisson random variable with mean $ν_{θ, k}^{j}$ that models the detected photons at the $k^{t h}$ pixel in the $j^{t h}$ spectral image, and $W_{k}^{j}$ is an independent Gaussian random variable with mean $η_{k}^{j}$ and standard deviation $σ_{k}^{j}$ that models the measurement noise at the $k^{t h}$ pixel in the $j^{t h}$ spectral image, for $θ \in Θ$ , $j = 1, \dots, N_{c h}$ and $k = 1, \dots, N_{p}$ . Here we assume that $W_{k}^{j}$ is independent of θ for $θ \in Θ$ , $j = 1, \dots, N_{c h}$ and $k = 1, \dots, N_{p}$ . Then the probability density function of $I_{θ, k}^{j}$ is given by [25]

p_{θ, k}^{j} (z) = \frac{1}{\sqrt{2 π} σ_{k}^{j}} \sum_{l = 0}^{\infty} \frac{e^{- ν_{θ, k}^{j}} {(ν_{θ, k}^{j})}^{l}}{l!} e^{- \frac{1}{2} {(\frac{z - l - η_{k}^{j}}{σ_{k}^{j}})}^{2}},

for

z \in ℝ

,

θ \in Θ

,

k = 1, \dots, N_{p}

and

j = 1, \dots, N_{c h}

. From the above expression it immediately follows that assumption A2 is satisfied. Moreover, it can also be shown that p_θ satisfies the regularity conditions [26] thereby satisfying assumption A1. This data model is applicable to a fluorescence microscope configuration in which the images are acquired with either a CCD or a sCMOS camera. Here, in addition to the shot noise statistics the detected signal is also corrupted by the measurement noise of the detector.

Stochastic-Poisson + Gaussian data model: For completeness, we also consider another data model where in addition to shot noise statistics and measurement noise of the detector, the model takes into account stochastic signal amplification, which, for example, occurs in an electron multiplying CCDcamera. Specifically, the detected photons at each pixel that is Poisson distributed with mean $ν_{θ, k}^{j}$ is amplified by a random function M that is independent of θ and the amplified signal is further corrupted by additive Gaussian noise [27], for $θ \in Θ$ , $j = 1, \dots, N_{c h}$ and $k = 1, \dots, N_{p}$ . M is typically modeled as a branching process [28] with the initial particle count to be Poisson distributed (with mean $ν_{θ, k}^{j}$ ) and the individual offspring count to be a zero modified geometric distribution. For the above data model, the probability density function can be written as (see [27] for details)

\begin{array}{l} p_{θ, k}^{j} (z) = \frac{e^{- ν_{θ, k}^{j} \frac{A}{B}}}{\sqrt{2 π} σ_{k}^{j}} [e^{- {(\frac{z - η_{k}^{j}}{\sqrt{2} σ_{k}^{j}})}^{2}} + \sum_{l = 1}^{\infty} e^{- {(\frac{z - l - η_{k}^{j}}{\sqrt{2} σ_{k}^{j}})}^{2}} \times \\ \sum_{h = 0}^{l - 1} \frac{(\begin{matrix} (l - 1) \\ h \end{matrix}) C^{l - 1 - h} {(D ν_{θ, k}^{j})}^{h + 1}}{(h + 1)! B^{h + l + 1}}], z \in ℝ, \end{array}

where A, B, C and D are constants that are independent of θ for

θ \in Θ

,

j = 1, \dots, N_{c h}

and

k = 1, \dots, N_{p}

. From the above equation we see that assumption A2 is satisfied. Further, it can be shown that the above equation satisfies assumption A1 [27].

2.4. General expression for the Fisher information matrix

In this section, we derive a general expression of the Fisher information matrix for the output image cube θ using the image formation model described in Section 2.2. We also investigate conditions under which the Fisher information matrix $I (θ)$ is diagonal, as diagonality of $I (θ)$ has several implications. In the present context, since θ is a vector with $N_{p} \times N_{s}$ elements, the resulting Fisher information matrix will be a $(N_{p} \times N_{s}) \times (N_{p} \times N_{s})$ matrix, which can be very large. Thus a diagonal $I (θ)$ can render its calculation to be more tractable. In general for an n-dimensional parameter vector $θ = (θ_{1}, \dots, θ_{n}) \in Θ$ , if $I (θ)$ is diagonal, then the limit of the accuracy of θ_i is independent of other components of θ.

Here, the parameter vector is given by

\begin{array}{l} θ = (\underset{θ_{1}^{T}}{\underset{︸}{θ_{1, 1}, θ_{1, 2}, \dots, θ_{1, N_{s}}}}, \underset{θ_{2}^{T}}{\underset{︸}{θ_{2, 1}, θ_{2, 2}, \dots, θ_{2, N_{s}}}}, \dots, \\ \underset{θ_{N_{p}}^{T}}{\underset{︸}{θ_{N_{p}, 1}, θ_{N_{p}, 2}, \dots, θ_{N_{p}, N_{s}}}}), \end{array}

where

θ_{k} = {(θ_{k, 1}, θ_{k, 2}, \dots, θ_{k, N_{s}})}^{T}

denotes the unknown photon counts of the N_s different labels at the

k^{t h}

pixel in the output image cube for

k = 1 \dots, N_{p}

. Further we assume that for

j = 1, \dots, N_{c h}

and

k \neq m

and

k, m = 1, \dots, N_{p}

,

\frac{\partial ν_{θ, k}^{j}}{\partial θ_{m}} = 0

,

θ \in Θ

where 0 denotes a

1 \times N_{s}

row vector with all elements equal to zero. This relies on the prior assumption of spatial independence between the detected photons in different pixels (section 2.2). Its relevance will become evident in the next section where we consider an explicit relationship between

ν_{θ, k}

and the components of θ_k for the linear mixing model. As we will see, the above assumption results in the Fisher information matrix being block diagonal, where the diagonal entries pertain to the Fisher information matrix of θ_k for

k = 1, \dots, N_{p}

.

In the following theorem, we state two results. The first result is a general expression for the Fisher information matrix pertaining to the input image cube $I_{θ}$ , which is analogous to a previously published result [27, 29]. The second result investigates a condition for block diagonality of the Fisher information matrix.

Theorem 2.1. Let $Θ \in ℝ^{N_{p} \times N_{s}}$ denote the parameter space. For $θ \in Θ$ , let $I_{θ} = {I_{θ, k}^{j} | k = 1, \dots, N_{p}, j = 1, \dots, N_{c h}}$ denote the input image cube that is defined in eq. (1). Assume that conditions A1–A2 are satisfied (see Section 2.2).

1. For $θ \in Θ$ , the Fisher information matrix for the output image cube θ is given by

I (θ) = \sum_{k = 1}^{N_{p}} \sum_{j = 1}^{N_{c h}} α_{θ, k}^{j} {(\frac{\partial ν_{θ, k}^{j}}{\partial θ})}^{T} \frac{\partial ν_{θ, k}^{j}}{\partial θ}, θ \in Θ,

where for

θ \in Θ

,

j = 1, \dots, N_{c h}

and

k = 1, \dots, N_{p}

α_{θ, k}^{j} : = E [{(\frac{\partial l n (p_{θ, k}^{j} (z_{k}^{j}))}{\partial ν_{θ, k}^{j}})}^{2}]

and

z_{k}^{j}

denotes a realization of

I_{θ, k}^{j}

.

2. Assume that for $k \neq m$ , $k, m = 1, \dots, N_{p}$ and $j = 1, \dots, N_{c h}$ , $\frac{\partial ν_{θ, k}^{j}}{\partial θ_{m}} = 0$ , $θ \in Θ$ . Then the Fisher information matrix given in result 1 of this Theorem can be written as

I (θ) = D i a g [I_{1} (θ), I_{2} (θ), \dots, I_{N_{p}} (θ)], θ \in Θ,

where for

θ \in Θ

and

k = 1, \dots, N_{p}

I_{k} (θ) : = \sum_{j = 1}^{N_{c h}} α_{θ, k}^{j} {(\frac{\partial ν_{θ, k}^{j}}{\partial θ_{k}})}^{T} \frac{\partial ν_{θ, k}^{j}}{\partial θ_{k}} .

Proof:

See [27, 29] for proof.
Substituting $\frac{\partial ν_{θ, k}^{j}}{\partial θ_{m}} = 0$ for $θ \in Θ$ in result 1 of this Theorem and simplifying the result follows.

•

From result 2 of the above theorem we see that the block diagonal representation reduces the computational complexity to calculate the Fisher information matrix. Specifically, for $k = 1, \dots, N_{p}$ the $k^{t h}$ block matrix $I_{k} (θ)$ is a $N_{s} \times N_{s}$ square matrix, which is relatively straightforward to compute.

2.5. Non negativity constraint on θ

The results of the above theorem pertain to the unconstrained Fisher information matrix where the parameter space Θ is a subset of $ℝ^{N_{p} \times N_{s}}$ . This implies that the unknown parameter vector $θ \in Θ$ can take positive and negative values. In many spectral imaging applications including fluorescence microscopy, the unknown parameter vector is constrained to takes non-negative values. We note that the results of Theorem 2.1 will also hold for a parameter estimation problem with an inequality constraint on θ, which is of the form $G_{θ} \geq 0$ , $θ \in Θ$ where $G_{θ} : ℝ^{N_{p} \times N_{s}} \to ℝ^{N_{p} \times N_{s}}$ is a vector valued function that is continuously differentiable with respect to θ for $θ \in Θ$ (see [30]).

2.6. Linear mixing model and the mixing matrix

In the previous theorem we derived a general expression of the Fisher information matrix for the output image cube θ pertaining to a general hyperspectral imaging system. Here we next consider the case when the non negative function $ν_{θ, k}^{j}$ is a linear superposition of the components of the unknown parameter vector θ for $j = 1, \dots, N_{c h}$ and $k = 1, \dots, N_{p}$ . For $θ \in Θ$ , define $ν_{θ, k} : = {[ν_{θ, k}^{1}, ν_{θ, k}^{2}, \dots, ν_{θ, k}^{N_{c h}}]}^{T}$ , $k = 1, \dots, N_{p}$ . Then for the linear mixing model we have

ν_{θ, k} : = A θ_{k}, k = 1, \dots, N_{p}, θ \in Θ,

where A is a

N_{c h} \times N_{s}

dimensional matrix with elements

{a_{i j}; i = 1, \dots, N_{c h}; j = 1, \dots, N_{s}}

known as the mixing matrix and

θ_{k} = {[θ_{k, 1}, θ_{k, 2}, \dots, θ_{k, N_{s}}]}^{T}

,

k = 1, \dots, N_{p}

. The columns of A pertain to the different labels, the rows of A pertain to the different spectral channels, and the element a_ij denotes the contribution of the

j^{t h}

label in the

i^{t h}

channel. As we will see, the mixing matrix will play an important role in governing the behavior of the Fisher information matrix for the linear mixing model. We note that the linear mixing model is used to model hyperspectral data in many applications including fluorescence microscopy.

In the following Theorem we derive the Fisher information matrix for the linear mixing model. We also consider a special case where $ν_{θ, k} = θ_{k}$ for $θ \in Θ$ , $k = 1, \dots, N_{p}$ and $N_{c h} = N_{s}$ . This special case pertains to an ideal imaging configuration where the signal from a given label can be collected without being corrupted by signal from other labels at every pixel block in the input image cube (see Section 2.7).

Theorem 2.2. For $θ \in Θ$ , let $I (θ)$ denote the Fisher information matrix of the output image cube θ that is given in Theorem 2.1.

1. For $θ \in Θ$ and $k = 1, \dots, N_{p}$ let $ν_{θ, k}$ be given by Eq. (8). Then the Fisher information matrix of the output image cube θ for the linear mixing model is given by

I (θ) = D i a g [A^{T} G_{1} (θ) A, A^{T} G_{2} (θ) A, \dots, A^{T} G_{N_{p}} (θ) A], θ \in Θ,

where A denotes the mixing matrix,

G_{k} (θ) : = D i a g (α_{θ, k}^{1}, α_{θ, k}^{2}, \dots, α_{θ, k}^{N_{c h}}),

and

α_{θ, k}^{j}

is given by Eq. (6) for

j = 1, \dots, N_{c h}

,

k = 1, \dots, N_{p}

and

θ \in Θ

.

2. For $θ \in Θ$ and $k = 1, \dots, N_{p}$ , let $ν_{θ, k} = θ_{k}$ . Then the Fisher information matrix for the output image cube θ is given by

I_{b c} (θ) = D i a g [G_{1} (θ), G_{2} (θ), \dots, G_{N_{p}} (θ)], θ \in Θ,

where

G_{k} (θ)

is given by Eq. (9) for

θ \in Θ

and

k = 1, \dots, N_{p}

.

Proof:

1. By definition of $ν_{θ_{k}}$ , we have for $θ \in Θ$ , $j = 1, \dots, N_{c h}$ and $k = 1, \dots, N_{p}$ ,

ν_{θ, k}^{j} = \sum_{l = 1}^{N_{s}} a_{j l} θ_{k, l} = a_{j 1} θ_{k, 1} + a_{j 2} θ_{k, 2} + \dots + a_{j N_{s}} θ_{k, N_{s}} .

From the above equation, we have for $k \neq m$ and $k, m = 1, \dots, N_{p}$ ,

\frac{\partial ν_{θ, k}^{j}}{\partial θ_{m}} = [\frac{\partial ν_{θ, k}^{j}}{\partial θ_{m, 1}} \frac{\partial ν_{θ, k}^{j}}{\partial θ_{m, 2}} \dots \frac{\partial ν_{θ, k}^{j}}{\partial θ_{m, N_{s}}}] = 0, θ \in Θ,

where

j = 1, \dots, N_{c h}

, and 0 denotes a

1 \times N_{s}

vector with all elements equal to zero. Hence the Fisher information matrix will be given by result 2 of Theorem 2.1. To prove the desired result, it is sufficient if we show that

I_{k} (θ) = A^{T} G_{k} (θ) A

, where

I_{k} (θ)

is defined by Eq. (7). Substituting eq. 10 in Eq. (7), we have

I_{k} (θ) = \sum_{j = 1}^{N_{c h}} α_{θ, k}^{j} {(\frac{\partial ν_{θ, k}^{j}}{\partial θ_{k}})}^{T} \frac{\partial ν_{θ, k}^{j}}{\partial θ_{k}} = \sum_{j = 1}^{N_{c h}} α_{θ, k}^{j} (\begin{matrix} a_{j 1} \\ a_{j 2} \\ ⋮ \\ a_{j N_{s}} \end{matrix}) (a_{j 1} a_{j 2} \dots a_{j N_{s}})

= \sum_{j = 1}^{N_{c h}} α_{θ, k}^{j} (\begin{matrix} a_{j 1}^{2} & a_{j 1} a_{j 2} & \dots & a_{j 1} a_{j N_{s}} \\ a_{j 2} a_{j 1} & a_{j 2}^{2} & \dots & a_{j 2} a_{j N_{s}} \\ ⋮ & ⋮ & \dots & ⋮ \\ a_{j N_{s}} a_{j 1} & a_{j N_{s}} a_{j 2} & \dots & a_{j N_{s}}^{2} \end{matrix}) = A^{T} G_{k} (θ) A, θ \in Θ, k = 1, \dots, N_{p} .

2. Substituting $ν_{θ, k} = θ_{k}$ for $θ \in Θ$ and $k = 1, \dots, N_{p}$ in Eq. (7) and simplifying the result follows.

•

From result 1 of the above theorem we see that the Fisher information matrix for the linear mixing model depends on the mixing matrix A and the term $α_{θ, k}^{j}$ which is given by Eq. (6). Note that the analytical expression of $α_{θ, k}^{j}$ depends on the probability density function of the data model that describes the input image cube. In the next Corollary we derive analytical expression for three of the data models that we discussed in section 2.3.

2.7. Best case imaging scenario

From result 2 of Theorem 2.2, the Fisher information matrix $I_{b c} (θ)$ can be considered as a special case of result 1 with the mixing matrix being equal to a $N_{s} \times N_{s}$ identity matrix. As we will see here, this pertains to the best case imaging scenario in that it provides the best possible limit to the accuracy with which photons from a label can be estimated in the output image cube. To illustrate this, consider a simple fluorescence microscope imaging configuration with 2 spectral channels ( $N_{c h} = 2$ ) and two fluorescent labels (N_s = 2). Substituting this in Eq. (8), we have for $θ \in Θ$ ,

ν_{θ, k}^{1} = a_{11} θ_{k, 1} + a_{12} θ_{k, 2}, ν_{θ, k}^{2} = a_{21} θ_{k, 1} + a_{22} θ_{k, 2}, k = 1, \dots, N_{p}

where

ν_{θ, k}^{1}

and

ν_{θ, k}^{2}

denote the expected number of detected photons at the

k^{t h}

pixel in the input image cube pertaining to spectral channels 1 and 2, respectively. A logical choice in designing the pass bands of the spectral channels would be to choose a bandwidth that captures the peak of the emission spectra of the fluorescent labels. Then for spectral channel 1,

θ_{k, 1}

and

θ_{k, 2}

, which denote the expected number of detected photons from labels 1 and 2 at the

k^{t h}

pixel, can be considered as the signal of interest and background, respectively. Similarly, for spectral channel 2,

θ_{k, 2}

and

θ_{k, 1}

can be considered as the signal of interest and background, respectively. Hence the spectral unmixing problem becomes equivalent to estimating the signal of interest in each spectral channel in the presence of a background, i.e. the spectral bleed through from the other label(s). Thus the best case scenario can be realized when there is no background contribution, i.e.

a_{12} = a_{21} = 0

(no spectral bleed through) and all of the signal from the fluorescent labels are captured in their respective spectral channels, i.e.

a_{11} = a_{22} = 1

. This would reduce the mixing matrix to an identity matrix. We note that a hyperspectral imaging system with A = 1 may not be practically realizable, for example, if there is significant spectral overlap between the fluorescent labels. Nevertheless, this provides an important benchmark to assess the effect of spectral overlap on spectral unmixing accuracy.

Corollary 2.1. For $θ \in Θ$ , let $I (θ)$ denote the Fisher information matrix given by result 1 of Theorem 2.2, and for $k = 1, \dots, N_{p}$ let $ν_{θ, k}$ be given by Eq. (8).

1. Gaussian data model. For $θ \in Θ$ , $j = 1, \dots, N_{c h}$ and $k = 1, \dots, N_{p}$ , let $p_{θ, k}^{j}$ be given by Eq. (2). Then for $θ \in Θ$

I (θ) = D i a g [A^{T} G_{1}^{g} A, A^{T} G_{2}^{g} A, \dots, A^{T} G_{N_{p}}^{g} A],

where for

k = 1, \dots, N_{p}

and

θ \in Θ

G_{k}^{g} : = D i a g (\frac{1}{{(σ_{k}^{1})}^{2}}, \frac{1}{{(σ_{k}^{2})}^{2}}, \dots, \frac{1}{{(σ_{k}^{N_{c h}})}^{2}})

and

σ_{k}^{j}

denotes the standard deviation of Gaussian noise for

j = 1, \dots, N_{c h}

and

k = 1, \dots, N_{p}

.

2. Poisson data model. For $θ \in Θ$ , $j = 1, \dots, N_{c h}$ and $k = 1, \dots, N_{p}$ , let $p_{θ, k}^{j}$ be given by Eq. (3). Then for $θ \in Θ$

I (θ) = D i a g [A^{T} G_{1}^{p} (θ) A, A^{T} G_{2}^{p} (θ) A, \dots, A^{T} G_{N_{p}}^{p} (θ) A],

where for

k = 1, \dots, N_{p}

and

θ \in Θ

G_{k}^{p} (θ) : = D i a g (\frac{1}{ν_{θ, k}^{1}}, \frac{1}{ν_{θ, k}^{2}}, \dots, \frac{1}{ν_{θ, k}^{N_{c h}}}) .

3. Poisson + Gaussian data model. For $θ \in Θ$ , $j = 1, \dots, N_{c h}$ and $k = 1, \dots, N_{p}$ , let $p_{θ, k}^{j}$ be given by Eq. (4). Then

I (θ) = D i a g [A^{T} G_{1}^{p g} (θ) A, A^{T} G_{2}^{p g} (θ) A, \dots, A^{T} G_{N_{p}}^{p g} (θ) A] θ \in Θ,

where for

θ \in Θ

,

j = 1, \dots, N_{c h}

and

k = 1, \dots, N_{p}

G_{k}^{p g} (θ) : = D i a g (Γ_{θ, k}^{1}, Γ_{θ, k}^{2}, \dots, Γ_{θ, k}^{N_{c h}}),

Γ_{θ, k}^{j} : = \int_{ℝ} \frac{{(ζ_{θ, k}^{j} (z))}^{2}}{p_{θ, k}^{j} (z)} d z - 1,

p_{θ, k}^{j}

is given by Eq. (4) and

ζ_{θ, k}^{j} (z) : = \sum_{l = 1}^{\infty} \frac{{[ν_{θ, k}^{j}]}^{l - 1} e^{- ν_{θ, k}^{j}}}{(l - 1)!} \frac{1}{\sqrt{2 π} σ_{k}^{j}} e^{- \frac{1}{2} {(\frac{z - l - η_{k}^{j}}{σ_{k}^{j}})}^{2}}, z \in ℝ .

Proof: 1. Substituting for $p_{θ, k}^{j}$ (Eq. (2)) in Eq. (6), we have

\begin{matrix} α_{θ, k}^{j} = E [{(\frac{\partial ln (p_{θ, k}^{j} (z_{k}^{j}))}{\partial ν_{θ, k}^{j}})}^{2}] \\ = \frac{1}{{(σ_{k}^{j})}^{4}} E [{(z_{k}^{j})}^{2} - 2 z_{k}^{j} (η_{k}^{j} + ν_{θ, k}^{j}) + {(η_{k}^{j} + ν_{θ, k}^{j})}^{2}] = \frac{1}{{(σ_{k}^{j})}^{2}}, \end{matrix}

where

θ \in Θ

,

j = 1, \dots, N_{c h}

and

k = 1, \dots, N_{p}

. In deriving the above result we have made use of the fact that the random variables

I_{θ, k}^{j}

are mutually independent of each other,

E [(z_{k}^{j})^{2}] = Var (z_{k}^{j}) + {(E [z_{k}^{j}])}^{2}

and

E [z_{k}^{j}] = η_{k}^{j} + ν_{θ, k}^{j}

for

θ \in Θ

,

k = 1, \dots, N_{p}

and

j = 1, \dots, N_{c h}

. Substituting the above result in Eq. (9) the result follows.

2. For an independent Poisson random variable with mean $ν_{θ, k}^{j}$ , where $θ \in Θ$ , $j = 1, \dots, N_{c h}$ and $k = 1, \dots, N_{p}$ , we have

α_{θ, k}^{j} = E [{(\frac{\partial l n (p_{θ, k}^{j} (z_{k}^{j}))}{\partial ν_{θ, k}^{j}})}^{2}] = E [{(\frac{z_{k}^{j}}{ν_{θ, k}^{j}} - 1)}^{2}] = \frac{E [(z_{k}^{j})^{2}]}{{(ν_{θ, k}^{j})}^{2}} - 1 = \frac{1}{ν_{θ, k}^{j}},

where

θ \in Θ

,

j = 1, \dots, N_{c h}

and

k = 1, \dots, N_{p}

. Substituting the above expression in Eq. (9) the result follows.

3. Substituting Eq. (4) in Eq. (6) and simplifying (see [25] for details), we have

α_{θ, k}^{j} = Γ_{θ, k}^{j} = \int_{ℝ} \frac{{(ζ_{θ, k}^{j} (z))}^{2}}{p_{θ, k}^{j} (z)} d z - 1,

for

θ \in Θ

,

j = 1, \dots, N_{c h}

and

k = 1, \dots, N_{p}

. Substituting the above expression in Eq. (9) the result follows.

•

From the above Corollary we see that the Fisher information matrix for the Gaussian data model is independent of θ and depends only on the mixing matrix A and the variance ${(σ_{k}^{j})}^{2}$ of the Gaussian noise for $θ \in Θ$ , $j = 1, \dots, N_{c h}$ and $k = 1, \dots, N_{p}$ . This is in contrast to the Poisson and the Poisson + Gaussian data models where the Fisher information matrix depends on θ through $ν_{θ, k}^{j}$ that describes the expected number of detected photons at the $k^{t h}$ pixel in the $j^{t h}$ channel for $θ \in Θ$ , $j = 1, \dots, N_{c h}$ and $k = 1, \dots, N_{p}$ . An implication of this result is that if the underlying data model is indeed Gaussian, then the Fisher information matrix of the output image cube is solely governed by the optical configuration of the imaging system and is independent of the photon budget. We note that the above result for the Gaussian data model is consistent with a previously published result [19].

From result 2 of Corollary 2.1, we see that the Fisher information matrix for the Poisson data model shows an inverse dependence on the expected photon count. Consequently, the inverse Fisher information matrix will show a linear dependence on the expected photon count. According to the Cramer-Rao inequality, for any unbiased estimator $\hat{θ}$ of an n-dimensional vector parameter θ, we have $V a r ({\hat{θ}}_{i}) \geq {[I^{- 1} (θ)]}_{i i},$ where ${[I^{- 1} (θ)]}_{i i}$ denotes the Cramer-Rao lowerbound of ${\hat{θ}}_{i}$ for, $i = 1, ..., n$ . An implication of this result is that the Cramer-Rao lower bound of the photon count estimate for the labels will increase with increasing value of the expected photon count. Since the square root of Cramer-Rao lower bound provides a lower bound to the standard deviation of the photon count estimate, we will use the ratio of the square root of the Cramer-Rao lower bound of photon count to the expected photon count as a performance measure for the spectral unmixing problem (See Definition 4.2 in Results section).

2.8. Channel addition

An important question that arises in the design of hyperspectral imaging systems concerns the number of spectral channels that is required to capture the signal from the label of interest in order achieve optimal spectral unmixing. Specifically, given an input image cube with N_ch spectral channels the question arises as to whether adding an extra spectral channel will provide any benefit for spectral unmixing. In the next Corollary we show that for the linear mixing model with N_s labels, the Fisher information matrix for the output image cube pertaining to a $N_{c h} + 1$ channel hyperspectral imaging system is greater than that of the Fisher information matrix pertaining to a N_ch channel hyperspectral imaging system.

Corollary 2.2. For $θ \in Θ$ , let $I (θ | N_{c h})$ denote the Fisher information matrix given by result 1 of Theorem 2.2 pertaining to an input image cube with N_ch spectral channels. Then for $θ \in Θ$ , $I (θ | N_{c h} + 1) - I (θ | N_{c h})$ is positive semi-definite for $θ \in Θ$ .

Proof: By definition of $I (θ | N_{c h})$ , it is sufficient to show that $I_{k} (θ | N_{c h} + 1) \geq I_{k} (θ | N_{c h})$ for $θ \in Θ$ and $k = 1, \dots, N_{p}$ , where $I_{k} (θ | N_{c h}) = A^{T} G_{k} (θ) A$ for $θ \in Θ$ , A is a $N_{c h} \times N_{s}$ matrix and $G_{k} (θ)$ is a $N_{c h} \times N_{c h}$ diagonal matrix, for $k = 1, \dots, N_{p}$ that is given by Eq. (9). In the case of a $N_{c h} + 1$ channel hyperspectral imaging system, we have for $θ \in Θ$ and $k = 1, \dots, N_{p}$

I_{k} (θ | N_{c h} + 1) = {\tilde{A}}^{T} {\tilde{G}}_{k} (θ) \tilde{A},

where

\tilde{A}

is

(N_{c h} + 1) \times N_{s}

matrix and

{\tilde{G}}_{k} (θ)

is

(N_{c h} + 1) \times (N_{c h} + 1)

diagonal matrix, for

k = 1, \dots, N_{p}

. Since the number of labels is the same, we rearrange the mixing matrix

\tilde{A}

such that the addition of an extra spectral channel results in the addition of a row at the bottom of this matrix. Hence

\tilde{A}

can be written as

\tilde{A} = (\begin{matrix} A \\ R \end{matrix})

where R is a

1 \times N_{s}

vector that pertains to the

(N_{c h} + 1)

th channel. Similarly, the matrix

{\tilde{G}}_{k} (θ)

can be written as

{\tilde{G}}_{k} (θ) = (\begin{matrix} G_{k} (θ) & 0 \\ 0^{T} & α_{θ, k}^{N_{c h} + 1} \end{matrix}), θ \in Θ, k = 1, \dots, N_{p},

where 0 denotes a

N_{c h} \times 1

zero vector and

α_{θ, k}^{N_{c h} + 1}

is a scalar term given by Eq. (6) and pertains to the

(N_{c h} + 1)

th diagonal entry of

{\tilde{G}}_{k} (θ)

. Substituting the above expressions of

{\tilde{G}}_{k} (θ)

and

\tilde{A}

in Eq. (11), we have

\begin{matrix} I_{k} (θ | N_{c h} + 1) = (A^{T} R^{T}) (\begin{matrix} G_{k} (θ) & 0 \\ 0^{T} & α_{θ, k}^{N_{c h} + 1} \end{matrix}) (\begin{matrix} A \\ R \end{matrix}) \\ = (A^{T} R^{T}) (\begin{matrix} G_{k} (θ) A \\ α_{θ, k}^{N_{c h} + 1} R \end{matrix}) = I_{k} (θ | N_{c h}) + α_{θ, k}^{N_{c h} + 1} R^{T} R, \end{matrix}

where

θ \in Θ

and

k = 1, \dots, N_{p}

. By definition,

α_{θ, k}^{N_{c h} + 1} \geq 0

for

θ, \in Θ

and

k = 1, \dots, N_{p}

and

R^{T} R \geq 0

. From this the result follows.

•

In deriving the above result, we made no specific assumptions about the underlying data model that describes the input image cube. Therefore the above result is applicable to a wide variety of imaging configurations for the linear mixing model. An immediate implication of the above result is that the limit of the accuracy of photon count for a given label improves by adding extra spectral channels to detect the signal from that label.

2.9. Channel splitting and photon partitioning

In the previous section, we saw how adding an extra channel to a hyperspectral imaging system can improve the Fisher information matrix. In many practical situations, channel addition may not be feasible since the overall spectral bandwidth to collect the signal from a given label cannot be increased. In such cases, a question then arises as to whether spectral subsampling within the passband would be beneficial. Specifically, if we have a hyperspectral imaging system with N_ch spectral channels and if one of the spectral channels are split into two channels of smaller spectral bandwidth, then under what conditions is channel splitting beneficial? In the next corollary we state a result that shows channel splitting is beneficial for the Poisson data model.

Corollary 2.3. For $θ \in Θ$ , let $I (θ | N_{c h})$ denote that Fisher information matrix given by result 2 of Theorem 2.1 where N_ch denotes the number of spectral channels. Assume that for $j = 1, \dots, N_{c h}$ there exists a partition function $0 < γ_{θ}^{j} < 1$ with $\frac{\partial γ_{θ}^{j}}{\partial θ} \neq 0$ for $θ \in Θ$ and $j = 1, \dots, N_{c h}$ such that the addition of an extra channel splits the expected photon counts into two fractions given by $γ_{θ}^{j} ν_{θ, k}^{j}$ and $(1 - γ_{θ}^{j}) ν_{θ, k}^{j}$ , where $ν_{θ, k}^{j}$ denotes the expected photon count of the $k^{t h}$ pixel in the $j^{t h}$ spectral image, for $θ \in Θ$ , $j = 1, \dots, N_{c h}$ and $k = 1, \dots, N_{p}$ . If we consider a Poisson data model, then

I (θ | N_{c h} + 1) > I (θ | N_{c h}), θ \in Θ .

Proof: See ref. [20] for proof. $•$

Note that unlike Corollary 2.2 which used the analytical expression for the Fisher information matrix given in Theorem 2.2, the above corollary used a more general expression for the Fisher information matrix given by result 2 of Theorem 2.1. The reason for this is due to the dependence of θ on the partition function $γ_{θ}^{j}$ for $j = 1, \dots, N_{c h}$ and $θ \in Θ$ . As we will show in Section 4.5, the above result is only true for the Poisson data model. In fact for the Poisson + Gaussian data model we will show that the limit of the accuracy of the photon count exhibits complex behavior with channel splitting and eventually deteriorates when increasing the number of partitioned channels.

3. Methods

3.1. Computing the Fisher information matrix and the mixing matrix A

The Fisher information matrix was calculated using the FandPLimitTool [31] which is a MATLAB based software package that contains an extensive suite of tools to calculate the Fisher information matrix for a wide variety of parameter estimation problems in fluorescence microscopy. For the linear spectral unmixing problem, the Fisher information matrix depends on the mixing matrix A which needs to be computed for illustrating the results. We note that in a practical situation, A can be experimentally determined by imaging single color control samples under identical imaging conditions to that of the sample that contains N_s different labels. Here we present a simplified approach to theoretically calculate A by using relevant reference data such as the the excitation and emission spectra of fluorescent labels, the spectral emissivity of the fluorescent light source and passbands of the excitation and emission filters. The elements of the mixing matrix A can be written as

a_{i j} = ξ_{e x} (i, j) ξ_{e m} (i, j), i = 1, \dots, N_{c h}, j = 1, \dots, N_{s},

where for

i = 1, \dots, N_{c h}

and

j = 1, \dots, N_{s}

ξ_{e x} (i, j) : = \frac{b (i, j)}{m a x (b (1, j), b (2, j), \dots, b (N_{c h}, j))},

b (i, j) : = \int_{λ_{i, m i n}^{e x}}^{λ_{i, m a x}^{e x}} t_{j, e x} (λ) E_{L S} (λ) d λ,

ξ_{e m} (i, j) : = \int_{λ_{i, m i n}^{e m}}^{λ_{i, m a x}^{e m}} t_{j, e m} (λ) t_{o p t} (λ) t_{c a m} (λ) d λ .

In the above equations, $[λ_{i, m i n}^{e x}, λ_{i . m a x}^{e x}]$ and $[λ_{i, m i n}^{e m}, λ_{i . m a x}^{e m}]$ denote the excitation and emission passbands, respectively, corresponding to the $i^{t h}$ channel, $t_{j, e x}$ and $t_{j, e m}$ denote the fluorescence excitation and emission spectra, respectively, of the $j^{t h}$ dye, $j = 1, \dots, N_{s}$ , E_LS denotes the spectral emissivity of the fluorescent light source, $t_{o p t}$ denotes the spectral transmitivityof the optical components (e.g., objective and tube lens) in the emission light path, and $t_{c a m}$ denotes the spectral sensitivity of the detector. For simplicity, we set $t_{o p t} (λ) = 1$ and $t_{c a m} (λ) = 1$ , and also assume that the transmission efficiency of the excitation and emission filters is equal to 1 in the passband and is equal to zero outside the passband. The excitation and emission spectra of the fluorescent labels and the spectral emissivity of the fluorescent light source were obtained from the Pubspectra database [32] and were modified such that their sum equals 1. The above integrals were numerically evaluated using the Trapezoidal rule.

3.2. Hyperspectral data simulation

To simulate hyperspectral data, we use the above approach to compute the mixing matrix A and use Eq. (8) to compute the expected photon counts of the fluorescent labels $ν_{θ, k}$ at the k $^{t h}$ pixel in the input image cube. For the Poisson + Gaussian data model, we first create a Poisson realization of $ν_{θ, k}$ and then add Gaussian noise with mean $η_{k}^{j} = 0$ e $^{-}$ /pixel and standard deviation $σ_{w, k}^{j} = 8$ e $^{-}$ /pixel to the Poisson data.

4. Results

In this section, we illustrate the results obtained in the previous section concerning the linear mixing model. We first define a few entities that we will use throughout this section.

Definition 4.1. For $θ \in Θ$ , $i = 1, \dots, N_{s}$ and $k = 1, \dots, N_{p}$ , the Linear Unmixing Performance (LUP) bound for the i $^{t h}$ label at the $k^{t h}$ pixel in the output image cube is defined as $δ_{k, i} = \sqrt{{[I_{k}^{- 1} (θ)]}_{i i}}$ , $θ \in Θ$ , and the LUP bound for the best case scenario is defined as $δ_{k, i}^{b c} = \sqrt{{[G_{k}^{- 1} (θ)]}_{i i}}$ , where $I_{k} (θ) = A^{T} G_{k} (θ) A$ , A denotes the mixing matrix and $G_{k} (θ)$ is given by 9, for $θ \in Θ$ and $k = 1, \dots, N_{p}$ .

Definition 4.2. For $θ \in Θ$ , $i = 1, \dots, N_{s}$ and $k = 1, \dots, N_{p}$ , the normalized LUP (nLUP) bound for the i $^{t h}$ label at the $k^{t h}$ pixel in the output image cube is defined as $\frac{δ_{k, i}}{θ_{k, i}}$ , and the nLUP bound for the best case scenario is defined as $\frac{δ_{k, i}^{b c}}{θ_{k, i}}$ , where $δ_{k, i}$ and $δ_{k, i}^{b c}$ are defined in Definition 4.1.

4.1. Spectral resolution for a pair of fluorophores

An important question that arises in hyperspectral imaging applications concerns with how accurately two spectrally overlapping fluorescent labels can be discerned from the acquired data. Here we address this problem by considering a pair of fluorescent labels whose emission maxima are separated by a distance d_s (see Fig. 2(A)). We assume an imaging configuration where the sample is illuminated (sequentially or simultaneously) to excite the labels such that the excitation passband for a given label overlaps with its corresponding excitation maxima. The fluorescence signal is then detected (sequentially or simultaneously) in two distinct spectral channels, where the emission passband of each spectral channel covers the corresponding emission maxima of a particular label. We then ask the question how decreasing the spectral distance d_s affects the spectral resolution of the two fluorophores. For the current discussion, we consider the fluorescent labels with traditional Stokes shift fluorescence signal [33] where the peak of the emission spectra for the label occurs at a longer wavelength (lower energy) than the peak of its corresponding excitation spectra.

Figures 2(B) and 2(C) show the nLUP bounds for labels 1 and 2, respectively, for different data models as a function of the spectral distance d_s. The figures also show the nLUP bounds for labels 1 and 2 for the best case scenario, which pertains to the ideal imaging configuration wherein the signal collected from a given label is not corrupted by photons from other labels present in the sample. Here, to simulate the decrease in d_s, we shift the excitation and emission spectra of label 2 and also the corresponding passbands of the excitation and emission filters towards label 1 while leaving the excitation and emission spectra for label 1 unchanged.

By definition, the nLUP bound is a ratio of the LUP bound to the expected photon count for a given label (Definition 4.2). A large numerical value of the nLUP bound predicts a relatively high level of error in estimating the photon count at a given pixel while a small numerical value predicts a relatively low level of error in estimating the photon count at a given pixel. For example, if the nLUP bound = 0.01, then by definition we have the LUP bound to be equal to 0.01 × expected photon count. This implies that the best possible accuracy with which the photon count is estimated can be no smaller than 1% of the true photon count value. On the other hand, if the nLUP bound = 1, then this implies that the possible accuracy with which the photon count is estimated can be no smaller than 100 % of the true photon count value.

From the figures we see that when the labels are spectrally well separated the nLUP bound is numerically close to the nLUP bound for the best case scenario. This is expected since there is no spectral crosstalk between the excitation or emission spectra of the two labels. Consistent with this, we see that the mixing matrix A is diagonal (see Fig. 2(C)). As the spectral distance decreases, the excitation/emission spectra of the labels start to overlap and the mixing matrix is no longer diagonal. Consequently, the numerical value of the nLUP bound becomes bigger and starts to deviate from the nLUP bound for the best case scenario. When the spectral distance further decreases, the nLUP bound continues to deteriorate and we see that the numerical values of the off-diagonal terms of the A matrix, which accounts for the spectral cross talk between the labels, is comparable to the diagonal terms. Note that the nLUP bound for the Poisson + Gaussian data model is consistently higher than that for the Poisson data model, since in the former model the presence of additive Gaussian noise corrupts the detected photon counts at each pixel in the input image cube which leads to higher uncertainty in estimating the photon counts at a given pixel.

We note that the difference between the nLUP bound and the nLUP bound for the best case scenario is referred to as the multiplexing disadvantage [15] which accounts for the deterioration in the limit of the accuracy of photon count from the best case scenario when we consider the effect of detecting unwanted photons from other labels present in the sample. As we will see in rest of this section, the nLUP bound for the best case scenario will provide an important benchmark for studying the performance limits of hyperspectral imaging systems. In our present example the shape of the emission spectra for labels 1 and 2 are dissimilar, and therefore even for a very small spectral distance of 1 nm, the nLUP bound remains finite for both labels.

Fig. 2 The effect of changing the spectral distance on spectral resolution. Panel A shows the normalized excitation (red lines) and emission spectra (blue lines) of two fluorescent labels that are spectrally separated by a distance d_s. The panel also shows the spectral emissivity of a metal hallide broadband light source that is used to excite the two labels (black dotted line). Here, for both labels we consider the traditional Stokes shift for their fluorescence emission spectra. Panels B and C show the nLUP bound for labels 1 and 2 at a pixel, respectively, for the Poisson and the Poisson + Gaussian data models and also for the best case scenario pertaining to the Poisson data model. Panel C also shows the mixing matrix pertaining to different values of d_s. In panels B and C, the expected photon count for both labels is set to be 500 photons per pixel and for the Poisson + Gaussian data model we consider the mean and standard deviation of the Gaussian noise to be 0 e⁻/pixel and 8 e⁻/pixel, respectively.

Download Full Size | PDF

4.2. Improving the spectral resolution by using anti-Stokes shift fluorescence

In the previous section we saw that with traditional fluorescent labels which exhibit Stokes shift fluorescence, there is a limit to how spectrally close the two labels can be and still be accurately spectrally unmixed. In this section, we show that by replacing one of the labels with a fluorophore that exhibits anti-Stokes shift fluorescence we can overcome the spectral resolution limit. Figure 3 shows the excitation and emission spectra for such a pair of labels, where label 1 is a traditional fluorescent label as in Fig. 2 and label 2 exhibits anti-Stokes shift fluorescence [33] wherein the peak of its emission spectra occurs at a shorter wavelength (higher energy) relative to the peak of its excitation spectra. It should be pointed out that at present there is significant interest in developing such probes especially for biological applications [34, 36]. For example, upconverting nanoparticles are one such a class of fluorescent labels that typically exhibit strong absorption in the near infrared wavelength range (700 - 1000 nm) and emit fluorescence signal in the visible range (400 - 650 nm) [35, 36].

In Fig. 3 we have simulated the excitation and emission spectra for a hypothetical label that exhibits anti-Stokes shift fluorescence. We consider an imaging configuration where the labels are sequentially illuminated over two excitation passbands that cover their corresponding excitation maxima, and the fluorescence signal is sequentially collected over emission passbands that cover their emission maximas. To simulate the decrease in spectral distance, we shift the excitation and emission spectra of label 2 and their corresponding excitation and emission passbands towards label 1, and keep the excitation and emission spectra of label 1 fixed.

Figures 3(B) and 3(C) show the nLUP bounds for labels 1 and 2, respectively, pertaining to the Poisson and Poisson + Gaussian data models. The figures also show the nLUP bound for the best case scenario pertaining to the Poisson data model. From the figures we see that the nLUP bound is almost constant for all values of the spectral distance. Specifically, the mixing matrix A remains diagonal irrespective of the spectral distance of separation between the two labels. This can be attributed to anti-Stokes shift fluorescence which significantly eliminates the overlap of the excitation spectra between labels 1 and 2 when the spectral distance decreases. Therefore for the above imaging configuration there is very limited spectral cross talk between the two labels and consequently the nLUP bound remains constant and is consistently close to the nLUP bound for the best case scenario. Note that similar to Fig. 2, the nLUP bound for the Poisson+Gaussian data model is consistently higher than the nLUP bound for the Poisson data model.

FIg. 3 Improving the spectral resolution by using anti-Stokes shift fluorescences. Panel A shows the normalized excitation (red lines) and emission spectra (blue lines) of two fluorescent labels that are spectrally separated by a distance d_s. The panel also shows the spectral emissivity of a metal hallide broadband light source that is used to excite the two labels (black dotted line). Here, label 1 is the same as that shown in Fig. 2, while label 2 is a fluorophore with anti-Stokes shift fluorescence emission, where the peak of its emission spectra is at a shorter wavelength (higher energy) than the peak of its excitation spectra. Panels B and C show the nLUP bounds for labels 1 and 2, respectively, at a pixel for the Poisson and the Poisson + Gaussian data models and also for the best case scenario pertaining to the Poisson data model. Panel C also shows the mixing matrix pertaining to different values of d_s. The numerical values used to generate the above plots are identical to those used in Fig. 2.

Download Full Size | PDF

4.3. Effect of photon count

We next investigate how changing the expected photon count impacts the linear unmixing performance bound. Here, we consider two fluorescent labels Cy3 and Cy3.5 which have significant overlapping excitation and emission spectra (Figs. 4(A) and 4(B)). We assume an imaging configuration in which the fluorophores are sequentially excited at distinct excitation passbands and the corresponding fluorescence signal is then sequentially collected at the indicated emission passbands.

Figures 4(C) and 4(D) show the nLUP bound for the Poisson and the Poisson + Gaussian data models along with the nLUP bound for the best case scenario pertaining to the Poisson data model. We see that as the expected photon count /pixel increases, the numerical value of the nLUP bound becomes smaller and approaches the nLUP bound for the best case scenario. Note that the nLUP bound for the Poisson + Gaussian data model is consistently higher than that of the Poisson data model, and the difference starts to diminish as the expected photon count per pixel increases. Taken together, these results imply that increasing the photon/light budget would result in an overall improvement in the performance bound.

Fig. 4 Effect of photon count on the nLUP bound. Panels A and B show the normalized excitation (red lines) and emission (blue lines) spectra for Cy3.5 and Cy3 fluorescent labels, respectively, along with the spectral emissivity of a metal hallide light source (black dotted line). The panels also show the corresponding excitation and emission passbands for each fluorescent label. Panels C and D show the behavior of the nLUP bound for Cy3.5 and Cy3, respectively, at a single pixel as a function of the expected photon count for the Poisson and the Poisson + Gaussian data models. As reference, the panel also shows the nLUP bound for the best case scenario pertaining to the Poisson data model. Here, we assume the expected photon count to be the same for both fluorophores and for the Poisson + Gaussian data model, the mean and standard deviation of the readout noise is set to be 0 e⁻/pixel and 8e⁻/pixel, respectively.

Download Full Size | PDF

4.4. Channel addition

We next investigate the effect of adding extra spectral channels to collect the signal from a label on the spectral unmixing accuracy. Here again we consider Cy3 and Cy3.5 labels and calculate the LUP bound that is defined in Definition 4.1. Unlike the results given in Figs. 2–4, here we compute the LUP bound since the expected photon count is fixed for the fluorescent labels. We assume an imaging configuration such that the fluorescence signal from each label is sequentially collected in one or more spectral channels (i.e. different emission passbands) by repetitively exciting the sample at the same excitation passband. For example, for the Cy3.5 fluorophore, consider a 6 channel configuration with emission passbands in the range of 590 - 610 nm, 610 - 630 nm, 630 - 650 nm, 650 - 670 nm, 670 - 690 nm and 690 - 710 nm (see Fig. 5(A)). Here the sample will be sequentially excited 6 times at the excitation passband of 542 - 582 nm and each time the fluorescence signal will be collected in a different emission passband pertaining to Cy3.5. Note that in this configuration the number of detected photons from the fluorescent label will increase with increasing number of channels assuming that there is negligible photobleaching effect which typically diminishes the fluorescence intensity of thelabel with repeated excitation. We note that such an imaging configuration has been implemented, for example, using specialized emission filters that are placed before the imaging detector in the light path in which the passband of the filter can be modified either electronically [9] or optomechanically by changing the relative angle of incidence of light on the emission filter [7].

Fig. 5 Effect of channel addition on the LUP bound. Panels A and B show the normalized excitation (red lines) and emission (blue lines) spectra for Cy3.5 and Cy3 fluorescent labels, respectively, along with the spectral emissivity of a metal hallide light source (black dotted line). The panels also show the excitation passband and the emission passbands pertaining to the different spectral channels. Panel C shows the behavior of the LUP bound for Cy3.5 at a single pixel as a function of the number of channels after channel addition for different data models. Panel D shows the same for Cy3. In Panels C and D, the expected photon count is set to be 3000 for both labels, and the numerical values of the mean and standard deviation of the Gaussian noise component are identical to those used in Fig. 4.

Download Full Size | PDF

Figures 5(C) and 5(D) show the LUP bound for Cy3.5 and Cy3 labels, respectively, as a function of the number of spectral channels for two different data models. From the figure we see that for both the Poisson and the Poisson + Gaussian data models the LUP bound improves by increasing the number of channels. This is consistent with Corollary 2.2, where we showed that the Fisher information matrix for an output image cube pertaining to an $N_{c h} + 1$ channel hyperspectral system is greater than or equal to that of the Fisher information matrix for an N_ch channel hyperspectral imaging system. Note that for both labels, the LUP bound for the Poisson + Gaussian data model is consistently higher than that of the Poisson data model, which is consistent with the behavior that we observed in Fig. 4. We also observe that the addition of spectral channels beyond a certain wavelength range is not beneficial as we see that the LUP bound starts to plateau out.

4.5. Channel splitting

In the previous section we investigated how the addition of spectral channels impacts the LUP bound for a given label. In many situations, the addition of spectral channels is not feasible. More specifically the emission passband for a given fluorescent label is typically fixed. In such cases a question arises as to whether splitting the available passband into narrower spectral channels will render any benefit. Here we investigate this question for Cy3 and Cy3.5 labels. For each fluorescent label, we consider a partitioning scheme wherein we begin with a single emission passband and then partition the passband into 2, 4, 8, 13, 20 and 40 smaller bands. For example, for Cy3.5 we start with the emission passband of 590 nm - 630 nm (see Fig. 4), which we split into two narrower channels, i.e. 590 nm - 610 nm and 610 nm - 630 nm, then into 4 narrower channels, i.e. 590 nm - 600 nm, 600 nm - 610 nm, 610 nm - 620 nm and 620 nm - 630 nm, and so on. A similar partitioning scheme is also adopted for Cy3 where we start with the emission passband of 560 - 600 nm.

We consider an imaging configuration where the labels are sequentially excited at a specific excitation passband followed by simultaneous detection of the emitted photons across the different spectral channels for that label. Unlike the imaging configuration that we considered in Section 4.4, here the total number of photons collected from a given label remains the same as we increase the number of partitioned channels. We note that the above imaging configuration can be realized in a spectral confocal microscope or in a hyperspectral line scanning microscope that is equipped with a dispersive optical element such as a prism or a monochromator, which spectrally disperses the incident light along a line and the dispersed signal is then captured on a linear imaging detector.

Fig. 6 The effect of channel splitting on the LUP bound. Panel A shows the behavior of the LUP bound for Cy3.5 at a single pixel as a function of the number of channels after channel splitting for the Poisson and the Poisson + Gaussian data models. Panel B shows the same for Cy3. In Panels A and B, the expected photon count is set to be 3000 for both labels, and for the Poisson + Gaussian data model we assume the mean of the Gaussian noise to be 0 e⁻/pixel and different values of standard deviation as indicated in the legend in panel A.

Download Full Size | PDF

Figures 6(A) and 6(B), show the LUP bound for Cy3.5 and Cy3 labels at a given pixel as a function of the number of partitioned channels for different data models. For the Poisson data model we see that channel splitting improves the LUP bound for both labels. Specifically, the LUP bound decreases with increasing number of channels after partitioning. This is consistent with Corollary 2.3, where we showed that for the Poisson data model the Fisher information matrix for a hyperspectral imaging system with $N_{c h} + 1$ partitioned channels is greater than that with N_ch partitioned channels.

Figures 6(A) and 6(B) also show the LUP bound for the Poisson + Gaussian data model for different readout noise levels. Here we assume that the readout noise remains the same as the number of partitioned channels increases. Unlike the Poisson data model, we see that the LUP bound first improves but then progressively deteriorates with increasing number of partitioned channels. Specifically, when the number of channels initially increases, this provides additional spectral information which in turn results in an improvement of the LUP bound. However, as the number of partitioned channels increases, the passband for each spectral channel becomes narrower and in turn the expected photon count per spectral channel decreases while the readout noise remains the same.

Note that the extent of this deterioration is proportional to the standard deviation of the readout noise. For example for Cy3.5 label, if the standard deviation of the readout noise is 8 e $^{-}$ /pixel, then the LUP bound varies from 160 photons to 210 photons when the number of channels after partitioning increases from 2 to 20. On the other hand, if the standard deviation of the readout noise is 2 e $^{-}$ /pixel, then the LUP bound varies from 149 photons to 152 photons when the number of channels after partitioning increases from 2 to 20. We note that a similar result has been reported for the localization accuracy problem in fluorescence microscopy, where the limit of the accuracy of the X/Y location coordinate also shows an analogous behavior as a function of pixel size of the imaging detector for the Poisson and the Poisson + Gaussian data models [37].

5. Assessment of spectral unmixing algorithms

In the previous sections we investigated the behavior of the linear unmixing performance bound for different experimental configurations and imaging conditions. A fundamental question remains whether there exists an unbiased estimator than will attain the performance bound. To address this question, we consider two algorithms for spectral unmixing namely the least squares (LS) estimator and the maximum likelihood (ML) estimator, and evaluate their performance on simulated hyperspectral imaging data. Here we use the unconstrained least squares estimator which does not impose any non-negativity constraint on the estimated result. Recall from Theorem 2.2 that the block diagonality of the Fisher information matrix implies that the accuracy with which the photon counts are estimated at the k $^{t h}$ pixel in the output image cube is independent of the accuracy of photon count estimates at the m $^{t h}$ pixel when $k \neq m$ . Hence for the current discussion it is sufficient to evaluate the performance of the spectral unmixing algorithms for hyperspectral data simulated at a single pixel. For $k = 1, \dots, N_{p}$ , let $z_{k} : = {[z_{k}^{1} z_{k}^{2} \dots z_{k}^{N_{c h}}]}^{T}$ , denote the detected photon counts at the k $^{t h}$ pixel block in the input image cube. The LS estimator of θ_k can be written as

{\hat{θ}}_{k} (L S) : = {(A^{T} A)}^{- 1} A^{T} z_{k}, k = 1, \dots, N_{p},

where A denotes the mixing matrix. The ML estimator can be written as

{\hat{θ}}_{k} (M L) = \max_{θ \in Θ} (\sum_{j = 1}^{N_{c h}} l n (p_{θ, k}^{j} (z_{k}^{j}))), k = 1, \dots, N_{p},

where

p_{θ, k}^{j}

denotes the probability density function of the photon counts detected at the

k^{t h}

pixel in the

j^{t h}

spectral channel, for

j = 1, \dots, N_{c h}

and

k = 1, \dots, N_{p}

.

From the definition it is straightforward to see that for the Poisson data model (Eq. (3)) the LS estimator is unbiased, since

\begin{matrix} E [{\hat{θ}}_{k} (L S)] = {(A^{T} A)}^{- 1} A^{T} E [z_{k}] = {(A^{T} A)}^{- 1} A^{T} ν_{θ, k} \\ = {(A^{T} A)}^{- 1} (A^{T} A) θ_{k} = θ_{k}, θ \in Θ, k = 1, \dots, N_{p}, \end{matrix}

where we have used Eq. (8). However, if the hyperspectral data follows either the Gaussian data model (Eq. (2)) or the Poisson + Gaussian data model (Eq. (4)), then we have

\begin{matrix} E [{\hat{θ}}_{k} (L S)] = {(A^{T} A)}^{- 1} A^{T} E [z_{k}] = {(A^{T} A)}^{- 1} A^{T} (ν_{θ, k} + η_{k}) \\ = {(A^{T} A)}^{- 1} (A^{T} (A θ_{k} + η_{k}) = θ_{k} + {(A^{T} A)}^{- 1} A^{T} η_{k}, \end{matrix}

where

η_{k} : = {[η_{k}^{1} η_{k}^{2} \dots η_{k}^{N_{c h}}]}^{T}

denotes the mean of the additive Gaussian noise at the

k^{t h}

pixel block for

k = 1, \dots, N_{p}

. From the above equation we see that the LS estimator will be unbiased only when the mean of the Gaussian noise component

η_{k}^{j}

is equal to zero for

j = 1, \dots, N_{c h}

and

k = 1, \dots, N_{p}

.

Fig. 7 Performance of spectral unmixing algorithms on hyperspectral data. The figure shows the behavior of the LS and ML estimators on simulated hyperspectral data. Panels A and B show the % of relative error of the estimators for Cy3.5 and Cy3 labels,respectively, for different expected photon counts. For both algorithms, we do not impose any non-negativity constraint for the photon counts during estimation. Here we assume that the expected photon count is the same for both fluorescence labels. Panels C and D show the bias of the estimators for Cy3.5 amd Cy3, respectively, for different expected photon counts.

Download Full Size | PDF

We consider the hyperspectral imaging configuration illustrated in Fig. 5 where the fluorescent labels Cy3 and Cy3.5 are sequentially excited and their fluorescence signal is detected across 6 distinct spectral channels. We assume that the expected photon count for each label is the same and simulate the hyperspectral data as described in Section 3.2. For each value of expected photon count, we create 10,000 realizations of the data and estimate the photon counts associated with each fluorescent label by using the different spectral unmixing algorithms. For the ML estimator, the starting value of the photon count estimate was obtained by applying the LS estimator to the data and no non-negativity constraint was imposed on both estimators. To assess the performance of the unmixing algorithms, we calculate the bias and the relative error which are given by

\begin{array}{l} Bias for j^{t h} label = mean ({\hat{θ}}_{j}) - θ_{j}, \\ % of relative error for j^{t h} label = 100 \times \frac{\frac{std ({\hat{θ}}_{j})}{mean ({\hat{θ}}_{j})} - \frac{δ_{j}}{θ_{j}}}{\frac{δ_{j}}{θ_{j}}}, \end{array}

where

j = 1, 2

and δ_j denotes the LUP bound of θ_j (see Definition 4.2).

Figure 7 shows the performance of the LS and ML estimators for the hyperspectral data generated for different values of the expected photon counts. From the figure, we see that the performance of LS and ML estimators are relative close to the nLUP bound in that the % of relative error for both labels are within 8% of their corresponding nLUP bounds. Note that the performance of the ML estimator is consistently better than that of the LS estimator for a range of expected photon count values, since the % of relative error for the ML estimator is lower than that for the LS estimator. This is expected since we used the Poisson + Gaussian data model to simulate the hyperspectral data and used the appropriate probability density function to estimate the photon counts in the ML estimator. The figure also shows the bias for the LS and ML estimators. By definition the LS estimator is unbiased and consistent with this we see that the bias estimates of the LS estimator for both labels are equally distributed between positive and negative values. In particular, the magnitude of the bias estimate is within ± 1 photon from the true value of the photon count. An almost identical behavior is observed for the ML estimator which suggests that the ML estimator is also unbiased for the imaging conditions tested here.

6. Conclusion

A general stochastic model for hyperspectral imaging data was presented and analytical expressions for the Fisher information matrix was derived. The model is based on relatively broad assumptions about the underlying probability density function that describes the hyperspectral data and allows for a wide variety of imaging conditions. As an application, we considered the linear mixing model and showed that the Fisher information matrix becomes block diagonal. Using the Cramer-Rao inequality, we introduced a linear unmixing performance bound and showed how this can be used to predict the spectral resolution limit for two spectrally overlapping fluorescent labels. We also showed how the spectral resolution limitcan be surpassed in a standard microscope configuration by using the phenomenon of anti-Stokes shift fluorescence. Further, we investigated the effects of channel addition and channel splitting on the behavior of the performance bound and illustrated them through concrete examples. Finally, we evaluated the performance of the least squares and maximum likelihood estimators for spectral unmixing, and studied their bias and variance behaviors at different photon/light budgets. In conclusion we note that the results and analysis presented here extend prior studies by providing a comprehensive framework to analyze the performance limits of a wide variety of hyperspectral imaging systems and spectral unmixing algorithms. Moreover, the information-theoretic analysis presented here has wider implications, for example, in D-optimum experimental design [38] and variational inference problems that make use of the Fisher information matrix [39].

Disclosures

The author declares that there are no conflicts of interest related to this article.

References

1. M. E. Dickinson, G. Bearman, S. Tille, R. Lansford, and S. E. Fraser, “Multispectral imaging and linear unmixing add a whole new dimension to laser scanning fluorescence microscopy,” Biotechniques 31, 1272–1278 (2001). [CrossRef]

2. T. J. Fountaine, S. M. Wincovitch, D. H. Geho, S. H. Garfield, and S. Pittaluga, “Multispectral imaging of clinically relevant cellular targets in tonsil and lymphoid tissue using semiconductor quantum dots,” Mod. Pathol. 19, 1181–1191 (2006). [CrossRef] [PubMed]

3. F. Cutrale, V. Trivedi, L. A. Trinh, C. Chiu, J. M. Choi, M. S. Artiga, and S. E. Fraser, “Hyperspectral phasor analysis enables multiplexed 5D in vivo imaging,” Nat. Methods 14, 149–152 (2017). [CrossRef] [PubMed]

4. P. J. Cutler, M. D. Malik, S. Liu, J. M. Byars, D. S. Lidke, and K. A. Lidke, “Multicolor quantum dot tracking using a high-speed hyperspectral line-scanning microscope,” PLoS ONE 8, e64320 (2013). [CrossRef]

5. Y. Garini, I. T. Young, and G. McNamara, “Spectral imaging: principles and applications,” Cytom. Part A 69A, 735–747 (2006). [CrossRef]

6. T. Zimmermann, J. Marrison, K. Hogg, and P. O’Toole, “Clearing up the signal: spectral imaging and linear unmixing in fluorescence microscopy,” Methods Mol. Biol. 1075, 129–148 (2014). [CrossRef]

7. P. Favreau, C. Hernandez, A. S. Lindsey, D. F. Alvarez, T. Rich, P. Prabhat, and S. J. Leavesley, “Thin-film tunable filters for hyperspectral fluorescence microscopy,” J. Biomed. Opt. 19, 011017, 1–11 (2014).

8. P. Favreau, C. Hernandez, T. Heaster, D. F. Alvarez, T. Rich, P. Prabhat, and S. J. Leavesley, “Excitation scanning hyperspectral-imaging microscope,” J. Biomed. Opt. 19, 046010, 1–11 (2014). [CrossRef]

9. N. Gat, “Imaging spectroscopy using tunable filters: a review,” Proc. SPIE 4056, 50–64 (2000). [CrossRef]

10. E. Stack, C. Wang, K. A. Roman, and C. C. Hoyt, “Multiplexed immunohistochemistry, imaging, and quantitation: A review, with an assessment of tyramide signal amplification, multispectral imaging and multiplex analysis,” Methods 70, 46–58 (2014). [CrossRef] [PubMed]

11. D. C. Heinz and C.-I. Chang, “Fully constrained least squares linear spectral mixture analysis method for material quantification in hyperspectral imagery,” IEEE Transactions on Geosci. Remote. Sens. 39, 529–545 (2001). [CrossRef]

12. E. Chouzenoux, M. Legendre, S. Moussaoui, and J. Idier, “Fast constrained least squares spectral unmixing using primal-dual interior-point optimization,” IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 7, 59–69 (2013). [CrossRef]

13. J. R. Mansfield, C. Hoyt, and R. M. Levenson, “Visualization of microscopy-based spectral imaging data from multi-label tissue sections,” Curr. Protoc. Mol. Biol. 84, 14 (2008).

14. S. E. Bialkowski, “Overcoming the multiplex disadvantage by using maximum-likelihood inversion,” Appl. Spectrosc. 52, 591–598 (1998). [CrossRef]

15. D. R. Fuhrmann, C. Preza, J. A. O’Sullivan, and D. L. Snyder, “Spectrum estimation from quantum limited interferograms,” IEEE Transactions on signal processing 52, 950–961 (2004). [CrossRef]

16. F. Fereidouni, A. N. Bader, and H. C. Gerritsen, “Spectral phasor analysis allows rapid and reliable unmixing of fluorescence microscopy spectral images,” Opt. Express 20, 12729 (2012). [CrossRef] [PubMed]

17. C. R. Rao, Linear statistical inference and its applications.(Wiley, 1965).

18. R. Neher and E. Neher, “Optimizing imaging parameters for the separation of multiple labels in a fluorescence image,” J. Microsc. 213, 46–62 (2004). [CrossRef]

19. M. Cubero-Castan, J. Chanussot, X. Briottet, M. Shimoni, and V. Achard, “An unmixing-based method for the analysis of thermal hyperspectral images,” in IEEE International Conference on Acoustics, Speech and Signal Processing, (IEEE, 2014), pp. 7859–7863.

20. A. Esposito, M. Popleteeva, and A. R. venkitaraman, “Maximizing the biochemical resolving power of fluorescence microscopy,” PLoS ONE 8, e77392 (2013). [CrossRef] [PubMed]

21. J. R. Janesick, Scientific charge-coupled devices.(SPIE Press, 2000).

22. B. Saleh, Photoelectron statistics(Springer, 1978). [CrossRef]

23. D. L. Snyder and M. I. Miller, Random point processes in time and space.(Springer, 1991), 2nd ed. [CrossRef]

24. J. B. Pawley, Handbook of biological confocal microscopy(Springer, 2006), 3rd ed. [CrossRef]

25. S. Ram, E. S. Ward, and R. J. Ober, “A stochastic analysis of performance limits for optical microscopes,” Multidimens. Syst. Signal Process. 17, 27–58 (2006). [CrossRef]

26. S. Ram, “Resolution and localization in single molecule microscopy,” Ph.D. thesis, University of Texas at Arlington/University of Texas Southwestern Medical Center at Dallas (2007).

27. J. Chao, E. S. Ward, and R. J. Ober, “Fisher information matrix for branching processes with application to electron-multiplying charge-coupled devices,” Multidimens. Syst. Signal Process. 23, 349–379 (2012). [CrossRef] [PubMed]

28. M. Kimmel and D. E. Axelrod, Branching processes in biology.(Springer, 2002). [CrossRef]

29. J. Chao, S. Ram, E. S. Ward, and R. J. Ober, “Ultrahigh accuracy imaging modality for super-localization microscopy,” Nat. Methods 10, 335–338 (2013). [CrossRef] [PubMed]

30. J. D. Gorman and A. O. Hero, “Lower bounds for parametric estimation with constraints,” IEEE Transactions on Inf. Theorey 26, 1285–1301 (1990). [CrossRef]

31. A. V. Abraham, S. Ram, J. Chao, E. S. Ward, and R. J. Ober, “Quantitative study of single molecule location estimation techniques,” Opt. Express 17, 23352–23373 (2009). [CrossRef]

32. G. McNamara, A. Gupta, J. Reynaert, T. D. Coates, and C. Boswell, “Spectral imaging microscopy web sites and data,” Cytom. Part A 5, 121–132 (2006).

33. J. Lakowicz, Principles of fluorescence spectroscopy(Springer, 2006). [CrossRef]

34. Y. Li, G. Dong, M. Peng, L. Wondraczek, and J. Qiu,“Anti-Stokes Fluorescent Probe with Incoherent Excitation,” Sci. Rep. 4, 4059 (2014). [CrossRef]

35. A. Gnach and A. Bednarkiewicz, “Lanthanide-doped up-converting nanoparticles: merits and challenges,” Nano Today 47, 532–563 (2012). [CrossRef]

36. S. Wilhelm, “Perspectives for upconverting nanoparticles,” ACS Nano , 2017,10644–10653 (2017). [CrossRef] [PubMed]

37. R. J. Ober, S. Ram, and E. S. Ward, “Localization accuracy in single molecule microscopy,” Biophys. J. 86, 1185–1200 (2004). [CrossRef] [PubMed]

38. A. Atkinson, A. Donev, and R. Tobias, Optimum Experimental Design (Oxford University Press, 2007).

39. C. A. Taylan, C. Fevotte, and S. J. Godsill, “Variational and stochastic inference for Bayesian source separation,” Digit. Signal Process. 17, 891–913 (2007). [CrossRef]

Information theoretic analysis of hyperspectral imaging systems with applications to fluorescence microscopy

Abstract

1. Introduction

2. Theory

2.1. Problem formulation

2.2. Image formation model

2.3. Specific data models

2.4. General expression for the Fisher information matrix

2.5. Non negativity constraint on θ

2.6. Linear mixing model and the mixing matrix

2.7. Best case imaging scenario

2.8. Channel addition

2.9. Channel splitting and photon partitioning

3. Methods

3.1. Computing the Fisher information matrix and the mixing matrix A

3.2. Hyperspectral data simulation

4. Results

4.1. Spectral resolution for a pair of fluorophores

4.2. Improving the spectral resolution by using anti-Stokes shift fluorescence

4.3. Effect of photon count

4.4. Channel addition

4.5. Channel splitting

5. Assessment of spectral unmixing algorithms

6. Conclusion

Disclosures

References

Cited By

Figures (7)

Equations (42)

Biomedical Optics Express