## Abstract

A probability model for a 3-layer radiative transfer model (foreground layer, cloud layer, background layer, and an external source at the end of line of sight) has been developed. The 3-layer model is fundamentally important as the primary physical model in passive infrared remote sensing. The probability model is described by the Johnson family of distributions that are used as a fit for theoretically computed moments of the radiative transfer model. From the Johnson family we use the S* _{U}* distribution that can address a wide range of skewness and kurtosis values (in addition to addressing the first two moments, mean and variance). In the limit, S

*can also describe lognormal and normal distributions. With the probability model one can evaluate the potential for detecting a target (vapor cloud layer), the probability of observing thermal contrast, and evaluate performance (receiver operating characteristics curves) in clutter-noise limited scenarios. This is (to our knowledge) the first probability model for the 3-layer remote sensing geometry that treats all parameters as random variables and includes higher-order statistics.*

_{U}©2012 Optical Society of America

## Corrections

Avishai Ben-David and Charles E. Davidson, "Probability theory for 3-layer remote sensing radiative transfer model: errata," Opt. Express**21**, 11852-11852 (2013)

https://www.osapublishing.org/oe/abstract.cfm?uri=oe-21-10-11852

## 1. Introduction

The radiative transfer model with simple 3-layer geometry has been the primary physical model for long wave infrared (LWIR) or “thermal infrared” passive remote sensing for many years [1–7]. This model has been used for detecting vapor and aerosol targets in the atmosphere in conjunction with detection algorithms such as matched filters and anomaly detectors. A probability density function (*pdf*) model for the radiative transfer geometry is of great interest and importance for predicting the stochastic nature of the measured signals, evaluating performance, and for developing detection algorithms.

The traditional three-layer remote sensing model is based on the deterministic radiative transfer equation [8,9] where radiative transfer parameters are non-fluctuating. In 1991, Stephens et al. [10] gave a review of efforts to include stochastic components into the radiative transfer equation, and proposed a method for determining *pdf*s for the optical properties of stratiform clouds. Stephens’s model was based on the two-stream solution [8,9] to the radiative transfer equation for a single layer, where the optical depth and single scattering albedo were treated as random variables with empirically measured *pdf*s (the asymmetry parameter was fixed). Using transformation of variables and numerical techniques, Stephens obtained output distributions for the albedo, diffuse transmission, and absorption of the cloud. Other work [11,12] has incorporated fluctuations by characterizing a few radiative transfer parameters by a mean and variance (second-order statistics) while holding others fixed; these papers do not address the underlying probability distribution of the predicted spectral radiance. The goal of our work is to predict the *pdf* of spectral radiance not only due to a fluctuating vapor cloud, but also due to fluctuations in the atmosphere in front and behind the cloud, where all parameters are stochastic and we employ higher order statistics.

In the literature there is a seemingly related area of research regarding radiative transfer through random or stochastic media [13–15], sometimes referred to as *stochastic radiative transfer*. However, as Byrne states on page 386 of [14], “The common term ‘stochastic radiative transfer’ is, strictly, a misnomer. The situation we discuss is the problem of finding the *average* (emphasis added) radiative properties of an ‘ensemble’ (a generally infinite set) of problems, by means other than actually solving each problem and averaging results”. Thus, so-called stochastic radiative transfer does not provide information regarding the distribution of spectral radiance. In our paper when we use the term *stochastic radiative transfer* or *stochastic model*, we do not intend to refer to ensemble averages, but to complete probability distribution information.

In this work we develop a stochastic 3-layer radiative transfer model that consists of a foreground layer between the sensor and the target, a non-scattering target layer (e.g., a vapor cloud), a background layer at the far end of the target, and an external radiation source (e.g., a mountain, building, etc.) at the end of the line of sight (LOS). The external radiation source can also be taken as atmospheric radiation, in which case it can be thought of as a fourth layer. Each layer is characterized by a transmittance (which attenuates the radiation from previous layers) and serves as a source of radiation (due to its intrinsic temperature). Our main objective is to derive a *pdf* for the at-sensor spectral radiance (i.e., the spectral radiance—hereafter just *radiance*—input to a hypothetical sensor) due to fluctuations in transmittance and the radiation produced by the layers. These fluctuations contribute to variance in sensor measurements that is typically referred to as “clutter” (variation from the scene that potentially interferes with the task of detecting a particular signal of interest). Technological advancements have led to very low sensor noise, and typical remote sensing applications are limited by clutter noise. Thus, since it is of primary importance to understand the stochastic nature of the physical environment, and less important to model sensor noise, we neglect modeling sensor characteristics in this paper. At-sensor radiance statistics, when combined with specific sensor models and detection algorithms, can produce accurate end-to-end performance predictions. Predictions derived from the at-sensor radiance only (as in this paper) may be interpreted as upper bounds on observed performance from an actual sensor.

We focus mainly on modeling each of the three layers as homogeneously composed layers at thermal equilibrium (the target layer is always modeled this way), in which case the layer radiance is due to self emission, and the stochastic nature of the radiance is due to fluctuations in temperature and density (which in turn causes the layer’s transmittance and emissivity to be stochastic). This type of model is likely applicable to near-horizon LOS (e.g., *LOS*_{1} in Fig. 1
), but not up- or down-looking LOS (e.g., *LOS*_{2} in Fig. 1) since the vertical stratification of the atmosphere makes it less likely for a layer to be homogeneously composed (local thermal equilibrium is always assumed). Therefore, we also allow the layer radiance to be taken with an assumed *pdf* intended to capture all sources of fluctuations in a layer without assuming homogeneity. We will show that a lognormal distribution is a reasonable distribution to assume for the radiance from a layer (as well as for the transmission). In fact, normal and lognormal distributions play a central role in our model due to the central limit theorem and the role these distributions play in sampling from additive and multiplicative processes [16–21].

To derive the *pdf* of the at-sensor radiance, we allow any or all of the radiative transfer model inputs to become random variables (RVs), assume a particular distribution and population parameters for the variates in each layer (transmission and temperature), and assume that the layers are independently distributed. Then, the *pdf* of the at-sensor radiance is determined by finding population parameters for a distribution that matches the first four moments (mean, variance, skewness, and kurtosis, denoted *E*, *V*, *S*, and *K*, respectively) computed analytically through the 3-layer radiative transfer model equation. A *pdf* that matches *all* moments is the correct (true) analytical distribution, however, in practice one does not have access to reliable higher moments, and furthermore, fitting routines that include higher moments are extremely complex. In our statistical model we make use of the first 4 moments, only; in detection theory the sought-after information (the presence of the target) resides in the tail of the distribution [22–24], hence the importance to describe a *pdf* with arbitrary skewness and kurtosis values that may be used as an indicator for detection [25].

There are several systems of distributions designed to exactly match any combination of values for the first four moments, including Pearson distributions, Burr distributions, and systems based on logistic and Laplace distributions, respectively [18,26]. In our work we chose the Johnson system [18,26–28] for simplicity (it includes only four distributions), convenience (only the Johnson S* _{U}* distribution is typically needed for our model), and the central role the lognormal distribution plays in the system. The ability of the Johnson S

*distribution to model the at-sensor radiance will be established by comparing to stochastic simulations. The S*

_{U}*distribution is also useful to model intermediate terms in the model (such as the radiance incident on the back of the cloud due to the source and background layers), although sometimes less well.*

_{U}Our statistical model is a “forward model”. We do not address the general question of how to choose and determine the statistical properties of the radiative transfer variates (transmissions and temperatures) for a given physical scenario, nor do we address the “inverse problem” of estimating statistical properties of the radiative transfer variates from radiance measurements. These are both challenging problems deserving further research.

Our statistical model is for a single wavelength (i.e., the radiance is a scalar) hence, we develop a univariate *pdf* model. We reserve the extension from univariate to multivariate model (where radiance is specified at multiple wavelengths—and thus is a vector quantity—as is required for multi- or hyperspectral applications) for a future publication. Our model can be extended from 3 layers to *n* layers (though computing analytical moments for *n* layers is very cumbersome).

The remainder of the paper is organized as follows. In section 2 we present the radiative transfer model. In section 3 we present the details of the *pdf* model. In section 4 we detail the simulations that are used to demonstrate and investigate our statistical model. In the simulations we use turbulence theory to set the population parameters for the layer temperatures. In section 5 we show results of simulations and we discuss the performance of our model. We conclude with a summary. We include four appendices to aid the reader with different technical aspects of our presentation. Definitions and properties of the lognormal and Johnson S* _{U}* distributions (which play a key role in the stochastic model) appear in Appendices A and B, respectively. Moments and properties of the expectation operator appear in Appendices C and D.

## 2. Radiative transfer model

We use a standard layer model for passive LWIR remote sensing, using the following notation (for simplicity we omit the wavelength dependence of all variables in the model). Let *L _{f}*,

*L*,

_{b}*L*denote radiances originating from the foreground, background, and source layers, respectively; let

_{s}*t*,

_{f}*t*,

_{c}*t*denote transmissions of the foreground, cloud (target), and background layers, and let

_{b}*B*(

*T*) denote the Plank blackbody radiance at the cloud temperature,

_{c}*T*

_{c}. Cloud transmission may also be written in terms of optical depth, ${\tau}_{c}$, as ${t}_{c}=\mathrm{exp}(-{\tau}_{c})$, and due to local thermal equilibrium and Kirchoff’s law, the emissivity of the (non-scattering) cloud layer is ${\epsilon}_{c}=1-{t}_{c}$.

Using terminology from detection theory [29]—where *H*_{0} denotes the null hypothesis (target cloud is absent) and *H*_{1} is the alternative hypothesis (target is present)—the total at-sensor radiance given by the model in the presence of the target is denoted *M*|*H*_{1} or *M*^{(1)} and is given by

The total at-sensor radiance in the absence of the cloud, denoted *M*|*H*_{0} or *M*^{(0)}, can be obtained from Eq. (1) by setting *t _{c}* to 1, resulting in

Equations (1-2) are good for any LOS. However, for *LOS*_{1} or for when foreground and background layers are well-approximated as homogeneous layers at thermal equilibrium, the foreground and background layer radiances may be written as

The terms $1-{t}_{f}={\epsilon}_{f}$ and $1-{t}_{b}={\epsilon}_{b}$ are emissivities for the foreground and background layers, respectively (in the LWIR most of the attenuation is due to molecular absorption, and therefore scattering may oftentimes be neglected). In many cases one assumes ${T}_{b}={T}_{f}$. Figure 1 diagrams the radiative transfer model and physical parameters for each layer for both horizontal and slant-path lines of sight (*LOS*_{1} and *LOS*_{2}, respectively).

It is also convenient to define at-sensor radiance terms corresponding to each layer, i.e., layer-radiances after attenuation by subsequent layers, as follows: ${M}_{f}={L}_{f}$, ${M}_{c}={t}_{f}(1-{t}_{c})B({T}_{c})$, ${M}_{b}^{\left(1\right)}={t}_{f}{t}_{c}{L}_{b}$, ${M}_{s}^{\left(1\right)}={t}_{f}{t}_{c}{t}_{b}{L}_{s}$, ${M}_{b}^{\left(0\right)}={t}_{f}{L}_{b}$, and ${M}_{s}^{\left(0\right)}={t}_{f}{t}_{b}{L}_{s}$. Parenthetical superscript notation is necessary for *M _{b}* and

*M*to differentiate between

_{s}*H*

_{0}and

*H*

_{1}since the presence of the cloud affects them (whereas

*M*is unaffected by presence of the cloud and

_{f}*M*only exists under

_{c}*H*

_{1}, and thus does not need additional notation). Using these quantites, the total at sensor radiances in Eqs. (1-2) are just the sum of the appropriate individual at-sensor radiance terms: ${M}^{\left(1\right)}={M}_{f}+{M}_{c}^{(1)}+{M}_{b}^{(1)}+{M}_{s}^{(1)}$ and ${M}^{\left(0\right)}={M}_{f}+{M}_{b}^{(0)}+{M}_{s}^{(0)}$, respectively.

The information about the vapor cloud’s presence, Δ*M*, can be seen by subtracting Eq. (2) from Eq. (1) to obtain

*T*, known as thermal contrast [1,4], is the difference between the input radiance on the back of the cloud,

*L*, and the blackbody radiance of the cloud. If the thermal contrast is negative, $\Delta T<0$, the cloud is observed in emission: the cloud is “warmer” than the input radiance and adds photons to the LOS ($\Delta M>0$). If the thermal contrast is positive, $\Delta T>0$, the cloud is observed in absorption: the cloud is “colder” than the input radiance and removes photons from the LOS ($\Delta M<0$). A non-zero value of Δ

_{in}*T*is required for vapor detection, and thus is the “engine” behind thermal infrared remote sensing. It is sometimes convenient to present the thermal contrast in temperature units (Kelvin) aswhere ${B}^{-1}(\cdot )$ is the inverse of the blackbody function.

It is essential to note that Eqs. (1-4) describe the traditional deterministic model (henceforth called the *physical model*): they predict the value of the at-sensor radiance given constant (non-fluctuating) values of the physical parameters. The goal of this paper is to extend the physical model by allowing any (or all) of the model inputs to be RVs; the result is that model outputs (*M*^{(0)}, *M*^{(1)}, and Δ*M*, as well as intermediate quantities such as Δ*T*, *L _{in}*, etc.) are themselves RVs and characterized by appropriate

*T*to be less than zero is one—i.e., $\mathrm{Pr}\left(\Delta T<0\right)=1$—may appear in the measurements in the reverse mode with $\mathrm{Pr}\left(\Delta M<0\right)>0$. Only with a probability model can one study these phenomena, which we demonstrate and discuss further in Section 5.

## 3. Probability model

In this section we describe how initial distributions for the input RVs are chosen (in subsection 3.1), and detail how the *pdf*s of the output RVs are determined (subsection 3.2). Input RVs are assumed independent (this includes RVs from different layers, e.g., *T _{f}* and

*T*, as well as RVs within a single layer, e.g.,

_{c}*t*and

_{c}*T*). In general this choice is reasonable, however, in adiabatic conditions air temperature and density are not independent, causing correlation between temperature and transmission (proportional to density). This dependence could, in principle, be handled using the ideal gas law.

_{c}*H*_{0} and *H*_{1} radiances along a fixed LOS are (necessarily) measured at different times, and therefore we assume that all the population parameters in the radiative transfer model remain unchanged (stationary) during the measurement period and that the sampling is independent. For example, we assume that there is no drift in the population parameters or distribution of the foreground temperature, hence, the mean and variance (and all higher moments) for ${T}_{f}$ are fixed. In a recent paper [30] it was shown that for a short period ~3-5 s temperature drift is small and the stationary assumption for temperature is reasonable.

#### 3.1 Distribution of input terms

### 3.1.1 Temperature – Planck blackbody function

The Planck blackbody function$B(T)$is given by

where, for a given wavelength $\lambda $ in microns and radiance units*W*/

*cm*

^{2}/

*sr*/μ

*m*, ${k}_{1}=1.1910\text{\hspace{0.17em}}\times {10}^{4}{\lambda}^{-5}$ and ${k}_{2}=1.4388\text{\hspace{0.17em}}\times {10}^{4}{\lambda}^{-1}$; for

*λ*in wavenumbers (

*cm*

^{−1}) and radiance units

*W*/

*cm*

^{2}/

*sr*/

*cm*

^{−1}, ${k}_{1}=1.1910\text{\hspace{0.17em}}\times {10}^{-12}{\lambda}^{3}$ and ${k}_{2}=1.4388\text{\hspace{0.17em}}\lambda $.

Assume that the temperature, *T*, is a normally distributed random variable (a commonly made assumption, supported by the central limit theorem) with mean ${\mu}_{T}$ and variance ${\sigma}_{T}^{2}$, i.e., $T~N({\mu}_{T},{\sigma}_{T}^{2})$. Deriving the exact *pdf* of *B*(*T*) via transformation of variables is straight-forward, however exact moments of the distribution (required in our model) are not computable analytically. Instead, an extremely good approximation is that *B*(*T*) is lognormally distributed, $LN({\mu}_{B},{\sigma}_{B}^{2})$ (for which all moments are easily computed), which is shown as follows.

Typical terrestrial values of *μ _{T}* are on the order of 300 K with variances at least an order of magnitude smaller. Under these conditions, 1/

*T*is approximately linear (e.g., the points $1/({\mu}_{T}\pm 3\sigma )$ are close to equidistant from the median $1/{\mu}_{T}$ and therefore the mean is close to $1/{\mu}_{T}$). Therefore, a first-order Taylor expansion of $1/T$ around ${T}_{0}={\mu}_{T}$, giving $1/T\approx {\mu}_{T}^{-1}-{\mu}_{T}^{-2}\left(T-{\mu}_{T}\right)$, leads to the excellent approximation $1/T~N({\mu}_{T}^{-1},{\mu}_{T}^{-4}{\sigma}_{T}^{2})$. It follows (Appendix A) that $\mathrm{exp}\left({k}_{2}/T\right)~LN\left({k}_{2}{\mu}_{T}^{-1},{k}_{2}^{2}{\mu}_{T}^{-4}{\sigma}_{T}^{2}\right)$.

Were it not for the “-1” term in the denominator of (6), the inverse and multiplicative properties of the lognormal distribution (see Appendix A) could be used to show that $B(T)~LN\left(\mathrm{ln}{k}_{1}-{k}_{2}{\mu}_{T}^{-1},{k}_{2}^{2}{\mu}_{T}^{-4}{\sigma}_{T}^{2}\right)$. Although it is tempting to simply ignore the “minus one” term—on the grounds that typically $\mathrm{exp}\left({k}_{2}/T\right)>>1$ in the LWIR—it creates a bias (shift) in the distribution that while small, is significant compared to the variance of the distribution. A better method is to approximate the denominator as a standard 2-parameter lognormal distribution. That is, let $x=\mathrm{exp}({k}_{2}/T)-1$ be the denominator of (6). The goal is to find population parameters *μ _{x}* and ${\sigma}_{x}^{2}$ for a lognormal distribution that matches the mean and variance of

*x*without neglecting the minus one. The mean and variance of

*x*are computed (see Appendix A) as $E(x)=\mathrm{exp}\left({k}_{2}{\mu}_{T}^{-1}+0.5{k}_{2}^{2}{\mu}_{T}^{-4}{\sigma}_{T}^{2}\right)-1$ and $V(x)=\left[\mathrm{exp}\left({k}_{2}^{2}{\mu}_{T}^{-4}{\sigma}_{T}^{2}\right)-1\right]E{(x)}^{2}$, respectively, and population parameters are computed with Eq. (A2) from Appendix A. Finally, the inverse and multiplicative properties of the lognormal distribution can now be used to show that the blackbody radiance is lognormally distributed, $B(T)~LN\left({\mu}_{B},{\sigma}_{B}^{2}\right)$, where ${\mu}_{B}=\mathrm{ln}{k}_{1}-{\mu}_{x}$ and ${\sigma}_{B}^{2}={\sigma}_{x}^{2}$. The fit is so good that

*B*(

*T*) may essentially be considered an “input term” to the model.

### 3.1.2 Transmission

We choose to represent the transmission of a layer in the model as a lognormally-distributed variate, $t~LN({\mu}_{t},{\sigma}_{t}^{2})$. Partially this is due to the convenience of working with lognormal distributions. More importantly, however, is a physical argument based on the statistics of multiplicative processes.

Consider the transmission of monochromatic radiation through a general layer composed of *j* sub-layers. The total transmission will be a product of the sub-layer transmissions, $t={\displaystyle \prod {}_{i=1}^{j}{t}_{i}}$. Each sub-layer transmission is a non-negative random variable, $0\le {t}_{i}\le 1$. By the central limit theorem for multiplicative processes (see property (x) of Appendix A) *t* approaches a lognormal distribution, $t~LN({\mu}_{t},{\sigma}_{t}^{2})$. Note that values for *μ _{t}* and ${\sigma}_{t}^{2}$ for distributions generated in this manner will naturally enforce the constraint that the probability for $t>1$ must be zero, i.e., $\mathrm{Pr}\left(t>1\right)=0$. Note also that due to the multiplicative property of the lognormal distribution, if the transmissions of the sub-layers are themselves lognormally distributed, ${t}_{i}~LN\left({\mu}_{i},{\sigma}_{i}^{2}\right)$, then $t~LN({\mu}_{t},{\sigma}_{t}^{2})$ with ${\mu}_{t}={\displaystyle \sum {\mu}_{i}}$ and ${\sigma}_{t}^{2}={\displaystyle \sum {\sigma}_{i}^{2}}$.

The above central-limit argument is likely very appropriate for foreground and background layer transmissions, since these tend to be due to the ambient atmosphere along a path that may be long (e.g., a background layer for LOS to space). The transmission of the cloud is qualitatively different than that of the foreground or background layers in the sense that the cloud is localized and (in our remote sensing applications) it is usually due to an external source (e.g., a man-made cloud). An additional justification for the cloud transmission *t _{c}* to be lognormally distributed is based on the assumption of Gaussian statistics for concentration that is commonly used in plume dispersion models [31, Ch. 18]. Since the optical depth is proportional to concentration, this implies ${t}_{c}=\mathrm{exp}(-{\tau}_{c})=LN(-{\mu}_{\tau c},{\sigma}_{\tau c}^{2})$, where

*μ*and ${\sigma}_{\tau c}^{2}$ are the mean and variance of the normally-distributed optical depth.

_{τc}As already mentioned, a real (i.e., physical) transmission is always bounded $0\le t\le 1$. An arbitrary lognormal distribution will always be positive but may not have a non-zero probability for $t>1$. Because the user may choose population parameters for the layer transmissions where $\mathrm{Pr}\left(t>1\right)>0$, we actually represent all transmissions in the model as *truncated* lognormal distributions that enforce an upper bound at 1 (if the probability that $t>1$ is negligible—e.g., $0\le {\mu}_{t}-\eta {\sigma}_{t}$ where *η* is a factor large enough, for example $\eta =4$, to ensure a negligible probability for the normally-distributed optical depth to be negative—then truncation will have no effect). Moments of the truncated lognormal distribution are given in Appendix A, property (ix).

### 3.1.3 Radiance

We assume that the ambient atmospheric radiance $L~LN({\mu}_{L},{\sigma}_{L}^{2})$ is lognormally distributed (e.g., foreground and background layers for a slant-path LOS—*LOS*_{2} in Fig. 1—where the layers are modeled generically without assuming homogeneity). This choice is partially motivated by the fact that radiance values must be non-negative (the lognormal distribution naturally enforces this constraint) and that the lognormal distribution is easy to work with. More importantly, however, is that there is a physical argument to be made that is based on radiative transfer theory.

The formal solution of the classical radiative transfer differential equation [32] [Ch. 11, Eq. (3)] for an atmosphere in local thermodynamic equilibrium is given [32] [Ch. 11, Eq. (4)] as an integral for the multiplication of a blackbody function along a slant path with a slant path transmission to an observer. Approximating the integral as a sum shows that the radiance is given by a summation of *t* × *B*(*T*) terms. We already showed that the blackbody and transmission functions are both lognormally distributed. With the addition property of lognormals (see property (vi) in Appendix A), the result is lognormally distributed radiance.

If the source radiance, *L _{s}*, is an atmospheric radiance, it is reasonable to assume that it is lognormally distributed, as the above argument would apply. For source radiance that is due to emission from an opaque object, we assume a lognormal distribution also. If the object is a blackbody, then this is justifiable based on section 3.1.1. For graybody radiation, where ${L}_{s}={\epsilon}_{s}B({T}_{s})$, choosing a lognormal distribution for the radiance implies either (i)

*ε*is constant and property (iii), Apendix A, establishes ${L}_{s}~LN(\mathrm{ln}{\epsilon}_{s}+{\mu}_{B},{\sigma}_{B}^{2})$, or (ii) ${\epsilon}_{s}~LN({\mu}_{\epsilon s},{\sigma}_{\epsilon s}^{2})$ and property (v), Appendix A, establishes ${L}_{s}~LN({\mu}_{\epsilon s}+{\mu}_{B},{\sigma}_{\epsilon s}^{2}+{\sigma}_{B}^{2})$. In the latter case, it must be that the population parameters of

_{s}*ε*are such that Pr(

_{s}*ε*>1) = 0 for the quantity to be strictly physical.

_{s}*3.2. Distribution of output terms using Johnson* S_{U}

_{U}

The primary output terms of the *pdf* model are the total at-sensor radiances *M*^{(0)} and *M*^{(1)}, as well as the differential radiance *ΔM*; these are the terms that a sensor has access to directly (i.e., can measure). The thermal contrast, *ΔT*, is also fundamentally important due to the key role it plays in enabling LWIR detection. We model these terms with the Johnson S* _{U}* distribution. In this section we explain the process and the reasons why the S

*distribution is successful as a single distribution to describe these terms. We also discuss the use of the Johnson S*

_{U}*distribution to describe intermediate terms in the model—terms of secondary importance such as at-sensor radiance terms for the individual layers,*

_{U}*M*,

_{f}*M*,

_{c}*M*, and

_{b}*M*—which can be more challenging for the S

_{s}*distribution to represent.*

_{U}The previous section showed that it is reasonable to model all of the input RVs in Eq. (1) as lognormal distributions. While physically justifiable, this is also very convenient since a product of lognormal distributions is (analytically) lognormally distributed, and a sum of lognormal distributions is (approximately) lognormally distributed. Ignoring the presence of emission terms such as ${\epsilon}_{c}=1-{t}_{c}$ (e.g., assuming *t _{c}* was constant or that

*ε*was lognormally distributed) would lead to a lognormal distribution for the total at-sensor radiance (where correlation between terms in the sum—caused by

_{c}*t*,

_{f}*t*, and

_{c}*t*appearing in multiple places—can be handled via property (viii) in Appendix A). However, the problem is that the emissivity $\epsilon =1-t$ when $t~LN({\mu}_{t},{\sigma}_{t}^{2})$ does not follow the standard 2-parameter lognormal distribution, and is instead described by a lognormal distribution that has been reversed (reflected about zero) and shifted. While the moments of this distribution are easily computed in terms of the moments of

_{b}*t*(e.g., $E({\epsilon}^{2})=1-2E(t)+E({t}^{2})$), it is difficult to determine the distribution of the product $\epsilon \times B(t)$, for example. Equivalently, expanding terms such as ${M}_{c}={t}_{f}(1-{t}_{c})B({T}_{c})$ to become ${t}_{f}-{t}_{f}{t}_{c}B({T}_{c})$ shows that a

*difference*of (correlated) lognormally-distributed variates enters into the model. Theoretically, the standard 2-parameter lognormal distribution is a poor candidate to approximate a difference of lognormals since there is no guarantee that values would be positive. This fact, combined with the fact that transmissions in our model may actually be

*truncated*lognormal distributions (for which the nice multiplicative and additive properties no longer apply), motivated us to use the Johnson system to fit the moments of the model outputs. A secondary advantage to this is that one may relax the need to assume a specific distribution for the model inputs, and instead may simply specify the first four moments of the input variates.

The Johnson system contains three 4-parameter distributions, S* _{L}*, S

*, and S*

_{U}*, which, along with the normal distribution, allows fitting any combination of values for the set of moments, {*

_{B}*E*,

*V*,

*S*,

*K*}. All distributions in the system may be considered transformations of a normal variate. Typically distribution systems are characterized by the Pearsonian coefficients ${\beta}_{1}={S}^{2}$ and ${\beta}_{2}=K$, where each distribution in the system corresponds to a region, curve, or point in the

*β*

_{1},

*β*

_{2}plane. The S

*distribution is a lognormal distribution with the added flexibility of scale and location parameters (and thus the distribution may be shifted and the skewness may become negative); despite being defined as a four-parameter distribution only three of the parameters are identifiable. Both the Johnson S*

_{L}*and the standard two-parameter lognormal distributions are described by the same curve in the*

_{L}*β*

_{1},

*β*

_{2}plane (see Fig. 2 ). Any point below the lognormal curve is in the S

*region, which is an unbounded (and unimodal) distribution. Any point above the lognormal curve is in the S*

_{U}*region, a bounded distribution that may be unimodal or bimodal. The normal distribution is a point in Fig. 2 and is a limiting form of the S*

_{B}*(and S*

_{L}*and S*

_{U}*) distributions; the S*

_{B}*is a limiting form of the S*

_{L}*(and S*

_{U}*). The fact that the lognormal distribution appears as a special case of each distribution in the Johnson system is an important property given that (as stated above) the model predicts lognormally distributed radiance under certain conditions.*

_{B}The S* _{U}* distribution is particularly important because: (i) we observed that most of our simulations produced (

*β*

_{1},

*β*

_{2}) values that—if not on the lognormal curve—were below it, and thus in the S

*region; (ii) analytical moments are relatively easy to derive and there are good procedures for method-of-moments estimation (details are in Appendix B); (iii) the S*

_{U}*and the*

_{L}*LN*distributions are limiting cases of the S

*, and thus the S*

_{U}*distribution can represent these distributions with an error that is as small as desired—this means that the model outputs may always be considered to be S*

_{U}*thereby avoiding the complexity of having to switch to or from an S*

_{U}*representation.*

_{L}There are times when the moments of model outputs are above the lognormal curve in the S* _{B}* region (generally caused when the truncated lognormal distribution is necessary for transmission variates). However, dealing with the S

*distribution is difficult due to the complexity of the moments; estimation must be performed via quantiles or even frequencymoments [28]. Therefore, even when the distribution to be fit is in the S*

_{B}*region, we use an S*

_{B}*approximation (we project the point in the S*

_{U}*region to the closest point on the lognormal curve and then use the S*

_{B}*—see details in Appendix B, property (ii)—generally the Euclidean distance to the lognormal curve is small, and the S*

_{U}*fit is reasonably good). Therefore, the S*

_{U}*distribution is the only member of the Johnson system that we need to consider.*

_{U}The general procedure for determining the *pdf* of model outputs is to analytically compute the first four moments (*E*, *V*, *S*, and *K*) for each model output that we desire to fit with a *pdf*, say, *M*^{(1)} under *LOS*_{2}. *E*, *V*, *S*, and *K* are a function of the raw moments of *M*^{(1)}, which are in turn a function of the population parameters of the model inputs. The raw moments are found by, e.g., expanding the function ${({L}_{f}+{t}_{f}(1-{t}_{c})B({T}_{c})+{t}_{f}{t}_{c}{L}_{b}+{t}_{f}{t}_{c}{t}_{b}{L}_{s})}^{n}$ (where *n* is 1 through 4; obtaining 5 terms for $n=1$, 15 terms for $n=2$, 35 terms for $n=3$, and 70 terms for $n=4$) and taking the expectation of each expression, using properties of the expectation operator given in Appendix D. Each expression contains terms that are a product of up to six RVs (in this example) appearing to powers up to 4. Because of independence, the expectation operator distributes, e.g., $E({t}_{f}^{3}{t}_{c}^{2}{L}_{b})=E({t}_{f}^{3})E({t}_{c}^{2})E({L}_{b})$. *E*, *V*, *S*, and *K*, are then computed as a function of the raw moments using Appendix C. This procedure, though tedious, is nevertheless straight-forward and may be applied to all of the model outputs. Given the analytical moments, then the estimation procedure in Appendix B is used to determine the four population parameters (*a*, *b*, *c*, *d*) of the S* _{U}* distribution, e.g., ${M}^{(1)}~{S}_{U}\left(a,\text{\hspace{0.17em}}b,\text{\hspace{0.17em}}c,\text{\hspace{0.17em}}d\right)$. If the moments are located in the S

*region (or on the lognormal curve), then the corresponding moments of the S*

_{U}*fit (see Eq. (B4) in Appendix B) will match exactly. If the distribution to be fit is in the S*

_{U}*region, then the skewness and kurtosis will be slightly off, and only the first two moments will be fitted exactly.*

_{B}In the remainder of this section, we highlight several individual model outputs, how well they are modeled with the S* _{U}* distribution, and conditions for them to be lognormally distributed.

### 3.2.1 Homogeneous layer radiance and at-sensor radiances

As previously mentioned, the possibility of truncated lognormal distributions for the transmission terms and the presence of emission terms $\epsilon =1-t$ are the primary reasons for model outputs to deviate from the standard 2-parameter lognormal distribution. Therefore, terms such as *M _{c}*—or

*L*and

_{f}*L*using the homogeneity assumption (Eq. (3))—are potentially the most problematic since both transmission and emission terms appear. Our observations indicate that terms such as

_{b}*M*are the most likely to be poorly fit by the S

_{c}*distribution, i.e., it is possible for them to be in the S*

_{U}*region. However, even when the component at-sensor radiances are poorly fit, the total at-sensor radiance tends to be remarkably well fit by the S*

_{B}*, and usually*

_{U}*M*

^{(1)}is even better fit than

*M*

^{(0)}. A reason for this may stem from the fact that the S

*region (see Fig. 2) is characterized by smaller kurtosis values for a given skewness than the S*

_{B}*(or alternatively larger skewness for the same kurtosis).*

_{U}*M*

^{(0)}and

*M*

^{(1)}are a convolution (sum) of the component at-sensor radiances where convolution is a smoothing and broadening operation. We theorize that the convolution of terms helps increase the value of the kurtosis relative to the skewness, making it more likely for total at-sensor radiance terms to be in the S

*region.*

_{U}*M*

^{(1)}is a convolution of more terms, and thus would be more likely to be in the S

*region than*

_{U}*M*

^{(0)}. Regardless of the reason,

*M*

^{(0)}and

*M*

^{(1)}—the primary outputs of the model—are particularly well fit by the S

*distribution.*

_{U}The ability to approximate at-sensor radiances as lognormals is attractive as it eliminates the necessity of using a fitting routine for the S* _{U}* due to the elegance of lognormal distributions that are easily manipulated. Component at-sensor radiances such as

*M*(or

_{s}*M*under

_{b}*LOS*

_{2}when the homogeneity assumption is not used) will be lognormally distributed as long as the transmissions are not significantly truncated. On the other hand,

*M*will only be lognormal if the transmissions are not truncated

_{c}*and*if $1-{t}_{c}$ may be well approximated by a lognormal distribution (which occurs when

*t*is nearly symmetric), in which case ${M}_{c}={t}_{f}\left(1-{t}_{c}\right)B({T}_{c})$ is a product of three lognormal variates. If all of the component at-sensor radiances are lognormally distributed, then the total at-sensor radiance will also be (approximately) lognormally distributed (based on the additive property of

_{c}*correlated*lognormals in Appendix A—where the correlation is caused by the transmissions appearing in multiple terms).

We note that a further simplification for the total at-sensor radiance (or any other quantity) that is lognormally distributed is the normal distribution. The normal is the limiting form of the lognormal as ${\sigma}^{2}\to 0$. Thus, if the skewness and kurtosis of the lognormal distribution are not too far from 0 and 3, respectively, then a normal distribution may be used instead. For example, at wavenumber of 1000 *cm*^{−1}, ${\mu}_{T}=300K$ and ${\sigma}_{T}=5K$, the blackbody lognormal function $B(T)~LN({\mu}_{B},{\sigma}_{B}^{2})$ (see Section 3.1.1) is given with${\mu}_{B}=-11.5$ and ${\sigma}_{B}^{2}=0.0064$, for which the skewness and kurtosis (Appendix A) are 0.24 and 3.1, respectively. Hence, $B(T)$ can be well-approximated by a normal distribution $B(T)~N({\mu}_{B},{\sigma}_{B}^{2})$.

### 3.2.2 Differential radiance

The differential radiance, $\Delta M={M}^{(1)}-{M}^{(0)}$, is particularly well-suited to a Johnson S* _{U}* distribution. We can theoretically prove that a difference of independent S

*variates, $z=x-y$, always remains in the S*

_{U}*region. Although this does not establish that a difference of S*

_{U}*variates is*

_{U}*analytically*an S

*distribution, it does say that an S*

_{U}*fit will always be able to exactly match the first 4 moments. Additionally, even when one or both of*

_{U}*M*

^{(0)}and

*M*

^{(1)}are in the S

*region (and therefore outside of the S*

_{B}*), it is possible for*

_{U}*ΔM*to be in the S

*region. Thus, the Johnson S*

_{U}*is very appropriate for fitting*

_{U}*ΔM*.

In the physical model, the differential radiance given by Eq. (4) is the information in the measured radiance that is due to the cloud, i.e., it captures *only* those photons that interact with the cloud. In the stochastic model the interpretation of *ΔM* is not as straight-forward: $\Delta M={M}^{(1)}-{M}^{(0)}$ is a difference of RVs that are assumed to be (independently) sampled at different times, and fluctuations in the input RVs will cause a differential radiance between terms that do not involve the cloud. Thus, in the stochastic model *ΔM* captures both “informative” cloud-photons, as well as “uninformative” photons that are due to clutter fluctuations. For example, consider the foreground radiance, *L _{f}*, which is distributed identically and independently under

*H*

_{0}and

*H*

_{1}conditions. The mean $E\left({L}_{f}|{H}_{1}-{L}_{f}|{H}_{0}\right)=0$, but the variance $\mathrm{var}\left({L}_{f}|{H}_{1}-{L}_{f}|{H}_{0}\right)$ is the sum of the variances of

*L*|

_{f}*H*

_{0}and

*L*|

_{f}*H*

_{1}; therefore, the differential radiance will include some photons that originated in the foreground layer and never interacted with the cloud. Note that correlations between

*H*

_{0}and

*H*

_{1}quantities may be included in this analysis, but here we assume independent temporal sampling.

When *M*^{(0)} and *M*^{(1)} analytically follow the lognormal distribution, the differential radiance *ΔM* is a difference of independent lognormals. We can show analytically that a difference between two independent lognormal distributions is always in the S* _{U}* region (below the lognormal line in the

*β*

_{1},

*β*

_{2}plane), and in fact, the fit is so fantastic it is tempting to think that the S

*distribution might even be the analytical*

_{U}In the stochastic model, we use binary hypothesis testing [19] with respect to the radiance distributions *M*^{(0)} and *M*^{(1)} to determine the detection potential of the remote sensing scenario that is due to scene noise (e.g., the likelihood that at-sensor radiance may be attributed to the presence of a cloud along with the corresponding probability of false alarm). We are aware of the fact that detecting a presence of a cloud with only one wavelength is problematic as it is susceptible to drift, atmospheric disturbances, and interferences (using multiple wavelengths, these effects may be mitigated due to the wavelength dependence of the different events).

### 3.2.3 Thermal contrast

Determining the distribution of the thermal contrast in radiance units—defined in (4)—is qualitatively similar to determining the distribution of differential radiance, and raises no additional challenges. However, determining the distribution of *ΔT* in units of brightness temperature (Eq. (5)) is difficult because of the appearance of the inverse Planck blackbody function, *B*^{−1}(·), that appears in the expression $\Delta T={B}^{-1}\left({L}_{in}\right)-{T}_{c}$. If *L _{in}* is a lognormally distributed variate, then the steps in showing that

*B*(

*T*) is approximately lognormal when

*T*is normal can be reversed to show that ${T}_{in}={B}^{-1}\left({L}_{in}\right)$ is approximately normal, where

However, in general *L _{in}* will be described (or approximated) by an S

*distribution. Transformation of variables applied to Eq. (7) is too complicated when the input distribution is the Johnson S*

_{U}*(scaling and translation relationships for the S*

_{U}*distribution exist—see Appendix B—but more complicated transformations generally do not lead to densities that can be integrated); computing moments of*

_{U}*T*analytically from the moments of

_{in}*L*is also intractable due to the nonlinearity of the transform. However, moments can be computed easily when the inverse blackbody function is linearized (approximated) with a first order Taylor expansion. Therefore, we will describe two complementary methods to determine the

_{in}*ΔT*: (i) approximating

*L*as a lognormal distribution and using the exact functional form of

_{in}*B*

^{−1}; or (ii) propagating exact moments of

*L*using an approximated form of

_{in}*B*

^{−1}.

In the first method, ${k}_{1}/L~LN\left(\mathrm{ln}{k}_{1}-{\mu}_{L},{\sigma}_{L}^{2}\right)$, and $y={k}_{1}/L+1$ may be approximated very well as a 2-parameter lognormal distribution with population parameters {*μ _{y}*,

*σ*} found from fitting $E(y)={k}_{1}\mathrm{exp}\left(-{\mu}_{L}+{\sigma}_{L}^{2}/2\right)+1$ and $V(y)=\left[\mathrm{exp}\left({\sigma}_{L}^{2}\right)-1\right]E{(y)}^{2}$ using Eq. (A2). Then, $\mathrm{ln}y~N\left({\mu}_{y},{\sigma}_{y}^{2}\right)$ by definition. The previously-used first order Taylor approximation for the reciprocal of a normal variate (Section 3.1.1) establishes ${B}^{-1}\left(L\right)$ to be approximately $N\left({\mu}_{y}^{-1},{\mu}_{y}^{-4}{\sigma}_{y}^{2}\right)$.

_{y}*T*

_{c}is also normally distributed, and thus the distribution of

*ΔT*is approximated by $N\left({\mu}_{y}^{-1}-{\mu}_{Tc},{\mu}_{y}^{-4}{\sigma}_{y}^{2}+{\sigma}_{Tc}^{2}\right)$. These steps are simply the reverse of the steps in Section 3.1.1, and are appropriate whenever the S

*distribution for*

_{U}*L*is not “too far” from the lognormal line. However, as the kurtosis and skewness of the S

_{in}*gets further from the lognormal, the output distribution for*

_{U}*T*becomes skewed and the normal approximation (which is necessarily symmetric) will begin to provide a poor fit.

_{in}In the second method, we approximate Eq. (7) with a first order Taylor expansion around the mean of *L*. Let *L*_{0} denote *E*(*L*), then

The result is that ${B}^{-1}\left(L\right)$ now has a linear relationship with random variate *L*, ${B}^{-1}\left(L\right)\approx {\alpha}_{1}L+{\alpha}_{0}$, with constant gain *α*_{1} and offset *α*_{0} given by

The moments of ${B}^{-1}\left(L\right)$ and $\Delta T={B}^{-1}\left({L}_{in}\right)-{T}_{c}$ may now be easily computed using Appendices C and D, and fit with an S* _{U}* via property (ii) of Appendix B.

Note that if *L _{in}* analytically follows an S

*distribution, ${L}_{in}~{S}_{U}\left(a,\text{\hspace{0.17em}}b,\text{\hspace{0.17em}}c,\text{\hspace{0.17em}}d\right)$, then ${B}^{-1}({L}_{in})~{S}_{U}\left({\alpha}_{1}a+{\alpha}_{0},\text{\hspace{0.17em}}b{\alpha}_{1},\text{\hspace{0.17em}}c,\text{\hspace{0.17em}}d\right)$ via property (iii) in Appendix B, where the absolute value and signum functions are not needed since ${\alpha}_{1}>0$. Subsequently, moments of ${B}^{-1}\left(L\right)$ can be found from Eqs. (B3) and (B4) instead of using Appendices C and D. Additionally, if*

_{U}*T*is a constant, then neither Appendices C and D nor the S

_{c}*fitting are needed since Appendix B property (iii) establishes $\Delta T~{S}_{U}\left({\alpha}_{1}a+{\alpha}_{0}-{T}_{c},\text{\hspace{0.17em}}b{\alpha}_{1},\text{\hspace{0.17em}}c,\text{\hspace{0.17em}}d\right)$.*

_{U}The first order Taylor expansion is appropriate when the variance of *L _{in}* is small enough such that the inverse blackbody function is approximately linear over the domain of

*L*. Which method is better will be determined by the relative error in approximating

_{in}*L*as a lognormal distribution compared to the error in assuming the blackbody function is linear. In all simulations shown in Section 5, the blackbody function is close to linear, and so the second method works well. Thus, in the remainder of the paper we only focus on the second method.

_{in}## 4. Simulations

In this section we describe the details of simulations for the detection of triethyl phosphate (TEP) vapor, including how population parameters were chosen. All simulations are run at 1049.3 *cm*^{−1} (which corresponds to peak absorption for TEP) and radiance values are reported in units of $W/c{m}^{2}/sr/c{m}^{-1}$. With given population parameters $(\mu ,{\sigma}^{2})$ for the input variates to the model, we sample values from the appropriate random number generator (10^{6} samples of each variate, creating 10^{6} sets of values). For example, layer transmissions are sampled from (possibly truncated) lognormal distributions, temperatures are sampled from normal random number generators, and layer radiances (e.g., *L _{s}*) are sampled from the lognormal distribution. Samples for the intermediate model terms and the primary outputs are computed via applying the model equations to each set of random values, creating 10

^{6}values for each output term. We ensure that all RVs for

*H*

_{0}are sampled independently (i.e., sampled separately) from

*H*

_{1}, thus we have an additive signal model where

*H*

_{0}and

*H*

_{1}are uncorrelated (in field measurements

*H*

_{0}and

*H*

_{1}are measured at different times, and therefore are assumed independent). From the sampled RV data we compute histograms (sampled

We present two types of simulations. The first is a physical scenario of a sensor situated 1 *m* above the ground in a horizontal LOS geometry that is viewing a vapor cloud located at a horizontal distance of 1 *km*. At the end of the LOS (10*km* away from the sensor), there is a topographical target (a mountain). The second type of simulation is for a hypothetical scenario with a challenging set of parameters for the RVs (e.g., a combination of transmissions, temperatures and radiance values) in order to test our model’s accuracy and to explore its limitations. The parameters chosen for the hypothetical scenario can be associated with a physical scenario, but this is not our intention. For both types of simulations we chose horizontal LOS geometry, a geometry that is more challenging than a slant path due to the added complexity of the foreground and background emission terms (discussed in Section 3.2). Simulation results for slant path geometry (not shown) were very good. Table 1
summarizes population parameters as well as means and variances for all simulations.

#### 4.1 Physical simulation

The 3-layer model geometry in the physical simulation is similar to a previously used scenario [4]: a LOS near the horizon (1.7 degree up-look angle), foreground pathlength of 1 *km* at temperature of 288 K; a cloud at temperature of 288 K (i.e., at equilibrium with the ambient atmosphere); background layer of 9 *km* path at the cloud temperature; a mountain (that serves as an external blackbody source) with emissivity of 0.85 (typical of many clay soils in the IR) at temperature of 278 K (i.e., 10 K colder than the ambient air temperature, a scenario that can occur in the morning hours). Note that a 1.7 degree up-look angle is shallow enough that horizontal distances and slant-path distances are virtually identical. We use previous stochastic simulations [6] as guidelines for setting population parameters for transmission and temperature functions in this work.

The standard deviation of the transmission, $\sqrt{V(t)}$, for IR wavelengths and horizontal LOS (elevation angle < 5°) is between $2\cdot {10}^{-5}$ to $8\cdot {10}^{-5}$ for which the mean transmission values $E(t)$ are between 0.7 to 0.97 (see Figs. 4 and 5 in Reference [6]). Note that the transmission of the atmosphere is highly wavelength dependent. At wavelength 993 $c{m}^{-1}$, a relatively clear atmospheric transmission, the ground level extinction coefficient is 0.07 $k{m}^{-1}$; for 1053$c{m}^{-1}$, due to ozone absorption, the extinction coefficient is 0.12 $k{m}^{-1}$; for 853 $c{m}^{-1}$, due to water vapor absorption, the ground level extinction coefficient is 0.3 $k{m}^{-1}$.

In our “physical” simulation we choose the foreground transmission for horizontal path of 1 *km* to be lognormally distributed with $E({t}_{f})=0.9$ and $\sqrt{V({t}_{f})}=5\cdot {10}^{-5}$ for which the population parameters $({\mu}_{t},{\sigma}_{t}^{2})$ can be computed from $V(t),E(t)$ with Eq. (A2) to be ${\mu}_{tf}=-0.105$ and ${\sigma}_{tf}^{2}=3.09\cdot {10}^{-9}$. Thus, the foreground transmission is ${t}_{f}~LN({\mu}_{tf},{\sigma}_{tf}^{2})$ and the foreground optical depth is ${\tau}_{f}~N(-{\mu}_{tf},{\sigma}_{tf}^{2})$. The background layer is 9 times longer in pathlength than the foreground layer; population parameters are derived from the relation ${\tau}_{b}=9{\tau}_{f}$. Thus, ${\tau}_{b}~N(-{\mu}_{tb},{\sigma}_{tb}^{2})$ and ${t}_{b}=\mathrm{exp}(-{\tau}_{b})=LN({\mu}_{tb},{\sigma}_{tb}^{2})$ with ${\mu}_{tb}=9{\mu}_{tf}$ and ${\sigma}_{tb}^{2}=81{\sigma}_{tf}^{2}$. Using property (i) of Appendix A, the mean and standard deviation of the background transmission is $E({t}_{b})=0.387$ and $\sqrt{V({t}_{b})}=1.94\cdot {10}^{-4}$. Note that this produces a variablilty for the background transmission that is a little larger (factor of 2.5) than observed in [6]. Truncation is not required for any of the transmission variates in the physical simulation.

The variance of the atmospheric temperature is computed with a turbulence theory [6]. The variance of temperature due to turbulence between two points separated by distance *Δx* is $V(T)={C}_{T}^{2}\Delta {x}^{2/3}$, where ${C}_{T}^{2}$ is the temperature structure constant. ${C}_{T}^{2}$ is proportional to the refractive index structure constant, ${C}_{n}^{2}$ (Eq. (2) in [6] where a typo needs correction: the left-hand side should have been ${C}_{n}^{2}$ and the right-hand side should have been ${C}_{T}^{2}$). Both ${C}_{T}^{2}$ and ${C}_{n}^{2}$ are a function of elevation. The Tatarski model (see [33] Eq. (15).4.35) gives the height dependence of the refractive index structure constant as ${C}_{n}^{2}(h)={C}_{n}^{2}(1){h}^{-4/3}$ where $h>1\text{\hspace{0.17em}}m$ (supported by experiments for *h* up to at least 100*m*). We use a value of ${C}_{n}^{2}(1)={10}^{-13}\text{\hspace{0.17em}}{m}^{-2/3}$ which is characteristic of moderately strong turbulence; the corresponding value of the temperature structure constant (for the 1976 US standard atmosphere [34] with surface temperature of 288*K* and 46% relative humidity) is ${C}_{T}^{2}=0.088\text{\hspace{0.17em}}{K}^{2}{m}^{-2/3}$. Using the Tatarski height dependence and assuming (for convenience) that wind speed $u=1\text{\hspace{0.17em}}m{s}^{-1}$ and sampling rate $dt=1\text{\hspace{0.17em}}s$(and hence $\Delta {x}^{2/3}={(udt)}^{2/3}=1$), the variance of temperature over a slant-path extending from height ${h}_{1}$ above the ground to height ${h}_{2}$ is estimated by $V(T)={\displaystyle {\int}_{h1}^{h2}0.088{h}^{-4/3}dh}=0.264\left({h}_{1}^{-1/3}-{h}_{2}^{-1/3}\right)$. For LOS near the horizon at 1.7° elevation angle, the variance of the foreground layer (1*km* path from ${h}_{1}=1\text{\hspace{0.17em}}m$ to ${h}_{2}=30.7\text{\hspace{0.17em}}m$) is 0.180 *K*^{2} and for the background layer (9*km* path from ${h}_{1}=30.7\text{\hspace{0.17em}}m$ to ${h}_{2}=298\text{\hspace{0.17em}}m$) is 0.0448 *K*^{2}. The variance of the cloud temperature is simply set to 0.088 *K*^{2}. Temperatures in the model are normally distributed, and as explained in 3.1.1, corresponding distributions for the blackbody radiance are very close to lognormal distributions; Table 1 gives population parameters and moments for these quantities.

The radiance from the external source, *L _{s}*, is modeled as a lognormal distribution with $E({L}_{s})=5.15\cdot {10}^{-6}$ and $V({L}_{s})=1.79\cdot {10}^{-15}$. This choice is arbitrary but is consistent with graybody radiation ${L}_{s}={\epsilon}_{s}B({T}_{s})$ with ${T}_{s}~N(278,\text{\hspace{0.17em}}0.088)$ and ${\epsilon}_{s}$ distributed according to a lognormal distribution with $E({\epsilon}_{s})=0.85$ and ${\sigma}_{\epsilon s}^{2}={\sigma}_{BTs}^{2}$; population parameters and moments are given in Table 1.

For the vapor cloud population parameters we used field data from an experiment where a TEP vapor “cloud” was confined to a closed chamber [3] of size $3m\times 3m\times 6m$. In the experiment, concentration-pathlength values inferred from LWIR remote sensing measurements when 1 *cm*^{3} of TEP was vaporized in the chamber were fit with a normal distribution. We extract the mean and standard deviation of the TEP concentration-pathlength ($\rho =56.60$*mg*/*m*^{2} and $\sigma =2.762$ *mg*/*m*^{2}, respectively; see [3], Table 1, $i=4$) and convert the values to optical depth ($\tau =\alpha \rho $, where $\alpha =8.389\cdot {10}^{-4}\text{\hspace{0.17em}}{m}^{2}m{g}^{-1}$is the absorption coefficient at 1049.3 *cm*^{−1}). Thus, the mean optical depth is ${\mu}_{{\tau}_{c}}={\alpha}_{}\times 56.60=0.0475$ with variance ${\sigma}_{\tau c}^{2}={\alpha}^{2}\times {2.762}^{2}=5.369\cdot {10}^{-6}$, which means that the TEP cloud’s transmission in the physical simulation is distributed as ${t}_{c}~LN({\mu}_{tc},{\sigma}_{tc}^{2})$ where ${\mu}_{{t}_{c}}=-{\mu}_{{\tau}_{c}}$ and ${\sigma}_{tc}^{2}={\sigma}_{\tau c}^{2}$. Note that in our model the dimension of the cloud is immaterial, only the optical depth is important.

#### 4.2 Additional simulations

For the second type of simulations (intended to challenge our model) we increase the variance of the temperature of the three layers (${T}_{f},{T}_{c},{T}_{b}$) by ten-fold, ${\sigma}_{Tf}^{2}\to 10{\sigma}_{Tf}^{2}=1.80\text{\hspace{0.17em}}{K}^{2}$, ${\sigma}_{Tc}^{2}\to 10{\sigma}_{Tc}^{2}=0.88\text{\hspace{0.17em}}{K}^{2}$, and ${\sigma}_{Tb}^{2}\to 10{\sigma}_{Tb}^{2}=0.448\text{\hspace{0.17em}}{K}^{2}$. The source radiance is altered by increasing the variance of *T _{s}* by a factor of 100, from $0.088\text{\hspace{0.17em}}{K}^{2}\to 8.8\text{\hspace{0.17em}}{K}^{2}$. The variances of all layer transmissions are set to the large value of 0.01 (standard deviation of 0.1 transmission unit). For “Simulation 1”, the mean transmission value for all layers is set to 0.95 (i.e., optically thin layers), which results in a lognormal distribution with ~29% of values greater than 1 (and thus the effects of truncation are significant). For “Simulation 2”, the mean transmission values for all three layers are set to 0.1, such that each layer is closer to being optically thick. Although less than 0.1% of transmission values are greater than 1 for Simulation 2 (and thus truncation is not significant), the transmission

*prior*to truncation. The actual mean and variance of the

*truncated*lognormal distributions in Simulations 1 and 2 are $E(t)=0.900$, $V(t)=4.20\cdot {10}^{-3}$, and $E(t)=0.0991$, $V(t)=8.89\cdot {10}^{-3}$, respectively.

## 5. Results and discussion

We compare the sampled data histograms of the radiative transfer quantities (Eqs. (1-5)) to those predicted with the Johnson S* _{U}* distribution. To assess how close the two are, we compute a symmetric Kullback-Leibler (KL) distance [35], $KL=\left(KL(\alpha ,\beta )+KL(\beta ,\alpha )\right)/2$. The KL distances are noted in the figure captions. The smaller the distance the better the agreement. The computed value of the KL will depend on how the histograms are computed (e.g., the bin spacing used). To get an idea of the significance of the KL value, we computed the 99th percentile for KL distances between a histogram computed from 10

^{6}samples drawn from

*N*(0,1) and the theoretical density; the result of 0.0013 means that KL values must be greater than this to establish that the two densities are not identical. In Figs. 3 -4 we show the comparison for the physical simulations and in Figs. 5 -8 for the two hypothetical scenarios. In Table 2 we summarize the Johnson S

*population parameters, $\left\{a,b,c,d\right\}$, for the primary model outputs*

_{U}*M*

^{(0)},

*M*

^{(1)}, and $\Delta M$, as well as for $\Delta T$. The

In the first panel of Fig. 3 we show (top row) the layer emission radiances ${L}_{f}={M}_{f}$, ${L}_{b}$, and the cloud layer emission radiance ${L}_{c}=(1-{t}_{c})B({T}_{c})$. In the middle row we show the ${H}_{0}$ at-sensor radiances for the external source and background layers $({M}_{s}^{(0)},{M}_{b}^{(0)})$, and the cloud at-sensor radiance ${M}_{c}$; in the bottom row we show ${H}_{1}$ at-sensor radiance $({M}_{s}^{(1)},{M}_{b}^{(1)})$and the cloud blackbody radiance, $B({T}_{c})$. The layer radiances and at-sensor radiances are modeled with Johnson S* _{U}* statistics and the blackbody $B({T}_{c})$ is modeled with lognormal statistics. Figure 3 shows that the approximation of a blackbody with a lognormal

The agreement between our model and the data for the emission terms and the at-sensor radiances is excellent (KL distances for all RVs in the physical simulation are less than 0.002).

In Fig. 4 (top left panel) we show the *pdf* for the total at-sensor radiance under *H*_{0} and *H*_{1}. In the top right panel we show the *pdf* for the differential radiance $\Delta M$(Eq. (4)). The *pdf*s are in execellent agreement with the data (KL distances for *M*^{(0)}, *M*^{(1)}, and $\Delta M$ are less than 0.002). In the bottom left panel we show the *pdf* for the thermal contrast (in Kelvin) $\Delta T={B}^{-1}\left({L}_{in}\right)-{T}_{c}$ where ${L}_{in}={L}_{b}+{t}_{b}{L}_{s}$. The agreement with the data is excellent, KL distance of 0.0016. The negative $\Delta T$ indicates that the cloud is always in emission mode, ${\int}_{-\infty}^{0}{p}_{\Delta T}(dT)dT}=1$, and the probability for $\Delta T>0$ is zero. Nevertheless, the *pdf* for $\Delta M$ shows that the cumulative probability for $\Delta M<0$ is ${\int}_{-\infty}^{0}{p}_{\Delta M}(dM)dM}=0.13$. Thus, due to the fluctuations in the foreground layer, the cloud that is “physically” in emission mode will appear in the measurements with ${M}^{(1)}<{M}^{(0)}$, about 13% of the time, and the cloud may be erroneously interpreted as a cloud in “absorption” mode. This also has implications towards averaging spectra in order to increase the signal to noise ratio: fluctuations in the polarity of $\Delta M$ means that naively averaging instances of $\Delta M$measured by a sensor will decrease noise but will also decrease the mean (signal). In the bottom right panel we show the performance of detection, with the standard receiver operation characteristics (ROC) curves (the probability of detection, *P _{D}*, as a function of the probability of false alarm,

*P*). The ROC curve can be used to assess the theoretical performance of detecting the cloud due to scene fluctuations.

_{FA}In Figs. 5-8 we show the two hypothetical scenarios of Table 1 with challenging parameters. We first discuss the S* _{U}* fit for several of the intermediate terms in the model to illuminate the limitations of the S

*distribution, and then we show that in spite of these limitations, the S*

_{U}*is an excellent fit for the important quantities that determine performance, namely,*

_{U}*M*

^{(1)},

*M*

^{(0)},$\Delta M$, and $\Delta T$. In Fig. 5 (simulation 1) the KL distances for

*L*,

_{f}*L*,

_{b}*L*, ${M}_{b}^{(0)}$, ${M}_{b}^{(1)}$, and

_{c}*M*are large (~1.3), which indicates poor agreement with the S

_{c}*model. The histograms for these parameters all have a large probability for radiance values near zero, and a discontinuity at zero that reflects the fact that radiance values may not be negative (the shape of the histograms is caused by the severe truncation necessary to model the transmission terms on the interval 0 to 1; many transmission values are close to 1, meaning that the emission values for each layer $\epsilon =1-t$ are near zero). The S*

_{U}*distribution, however, is a continuous distribution defined on the interval $-\infty $ to $\infty $ with no restriction regarding non-negativity. Thus, the S*

_{U}*fits for these terms are poor, and include (nonphysical) negative radiances. At-sensor radiances for the source layer, on the other hand, are reasonably well-fit by the S*

_{U}*distribution (KL distances for ${M}_{s}^{(0)}$ and ${M}_{s}^{(1)}$ are 0.005 and 0.008, respectively). The source layer is not modeled with its own transmission term, and the effect of attenuating the source radiance through truncated lognormal transmission terms (where a majority of values are close to 1) is small.*

_{U}Despite the fact that intermediate model terms for simulation 1 pose problems for the Johnson S* _{U}* fit, the primary model outputs,

*M*

^{(1)},

*M*

^{(0)}, and $\Delta M$—as well as $\Delta T$—are very well fit by the S

*distribution, with KL distances around 0.0013. These quantities, as well as the ROC curve for simulation 1, are shown in Fig. 6 . The ROC curve shows the challenging detection scenario posed by this simulation; despite the large thermal contrast and the same vapor optical depth as the physical simulation, it is only modestly above the 45 degree line (equal probability of false alarm and detection—a “coin-flip” detection scenario). Thus, scene fluctuations are large compared to the difference in average radiance between*

_{U}*H*

_{1}and

*H*

_{0}. In simulation 1 (Fig. 6) the cloud is in emission mode

*nearly always*(the probability of $\Delta T>0$is negligible, around $1\cdot {10}^{-5}$), but nevertheless the cloud appears in the measurements as an “absorbing” cloud, $\Delta M<0$,with large (37%) probability, ${\int}_{-\infty}^{0}{p}_{\Delta M}(dM)dM}=0.37$.

In Fig. 7
(simulation 2), layer radiances *L _{f}*,

*L*, and

_{b}*L*show a reasonable S

_{c}*fit (KL values between 0.03 and 0.06). In each case, the long tail to the left of the S*

_{U}*peak causes a small probability of non-physical negative values (~0.02%). At-sensor radiances ${M}_{b}^{(0)}$ and ${M}_{c}$ are also reasonable (KL values around 0.75) but have a higher probability of negative radiances (3%). Fits for ${M}_{s}^{(0)}$, ${M}_{s}^{(1)}$, and ${M}_{b}^{(1)}$are poor with large KL distances (greater than 2) and significant probabilities of negative radiances (13%, 27%, and 14%, respectively). The amount of truncation for transmission variates is small (<0.1%), thus we do not think the main reason for the poor fits. Regardless of the cause, the S*

_{U}*distribution is just not appropriate to describe the shape (rise and fall) of the histograms. The shape of the S*

_{U}*distribution is controlled by the skewness and kurtosis of the distribution; as shown in Fig. 2 the S*

_{U}*distribution only covers a subset of the possible values. The histograms in Fig. 7 are all in the S*

_{U}*region:*

_{B}*L*,

_{f}*L*,

_{b}*L*, ${M}_{b}^{(0)}$, and ${M}_{c}$ are all relatively close to the lognormal line (~2 units or less) such that the S

_{c}*approximation is still reasonable; ${M}_{s}^{(0)}$, ${M}_{s}^{(1)}$, and ${M}_{b}^{(1)}$are significantly far from the lognormal line (10-100 units away) such that the S*

_{U}*approximations are poor. Evidently, certain types of very sharply peaked distributions that rise very quick onone side with a long tail on the other (as in Fig. 7) are difficult for the S*

_{U}*to represent. Despite the poor fit for some of the intermediate model outputs, Fig. 8 shows excellent agreement between the model and the histograms for*

_{U}*M*

^{(1)},

*M*

^{(0)}, and $\Delta M$. In fact, these histograms

*are actually in the*S

*and KL values are small (≤0.002). For $\Delta T$ the fit is good, though not perfect (KL of 0.008). The discrepancy between the model and the histogram for $\Delta T$ is not due to the validity of the linearization of the inverse blackbody function described in Section 3.2.3, but is due instead to the fact that the*

_{U}region*L*radiance is actually in the S

_{in}*region. The distance to the lognormal line is small, however, and the S*

_{B}*approximation for $\Delta T$ is reasonable.*

_{U}Figure 8 also shows the detection performance (ROC curve) for Simulation 2, which is even poorer than for Simulation 2. The larger optical depth of the three layers in Simulation 2 causes more emission from the cloud layer, but also more attenuation of cloud photons due to the foreground layer. Additionally, the thermal contrast is reduced (compare the magnitude of $\Delta T$ in Figs. 6 and 8) due to the fact that fewer photons from the mountain reach the cloud. In fact, in Simulation 2 the cloud will exhibit both radiation modes: 83% of the time in emission (${\int}_{-\infty}^{0}{p}_{\Delta T}(dT)dT}=0.83$, computed from the histogram), and 17% in absorption (in Simulation 1 the cloud was always in emission). In the measurements, however, the cloud in Simulation 2 will *appear* in emission mode with probability of 47% (${\int}_{-\infty}^{0}{p}_{\Delta M}(dM)dM}=0.47$). Thus, inferences made from $\Delta M$ regarding $\Delta T$ (that are valid in the *physical* model) can be misleading in practice, where variation in the radiative transfer terms causes a mismatch between $\Delta M$ and $\Delta T$.

## 5. Summary

A probability model for a 3-layer radiative transfer model (foreground layer, cloud layer, background layer, and an external source at the end of LOS) has been developed. The 3-layer model is fundamentally important as the primary physical model in passive LWIR remote sensing, but it has not been useful for predicting performance due to clutter (scene variations).

This is (to our knowledge) the first probability model for a 3-layer radiative transfer model where all quantities are allowed to be stochastic and higher order statistics are included. The model provides statistical characterization of at-sensor spectral radiance due to scene fluctuations, that is, it can predict the detection potential in clutter-noise limited scenarios. Note that the probability model could be combined with sensor models and detection algorithms to predict the performance of a particular sensor.

The model is based on computing analytical moments (mean, variance, skewness, and kurtosis) of the output variates based on the moments of the input variates. Our choices for the input statistics are made based on physical theory, and we expect that they can accurately model real stochastic quantities. However, any distribution that has four computable moments can be chosen. Moments for input variates could also be measured empirically.

We assumed Gaussian (normal) statistics for temperature fluctuations, $T~N({\mu}_{T},{\sigma}_{T}^{2})$, which results in the excellent approximation that the Planck blackbody radiance is lognormally distributed, $B(T)~LN({\mu}_{B},{\sigma}_{B}^{2})$. Layer radiances and transmission variates are modeled as lognormal variates, both of which are justifiable based on physical arguments. The lognormal distribution has the nice property that it describes non-negative random variables (it is defined on the domain 0 to ∞), and thus is naturally applicable to describe radiance values. Transmission values are non-negative but also less than or equal to 1; therefore it is sometimes necessary (depending on population parameters) to represent transmission variates by *truncated* lognormal distributions. Other choices are possible (for example, the beta distribution is naturally bounded between 0 and 1 and can describe a wide range of shapes), however, the lognormal distribution is convenient to work with since products and sums of lognormal distributions are themselves lognormally distributed (or approximately so).

In special cases the model predicts that the total at-sensor radiance is lognormally distributed, however the lognormal distribution is not flexible enough to represent the general case. Therefore, we use the Johnson system of distributions to fit the moments of the output variates. There are other systems of curves that may be used for distribution fitting; we chose the Johnson system due to central role the logarithmic distribution plays in the system and the small number of distributions that appear in the system. The Johnson S* _{U}* distribution is a particularly useful and flexible distribution that is appropriate for representing the at-sensor radiance under

*H*

_{0}and

*H*

_{1}hypotheses,

*M*

^{(0)}and

*M*

^{(1)}, respectively, and the differential radiance, $\Delta M={M}^{(1)}-{M}^{(0)}$. These are the primary quantities that a sensor would be able to measure. The thermal contrast, Δ

*T*(see Eqs. (4) and (5)), is also typically well-fit by the S

*distribution. The thermal contrast is of interest as the principle physical quantity that enables LWIR vapor detection. Other intermediate output terms of the model may be well fit with the S*

_{U}*distribution; at times they would be better modeled with the Johnson S*

_{U}*distribution, but method-of-moments estimation is intractable with the S*

_{B}*, which precludes its use in our model. This is a disadvantage to the Johnson system, but in practice it is of little impact since the pimary model outputs are fit so well by the S*

_{B}*distribution, even when the intermediate terms would require the S*

_{U}*to be “optimally” fit.*

_{B}In the traditional (non-stochastic) physical model, $\Delta M$ gives only information about the vapor cloud. In the stochastic model, due to scene fluctuations, Δ*M* includes photons that never interacted with the cloud.

Detection potential may be quantified by ROC curves derived from binary hypothesis testing on the *M*^{(0)} and *M*^{(1)} distributions. Fluctuations (in the temperature of the cloud and in the input radiance on the back of the cloud, *Lin*) can also cause the cloud to be in emission and in absorption at the “same time”, i.e., at a given moment there is a non-zero probability for the cloud to be in absorption and a non-zero probability for the cloud to be in emission. This has potential impacts towards processing spectral time-series data: it may be dangerous to attempt to increase the signal-to-noise ratio (SNR) via averaging spectra, since it would be possible for an emission and absorption spectrum to cancel. It is also possible that due to the fluctuations in the foreground layer, the polarity of $\Delta M$will be reversed, hence, a cloud that is in emission (i.e., the cloud output radiance into the foreground layer is greater than the incident radiance on its back side) may appear in the measurements with $\Delta M<0$. Only with a probability model for radiative transfer one can study these types of phenomena.

In the probability model we implicitly assume that the population parameters for the three layers (and the external source) are stationary (i.e., are fixed) during the measurement period. Our probability model is a univariate model and thus can only address the radiative transfer model at a single wavelength. This is the first step in developing a multivariate model for hyperspectral remote sensing where the covariance between wavelengths plays an important role. The challenge in extending our univariate model to a multivariate model is mainly in extending Johnson S* _{U}* to multivariate statistics where the covariance needs to be computed and estimated from Eq. (1) that is written for vector quantities. This research may lead to new detection algorithms tailored to exploit the Johnson S

*statistics. Our*

_{U}## Appendix A. The lognormal distribution

Let *x* be a normal distribution, $x~N(\mu ,{\sigma}^{2})$, with density

By transformation of variables, $y=\mathrm{exp}(x)$ follows the standard two-parameter lognormal distribution, $y~LN(\mu ,{\sigma}^{2})$, with density

The population parameters $(\mu ,{\sigma}^{2})$ are the mean and variance of the associated normal distribution. We will now present some useful properties of the lognormal distribution.

- (i) Statistical properties (moments)
The raw

*n*^{th}moments of the lognormal distribution are given by:from which the following moments may be derived (using Appendix C) to be

$$\begin{array}{l}E(y)=\mathrm{exp}(\mu +{\scriptscriptstyle \frac{1}{2}}{\sigma}^{2}),\\ V(y)=[\mathrm{exp}({\sigma}^{2})-1]E{(}^{y},\\ S(y)=[\mathrm{exp}({\sigma}^{2})+2]\sqrt{\mathrm{exp}({\sigma}^{2})-1,}\\ K(y)=\mathrm{exp}(4{\sigma}^{2})+2\mathrm{exp}(3{\sigma}^{2})+3\mathrm{exp}(2{\sigma}^{2})-3.\end{array}$$The median of the lognormal distribution is simply $median(y)=\mathrm{exp}(\mu )$, and the mode is given by $\text{mode}(y)=\mathrm{exp}(\mu -{\sigma}^{2})$. Defining Pearsonian coefficients ${\beta}_{1}=S{\left(y\right)}^{2}$ and ${\beta}_{2}=K\left(y\right)$, the equation for the lognormal curve in the

*β*_{1},*β*_{2}plane (see Fig. 2) is given by - (ii) Estimation using the method of moments
The population parameters for a lognormal distribution $y~LN(\mu ,{\sigma}^{2})$ that matches the mean $E(x)$ and variance $V(x)$ of a random variable

*x*, can be found by solving the simultaneous equations $\{E(y)=E(x),\text{\hspace{0.17em}}\text{\hspace{0.17em}}V(y)=V(x)\}$ to be$$\left\{\begin{array}{c}{\sigma}^{2}=\mathrm{ln}\left(1+\frac{V(x)}{E{(x)}^{2}}\right)\\ \mu =\mathrm{ln}\left[E(x)\right]-{\scriptscriptstyle \frac{1}{2}}{\sigma}^{2}\end{array}\right\}.$$Note that when attempting to fit a distribution to sampled data, other methods are usually preferred to the method of moments (maximum likelihood, for example) due to higher efficiencies [28]. Hybrid techniques may also be used, for example, replacing the mean by the median in the method of moments gives

$$\left\{\begin{array}{c}{\sigma}^{2}=\mathrm{ln}\left(\frac{1}{2}+\frac{\sqrt{median{(x)}^{2}+4V(x)}}{2\text{\hspace{0.17em}}median(x)}\right)\\ \mu =\mathrm{ln}\left[median(x)\right]\end{array}\right\}.$$However, for fitting theoretical distributions, where the moments $E(x)$ and $V(x)$ are known exactly, there are no concerns in using the method of moments.

In the properties below, let ${y}_{i}~LN({\mu}_{i},{\sigma}_{i}^{2})$ be independent lognormal variates (i.e., $E({y}_{i}^{n}{y}_{j}^{k})=E({y}_{i}^{n})E({y}_{j}^{k})$) and $c$ a positive constant. We have analytically the following properties:

- (iii) Scaling: $z=cy~LN(c+\mu ,{\sigma}^{2}),$
- (iv) Inverse: $z=1/y=\mathrm{exp}(-x)~LN(-\mu ,{\sigma}^{2}),$
- (v) Multiplication: $z={\displaystyle \prod {y}_{i}}~LN({\displaystyle \sum {\mu}_{i}},{\displaystyle \sum {\sigma}_{i}^{2}}),$
- (vi) Division: $z={y}_{1}/{y}_{2}~LN({\mu}_{1}-{\mu}_{2},{\sigma}_{1}^{2}+{\sigma}_{2}^{2}).$
The following two properties are approximately true, where in property (viii) the independence assumption is dropped:

- (vii) Addition (approximation with Fenton-Wilkinson [19], ):$$\begin{array}{c}z={\displaystyle \sum _{i}{y}_{i}}~LN({\mu}_{z},{\sigma}_{z}^{2}),\\ {\mu}_{z}=\mathrm{ln}\left({\displaystyle \sum _{i}E({y}_{i})}\right)-{\scriptscriptstyle \frac{1}{2}}{\sigma}_{z}^{2},\\ {\sigma}_{z}^{2}=\mathrm{ln}\left(1+\frac{{\displaystyle \sum _{i}V({y}_{i})}}{{\left({\displaystyle \sum _{i}E({y}_{i})}\right)}^{2}}\right).\end{array}$$
For two lognormal variables $z={y}_{1}+{y}_{2}$ it was stated [20,21] that Fenton-Wilkinson moment-matching method breaks down if ${\sigma}_{y}<4dB$ for either

*y*_{1}or*y*_{2}. In terms of signal to noise ratio, defined as $SN{R}_{y}=\sqrt{E{(y)}^{2}/V(y)}$, this restriction can be shown (by solving ${\sigma}_{y}=\mathrm{ln}(1+SN{R}_{y}^{-2})<4dB$) to give the condition that a minimum$SN{R}_{y}>0.3$ is required for the Fenton-Wilkinson approximation to hold. This property may be combined with (iii) and (v) to show that a linear combination of lognormal variates (with positive coefficients) is lognormally distributed, as long as the*SNR*restriction is met. - (viii) Addition of correlated lognormals
For two lognormals ${y}_{1}~LN({\mu}_{1},{\sigma}_{1}^{2})$ and ${y}_{2}~LN({\mu}_{2},{\sigma}_{2}^{2})$ that are correlated with correlation coefficient

$${\rho}_{12}=corr({y}_{1,}{y}_{2})=\frac{E({y}_{1}{y}_{2})-E({y}_{1})E({y}_{2})}{\sqrt{V({y}_{1})V({y}_{2})}},$$we follow the spirit of Fenton-Wilkinson and use the method of moments—property (ii)—to find $z={y}_{1}+{y}_{2}~LN({\mu}_{z},{\sigma}_{z}^{2})$ such that $E(z)=E({y}_{1})+E({y}_{2})$ and $V(z)=V({y}_{1})+V({y}_{2})+2{\rho}_{12}\sqrt{V({y}_{1})V({y}_{2})}$ are matched exactly. A sum of three correlated lognormals with correlations ${\rho}_{12}$, ${\rho}_{13}$, and ${\rho}_{23}$ is easily computed in the same manner:

$$\begin{array}{c}z={y}_{1}+{y}_{2}+{y}_{3}~LN({\mu}_{z},{\sigma}_{z}^{2}),\\ E(z)=E({y}_{1})+E({y}_{2})+E({y}_{3}),\end{array}$$and

- (ix) Moments of truncated lognormal
For a lognormal variate $y~LN(\mu ,{\sigma}^{2})$ that is truncated to be within $0\le y\le 1$ (e.g., when

*y*is a transmission that may not be greater than 1) the*n*^{th}raw moment may be written in terms of the untruncated moments:$$E({y}^{n})=\frac{erfc(\frac{\mu +n{\sigma}^{2}}{\sqrt{2{\sigma}^{2}}})}{erfc(\frac{\mu}{\sqrt{2{\sigma}^{2}}})}\mathrm{exp}(n\mu +{\scriptscriptstyle \frac{1}{2}}{n}^{2}{\sigma}^{2})$$where $erfc(\cdot )=1-erf(\cdot )$ is the complementary error function. If the associated normal parameters $(\mu ,{\sigma}^{2})$ are such that the probability Pr(y>1) is negligible (i.e., no truncation is required), the ratio

$$\frac{erfc(\frac{\mu +n{\sigma}^{2}}{\sqrt{2{\sigma}^{2}}})}{erfc(\frac{\mu}{\sqrt{2{\sigma}^{2}}})}\to 1,$$and the expression for untruncated raw moments is recovered.

- (x) Central limit theorem for multiplicative processes
Consider a product of independent non-negative variates, $y={\displaystyle \prod {z}_{i}}$. Taking the logarithm of this expression leads to a sum of (real-valued) independent variates, $x=\mathrm{ln}y={\displaystyle \sum \mathrm{ln}{z}_{i}}$. If, by the central limit theorem $x\to N(\mu ,{\sigma}^{2})$, then the relationship $y=\mathrm{exp}(x)$ immediately implies that $y\to LN(\mu ,{\sigma}^{2})$.

## Appendix B. The Johnson S_{U} distribution

_{U}

The S* _{U}* distribution is based on the following transformation of a normally distributed variate $x~N(0,1)$:

where the parameters *a* and $b>0$ are location and scale parameters, respectively, and *c* and $d>0$ are shape parameters. The density for $y~{S}_{U}(a,b,c,d)$ is given by

The S* _{U}* distribution can describe the lognormal and normal distributions as limiting cases: ${S}_{U}\to {S}_{L}$ when $c\to \pm \infty $, and ${S}_{U}\to N$ when $d\to \infty $. The S

*distribution is symmetric (skewness is zero) when $c=0$. Some useful properties of the S*

_{U}*distribution are presented below.*

_{U}- (i) Statistical properties (moments)
The

*n*^{th}moments around arbitrary location $\theta $ are given by$$E\left[{(y-\theta )}^{n}\right]={\displaystyle \sum _{j=0}^{n}\left(\begin{array}{c}n\\ j\end{array}\right){\left(-\frac{b}{2}\right)}^{j}{\left(a-\theta \right)}^{n-j}}{\displaystyle \sum _{r=0}^{j}\left(\begin{array}{c}j\\ r\end{array}\right){\left(-1\right)}^{r}{e}^{(j-2r)\Omega}{\omega}^{{(j-2r)}^{2}}}$$where $\omega =\mathrm{exp}({2}^{-1}{d}^{-2})$ and $\Omega =c{d}^{-1}$. Raw moments may be found from this equation by setting $\theta =0$, central moments are found by setting $\theta =E\left(y\right)$. Using Appendix C, the following moments may be derived:

$$\left\{\begin{array}{c}E=a-b\omega \mathrm{sinh}(\Omega ),\\ V={\scriptscriptstyle \frac{1}{2}}{b}^{2}({\omega}^{2}-1)(1+{\omega}^{2}\mathrm{cosh}2\Omega ),\\ S=-\omega {\left(\frac{{\omega}^{2}-1}{2}\right)}^{1/2}\frac{{\omega}^{2}({\omega}^{2}+2)\mathrm{sinh}3\Omega +3\mathrm{sinh}\Omega}{{\left(1+{\omega}^{2}\mathrm{cosh}2\Omega \right)}^{3/2}},\\ K={\scriptscriptstyle \frac{1}{2}}\frac{{\omega}^{4}({\omega}^{8}+2{\omega}^{6}+3{\omega}^{4}-3)\mathrm{cosh}4\Omega +4{\omega}^{4}({\omega}^{2}+2)\mathrm{cosh}2\Omega +3(2{\omega}^{2}+1)}{{\left(1+{\omega}^{2}\mathrm{cosh}2\Omega \right)}^{2}}\end{array}\right\}.$$Note that references [28, p42, Eq. (2).62] and [18, Eq. (6).74] both have errors in their expressions for moments of the S

distribution. In the_{U}*β*_{1},*β*_{2}plane, the Sdistribution defines an unbounded region “below” the lognormal curve (see Fig. 2)._{U} - (ii) Estimation using method of moments
We follow the iterative numerical procedure suggested by Elderton and Johnson [36] for finding the $(c,d)$ parameters that match the desired ${\beta}_{1}={S}^{2}$ and ${\beta}_{2}=K$ coordinates of the distribution to be fit (in our model these are computed analytically; for fitting sampled data alternative methods such as quantiles [37] may be preferred). For the S

distribution the sign of the skewness is determined by the sign of_{U}*c*($c<0$ means $S>0$), but because the fitting routine operates on*β*_{1}(thereby losing the sign of the skewness), the computed value of*c*will always be non-positive. Thus,*c*should be multiplied by $-1$ if the skewness to be fit was negative. Given values for*c*and*d*, it is then straightforward to determine the value of*b*that matches the variance, and then the value of*a*that matches the mean. This is the same procedure used in [38].If the distribution to be fit is in the S

region or on the lognormal curve, then the first four moments will be matched exactly by this fitting procedure. If the distribution to be fit lies in the S_{U}region (above the lognormal curve in Fig. 2) then the skewness and kurtosis will not match exactly. To ensure that the “best” possible S_{B}approximation is returned by the iterative procedure, we first find the point on the lognormal curve with the smallest Euclidean distance in the_{U}*β*_{1},*β*_{2}plane to the Spoint to be fit. This can be found using Newton’s method [39] to minimize the distance $D=\sqrt{{({\beta}_{1}-{S}^{2})}^{2}+{({\beta}_{2}-K)}^{2}}$ where_{B}*β*_{1}and*β*_{2}are given by Eq. (A1) and*S*^{2}and*K*are the coordinates of the point in the Sregion._{B}*D*is only a function of*σ*; the solution value*σ*is plugged back into Eq. (A1) to find the coordinates on the lognormal curve closest to the S_{min}point. Then, applying Elderton and Johnson’s routine to this point finds population parameters {_{B}*a*,*b*,*c*,*d*} corresponding to the optimal Sapproximation (and is numerically equivalent the best possible S_{U}approximation). This procedure is necessary to return the optimal S_{L}fit because the Elderton and Johnson procedure approaches the solution while preserving the correct value of_{U}*β*_{2}. Thus, it would return an Sapproximation that matched the mean, variance, and kurtosis, but not the skewness. The optimal solution accepts some error in the kurtosis in order to reduce the error in the skewness to achieve minimum total error._{U}Note that there is a non-uniqueness in approximating a point on the lognormal curve with the S

distribution. Since the lognormal is a limiting form of the S_{U}, the S_{U}can get indistinguishably close to the lognormal distribution without actually reaching it; for a given tolerance there will be a circular arc in the S_{U}region centered on the lognormal point in the_{U}*β*_{1},*β*_{2}plane defining particular Sdistributions that are equidistant to the lognormal point (and therefore have the same value for the total error and are thus “equally good” at approximating the lognormal point). In practice, however, for a fixed tolerance, using the iterative fitting routine will always produce the same set of S_{U}population parameters since the point is approached from a specific direction (along a line of constant_{U}*β*_{2}). - (iii) Translation and Scaling:
Let

*k*and*λ*be constants and $y~{S}_{U}(a,b,c,d)$. Then, it can be shown that $z=ky+\lambda ~{S}_{U}\left(ka+\lambda ,\text{\hspace{0.17em}}\left|k\right|b,\text{\hspace{0.17em}}\mathrm{sgn}(k)c,\text{\hspace{0.17em}}d\right)$. The absolute value and signum (sign) functions are needed because by convention,*b*and*d*are always nonnegative. In (B1), since*sinh*(·) is an odd function and*x*is a zero-mean symmetric distribution, the following transformations are all equivalent, $-\mathrm{sinh}(x-c)=\mathrm{sinh}(-x+c)=\mathrm{sinh}(x+c)$. Therefore there is an invariance in the Johnson Sdistribution to simultaneously flipping the sign of the_{U}*c*and*b*parameters, and no loss in generality in always taking*b*to be positive.

## Appendix C. [*E*, *V*, *S*, *K*] as a function of moments

Let *z* be a random variable whose raw moments $E({z}^{n})$ exist for *n* =1, 2, 3, 4. Then the mean, variance, skewness, and kurtosis may be written as a function of the raw moments as

respectively. These expressions may be derived using the properties of the expectation operator, which are summarized in Appendix D.

## Appendix D. Properties of the expectation operator

The definition of the expectation operator is

where *p _{x}*(

*x*) is the

*x*, and the integral is computed over the entire domain of

*x*. It follows directly from the definition that $E(kx+\lambda )=kE(x)+\lambda $ where

*k*and

*λ*are constants. It can also be established, regardless of independence between variates

*x*and

*y*, that $E[f(x)+g(y)]=E[f(x)]+E[g(y)]$. Additionally, if

*x*and

*y*are independent—such that the joint probability density ${p}_{x,y}(x,y)={p}_{x}(x){p}_{y}(y)$—then $E[f(x)g(y)]=E[f(x)]E[g(y)]$.

For convenience we give the moments of $z=x\pm y$ and of $z=xy$ (with *x* and *y* independent), which are frequently used in deriving moments of the model outputs. The moments of $z=x\pm y$ (for example, $\Delta M={M}^{(1)}-{M}^{(0)}$ and $\Delta T={B}^{-1}\left({L}_{in}\right)-{T}_{c}$) are given by

For emissivity terms $\epsilon =(1-t)$ that are of the form $z=1-y$, (D1) may be used with $E({x}^{n})=1$. The moments of $z=xy$ (with *x* and *y* independent) are given by

## References and links

**1. **M. L. Polak, J. L. Hall, and K. C. Herr, “Passive Fourier-transform infrared spectroscopy of chemical plumes: an algorithm for quantitative interpretation and real-time background removal,” Appl. Opt. **34**(24), 5406–5412 (1995). [CrossRef] [PubMed]

**2. **D. F. Flanigan, “Prediction of the limits of detection of hazardous vapors by passive infrared with the use of modtran,” Appl. Opt. **35**(30), 6090–6098 (1996). [CrossRef] [PubMed]

**3. **A. Ben-David and H. Ren, “Detection, identification, and estimation of biological aerosols and vapors with a Fourier-transform infrared spectrometer,” Appl. Opt. **42**(24), 4887–4900 (2003). [CrossRef] [PubMed]

**4. **A. Ben-David, C. E. Davidson, and J. F. Embury, “Radiative transfer model for aerosols at infrared wavelengths for passive remote sensing applications: revisited,” Appl. Opt. **47**(31), 5924–5937 (2008). [CrossRef] [PubMed]

**5. **D. Manolakis, “Signal processing algorithms for hyperspectral remote sensing of chemical plumes”, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 1857 – 1860, Digital Object Identifier: 10.1109/ICASSP.2008.4517995 (2008). [CrossRef]

**6. **A. Ifarraguerri and A. Ben-David, “Impact of atmospheric boundary layer turbulent temperature fluctuations on remote detection of vapors by passive infrared spectroscopy,” Opt. Express **16**(22), 17366–17382 (2008). [CrossRef] [PubMed]

**7. **A. Ben-David, “Remote detection of biological aerosols at a distance of 3 km with a passive Fourier transform infrared (FTIR) sensor,” Opt. Express **11**(5), 418–429 (2003). [CrossRef] [PubMed]

**8. **H. C. van de Hulst, *Multiple Light Scattering Tables, Formulas and Application* (Academic, 1980).

**9. **K. N. Liou, *An Introduction to Atmospheric Radiation*, 2nd ed. (Academic, 2002).

**10. **G. L. Stephens, P. M. Gabriel, and S.-C. Tsay, “Statistical radiative transport in one-dimensional media and its application to the terrestrial atmosphere,” Transp. Theory Stat. Phys. **20**(2-3), 139–175 (1991). [CrossRef]

**11. **J. Kerekes and J. Baum, “Full spectrum spectral imaging system analytical model,” IEEE Trans. Geosci. Rem. Sens. **43**(3), 571–580 (2005). [CrossRef]

**12. **C. Quang, F. Dalaudier, A. Roblin, V. Rialland, and P. Chervet, “Statistical model for atmospheric limb radiance structure: application to airborne infrared surveillance systems,” Proc. SPIE **7108**, 710805, 710805-9 (2008). [CrossRef]

**13. **A. Ishimaru, *Wave Propagation and Scattering in Random Media* (Academic, 1978).

**14. **N. Byrne, “3D Radiative transfer in stochastic media,” in *3D Radiative Transfer in Cloudy Atmospheres*, A. Marshak and A. B. Davis, eds. (Springer, 2005).

**15. **D. E. Lane-Veron and C. J. Somerville, “Stochastic theory of radiative transfer through generalized cloud fields,” J. Geophys. Res. **109**(D18), D18113 (2004). [CrossRef]

**16. **J. Aitchison and J. A. C. Brown, *The Lognormal Distribution* (Cambridge University Press, 1957).

**17. **E. Limpert, W. A. Stahel, and M. Abbt, “Lognormal distributions across the sciences: keys and clues,” Bioscience **51**(5), 341–352 (2001). [CrossRef]

**18. **N. L. Johnson, S. Kotz, and N. Balakrishnan, *Continuous univariate distributions*, 2nd ed. (Wiley, 1994), Vol. 1.

**19. **L. F. Fenton, “The sum of lognormal probability distributions in scatter transmission systems,” IRE Trans. Commun. Syst. **CS-8**, 57–67 (1960).

**20. **G. L. Stuber, *Principles of Mobile Communications* (Kluwer Academic Publishers, 1996).

**21. **N. B. Mehta, J. Wu, A. F. Molisch, and J. Zhang, “Approximating a sum of random variables with a lognormal,” IEEE Trans. Wirel. Comm. **6**(7), 2690–2699 (2007). [CrossRef]

**22. **D. Manolakis, D. Marden, and G. A. Shaw, “Hyperspectral image processing for automatic target detection applications,” Lincoln Lab. J. **14**, 79–116 (2003).

**23. **A. Schaum, “Adapting to change: the CFAR problem in advanced hyperspectral detection,” in Proceedings of the 36th Applied Imagery Pattern Recognition Workshop (IEEE Computer Society Washington, DC, 2007), pp. 15–21.

**24. **D. Manolakis, R. Lockwood, T. Cooley, and J. Jacobson, “Is there a best hyperspectral detection algorithm?” Proc. SPIE **7334**, 733402, 733402-16 (2009). [CrossRef]

**25. **H. Ren, “Anomaly detection in hyperspectral imagery: second and higher-order statistics-based algorithms,” in *Recent Advances in Hyperspectral Signal And Image Processing*, C–I Chang, ed. (Transworld Research Network, 2006), Chap. 12.

**26. **A. Stuart and K. Ord, *Kendall’s Advanced Theory of Statistics: Distribution Theory,* 6th ed. (Oxford University, 2007), Vol. 1.

**27. **N. L. Johnson, “Systems of frequency curves generated by methods of translation,” Biometrika **36**(1-2), 149–176 (1949). [CrossRef] [PubMed]

**28. **J. K. Ord, *Families of Frequency Distributions* (Griffin, 1972), Chap. 2.

**29. **S. M. Kay, *Fundamentals of statistical signal processing: detection theory* (Prentice Hall PTR, 1998), Vol. II.

**30. **A. Ben-David, S. K. Holland, G. Laufer, and J. D. Baker, “Measurements of atmospheric brightness temperature fluctuations and their implications on passive remote sensing,” Opt. Express **13**(22), 8781–8800 (2005). [CrossRef] [PubMed]

**31. **J. H. Seinfeld and S. N. Pandis, *Atmospheric Chemistry and Physics* (Wiley, 1998).

**32. **S. Chandrasekhar, *Radiative Transfer* (Dover, 1960).

**33. **N. S. Kopeika, *A System Engineering Approach to Imaging* (SPIE Optical Engineering Press, 1998), Chap. 15.

**34. **A. Berk, G. P. Anderson, P. K. Acharya, L. S. Bernstein, L. Muratov, J. Lee, M. J. Fox, S. M. Adler-Golden, J. H. Chetwynd, M. L. Hoke, R. B. Lockwood, T. W. Cooley, and J. A. Gardner, “MODTRAN5: A Reformulated Atmospheric Band Model with Auxiliary Species and Practical Multiple Scattering Options,” Proc. SPIE **5655**, 88–95 (2005). [CrossRef]

**35. **R. O. Duda, P. E. Hart, and D. G. Stork, *Pattern Classification,* 2nd ed. (Wiley, 2001).

**36. **W. P. Elderton and N. L. Johnson, *Systems of Frequency Curves* (Cambridge University Press, 1969).

**37. **R. E. Wheeler, “Quantile estimators of Johnson curve parameters,” Biometrika **67**(3), 725–728 (1980). [CrossRef]

**38. **I. D. Hill, R. Hill, and R. L. Holder, “Algorithm AS99: fitting Johnson curves by moments,” J. R. Stat. Soc., Ser. C, Appl. Stat. **25**(2), 180–189 (1976).

**39. **F. S. Acton, *Numerical Methods That Work* (The Mathematical Association of America, 1990).