## Abstract

Electro-optical target acquisition models predict the probability that a human observer recognizes or identifies a target. To accurately model targeting performance, the impact of imager blur and noise on human vision must be quantified. In the most widely used target acquisition models, human vision is treated as a “black box” that is characterized by its signal transfer response and detection thresholds. This paper describes an engineering model of observer vision. Characteristics of the observer model are compared to psychophysical data. This paper also describes how to integrate the observer model into both reflected light and thermal sensor models.

© 2009 OSA

## 1. Introduction

In the most widely used target acquisition models, the observer is treated as a “black box.” The observer is represented by Fourier domain characteristics like a transfer response and frequency-dependent detection thresholds. The U.S. Army currently uses this type of target acquisition model [1,2]. Published material supports the ability of these models to predict target recognition and identification probabilities [3–9]. These models also accurately predict minimum resolvable temperature (MRT) and minimum resolvable contrast (MRC) [10]. However, previous publications do not compare the observer vision model used in [3–10] directly to psychophysical data. The validity of the vision model is inferred from the success of the target acquisition model. This paper compares predictions of the engineering vision model to vision data.

The observer model used in [3–10] represents a departure from previous assumptions about the effect of noise on visual detection thresholds. Other target acquisition models are based on the conclusions of van Meeteren and Valeton [11], whereas the current Army models are based on [12,13]. The two observer models are described and contrasted.

Integrating the observer model into the target acquisition models is also described. Imagers of reflected light are often modeled differently than thermal imagers. For reflected light imagers, signal and noise calculations are based on integrating the detector electron flux. For thermal imagers, noise calculations are often based on detectivity. That is, imager noise is specified by a noise equivalent power (NEP). Both treatments are valid and lead to equivalent performance predictions. However, predicting human performance when the observer is using an imager requires that internal eye noise be root sum squared (RSS) with imager noise. The different procedure for implementing the RSS in reflected light and thermal imagers is described.

Section 2 provides background for the psychophysical data discussion. This section describes the contrast threshold function (CTF) The inverse of CTF is called the contrast sensitivity function (CSF). The dependence of the target acquisition model on observer characteristics is also described in Section 2. Section 3 describes the observer model. Section 3 also contrasts the current observer model to widely used alternatives that are based on the conclusions of van Meeteren and Valeton [11]. Section 4 describes how the observer vision model is integrated into target acquisition models. Section 5 compares observer characteristics to psychophysical data. Conclusions are in Section 6.

## 2. Background

Before describing the observer vision model, a brief summary of the target acquisition model is presented. The summary illustrates the importance of the observer model in making accurate performance predictions. This section also describes naked eye CTF measurement. CTF is a common way of quantifying visual performance. The CTF concept is critical to understanding the performance model.

#### 2.1 Contrast Threshold Function

CTF and its inverse CSF quantify the spatial frequency response of human vision. A sine wave pattern is presented to an observer, and a response is solicited as to whether the sine wave is visible. In Fig. 1 , the observer is viewing a sine-wave pattern. While holding average luminance to the eye constant, the contrast of the bar pattern is lowered until no longer visible to the observer. That is, the dark bars are lightened and the light bars darkened, holding the average constant, until the bar-space-bar pattern disappears. A decrease in contrast from left to right is shown at top right in the figure. The goal of the experiment is to measure the amplitude of the sine wave that is just visible to the observer.

Although experimental practice varies, one procedure is described in order to fully explain the CTF concept. CTF data is sometimes taken using two alternative, forced choice (2afc) experiments. In these experiments, the observer is shown one blank field and one with the sine wave. The observer must choose which field has the sine wave. These experiments measure the sine wave threshold where the observer chooses correctly half the time independent of chance. That is, the 2afc experiment provides the threshold which yields a 0.75 probability of correct choice. The procedure is repeated for various bar spacings—that is, for various spatial frequencies. See the bottom right of Fig. 1 for an illustration of spatial frequency; high spatial frequency is at the left, lower spatial frequency to the right. The curve of threshold contrast versus spatial frequency for each display luminance is called the CTF at that luminance.

Note that contrast threshold is the sine wave amplitude at which the observer is correct half of the time independent of chance. There is a finite probability of seeing the sine wave at reduced contrast, and there is some chance of not seeing the sine wave at contrasts above threshold. The function δ(C/CTF) describes the probability of seeing a signal with contrast C when eye threshold is CTF. The function δ is available from published data.

#### 2.2 Target acquisition model

The current Army target acquisition model is based on the Targeting Task Performance (TTP) image quality metric (IQM). The TTP is a member of a generic class of IQM called modulation transfer function (MTF) based metrics [14]. The values of different MTF based IQM are calculated by varying *n* and *r* in Eq. (1). The TTP IQM is based on finding the best *n* and *r* in Eq. (1) and 2 to match predictions to measured probabilities from target identification experiments [3]. Experimental PID data are best matched by the values *n* equal one and *r* equal to zero. Equation (1) with *n* equal to one and *r* equal to zero defines the TTP IQM.

- PID = probability of correct identification
*Θ*= value of TTP metric*ξ*= horizontal spatial frequency in cycles per milliradian (mrad^{−1})*η*= vertical spatial frequency in mrad^{−1}*CTF*= CTF when observer views sine waves through imager_{sys}*C*= Fourier transform of target modulation contrast on the display_{tgt}*δ (C*= probability of seeing contrast C_{tgt}/CTF_{sys})_{tgt}given threshold CTF_{sys}*R*= target range in kilometers_{ng}*Φ84*=*Φ*needed to achieve PID of 0.84

The difficulty of identifying one member of a target set depends on how much that target looks like other members of the set. *Φ84* is the value of *Φ* that results in 0.84 probability of task performance. *Φ84* is determined empirically for each target set. *Φ84* is different for reflected light and thermal imagers. This is because the visual cues are different between those spectral bands. However, once *Φ84* is known for a target set, the PID for any imager operating in the same spectral band can be predicted. The PID is a function of the ratio *Φ* to *Φ84*.

To predict PID versus range, the following procedure is used. *C _{tgt}* is found through the application of radiometric models. The naked eye CTF is degraded by imager blur and noise to establish

*CTF*.

_{sys}*Φ*is found at each range by a numerical integration corresponding to Eq. (1). PID is predicted using

*Φ*and a known

*Φ84*in Eq. (2).

To predict the probability of identifying a specific target, Eq. (1) is used with the Fourier transform of the targeted object. *C _{tgt}(ξ,η,range)* is range dependent because

*ξ*and

*η*describe angular frequencies at the imager. The target becomes angularly smaller as range between target and imager increases. The Fourier transform of the target is used in the specific object (SO) form of the target acquisition model [4,5].

In the widely used and distributed detect, recognize, and identify (DRI) models, however, the assumption of constant target contrast is made. That is, in the frequency domain, *C _{tgt}* has the same amplitude at all spatial frequencies. Further,

*C*is range independent except for the effect of atmosphere. The atmosphere makes

_{tgt}*C*range and weather dependent. However,

_{tgt}*C*is not associated with any target structure or size.

_{tgt}Using a constant *C _{tgt}* greatly simplifies the model while retaining the utility of the model for imager design purposes. Predicting the probability of identifying a specific target is not possible using the DRI model. However, the DRI model provides the following capabilities. (a) The DRI model predicts the average probability of identifying objects in a sufficiently diverse target set. (b) Since the “target” has frequency content at all spatial frequencies, good imager frequency transfer response is rewarded. Specifying good DRI performance is similar to specifying good optics MTF. A small optical blur and low imager noise results in improved DRI performance. (c) Most importantly, the DRI model provides a means of optimizing imager design for human viewing. The model quantifies the impact of imager design decisions on human targeting performance.

To further simplify the Army DRI models, separability in Cartesian coordinates is assumed. This is possible because the Fourier transform of the target is not used. Equation (1) becomes Eq. (3).

One problem with implementing Eq. (3) is that the CTF of the eye is not separable. Nonetheless, the separable model is pursued by suggesting that the geometric mean of horizontal and vertical eye CTF provides a reasonable representation of two dimensional performance.

Imager blur and noise are introduced into Eq. (1) and 3 through CTF_{sys}. That is, blur and noise degrade human vision. Accurately quantifying the effect of blur and noise on CTF is a critical step in modeling targeting performance.

## 3. Observer Model

A simple, engineering model of the eye and visual cortex is shown in Fig. 2 . This figure shows the MTF associated with the eyeball and visual cortex. The figure also shows points where noise is injected into the visual signal. Based on the experiments of Nagaraja [15] and others [16–19], the effect of noise can be explained by assuming the brain is taking the RSS of display noise and some internal eye noise. Further, for display luminance above the de-Vries-Rose Law region and for foveated and fixated targets, the RSS of eye and display noises occurs in the visual cortex [19].

In Eq. (5), *n _{eye}* is cortical noise and

*σ*is display noise filtered by the eyeball and visual cortex bandpass MTF.

*σ*is also appropriately scaled in amplitude. Using Weber’s Law, eye noise is proportional to display luminance. where α is an empirically established calibration factor. Once α is established through experiments, all the parameters in Eq. (6) are known or measurable, and CTF

_{sys}is calculated.

Although α is established empirically, the same value is used consistently for all types of imagers and to predict target acquisition probabilities and bar pattern thresholds. That is, the same value of α is used here as in [3–10,12,13]. It is true that psychophysical data like eyeball MTF and naked eye CTF varies between observers. Adjusting model predictions based on known observer characteristics is, of course, sensible. However, fitting model predictions to data based on assumed variations in the observer obscures all model errors.

Model calculation starts with measured naked eye thresholds and then estimates the threshold elevation that results from adding imager blur and noise. The target acquisition models use a numerical approximation to measured naked eye CTF provided by Barten [20]. The Barten numerical fit is selected based on Beaton’s comparison of several numerical approximations to experimental data [21]. The Barten numerical approximation to naked eye CTF data is given by Eq. (7) through 9.

The independent variables are the luminance of the display *L* in fL and the square root of the angular display size *w* in degrees. In the Army models, *w* equals 15 degrees. This is chosen as a nominal display field of view (FOV) at the eye.

Eyeball MTF is also needed to predict the effect of noise on threshold. Formulas to predict eyeball MTF are taken from Stefanik’s distillation of the data in Overington [22,23]. Total eyeball MTF is predicted by multiplying optical, retina, and tremor MTF. Optical MTF depends on pupil diameter. Pupil diameter versus light level is given by Table 1 . For each pupil diameter, the parameters i0 and f0 are given by Table 2 . Equation (10) gives optics MTF. The total eyeball MTF is then the product of optics, retina, and tremor MTF.

The visual cortex bandpass filters *B(ξ)* are taken from Barten [24], who created a numerical fit for the visual cortex filters by using psychophysical data. See Eq. (13). Again, this is a numerical fit and not a theoretical result. In Eq. (13), *ξ* is the frequency of the sine wave grating, and *ξ’* is a dummy variable used to integrate over noise bandwidth.

Equation (14) predicts CTF_{sys} for horizontal gratings viewed through an imager. A similar formula is used for vertical gratings.

*α*= 169.6 root-Hertz mrad; see Section 4- SMAG = system magnification
*σ*= noise affecting threshold at grating frequency ξ; see Eq. (15)*ρ(ξ,η)*= noise spectral density in fL second^{1/2}mrad*H*= System MTF from scene through display_{sys}(ξ)*H*= eyeball MTF_{eye}(ξ)*B(ξ)*= Filters in the visual cortex*D(ξ)*= MTF of display blur$${\sigma}^{2}(\xi )={\displaystyle \colorbox[rgb]{}{$\underset{-\infty}{\overset{\infty}{\int}}{\displaystyle \colorbox[rgb]{}{$\underset{-\infty}{\overset{\infty}{\int}}{\left|B(\xi \text{'}/\xi )\text{\hspace{0.05em} \hspace{0.05em}}D(\xi \text{'})\text{\hspace{0.05em}}{H}_{eye}\left(\xi \text{'}\right)\text{\hspace{0.05em}}\right|}^{2}{\left|D(\eta )\text{\hspace{0.05em}}{H}_{eye}\left(\eta \right)\text{\hspace{0.05em}}\right|}^{2}{\rho}^{2}(\xi ,\eta )\text{\hspace{0.17em}}$}}$}}d\xi \text{'}d\eta $$

The observer model pertains to cone vision of foveated targets. Cones mediate vision down to about 0.01 foot Lambert (fL). An observer at night might set his display as low as 0.1 fL. This low display luminance is a compromise between maintaining dark adaptation and effective viewing of display information. Therefore, the display luminance levels of interest here vary from about 0.1 fL to several hundred fL. However, there is little variation in visual thresholds above about 100 fL.

The visual system filters temporally as well as spatially. However, an explicit adjustment for the variation in temporal integration of the eye is not included in Eq. (14). We do not know a priori whether variations in temporal integration alters the relationship between luminance, display noise, and cortical noise. Although increasing temporal integration certainly applies additional filtering to display noise, it also increases the gain applied to both display signal and noise.

Further, no clear evidence exists that cone temporal integration is light level dependent. Cone temporal integration does not change with a variation of photopic light level down to about 10 fL [25]. At photopic light levels, adaptation is by pigment bleaching [26]. Data is not available on cone temporal integration below 10 fL. It is certainly true, however, that most of the observed increase in temporal integration at low luminance is due to the rod system. Rods begin to come out of saturation at 10 fL. Furthermore, regardless of what is actually happening physiologically, the match between the Eq. (14) model and I2 data suggests that a temporal adjustment is not needed in Eq. (14) [10].

This does not mean, however, that the eye treats static and dynamic noise equally. Certainly anecdotal experience with electro-optical systems suggests otherwise. This is a change in the nature of display noise, not a change in the visual system. For a non-framing imager, noise is integrated for a dwell time, not the cone integration time. For a framing imager with frame rate F_{R} second^{−1}, single frame noise is √(0.04F_{R}) more effective at masking signal than dynamic noise. The value 0.04 seconds for cone integration is taken from [25].

#### 3.1 Alternative observer model

Note that Barten’s CTF theory [24] is not used in the Army models. That theory is partially based on the conclusions of van Meeteren and Valeton [11]. Based on an experiment at one photopic display luminance, they conclude that eye noise is approximately proportional to a fixed fraction of the measured, naked eye CTF. After appropriate scaling and filtering, noise modulation at the display is summed in quadrature with naked eye CTF as shown in Eq. (16). β is an empirically derived proportionality constant. Note that Eq. (14) and 16 are not equivalent.

As discussed in [12,13], Eq. (16) fails to predict experimental image intensifier (I2) data. The I2 data includes a wide variation in both display luminance and spatial frequency, and the assumptions in [11] are not appropriate. The assumption that eye noise is a fixed fraction of measured eye CTF is discussed here to make clear that it is not used in the [3–10] models.

## 4. Integrating the observer model into target acquisition models

This section describes three different procedures commonly used to model imager signal and noise. In each case, display average luminance is set to *L* fL, and this alone establishes the magnitude of eye noise. The parameter *α* is a gain factor that scales display noise in order to RSS with cortical noise. This gain factor accounts for visual mechanisms and does not depend on any imager characteristics. However, any change in the assumed relationship between imager noise and display luminance changes the value of *α*.

The radiometric calculations used to find detector photo electrons are covered in many texts and are not described here. This section focuses on the relationship between detector noise and eye noise. Further, to simplify the discussion, assume that the noise *ρ(ξ)* is spectrally uniform with spectral density *ρ*. The noise filters and noise integration are represented by *ℑ(ξ)*. Neither *SMAG* nor *H _{sys}* contribute to the current discussion; assume that both equal one. Also, this section only discusses imagers with temporally varying noise. Equation (14) simplifies to Eq. (17).

The first modeling procedure is used in this paper to analyze psychophysical experiments. The images are computer generated. The standard deviation of the pixel-to-pixel noise in a single frame is *m* fL. *F _{R}* is display frame rate. If

*F*is high, there are many frames in an eye integration time

_{R}*t*of 0.04 seconds. For a low

_{eye}*F*, there are fewer frames in a period

_{R}*t*. The standard deviation of spatial noise in an eye integration time is

_{eye}*m*√(

*t*). The signal in an eye integration time is

_{eye}F_{R}*t*.

_{eye}F_{R}L*κ*is an empirically determined proportionality constant.

Equation (19) is used to model experiments where *L* and *m* are provided by the experimenter. Imager noise *m*/√*F _{R}* is the standard deviation after averaging

*F*frames. Average display luminance

_{R}*L*is the mean value of the average frame. In Eq. (19), α equals

*κ*/√

*t*and has a value of 169.6 root Hertz.

_{eye}The second modeling procedure is used when modeling reflected light imagers like image intensifiers. *E _{photo}* is the electron flux per second in the detector. Signal is proportional to

*t*. Imager noise affecting the eye is √(

_{eye}E_{photo}*t*).

_{eye}E_{photo}For modeling procedure 2, α again equals *κ*/√*t _{eye}* and has a value of 169.6 root Hertz. Luminance is proportional to the electrons integrated in one second. Noise is the square root of signal electrons.

Model procedure 3, however, leads to a different value of *α*. In the U. S. Army thermal model, signal and noise are expressed in terms of radiant quantities. Let *S* represent the watts on the detector that raises display luminance from black to average. Let *Γ _{det}* be the NEP in watts. Using procedure 3, the typical way to express noise to signal ratio for an eye integration time is shown in Eq. (22).

*Γ*is multiplied by the square root of bandwidth which is 1/√

_{det}*t*. Equation (23) shows noise to signal ratio using model procedure 2. If

_{eye}*S*generates

*E*electrons per second in the detector, then Eq. (22) and 23 give the same numerical answer for noise to signal ratio. But the physical models are not equivalent.

_{photo}Equation (23) represents the noise to signal ratio established after an eye integration time. Equation (22) gives the same noise to signal ratio, but Eq. (22) is established after one second. As an example of the problem, the power *t _{eye} S* generates

*t*electrons after one second, not after an eye integration time.

_{eye}E_{photo}*S*produces an electron flux, whereas

*E*results from integrating electron flux over time. A parallel can be drawn between

_{photo}*E*and joules but not between

_{photo}*E*and watts.

_{photo}Modeling procedure 3 requires a different approach. Calculate signal and noise terms at one second, not at *t _{eye}* seconds.

*S*generates display luminance

*L*, and the magnitude of eye noise is the same as for modeling procedures 1 and 2. However, now eye noise is summed over a second, not

*t*seconds. Since eye noise is random, the one second eye noise RMS is 1/√

_{eye}*t*larger. CTF

_{eye}_{sys}for modeling procedure 3 is given by Eq. (25).

## 5. Comparing observer characteristics to psychophysical data

Section 5.1 compares the naked eye CTF numerical fit given by Eq. (7) through 9 to empirical data. Section 5.2 discusses predicting eyeball MTF. Direct measurements of total eyeball MTF are not available. However, the experimental bases of the individual component MTF described by Eq. (10) through 12 are discussed. Section 5.3 compares Eq. (14) predictions of the effect of non-white noise to empirical data. Barten compares his CTF model using Eq. (16) to the same data [24]. In all cases, the current comparisons use Eq. (7) through 14 without changing calibration parameters to fit the model to measurements. That is, *L* and *ξ* are determined by the experimental setup, but *w* equals 15 degrees, and *α* equals 169.6 root Hertz.

#### 5.1 Comparison to contrast threshold function measurements

A great deal of CSF data exists, but the measurement conditions are not always germane to the observer model. CSF characterizes observer signal threshold versus spatial frequency. CSF represents the observer in a Fourier domain model. To the extent practical, the sine wave stimulus should represent a single frequency. Further, the CSF data selected for comparison to the observer model represents typical display conditions.

The CSF data are selected using the following criteria [27–30]. The FOV at the eye is 6 degrees or greater. The grating is presented statically; no temporal variation in intensity occurs during the presentation period. The length of the sine wave perpendicular to the bar-space-bar modulation is constant. Also, with one exception, the data are collected using natural vision. That is, the observer uses both eyes and no artificial pupil.

CSF data are often collected with monocular viewing and an artificial pupil of 2 to 2.5 millimeters [30–32]. Monocular viewing is modeled by dividing binocular CSF by the square root of two [Section 1.802 in 33]. However, the small artificial pupil lowers the effective luminance to the eye and improves eyeball MTF. At photopic luminances, decreased luminance has a small effect on CSF. The improved eyeball MTF, however, increases CSF at high spatial frequencies. Nonetheless, some of the data in [30] are used here because it provides CSF at luminance levels not available from other sources. Also, this particular data are widely referenced [Section 1.632 in 33].

Figure 3 through 5 compare the numerical CSF generated using Eq. (7) through 9 to CSF data taken with natural vision. Figure 6 compares numerical CSF to the data of [30]. The data cover luminances from 0.03 to 300 fL. The data provide three luminances in the important range between 0.1 and 10 fL [30].

As shown in the figures, data from different experimenters using different procedures and observers does vary. However, the fit between the numerical CSF and data is good. As expected, the best fit is to van Meeteren and Voss data. This data is used in creating the Eq. (6) through 9 numerical fit. However, the fit to the remaining data is also quite adequate. The model is pessimistic at high frequencies compared to Fig. 6 data. However, this data is collected with an artificial pupil, and the discrepancy at high frequencies is expected.

### 5.1.1 The effect of field of view on contrast sensitivity

In the observer model described here, *w* in Eq. (8) is fixed at 15 degrees. However, in [20], the parameter *w* is set to the display FOV. The different approach results from different objectives. Based on Barten’s use of the [20] CSF fit in his own IQM [24], it is probable that he intended the [20] formula to encompass many visual factors. However, varying *w* in Eq. (8) is not consistent with the goal of the observer model. This section discusses the FOV parameter *w* and the reasons for maintaining a fixed value.

When FOV to the eye is small, the number of sine wave cycles presented to the observer is limited. For example, for a 2 degree FOV and a spatial frequency of 0.25 mrad^{−1}, only 8.7 sine wave cycles fit on the display. Measured CSF varies depending on the number of sine wave cycles presented to the observer [Section 1.631 in 33,34,35]. Figure 7
compares the data of Hoekstra, van der Goot, and van den Brink [33] to the data of Virsu and Rovamo [34]. The [33] data is taken at 7 fL whereas the [34] data is taken at 3 fL. CSF improves substantially up to 7 or 8 cycles with a small additional improvement to 10 or 11 cycles. This behavior occurs at all spatial frequencies. Regardless of spatial frequency, CSF does not change when ten or more cycles are displayed.

In [20], the dependence of CSF on *w* is based on the data of Carlson [35]. The data of Carlson is consistent with the data of [33] and [34] in two respects. Increasing the number of presented grating cycles improves CSF, and the relative improvement is consistent at all spatial frequencies. However, in the Carlson data, some improvement in CSF is seen when presenting tens and even hundreds of cycles. Carlson presents the sine waves against a dark surrounding field. That is, when the FOV is small, the observer views a small bright area in an otherwise dark room. As the number of sine wave cycles increases, the adapting luminance FOV also increases. This is not consistent with the procedure of [33] and [34], who present a constant adapting luminance FOV. In the Carlson experiment, the adaptive state of the eye improves as the number of presented sine wave cycles increases. CSF is affected by both the number of cycles and the adapting luminance FOV.

The Eq. (7) through 9 numerical fit accurately predicts CSF when *w* equals 15 degrees. However, for our purposes, the numerical model is not accurate for smaller fields of view. Figure 8
compares the data of Campbell and Robson [32] to the CSF numerical fit. The CSF data for both 2 degree and 10 degree fields of view are equal at and above 0.25 mrad^{−1}. At this spatial frequency, there are 9 sine wave cycles in the 2 degree FOV. Based on the data in Fig. 7, all CSF at higher frequencies should be equal, and the CSF measurements in Fig. 8 bear this out. However, as seen in the figure, the Eq. (7) through 9 numerical CSF predictions are different at higher spatial frequencies. Changing *w* in Eq. (8) does not accurately predict the effect of changing FOV.

In the observer model, CTF or CSF represents observer Fourier domain response. When sine wave patterns are used to measure frequency response, limiting the number of sine wave cycles results in an error. The proper use of Fig. 7 is to correct measurements made with a few sine wave cycles. Figure 7 is not used to degrade observer performance when the FOV is small.

#### 5.2 Comparison to eyeball modulation transfer function measurements

Overington discusses the various factors affecting eyeball MTF [23]. His goal, like ours, is to quantify all of the factors affecting natural vision. Most researchers, however, measure the MTF of the ocular optics from the cornea to the retinal surface.

Figure 9 compares Eq. (10) optical eyeball MTF to the predictions of various researchers [36–39]. A pupil size of 4 millimeters (mm) is chosen for two reasons. First, that size best represents pupil diameter for display luminances between 0.1 and 100 fL. Second, all researchers provide estimates for the 4 mm size. The [37–39] results are based on various physical measurement techniques. The [36] eyeball MTF predictions are based on a psychophysical technique that depends on a proposed theory of contrast detection. Unfortunately, the predictions from various researchers are widely spread.

However, Eq. (10) predictions are closest to [37] results. The [37] data are based on physical measurements of a large number of observers. Figure 10 shows Eq. (10) optical MTF, Eq. (11) retinal MTF, and Eq. (12) tremor MTF. The average data from [37] is also plotted in Fig. 10. All MTF are for a 4 mm pupil. Equation (10) provides a reasonable estimate for the actual data from [37]. As seen in Fig. 10, tremor and retina MTF are less important.

#### 5.3 Comparison to non-white noise measurements

In this section, predictions of Eq. (14) are compared to CSF data collected in the presence of non-white noise. Comparisons are made to [11] and [40]. Equation (14) relates to absolute sine wave threshold data, not increment thresholds or circular disks. Therefore, the data in [15] and [17] are not used. In [41], the sine wave grating stimulus is present 8.3 milliseconds (msec) out of a 283 msec frame. Although all of the experiments use framing displays, imagery presented at a 60 Hertz rate is perceived as static. Presenting the grating only once each 17 frames violates the assumption of a static stimulus. The data in [41] are therefore not used.

In the experiment of van Meeteren and Valeton [11], horizontal gratings are presented on a 180 by 180 pixel display with 29 fL average luminance. The display subtends a 1 by 1 degree FOV at the eye. The display has 8 bit quantization, and this presents problems. Quantizing the full dynamic range of the display into 256 gray levels provides a minimum observable contrast of 1/128. Further, at low contrast, the presented stimulus is not sinusoidal. Sufficient quantization levels do not exist to create a sine wave waveform with a contrast amplitude of 0.01. Nonetheless, the medium and coarse grain data are usable, because the observed contrasts permit two or more quantization levels.

The medium grain noise is generated by assigning random values to every fifth pixel horizontally and vertically. The intermediate pixel values are then interpolated. The coarse grain noise is generated by assigning random values to every 20th pixel and then interpolating the remaining values. Standard deviation of both medium and coarse noise is 0.22. The noise is static.

Figure 11
and 12
compare model predictions to data with medium and coarse grain noise, respectively. In both figures, the abscissa is spatial frequency in mrad^{−1} and the ordinate is CTF_{sys}. Data for the two subjects are shown separately.

Stromeyer and Julesz [40] study the effect of bandlimited noise on CTF_{sys}. Noise with a flat spectrum up to 20 kilohertz (kHz) is low pass, high pass, or band pass filtered and then displayed horizontally on an xy monitor. The analog filter rolls off at 42 decibels (dB) per octave. Display luminance is 5 fL. A high frequency applied to the vertical creates vertical stripes with horizontal random intensity variation. The 8 msec sweeps are separated by 16 msec. A physical mask on the display limits open screen area to 6.5 by 17 centimeters. Most observations are made from 4 meters.

Figure 13
through 16
compare model predictions to data for five band pass filters. Table 3
provides the band number, 3 dB frequencies for low pass and high pass cutoffs, and figure number. In each figure, abscissa is spatial frequency and ordinate is CTF_{sys}/CTF minus one.

Figure 17
shows model to data comparison for low pass filtered noise. The figure plots model predictions on the abscissa and data on the ordinate. The straight line represents the ideal where model predictions equal measured data. Data are shown for sine wave patterns of 0.14, 0.29, and 0.57 mrad^{−1}. For each sine wave frequency, four low pass noise cutoffs are used. The cutoffs are at the grating frequency and at half octave steps below the grating frequency. The RMS noise is maintained at 0.15 modulation.

The [40] data are widely scattered between the two subjects. The [11] data show unexpected behavior at low spatial frequencies. An absolute match between the observer model and predictions is not expected. However, the observer model does predict the effect of non-white noise on CSF. In all cases, the data trends are predicted by the model. The model accurately predicts the frequencies at which non-white noise affects CSF. Onset and cessation of CSF degradation is predicted correctly.

## 6. Conclusions

This paper describes an observer vision model where the dominant eye noise is cortical. Eye noise is proportional to display luminance. Previous observer models assumed eye noise to be a fixed fraction of naked eye CTF. The new observer model leads to success in predicting target identification probabilities and in predicting MRT and MRC experimental results. This paper compares predictions of the observer model to psychophysical data.

The match between model and data is excellent considering the experimental errors. CSF data varies considerably from experiment to experiment. Also, the experiments on the effect of non-white noise on CSF show a great deal of variability between observers. Nonetheless, the observer model accurately predicts experimental behavior. The model predicts the impact of display luminance on CSF. The model also predicts the relative impact of noise at one frequency on the CSF at a different frequency. In all cases, the model either accurately predicts the data or at least predicts the trends in observed behavior.

The choice of Barten’s numerical fit to CTF data is based on simplicity and ease of obtaining the reference. However, the numerical fit is not accurate when predicting the effect of display FOV on CSF. In the observer model, display FOV is fixed at 15 degrees. Maintaining the display FOV parameter *w* at a fixed value provides a more accurate observer model than varying *w*.

Further, the observer model provides MTF and detection thresholds to represent observer vision in Fourier domain models. Associating CSF variations caused by experimental limitations with observer vision is not consistent with the goal of the observer model. In the target acquisition models, a Fourier representation of the target is transduced by the imager for viewing by the observer. Variations in the target signature and limitations in the imager transfer response are handled separately from observer characteristics. The Barten numerical fit is used to represent measured CSF data. It is not used to model the effect of target variations or imager design changes on observer performance.

The approximation for eyeball MTF is reasonably accurate and difficult to replace. Most eyeball data is collected with eye muscles paralyzed and using artificial pupils. Generally, the scientific objective is to understand the separate contribution of the cornea, lens, vitreous humor, and retina. This information is hard to generalize to a total MTF representing the eye with a natural pupil. The Overington compilation is still valuable.

This paper also describes how to integrate the observer model into different types of imager hardware models. Some models integrate detector photo electrons to calculate signal and noise. Other models represent both signal and noise in terms of radiant flux. The observer does not change. However, the different modeling assumptions do change the value of the empirical calibration constant α that relates internal observer eye noise to imager noise. The reason for changing the value of α is discussed, and values for both types of models are provided.

## References and links:

**1. **U. S. Army, RDECOM, NVESD target acquisition models (1 June 2009), https://www.sensiac.org

**2. **J. A. Ratches, R. Vollmerhausen, and R. Driggers, “Target Acquisition Performance Modeling of Infrared Imaging Systems: Past, Present, and Future,” IEEE Sens. J. **1**(1), 31–40 (2001). [CrossRef]

**3. **R. H. Vollmerhausen, E. Jacobs, and R. Driggers, “New metric for predicting target acquisition performance,” Opt. Eng. **43**(11), 2806–2818 (2004). [CrossRef]

**4. **R. Vollmerhausen and A. L. Robinson, “Modeling target acquisition tasks associated with security and surveillance,” Appl. Opt. **46**(20), 4209–4221 (2007). [CrossRef] [PubMed]

**5. **R. H. Vollmerhausen, S. Moyer, K. Krapels, R. G. Driggers, J. G. Hixson, and A. L. Robinson, “Predicting the probability of facial identification using a specific object model,” Appl. Opt. **47**(6), 751–759 (2008). [CrossRef] [PubMed]

**6. **R. H. Vollmerhausen, R. G. Driggers, and D. L. Wilson, “Predicting range performance of sampled imagers by treating aliased signal as target-dependent noise,” J. Opt. Soc. Am. A **25**(8), 2055–2065 (2008). [CrossRef]

**7. **Richard H. Vollmerhausen, Eddie Jacobs, Jon Hixson, and Mel Friedman, “The Targeting Task Performance (TTP) Metric; A New Model for Predicting Target Acquisition Performance,” Technical Report AMSEL-NV-TR-230, U.S. Army CERDEC, Fort Belvoir, VA 22060, (2005).

**8. **R. Driggers, R. Vollmerhausen, and K. Krapels, “Target Identification Performance as a Function of Temporal and Fixed Pattern Noise,” Opt. Eng. **40**(3), 443–447 (2001). [CrossRef]

**9. **N. M. Devitt, R. G. Driggers, R. H. Vollmerhausen, S. K. Moyer, K. A. Krapels, and J. D. O’Connor, “Target recognition performance as a function of sampling,” Proc. SPIE **4372**, 74–84 (2001). [CrossRef]

**10. **R. H. Vollmerhausen, “Predicting the effect of gain, level, and sampling on minimum resolvable temperature measurements,” Opt. Eng. (to be published).

**11. **A. van Meeteren and J. M. Valeton, “Effects of pictorial noise interfering with visual detection,” J. Opt. Soc. Am. A **5**(3), 438–444 (1988). [CrossRef] [PubMed]

**12. **R. Vollmerhausen, “Incorporating Display Limitations into Night Vision Performance Models,” IRIS Passive Sensors **2**, 11–31 (1995).

**13. **H. Richard, Vollmerhausen, “Modeling the Performance of Imaging Sensors,” In *Electro-Optical Imaging: System Performance and Modeling*, Lucien Biberman Ed., (SPIE Press, 2000), Chapter 12.

**14. **Harry L. Synder, “Image quality: measure and visual performance,” in *Flat-Panel Display and CRTs*, Lawrence E. Tannas, Jr., Ed., (Van Nostrand Reinhold, 1985), Chapter 4.

**15. **N. S. Nagaraja, “Effect of Luminance Noise on Contrast Thresholds,” J. Opt. Soc. Am. **54**(7), 950–955 (1964). [CrossRef]

**16. **D. G. Pelli, “Effects of visual noise,” Doctoral dissertation at the Physiological Laboratory, Churchill College, Cambridge University, England, (1981). Available in PDF from denis.pelli@nyu.edu.

**17. **G. E. Legge, D. Kersten, and A. E. Burgess, “Contrast discrimination in noise,” J. Opt. Soc. Am. **4**(2), 391–404 (1987). [CrossRef]

**18. **D. G. Pelli and B. Farell, “Why use noise?” J. Opt. Soc. Am. A **16**(3), 647 (1999). [CrossRef]

**19. **M. Raghavan, “Sources of visual noise,” Ph.D. dissertation (Syracuse Univ., Syracuse, New York, 1989).

**20. **P. G. J. Barten, “Formula for the contrast sensitivity of the human eye,” Proc. SPIE **5294**, 231–238 (2004) (Paper available on the Web at http://www.SPIE.org). [CrossRef]

**21. **R. J. Beaton, and W. W. Farley, “Comparative study of the MTFA, ICS, and SQRI image quality metrics for visual display systems,” Armstrong Lab., Air Force Systems Command, Wright-Patterson AFB, OH, Report AL-TR-1992–0001, DTIC ADA252116, (1991).

**22. **J. Raymond, Stefanik, *Performance modeling for image intensifier systems*, Report NV-93–14, Night Vision and Electronic-Sensors Directorate, U.S. Army Research, Development, and Engineering Command, Fort Belvoir, VA, (1993).

**23. **Ian Overington, *Vision and Acquisition*, (Crane, Russak & Company, 1976), Chapters 1,2,4.

**24. **G. J. Peter, Barten, *Contrast Sensitivity of the Human Eye and Its Effect on Image Quality*, (SPIE Press, Bellingham, WA, 1999).

**25. **R. A. Moses and W. M. Hart, “The temporal responsiveness of vision,’ in *Adler’s Physiology of the Eye: Clinical Application,* (Mosby 1987).

**26. **H. Davson, *Physiology of the Eye,* 5th ed., 221 & 271, (Macmillan Academic and Professional Ltd., 1990).

**27. **A. van Meeteren and J. J. Vos, “Resolution and contrast sensitivity at low luminances,” Vision Res. **12**(5), 825–833 (1972). [CrossRef] [PubMed]

**28. **J. J. DePalma and E. M. Lowry, “Sine wave response of the visual system. II. Sine wave and square wave contrast sensitivity,” J. Opt. Soc. Am. **52**(3), 328–335 (1962). [CrossRef]

**29. **A. Watanabe, T. Mori, S. Nagata, and K. Hiwatashi, “Spatial sine-wave responses of the human visual system,” Vision Res. **8**(9), 1245–1263 (1968). [CrossRef] [PubMed]

**30. **F. L. Van Nes and M. A. Bouman, “Spatial modulation transfer in the human eye,” J. Opt. Soc. Am. **57**(3), 401–406 (1967). [CrossRef]

**31. **A. S. Patel, “Spatial resolution by the human visual system. The effect of mean retinal illuminance,” J. Opt. Soc. Am. **56**(5), 689–694 (1966). [CrossRef] [PubMed]

**32. **F. W. Campbell and J. G. Robson, “Application of Fourier analysis to the visibility of gratings,” J. Physiol. **197**(3), 551–566 (1968). [PubMed]

**33. **Kenneth R. Boss and Janet E. Lincoln, *Engineering Data Compendium: Human Perception and Performance,* Vol. 1, Harry G. Armstrong Medical Research Laboratory, Wright-Patterson Air Force Base, Ohio, (1988).

**34. **V. Virsu and J. Rovamo, “Visual resolution, contrast sensitivity, and the cortical magnification factor,” Exp. Brain Res. **37**(3), 475–494 (1979). [CrossRef] [PubMed]

**35. **C. R. Carlson, “Sine-wave threshold contrast-sensitivity function: dependence on display size,” RCA Review **43**, 675–683 (1982).

**36. **J. Rovamo, H. Kukkonen, and J. Mustonen, “Foveal optical modulation transfer function of the human eye at various pupil sizes,” J. Opt. Soc. Am. **15**(9), 2504 (1998). [CrossRef]

**37. **F. W. Campbell and R. W. Gubisch, “Optical quality of the human eye,” J. Physiol. **186**(3), 558–578 (1966). [PubMed]

**38. **A. van Meeteren, “Calculations of the optical modulation transfer function of the human eye for white light,” Opt. Acta (Lond.) **21**, 395–412 (1974). [CrossRef]

**39. **P. Artal and R. Navarro, “Monochromatic modulation transfer function of the human eye for different pupil diameters: an analytic expression,” J. Opt. Soc. Am. **11**(1), 246–249 (1994). [CrossRef]

**40. **C. F. Stromeyer 3rd and B. Julesz, “Spatial-frequency masking in vision: critical bands and spread of masking,” J. Opt. Soc. Am. **62**(10), 1221–1232 (1972). [CrossRef] [PubMed]

**41. **Z.-L. Lu and B. A. Dosher, “Characterizing the spatial-frequency sensitivity of perceptual templates,” J. Opt. Soc. Am. **18**(9), 2041–2053 (2001). [CrossRef]