## Abstract

This Discussion Paper seeks to kill off probability summation, specifically the high-threshold assumption, as an explanatory idea in visual science. In combination with a Weibull function of a parameter of about 4, probability summation can accommodate, to within the limits of experimental error, the shape of the detectability function for contrast, the reduction in threshold that results from the combination of widely separated grating components, summation with respect to duration at threshold, and some instances, but not all, of spatial summation. But it has repeated difficulty with stimuli below threshold, because it denies the availability of input from such stimuli. All the phenomena listed above, and many more, can be accommodated equally accurately by signal-detection theory combined with an accelerated nonlinear transform of small, near-threshold, contrasts. This is illustrated with a transform that is the fourth power for the smallest contrasts, but tends to linear above threshold. Moreover, this particular transform can be derived from elementary properties of sensory neurons. Probability summation cannot be regarded as a special case of a more general theory, because it depends essentially on the 19th-century notion of a high fixed threshold. It is simply an obstruction to further progress.

© 2013 Optical Society of America

## 1. INTRODUCTION

Suppose a stimulus (a compound grating, for example) engages some number (e.g., three) of different threshold units. Suppose each unit sings if its respective input exceeds a fixed threshold value (with probabilities ${p}_{1}$, ${p}_{2}$, and ${p}_{3}$, respectively) and is otherwise silent. Then the probability of detecting the compound is

or, to put it another way, detection fails only if it fails individually for all three threshold units. Equation (1) shows “probability summation”.This critique is concerned solely with Eq. (1) and its use as an explanatory idea in visual science. “Summation” is actually a misnomer here, because the probabilities are multiplied, but Eq. (1) is used as a substitute for true summation of sensory input (e.g., Bloch’s law, Ricco’s law). Probability summation itself says nothing about stimuli that exceed threshold beyond the fact of their detection. But observers are able to discriminate between supra-threshold stimuli and for flashes of light in darkness, at least, Weber’s law obtains [1]. So probability summation surreptitiously implies that whatever processes underlie supra-threshold discrimination (e.g., [2]) do not extrapolate below some fixed threshold value, but are, instead, truncated at that level [3]. The experiments reviewed below show this implication to be false; it constitutes the principal focus of this critique.

“Probability summation” entered the analysis of grating detection with Sachs *et al.* [4–6]. Previously, however, Pirenne [7] had used Eq. (1) to account for the increased frequency of seeing at absolute threshold when flashes were presented to both eyes rather than one. Then Matin [8] and Collier [9] found the frequency of seeing two brief flashes, presented one to each eye within 100 ms, to exceed the prediction of Eq. (1). Quick [10] proposed that the detectability function for contrast $C$ should be represented by a cumulative Weibull distribution function, $1-\mathrm{exp}\{-a{C}^{k}\}$ (where $a$ aligns the theoretical function with the observed threshold), which, for a suitable value of the exponent $k$, is not too far removed from the normal integral. Rewriting Eq. (1),

All this is illustrated in Fig. 1, which reproduces data for the detection of both single and compound gratings first published in [11]. Gratings of 1.2, 3.6, and $10.8\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{c}/\mathrm{deg}$ and their compounds in sine and cosine phases filled an area $5\xb0\times 7.25\xb0$ and were presented within a Gaussian temporal window of parameter ($\sigma $) 100 ms in a 2AFC detection task. In Fig. 1, contrast is expressed relative to its 75% threshold value at 0 dB (i.e., sensation level), which is different for different wavenumbers. In the sine and cosine compounds the contrasts of the single gratings were each scaled relative to their respective thresholds, so that the compounds can be thought of as combinations of three gratings of equal effective contrast. The continuous curve fitted to those data is the function

which is the Weibull detectability function adjusted for guessing when the stimulus fails to reach threshold. The parameter $k$ is 3.96 (least-squares estimate). The other continuous curve is the same function displaced $-2.8\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{dB}$, which is approximately the advantage accruing to a compound of three gratings instead of one (${3}^{-1/3.96}\equiv -2.4\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{dB}$). [At this stage the two dashed curves merely illustrate that there are other functions that also fit the data to within the limits of experimental error; see Eq. (A9) in Appendix A.]The summation rule [Eq. (2)] also applies to temporal duration. Figure 2 displays measurements by Watson [12] of contrast thresholds for a $4\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{c}/\mathrm{deg}$ grating set in a surround of equal mean luminance. The grating was presented in a Gaussian temporal envelope of various parameters ($\sigma $) ranging from 100 to 700 ms; in addition the contrast was itself modulated sinusoidally at either 3 or 8 Hz, this to create a stimulus that in the temporal domain was constrained within a narrow band of frequencies. Thresholds were determined by a yes/no staircase procedure. The data in Fig. 2 show that threshold decreased approximately as the $-1/4$ power of duration; that is to say, $\text{duration}\times {C}^{4}$ [cf. Eq. (2)] was approximately constant at threshold and this agrees well with the estimate of the exponent of the Weibull function in Fig. 1.

Scientists of all kinds are wont to imagine their theories realized “as is” in nature, but, in truth, a theory is no more than a description, as succinct a description as may be, of what has been observed. The experiments reviewed below are of limited accuracy, not the least because the frame of reference against which successive stimuli are compared is labile (e.g., [13–16]). In consequence, an experiment may commonly be modeled in several different ways, all of them within the bounds of acceptable experimental error.

For this reason it might appear that the choice of model is entirely at the author’s disposal, but there is another consideration. Because a theory does not reflect the state of nature exactly, its domain of applicability is limited. For example, probability summation, in combination with the Weibull function [Eq. (3)], gives an accurate account of both the detectability function of a sinusoidal grating and the manner in which the location of that function (i.e., threshold) varies both with the number of widely separated sinusoidal components (Fig. 1) and with temporal duration (Fig. 2), but has difficulty (below) with related phenomena outside this range. Experimental data analyzed in terms of probability summation cannot be reliably extrapolated outside this domain. That is the first message of this critique. It is desirable to work within a theoretical framework with as wide a domain as possible.

If probability summation were the only candidate, then that is what visual scientists would have to use. However, the fact that probability summation can model the range of phenomena listed above does not preclude other theories of comparable accuracy (cf. Fig. 1), nor does it mean that probability summation will deliver similar accuracy for related phenomena outside that range (see, for example, Fig. 13 below for threshold measurements of spatial summation). The second message of this critique is that all the phenomena presently accommodated by probability summation, and many more besides, can equally be accommodated by signal-detection theory combined with a suitable nonlinear transform of contrast. The limited accuracy of sensory experiments means that this could be accomplished by a variety of models. By way of example Fig. 1 shows a normal integral (dashed curves) with respect to a transform that is of the fourth power for the very smallest contrasts, but transits rapidly to a linear law for supra-threshold contrasts. Probability summation is widely accepted by visual scientists, so that an exposé of its limitations and defects is timely.

## 2. HIGH-THRESHOLD ASSUMPTION

It is implicit in probability summation that if a stimulus fails to exceed threshold, it is as though that stimulus had never been presented. This is the 19th-century view of the matter on which Fechner’s [17] work, and much subsequent work, was founded. In the 1960s other (lower) threshold models were proposed [18, Chap. 6] that are less obviously at variance with observation, but probability summation depends essentially on the high-threshold model. Those other threshold models will not be examined here.

The high-threshold assumption is known to be false. Figure 3 shows the results of an experiment by Swets *et al.* [19]. The stimulus to be detected was a flash of light, ${30}^{\prime}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{arc}$ diameter and 10 ms duration, presented in the middle of a large uniform background of 10 ft lamberts. On each trial there were four observation intervals with the flash presented in exactly one of them. The observer’s task was to report the observation interval that had contained the flash. If the flash exceeds threshold, the response will be correct (because threshold is rarely exceeded without a flash). So an incorrect response means that the flash failed to exceed threshold, in which case the observer must have been guessing. Any second response (conditional on a first error) must also be a guess with probability of $1/3$ of being correct. Swets *et al.* asked their observers to make both first and second responses on every trial. Figure 3 shows the proportions of correct second responses on those trials on which the first response had been wrong. Except for the lowest values of ${d}^{\prime}$, the proportion of correct second guesses is clearly above the chance level and increases with the increasing strength of the stimulus.

Ordinarily, there are some detections reported when no flash has been presented. On the basis of the high-threshold assumption such “detections” must be guesses, and there should be a similar proportion of guesses when the flash has in fact been presented, but has fallen short of threshold. This leads to a correction for guessing [19, Eq. (6), p. 317],

*et al.*[19, Figs. 6 and 7] showed that this correction for guessing failed to align the detectability scores obtained with different proportions of false positives, and this result has been replicated by Nachmias [20] with a variety of different stimuli, including gratings.

The correction for guessing fails to align the detectability scores obtained with different proportions of false positives because the relation between the probabilities of detection and of a false positive is not linear. This relation is shown in Fig. 4 for several different strengths of signal and is illustrated with data from [21, Expt. 3]. Each operating characteristic links together the possible operating points [$\alpha =\text{probability of a false positive}(\text{abscissa})$, $\beta =\text{probability of detection}(\text{ordinate})$] conditional on a given strength of signal (measured by ${d}^{\prime}$) and thereby represents all that the observer knows about whether a signal has been presented. It can be represented by the parametric equations

A similar calculation is feasible at the level of the physical stimuli. The theory of signal detectability [22] is simply the mathematics of the detection of weak radar signals in a background of noise. It applies the classical theory of the optimum testing of statistical hypotheses based on the Neyman–Pearson lemma [23]. It is exactly the calculation required for auditory stimuli [24] and can be readily adapted for visual stimuli, substituting a Poisson process of density equal to the luminance (in the case that there is no additional external noise) for the mathematical representation of the random-noise background [25,26].

So the distributions of $\mathrm{ln}\lambda $ can be calculated as at the level of the physical stimuli and, at the same time, estimated at output from the responses of the observer. The two are connected via a set of theorems [27, pp. 18–22] that amount, in words, to “…the grouping, condensation, or transformation of observations by a statistic will in general result in a loss of information … There can be no gain of information by the statistical processing of data.” This is the statistical equivalent of the maxim, “You can’t get a quart out of a pint pot.” It provides a rigorous mathematical foundation for the study of sensory discrimination, independent of any particular model of the process. It parses the problem into two questions: first, what specific losses of information generate the pattern of results (operating characteristics, detectability/discriminability functions, tvi curves) that are observed in nature? This formulates the problem in terms of signal-detection theory; the human observer is then “ideal” (see [24, Chap. 6]), but with respect to the limited information transmitted through the visual pathway. Second, how are those specific losses of information realized in neural processing? This corresponds approximately to both of Marr’s [28] algorithmic and implementational levels.

The question nevertheless arises of whether there are circumstances in which probability summation might still be applicable. The critical issue is not whether the outputs from different units are statistically independent [as in Eq. (1)]—if the outputs from two units are not independent, the two are effectively a single unit—but whether the outputs are restricted to two values only, “detect” and “no detect.” Moreover, it is assumed that there is rarely a detection in the absence of a stimulus. This matter was intensively investigated in the 1960s [18, Chaps. 4–6], and the gist of that argument is relevant here.

Suppose that a particular stimulus has a probability of exceeding the threshold equivalent to the point A on the ordinate of Fig. 5. If the observer never reports a stimulus in the absence of a detection, this is the operating point that will be obtained. If, on the other hand, the observer always says “yes,” the operating point will be C in the top right-hand corner. If, however, the observer guesses “yes” on some trials, irrespective of whether threshold was exceeded, and on others reports only true detections, the averaged operating point will be a weighted compound of A and C. Depending on the mix of these two strategies, any point on the line AC may be obtained (see [18]). Guessing does not tell us anything new. There are still only two possible states of the observer: “detect” (with probability A) and “no detect” (with probability 1-A). This formulates probability summation in information-theoretic terms, and the information is equivalent to a Bernoulli variable.

If there are several different units (three in Fig. 1), then there will be an equivalent number of independent Bernoulli variables, and with sufficient multiplicity their sum will tend to the normal distribution in Fig. 4. But probability summation specifies that if any one threshold unit detects the stimulus, it is detected [Eq. (1) above; threshold is rarely exceeded in the absence of a stimulus]; that is to say, the information supplied by probability summation is still only Bernoulli and a multiplicity of units merely increases the probability of detection. The operating characteristic AC in Fig. 5 advances to BC. Probability summation therefore applies when the relation between $\alpha $ and $\beta $ [Eq. (5)] is linear, specifically Eq. (4). The gradient can take only two different values, infinity and ($1-p$).

Applicability of probability summation lends itself to a simple experimental test, which is illustrated in Fig. 5. The data points from [19] present the cumulative proportions of detections of the flash of light described above, in relation to false positives, by four observers who were instructed to express their confidence with respect to a six-point scale. It is patent that the relation between the data points, one for each boundary between adjacent categories, does not conform to Eq. (6), as probability summation requires. Probability summation is applicable when a procedure such as Swets *et al.*’s rating design shows that the observer has no further information about the likelihood of a signal beyond the simple report of detection. In the experiment in Fig. 5 the sensory information supplied on each trial was more detailed than could be expressed with a single Bernoulli variable.

Recently Koenig and Hofer [3] have published a study of absolute cone thresholds for a brief flash presented in darkness. Their operating characteristics [3, Table A1] are all asymmetric, after the pattern in Fig. 5. Koenig and Hofer [3, Fig. 1B] propose that this asymmetry arises because inputs below a certain threshold do not inform the observer’s decision, which is based on supra-threshold inputs only. But this suggestion is specific to absolute threshold (so will not explain other instances of asymmetric operating characteristics—e.g., Fig. 5) and invokes graded supra-threshold inputs in detection [so does not reinstate Eq. (1)].

## 3. PEDESTAL EXPERIMENTS

When a stimulus ($\mathrm{\Delta}C$) is superimposed on another, the detection threshold for the first is commonly raised. In auditory parlance the second stimulus masks the first. Exceptionally, the second stimulus makes the first easier to detect. This happens when the masker ($C$) is itself a weak stimulus, below detection threshold. The $C$ is then often thought of as a pedestal lifting the $\mathrm{\Delta}C$ above the noise background, following [29]. Since the threshold assumption is false (Fig. 3 shows above-chance detection of below-threshold stimuli), this is a circumstance where probability summation might be expected to fail.

Suppose a discrimination between contrasts $C$ and $C+\mathrm{\Delta}C$, where the $C$ is below detection threshold. Probability summation would detect the $C+\mathrm{\Delta}C$ (which has to be above threshold, or else there would be no experiment), but not the $C$, and this is not distinct from simply determining the detection threshold for $C+\mathrm{\Delta}C$. That threshold, $\mathrm{\Delta}{C}_{0}$, should be independent of $C$, and

whatever $C$ might happen to be. But when $C$ is raised above threshold, both stimuli are detected and probability summation says nothing about how the discrimination is then resolved. The empirical reality, however, is rather different.Figure 6 reproduces data from [30]. The observer viewed a $3\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{c}/\mathrm{deg}$ grating through a rectangular aperture, $2.2\xb0\times 3.2\xb0$ in an 11.8° diameter surround, matched to the grating in color and mean luminance. On each trial there were two stimuli of 250 ms duration, separated by a further 250 ms of uniform luminance, presenting a simple discrimination between two contrasts $C$ and $C+\mathrm{\Delta}C$, presented in either order, at random. The observer attempted to identify the stimulus of higher contrast ($C+\mathrm{\Delta}C$), whereafter he/she was told which response was correct. The difference $\mathrm{\Delta}C$ was varied in steps of 0.1 log unit in a staircase procedure designed to converge on the value that would give 79.6% correct responses [31]. The three panels of Fig. 6 show data from three different observers who each provided two sets of thresholds.

The threshold value of $\mathrm{\Delta}C$ at first decreases as $C$ increases, reaching a minimum when $C$ is about equal to the detection threshold (indicated by the vertical arrows); thereafter $\mathrm{\Delta}C$ increases, as one would ordinarily expect. I emphasize that the combined value ($C+\mathrm{\Delta}C$) increases monotonically with $C$, contrary to Eq. (7); it is just that initially ($C+\mathrm{\Delta}C$) does not increase so fast as $C$. The dotted curves in Fig. 6 show the predictions of a nonlinear transform of contrast coupled with the assumption that Weber’s law applies exactly to the transformed values [Eq. (A15) in Appendix A]. These predictions agree with the data to within the limits of experimental error (which may be assessed from the repeated threshold measurements).

An elegant way to discover the shape of the transform that is required is demonstrated in Fig. 7, which reproduces data from [32]. Their observers viewed gratings of 0.5, 2, and $8\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{c}/\mathrm{deg}$ filling a $6\xb0\times 6\xb0$ field set within a larger surround, matched in luminance. On each trial there were two 100 ms presentations separated by 600 ms. The filled circles in Fig. 7 are normal deviate transforms of the proportions of correct detections, when a grating of contrast $\mathrm{\Delta}C$ was paired with a blank presentation. The open circles are estimates of ${d}^{\prime}$, augmented by 0.6745, for the discrimination between contrasts $C$ and $C+\mathrm{\Delta}C$, when $C$ was set equal to the 75% detection threshold. The quantity 0.6745 is the value of ${d}^{\prime}$ for discriminating the threshold $C$ from a blank presentation. The 0.6745 adjustment has the effect of continuing the empirical relation to supra-threshold values of $C$. The same nonlinear transform [Eq. (A8) in Appendix A] is fitted to each set of data, subject only to a different scale factor to allow for different sensitivities to gratings of different wavenumbers. It is the same function as in Figs. 1 and 6. It begins as a fourth power law at the smallest contrasts, thereby agreeing with the temporal summation data in Fig. 2, and transits progressively to a linear law at supra-threshold levels.

Of course, other similar transforms could have been used instead. Foley and Legge [32] fit power functions to their data (cf. Fig. 7) with estimated exponents ranging from 2.11 to 3.04. Watson and Ahumada [33] reported pooling exponents [$k$ in Eq. (2)] between 2 and 3. This is in no way incompatible with a transform that approximates the fourth power for the very smallest contrasts. In the first place, while a power function is a natural first choice to describe results such as those in Fig. 7, there is no reason why the transform existing in nature should be of that form, and, indeed, it is not.

Foley and Legge [32] reported that as contrast increases, the psychometric function for discriminating $C+\mathrm{\Delta}C$ from $C$ changes shape, from a normal integral with respect to some power of contrast when $C=0$ (estimates of the exponent ranged from 2.9 to 3.5 [32, Table 1]) to apparent linearity when $C$ is equal to the 75% detection threshold (see also [34], as reproduced in [35, Fig. 12.1]). They were able to fit the function for discriminating $C+\mathrm{\Delta}C$ from a threshold $C$ (Fig. 1) from an extrapolation of their power law fit to the detection data in Fig. 7; that is to say, the nonlinear transform in Fig. 7 accounts not only for the observed increase in the normal deviate transform, but also for the change in shape of the psychometric function. This change in shape has also been reported by Bird *et al.* [36, Fig. 1]. As pedestal contrast is increased, the gradient of the psychometric function, vis-à-vis log contrast, decreases; that is, the estimated exponent of a presumed underlying power transform decreases. The best fitting exponent to their four discrimination functions, supra-contrast threshold, was 0.83. In addition, Henning and Wichmann [37, Fig. 4], have plotted the 60%, 75%, and 90% thresholds for a range of pedestal contrasts. The gradient of their psychometric functions decreased rapidly up to the point (about equal to the detection threshold) at which facilitation is maximal; thereafter there is little further change. All this points to an underlying transform that is near fourth power for the smallest contrasts, but rapidly transits to some lesser value of the exponent, arguably to a linear law.

If, as I suggest, the transform is fourth power only for the smallest contrasts and transits rapidly to linear, the estimated exponent will depend on the range of contrast values represented in the data. It will be less than 4 and will be less for data that extend above the contrast threshold (Fig. 7) than for strictly threshold data (Fig. 1). Watson and Ahumada [33] reported pooling exponents for a standard set of stimuli, each contained within a uniform surround. Pooling was calculated over the entire stimulus field, including the surround. But the presence of that surround—the grating stimuli were confined within a Gaussian envelope—meant that contrast threshold was higher than for an equivalent grating filling the entire stimulus field. So Watson and Ahumada’s estimates likewise do not apply to the very lowest contrasts. Frankly, the estimated value of the power-law exponent is of no great moment; it depends on the range of contrast values represented in the data and may take any value between 4 and 1. What needs to be emphasized is that the data in Figs. 1 and 2, which can be accommodated by probability summation, can also be modeled with a nonlinear transform. Such a transform can, at the same time, accommodate other data (Figs. 6 and 7) that lie outside the domain of applicability of probability summation.

Henning and Wichmann [37] also found that in the presence of noise (random grating components at a wide range of wavenumbers) facilitation was greatly reduced. The noise [37, Figs. 5–7] raises the detection threshold; this is so whether the noise is broadband, notched (no noise components within 0.75 octaves of the grating wavenumber), high-pass, or low-pass. The visual pathway is unable to filter out the noise. The presence of the noise shifts the location of the facilitation to about the value of the raised threshold, because there can be no detection of anything until the $C+\mathrm{\Delta}C$ stimulus exceeds the current detection threshold. The channel tuned to the grating wavenumber ($4\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{c}/\mathrm{deg}$) is therefore seeing a contrast well above the level that would be detectable without background noise. The transfer function (fourth power for the smallest contrasts, but tending to linear as contrast increases) is near linear at the level of contrast needed in the presence of noise, and the depth of the facilitation is reduced in consequence. (A linear function gives no facilitation at all.) This result has been replicated [38] in the frequency domain. Expanding the range of applicable phenomena in this way leads to a more reliable and comprehensive theory.

## 4. BINOCULAR SUMMATION

Figure 8 compares inverse contrast thresholds from Campbell and Green [39] for gratings seen by one eye and by two. The gratings filled a rectangular field 2° by 1.3° and were set in a uniform surround of the same space-average luminance ($80\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{cd}/{\mathrm{m}}^{2}$). The subject’s pupils were dilated with atropine and the stimuli were viewed through 2.8 mm artificial pupils to preclude any change in pupil size affecting retinal illumination. In the monocular condition the nonviewing eye was covered with a piece of frosted glass and therefore viewed a uniform luminance.

Although the contrast threshold varies widely over the range of wavenumbers ($2\u201346\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{c}/\mathrm{deg}$) tested, the binocular threshold is consistently lower (inverse threshold is greater) than both of the monocular thresholds by a factor that averages 1.44—very close to $\surd 2$—that is to say, the combination rule for gratings viewed binocularly is square law, not fourth power as in Fig. 1. This result poses an obvious problem for probability summation: why is the combination of disjoint grating inputs sometimes fourth power and sometimes square law? If, however, the detection of grating stimuli is modeled by a suitable nonlinear transform, there is a simple and elegant answer, which is depicted in Fig. 9.

Suppose the transfer function in Fig. 7, initially fourth power for the smallest contrasts, but transiting to a linear law for supra-threshold contrasts, to be realized in two stages. Each stage implements the same transform, initially square law for the smallest contrasts, but transiting to a linear law [Eq. (A6) in Appendix A]. In cascade these two stages deliver the transfer function, initially fourth power, in Fig. 7. The first stage is specific to each eye; the second stage is common to both. Binocular fusion takes place after one stage only, and for that reason follows a square law (Fig. 8)—but detection is not resolved until after both stages of analysis and, for that reason, it is fourth power (Fig. 1). Figure 9 shows an obvious similarity to Legge’s [40, Fig. 4] diagram illustrating his binocular energy-detector model.

## 5. MONOCULAR VERSUS DICHOPTIC MASKING

Legge [41] compared monocular with dichoptic masking. He presented a test grating for 200 ms to the right eye of each of his two observers and simultaneously a masking grating, either to the left eye (dichoptic masking), or to the right eye (monocular masking) with the left eye viewing a uniform field of the same space-average luminance. The gratings were vertically oriented and subtended 3.25° by 5°, with a space-average luminance of $200\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{cd}/{\mathrm{m}}^{2}$. Thresholds for the detection of the test grating were determined by a 2AFC staircase procedure.

Masked thresholds for five test wavenumbers (indicated by the arrows) and a variety of masker wavenumbers, with a fixed masker contrast of 0.19, are shown in Fig. 10. When the mask is of the same wavenumber as the test grating, dichoptic masking is much more effective than monocular (note the difference in the scales of the ordinates). This poses an immediate problem for probability summation: if the threshold units are binocular, why is dichoptic masking so much greater than monocular? And if the units are monocular, why is there any dichoptic masking at all? But the model set out in Fig. 9 can again provide a solution.

If transmission up to the point of binocular fusion were linear, there would be no difference between dichoptic and monocular masking. So one (at least) of the three stimuli (of contrasts ${C}_{\text{mask}}$, $\mathrm{\Delta}{C}_{\text{dich}}$, and ${C}_{\text{mask}}+\mathrm{\Delta}{C}_{\text{mon}}$) must fall within the nonlinear (square-law) part of the transfer function in Fig. 9. A contrast of 0.19 is very much supra-threshold, so the contrast $\mathrm{\Delta}{C}_{\text{dich}}$ must be the component subject to nonlinear transfer. However, when test and masker wavenumbers are the same, a masking contrast of 0.19 raises the masked threshold to a level that is itself supra-unmasked threshold. There seems to be only one way in which the contrast $\mathrm{\Delta}{C}_{\text{dich}}$ could still be subject to nonlinear transfer.

It is widely accepted that the visual pathway consists of a range of wavenumber-specific channels (see, e.g., [42]). Wavenumber specificity of the maskers is apparent in Fig. 10. Suppose now that a masking grating masks by overloading/saturating those channels most nearly tuned to the masker wavenumber, forcing detection of the test grating to an off-center channel. (I make no attempt to incorporate saturation into any model here, but take it for granted that sensory neurons have a limited dynamic range. Above that range the neuron would cease to transmit information and, for the purposes of contrast discrimination, can be ignored.) Transmitted through an off-center channel, the test grating suffers an additional attenuation ${a}_{m}$ with respect to the attenuation ordinarily encountered in passage through the channel tuned to the test wavenumber. Comparing dichoptic with monocular masking for a test grating at the masker wavenumber,

The particular square-law transfer function depicted in Fig. 9 transits from square law to linear approximation at a normalized input of about 1.17 [35, p. 265], so that the normalized value of $\mathrm{\Delta}{C}_{\text{dich}}$ (in the square-law region of the transform) has to be less than 1.17. If, then, ${a}_{m}^{-1}>{(1.17)}^{2}$, $\mathrm{\Delta}{C}_{\text{dich}}>\mathrm{\Delta}{C}_{\text{mon}}$, and if ${a}_{m}^{-1}\gg {(1.17)}^{2}$, $\mathrm{\Delta}{C}_{\text{dich}}\gg \mathrm{\Delta}{C}_{\text{mon}}$, as in Fig. 10.

As the masker wavenumber is adjusted away from the test wavenumber, channels closer to the test wavenumber become available to support detection. The additional attenuation of the test grating with respect to these less-off-center channels is reduced, as also is the increase of $\mathrm{\Delta}{C}_{\text{dich}}$ over $\mathrm{\Delta}{C}_{\text{mon}}$. In consequence, the dichoptic masking patterns in Fig. 10 are more sharply tuned than the monocular.

This reading of the data in Fig. 10 is supported by the further results in Fig. 11. In this second experiment the test and masker had the same wavenumber and the same phase, so that it amounted to a study of contrast discrimination. As before, the masker contrast was presented either to the left eye (dichoptic masking) or to the right eye (monocular masking), with the left eye viewing a uniform field of the same space-average luminance. Thresholds were measured for four different wavenumbers and a range of masking contrasts, including masking contrasts below detection threshold, so that the extent of facilitation is apparent. All these data have been normalized with respect to their respective unmasked detection thresholds, so that the data points for the different wavenumbers cluster around a common empirical trend.

Looking first at the monocular data, the characteristic with long dashes has been copied from Fig. 6(a) (and appropriately rescaled); it represents the effect of two stages of nonlinear transform in cascade (see Fig. 9). The pecked curve is the corresponding characteristic representing one nonlinear stage only. It has been calculated with the same parameters as the two-stage characteristic and subjected to the same rescaling. These two characteristics therefore provide comparable indications of the effects of one and two stages of nonlinearity. Comparing these two curves with the data points, it can be seen that monocular facilitation is compatible with two stages of nonlinearity in cascade, but not with one only.

The same two characteristics have been copied onto the dichoptic diagram, with the same scaling, except that the square-law (single-stage) characteristic has been expanded twofold in the vertical direction. Since this characteristic describes the relation of $\mathrm{\Delta}{C}_{\text{mon}}$ to $C$ in the monocular diagram, but ${(\mathrm{\Delta}{C}_{\text{dich}})}^{2}$ to $C$ here, this expansion (relative to the logarithmic ordinate) correctly adjusts the characteristic to $\mathrm{\Delta}{C}_{\text{dich}}$. Comparing the data points with both characteristics, it can be seen that one stage of nonlinearity is sufficient to accommodate the facilitation obtained with a dichoptic masker.

Finally, the discrimination of contrast in Fig. 11 does not conform to Weber’s law, but approximates a power function with an exponent of about 0.7 (gradient of straight line in Fig. 11(a); [34,43]), except that the gradient in the dichoptic diagram [Fig. 11(b)] is steeper (0.85). As masking contrast increases, the masker will engage an increasing number of channels, tuned to an increasing range of wavenumbers. At the same time, detection of the test grating is forced to increasingly off-center channels, such that the attenuated mask, ${a}_{m}{C}_{\text{mask}}$, falls nicely within some channel’s dynamic range. Let ${c}_{0}$ be the attenuated contrast that falls in the middle of that range; i.e., ${a}_{m}{C}_{\text{mask}}={c}_{0}$. Then the discrimination threshold depends on the ratio

Now let the number of channels that transmit information effectively increase as $\nu (C)$. An increased number of channels generates increased precision and, statistically, the standard error of an estimate decreases as the square root of the number of independent observations. In view of that increasing number, a relation that, for other sensory attributes, would be

where $\mathrm{\Theta}$ is the Weber fraction, becomes, instead,If $\nu (C)$ increases as ${C}^{0.6}$ [choosing the exponent to fit the data in Fig. 11(a)], then

Since ${a}_{m}={c}_{0}/{C}_{\text{mask}}$, $\mathrm{\Delta}{C}_{\text{dich}}$ increases as ${C}_{\text{mask}}{[\nu ({C}_{\text{mask}})]}^{-1/4}={C}_{\text{mask}}^{0.85}$ [Fig. 11(b)]. Note that both monocular and dichoptic discriminations are resolved at the same (second) stage of analysis, presumably by the same neural units. The difference between the masked thresholds arises because the dichoptic mask and test stimulus pass through different pathways (left and right eyes) up to the point of binocular convergence and $\mathrm{\Delta}{C}_{\text{dich}}$ is a small signal, subject to square-law suppression, in that first stage of analysis.

The analysis above has relied on data from [41], whereas some authors [36,44–47] have reported contrast discrimination data that conform accurately to Weber’s law. The two papers by Kulikowski are instructive. In those experiments contrast was switched instantaneously between the values $C$ and $C+\mathrm{\Delta}C$ at 0.5 Hz and $\mathrm{\Delta}C$ was adjusted to be just detectable. So Weber’s law applies in those two cases to the detection of an increment in contrast. When, subsequently, Kulikowski [34] repeated the threshold measurements using a 2AFC method, he found the difference threshold to increase only as the 0.71 power of contrast [35, Fig. 1.11]. Speed and Ross [47] presented sinusoidal gratings that reversed in contrast, square wave modulated at 8.8 Hz, within a 2 s presentation. It is plausible that Weber’s law again relates to the detection of an increment in contrast, rather than simple discrimination between two separate contrasts. Bird *et al.* [36] presented their sinusoidal stimuli within a rectangular temporal window of 78.8 ms duration. It is arguable that in this case discrimination was dominated by the temporal transients, rather than the sinusoidal modulation. However, this cannot be the full story because Yang and Makous [48] also measured thresholds for increments in contrast and found the exponent to be less than 1 and, moreover, to vary with wavenumber.

## 6. EVALUATION

In combination with a Weibull function of parameter near 4 [Eq. (3)], probability summation can accommodate, to within the limits of experimental error, the shape of the detectability function for contrast, together with the reduction in threshold that comes from the combination of widely dispersed grating components (Fig. 1), summation with respect to duration at threshold (Fig. 2), and some, but not all, of the variation of threshold with the area of the grating (i.e., spatial summation; see Figs. 11 and 12 below). At the same time, all of these results, together with facilitation (Fig. 6), binocular summation (Fig. 8), and the comparison between dichoptic and monocular masking of contrast (Figs. 9 and 10) can be accommodated, to a similar degree of accuracy, by a nonlinear transform of contrast, fourth power for the smallest contrasts, but tending rapidly to linearity as contrast increases.

The high-threshold assumption is a philosophical speculation of the 19th century [49]. It is not observable and is in conflict with a number of reliable experimental findings. It is time to begin again with what is observable and well established. Figure 12 illustrates the assumptions that follow with respect to one spatial axis perpendicular to the bars of a grating stimulus.

- 1. The contrast stimulus [Fig. 12(a)] is a sinusoidal modulation of an otherwise uniform luminance, where $u$ is a spatial coordinate and $g$ is the wavenumber. The transport of energy to the eye consists of a Poisson process of absorptions in the retina, of density proportional to the luminance. Figure 12(b), upper panel, displays a sample function.
- 2. A retinal ganglion cell receives both positive (excitatory) and negative (inhibitory) inputs within a limited area (receptive field). Envisage a matrix of receptive field units; since these units transmit in parallel, any theoretical treatment must focus on the matrix as a whole, not just on individual units. Figure 12(b) displays a pair of sample input functions, not to an individual cell, but to the matrix as a whole.
- 3. Under suitable conditions, observers are able to detect very small (0.0025) modulations of luminance (Fig. 6) and discriminate even finer differences in contrast (0.001; Fig. 6 again). Since all models of these phenomena invoke a nonlinear transform—a nonlinear transform, not of the luminance [i.e., $L(u)$ in Eq. (16)], but of the contrast $C$ by itself—the first step in the sensory analysis of such a stimulus must be to strip the modulation, ${L}_{0}C\text{\hspace{0.17em}}\mathrm{cos}\text{\hspace{0.17em}}2\pi gu$, from the mean, ${L}_{0}$. Mathematically, that is differentiation.
- 4. Differentiation is realized in a statistical balance between the positive and negative inputs to each receptive field. When recording the response of a retinal ganglion cell to a grating stimulus, it is necessary to drift the grating across the receptive field, or else one records only the cell’s response to quantal noise (maintained discharge; e.g., [50]). Figure 12(c) shows the resultant after cancellation between the positive and negative inputs. The mean inputs cancel (differentiation). The sinusoidal modulation does not entirely cancel, because the positive and negative inputs are differently disposed in space and impose different attenuations on the sinusoidal modulation (transmitted contrasts ${C}^{+}$ and $C$, respectively). The resultant contrast is (${C}^{+}-{C}^{-}$), abbreviated to $aC$. Neither does the quantal noise, because the positive and negative inputs are statistically independent. Instead, the two noise components combine in square measure to produce a background noise of power proportional to the luminance. In the absence of modulation ($C=0$), this gives Weber’s law for luminance [2].
- 5. Retinal ganglion cells receive both positive and negative inputs, but transmit action potentials of one polarity only. In terms of radio engineering, this is half-wave rectification. Figure 12(d) shows the half-wave rectified transform of Fig. 12(c). If the depth of modulation is large enough for the quantal noise to be ignored, the output is simply the positive excursions of the modulation; the transfer function is linear. But if, at threshold, the half-wave rectified transform is averaged over space (or time), it generates a nonlinear transform. At this level the transform is of square law only, transiting to linear for large inputs (Fig. 9). It accounts for the many square-law relationships observed in threshold summation [35, Chap. 7] and generates an increment over the face of a grating stimulus proportional to the square of the contrast.
- 6. The receptive field organization seen in the retina is repeated at higher levels in the visual pathway. At this stage the increment over the face of a grating is detected by comparison with its uniform surround, in the same way as an increment in luminance. The Craik–Cornsweet illusion (see, e.g., [35, Fig. 5.3, p. 64]) provides a direct demonstration, and Kelly [51] has reported an analogous finding with flicker. That detection applies a further square-law transform in cascade, giving a resultant that is fourth power for the smallest contrasts, but tends rapidly to linearity as contrast increases (Fig. 7). The psychometric function in Fig. 1 is a normal integral with respect to the cascaded transform, and the facilitation plotted in Fig. 6 assumes simply that Weber’s law applies to that cascaded transform.

(I have speculated [35, pp. 226–227] that the operative second stage is realized in the complex cells described by Hubel and Wiesel [52]. I envisage that the visual pathway is, functionally speaking, a ladder of receptive field units repeatedly transforming the sensory process in the manner illustrated in Fig. 12. If the input is large relative to the noise level, the transform operates in its linear region and has no effect beyond attenuating the throughput of information. But if the input is small, it is subject to square-law suppression. The phenomena reviewed above indicate square-law suppression at two successive stages, of which the second is possibly the complex cells, while the first might be the simple cells of layer IVb in the striate cortex, rather than retinal ganglion cells.)

The mathematical arguments in Appendix A develop these observed properties into quantitative models for contrast detection and discrimination. All the experimental results reviewed here follow from these very elementary and general properties of sensory neural organization.

#### A. Spatial Summation of Gratings

The argument above says that a sub- or near-threshold grating creates a small increment proportional to the square of its contrast over its spatial extent and that increment is then detected by virtue of the perturbation around its periphery. That perturbation introduces a second square-law transform of sub- and near-threshold stimuli and is essential to the creation of the ultimate fourth-power nonlinearity. This is a very different story to that proposed by probability summation. Nevertheless, if a grating of area A is presented only briefly, the periphery that matters is its temporal onset and offset. That periphery is aggregated over the entire area, so that both theories then deliver a threshold that varies as ${A}^{-1/4}$. Experimental discrimination between the two requires continuous inspection of the grating, so that detection depends on spatial cues.

Figure 13 presents data from Howell and Hess [53] for the detection of vertically oriented gratings presented for continuous inspection. The gratings were generated on an oscilloscope at a mean luminance of $100\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{cd}/{\mathrm{m}}^{2}$ and thresholds were determined by the method of adjustment. The data points for $0.1\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{cycles}/\mathrm{deg}$ (circles) relate to gratings 80° wide (and therefore containing eight complete cycles) and of various heights from 3° to 64°, the height being expressed in terms of the horizontal wavelength ($\lambda $). As the height increases, the threshold decreases, apparently reaching a lower limit at a height of $3.2\lambda $. In the case that the grating was surrounded by a field of equal mean luminance (open circles; the surround width was 150°), contrast threshold varies, over this range, as ${(\text{height})}^{-1/4}$; this is shown by the broken line. But in the case that the surround is dark (filled circles), the threshold varies as ${(\text{height})}^{-1/2}$, shown by the continuous line.

This comparison, between spatial summation with and without a matched surround, immediately involves the periphery of the grating in the production of the fourth-power transform. In the case of a surround of equal mean luminance, there is a near-threshold difference between the square-law increment created over the extent of the grating and the surround. Take the surround away, and that difference becomes supra-threshold and no longer generates any nonlinearity. Probability summation, on the other hand, says nothing about the role of the periphery in the detection of a grating and cannot simultaneously accommodate both square-law and fourth-power summation [54].

Figure 13 also presents data for other wavenumbers, from 0.5 to $20\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{c}/\mathrm{deg}$. These other gratings had exactly five cycles in the horizontal dimension (and therefore of angular width proportional to the wavelength) and of various heights (expressed as before in units of the wavelength, $\lambda $) presented within a matched surround of luminance equal to the space average of the grating. In each case the contrast threshold decreases approximately as ${(\text{height})}^{-1/4}$ up to a limit of about $10\lambda $ ($20\lambda $ at $20\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{c}/\mathrm{deg}$). That limit poses another problem for probability summation. If summation depends on a physical aggregation of input over the area of the stimulus, then it is plausible that the area over which summation can be effected might be limited. The area of summation in Fig. 13 appears to be limited in proportion to the wavelength of the grating. But probability summation is different. A stimulus fails to be detected only if all of the threshold units within its spatial domain fail, and the probability of detection is the complement of that probability of failure. There is no aggregation of input. A limit to the area of spatial summation can occur only if further expansion of the stimulus fails to engage any additional threshold units, that is, if further expansion takes the stimulus into an area devoid of effective vision.

In Fig. 13 the perturbation around the periphery of a grating seems to be aggregated over a region proportional to its wavelength, and the results there are compatible with the idea that for gratings of only five cycles the boundary effects at the vertical sides of the grating effectively cover its entire width. Those boundary effects would need to be at least five cycles wide. A fourth-power transform summed over the height of each side then gives a threshold decreasing as ${(\text{height})}^{-1/4}$.

Figure 14 shows further data from [53] in which the height of the gratings was fixed at a number of wavelengths greater than the limit to summation in Fig. 13, and the number of cycles in the horizontal dimension varied as shown. As before, the gratings were presented in a matched surround of equal mean luminance. Over the range from 1 cycle up to 40, threshold decreases, initially as ${(\text{No. of cycles})}^{-1/4}$ (continuous lines) and subsequently at a lesser rate. The broken lines represent ${(\text{No. of cycles})}^{-1/8}$. Once the width of a grating exceeds the presumed limit to spatial summation, the scenario sketched above says that threshold should decrease as ${(\text{length of boundary})}^{-1/4}$. If the aspect ratio of the stimulus was held constant, this would equate to ${(\text{area})}^{-1/8}$. This is not quite the situation in Fig. 14, but lines of gradient $-1/8$ are shown for comparison. Rovamo *et al.* [55,56] have substantially replicated Howell and Hess’ results using gratings filling a square aperture and therefore of fixed aspect ratio.

#### B. Is Probability Summation a Special Case of a More General Theory?

In combination with a Weibull function of parameter about 4, probability summation can accommodate the shape of the detectability function for contrast, the reduction in threshold that results from the combination of widely dispersed grating components, summation with respect to duration at threshold, and some instances, but not all, of spatial summation. A model based on a nonlinear transform of small, near-threshold contrasts can do all of this and much more. The example model in this article has been applied, in addition, to subthreshold facilitation, binocular summation, the comparison between monocular and dichoptic masking, and (less convincingly) spatial summation of grating stimuli. So can probability summation be regarded as a special, simplified, case of a more general theory?

In fact, the agreement of probability summation with data within its domain of applicability looks to be a numerical coincidence. The chief reason is its reliance on the “high-threshold” assumption, specifically, that stimuli that fail to exceed a fixed threshold have no effect at all. This assumption is known to be false (Fig. 3 above). Notwithstanding, Graham [57, Part III] has catalogued a very large number of models incorporating probability summation. While one might suppose initially that successful modeling of contrast detection and discrimination is a matter of getting the subsidiary assumptions right, in fact no model that denies input from subthreshold stimuli can accommodate the results from [19]. Probability summation cannot be modified to accommodate discrimination with respect to contrasts below threshold (Fig. 6).

The high-threshold assumption is not only wrong, but also quite unnecessary in the modeling of contrast detection and discrimination. A variety of accelerated nonlinear transforms of contrast will give a good account, not only of the psychometric function and threshold summation, but of facilitation as well. The particular model represented in the figures here can also accommodate binocular summation (Figs. 8 and 9) and the comparison between monocular and dichoptic masking (Figs. 10 and 11). It offers the wider range of applicability desirable for the extrapolation of experimental results.

The way to an understanding of visual sensitivity lies not in exploring an ever-increasing diversity of models, but in identifying those basic assumptions that any successful model must, of necessity, include. Sensitivity to subthreshold stimuli is one such assumption. One might suppose that the way to identify basic assumptions is by direct study of neural processes. The idea of divisive inhibition [58–60] might appear to be one such assumption. Except in the case that the objective is the modeling of neural discharge patterns, this is illusory. It is illusory because one does not know absolutely which neurons mediate the discrimination of present interest, nor which part of their operative range is relevant. Neuron discharges saturate. There is no substitute for first discovering the functional organization of visual sensitivity; differentiation (above) is the first step. Only then can one know which functions have to be realized in neural processing. The principle that, “There can be no gain of information by the statistical processing of data” (above) provides a rigorous foundation for such an analysis.

Finally, Pelli [61] has used the term “probability summation” without the high-threshold assumption. “Summation” now equates to selecting the maximum of some number ($M$) of independent signal-detection variables.

In their analysis of radar detection, Peterson *et al.* [22] considered the problem of detecting a signal when it was known only to be one of $M$ possible signals. The radar receiver must then cross correlate the received input with each of the $M$ possibilities. This delivers $M$ random variables, $M\text{-}1$ of which are samples of noise alone and one of which (it is not known which) might be a signal. The uncertainty (indexed by $M$) increases the signal strength needed for detection; Peterson *et al.* [22, Eq. (168)] give the approximate relation

Pelli ([61]; see also [57, pp. 301–304]) has applied a related model to the detection of contrast. The “received input” is now the visual stimulus, and the $M$ possible signals are represented by $M$ microanalyzers within the visual system. Looking at the maximum of the $M$ outputs (this does not differ much from the likelihood calculations in [22]), computer simulation (the model defies analytic treatment) with a suitable value of $M$ shows that this model will accommodate the increased steepness of the psychometric function, summation of grating stimuli with respect to space and time, the variation of yes–no performance according to the decision criterion employed [19,20], the relation between yes–no and 2AFC performance, and subthreshold facilitation. That is to say, replacement of the Bernoulli variable in Eq. (1) with a Gaussian substantially increases the range of detection/discrimination phenomena that can be accommodated. This is only to be expected in light of the results reproduced in Fig. 3.

There is one serious problem outstanding with Pelli’s model; it is exemplified in Table 1, which presents typical threshold values for different discrimination/detection tasks paired with the gradient of the corresponding psychometric function. The stimuli in these three experiments are presented to the same eye—the visual system not only detects gratings, but also increments and differences in luminance—and need to be accommodated by a common theory. It is therefore essential to Pelli’s model, as a candidate for that common theory, that an increase in uncertainty (the number $M$ of stimulus cues that need to be evaluated) increases not only the gradient of the psychometric function, but also the threshold [61, Fig. 5], because the visual system not only detects gratings, but also increments and differences in luminance. In nature (Table 1) the relationship between threshold and gradient of the psychometric function lies firmly in the contrary direction. The curvilinearity of the psychometric function for small contrasts actually results from selecting the most informative cues for detection, thereby filtering out much irrelevant noise. This delivers not only steeper psychometric functions, but also, at the same time, more sensitive detection.

In Fechner’s day the “high-threshold” assumption provided a scenario within which sensory discrimination could be studied and thresholds measured, but today it is simply an obstruction to further progress. “The difficulty lies, not in the new ideas, but in escaping from the old ones, which ramify … into every corner of our minds” [62, p. vii].

## APPENDIX A: DERIVATION OF THE FUNCTIONS DISPLAYED IN FIGS. 1, 6, 7, AND 9

In this appendix,

$u$ is a spatial coordinate. |

$g$ is the wavenumber. |

$L(u)$ is the luminance as a function of $u$. |

${L}_{0}$ is its mean. |

$C$ is the stimulus contrast. |

${C}^{+}$ and ${C}^{-}$ are the contrast after attenuation in the positive and negative inputs, respectively. |

$a$ is the net attenuation after cancellation between the two inputs. |

$\mathrm{\Phi}(x)$ is the normal integral function. |

$z=\surd {L}_{0}aC$ is the normalized amplitude of modulation. |

$\mu (z)$ is the space-average rectified throughput. |

$h(z)$ is the normalized increment of $\mu (z)$ over its zero-contrast value. |

${\mu}_{\text{bckgd}}$ is the zero-contrast output after two stages of differentiation. |

$\mathrm{\Theta}$ is the Weber fraction. |

$\mathrm{\Delta}{C}_{0}$ is the contrast detection threshold. |

Let the grating stimulus have luminance

where $u$ is a spatial coordinate and $g$ is the wavenumber [Fig. 12(a)]. This stimulus is properly a Poisson process of absorptions in the retina of density proportional to $L(u)$.Retinal ganglion cells differentiate their input. This is realized when the stimulus is passed, half through the positive input of the receptive field, half through the negative [Fig. 12(b)]. The contrast is attenuated differently in the positive and negative inputs, because the two inputs are differently disposed in space. Let the respective contrasts, after attenuation, be ${C}^{+}$ and ${C}^{-}$. The mean input to the receptive field unit is then [Fig. 12(c)]

Retinal ganglion cells receive both positive and negative inputs, but transmit action potentials of one polarity only—half-wave rectification [Fig. 12(d)]. The mean of the half-wave rectified output is

The first term, $\surd ({L}_{0}/2\pi )$, is simply the mean half-wave rectified quantal noise when there is no contrast ($C=0$). The second term contributes nothing, because (${L}_{0}aC\text{\hspace{0.17em}}\mathrm{cos}\text{\hspace{0.17em}}2\pi gu/\surd {L}_{0}$) is antisymmetric—it averages zero over a complete cycle of $\mathrm{cos}\text{\hspace{0.17em}}2\pi gu$—while the third term adds a quantity ${L}_{0}{(aC)}^{2}\xb7\surd ({L}_{0}/2\pi )/4$ to the background noise $\surd ({L}_{0}/2\pi )$, an increment of relative size ${L}_{0}{(aC)}^{2}/4$.

We require the space average of Eq. (A3) over the full range of contrast, and a good approximation can be had as follows: writing $z$ for $\surd {L}_{0}aC$ as a matter of convenience, the mean rectified output [Eq. (A3)] can be expressed as

When this expression is averaged over a complete cycle of $\mathrm{cos}\text{\hspace{0.17em}}2\pi gu$, the contribution from the noise background, ${\mathrm{\Phi}}^{\prime}(0)$, is unchanged, while the term $z\text{\hspace{0.17em}}\mathrm{cos}\text{\hspace{0.17em}}2\pi gu\mathrm{\Phi}(z\text{\hspace{0.17em}}\mathrm{cos}\text{\hspace{0.17em}}2\pi gu)$ can be split into two parts, $z\text{\hspace{0.17em}}\mathrm{cos}\text{\hspace{0.17em}}2\pi gu\{\mathrm{\Phi}(z\text{\hspace{0.17em}}\mathrm{cos}\text{\hspace{0.17em}}2\pi gu)-1/2\}$ and $(z\text{\hspace{0.17em}}\mathrm{cos}\text{\hspace{0.17em}}2\pi gu)/2$, of which the first is symmetric and the second averages zero over a complete cycle of $\mathrm{cos}\text{\hspace{0.17em}}2\pi gu$. Deleting $(z\text{\hspace{0.17em}}\mathrm{cos}\text{\hspace{0.17em}}2\pi gu)/2$ and ${\mathrm{\Phi}}^{\prime}(0)$ leaves $\{z\mathrm{cos}2\pi gu\text{\hspace{0.17em}}(\mathrm{\Phi}(z\mathrm{cos}2\pi gu)-1/2)+\text{\hspace{0.17em}}({\mathrm{\Phi}}^{\prime}(\mathrm{cos}2\pi gu\text{\hspace{0.17em}}z)-{\mathrm{\Phi}}^{\prime}(0))\}$. This has a space average reduced by one half from its peak value (when $\mathrm{cos}\text{\hspace{0.17em}}2\pi gu=1$) at the smallest contrasts [space average of ${\mathrm{cos}}^{2}\text{\hspace{0.17em}}2\pi gu=1/2$ in Eq. (A5)], increasing to $2/\pi (=0.637)$ at the highest contrasts (space average of $|\mathrm{cos}\text{\hspace{0.17em}}2\pi gu|$), when the rectified output is simply the positive half cycles of the sinusoidal modulation. The mean rectified output averaged over a full cycle of $\mathrm{cos}\text{\hspace{0.17em}}2\pi gu$ is therefore approximately $\surd {L}_{0}\mu (z)$, where

equating the average increment to one half its peak throughout the contrast range. The average increment over the face of the grating is $\surd {L}_{0}\{\mu (z)-\mu (0)\}$, and numerical calculations show that $\{\mu (z)-\mu (0)\}$ is approximately ${z}^{2}/[4\surd (2\pi )]$ for $z<1.17$ and $z/4-1/\surd (8\pi )$ thereafter [35, pp. 263–266]. This is the function in Fig. 9.The second stage of analysis in Fig. 9 sees an increment of size $\surd {L}_{0}\{\mu (z)-\mu (0)\}$, extending over the field occupied by the grating, added to a background of mean $\surd ({L}_{0}/2\pi )$. This increment is detected in the same way as ordinary (rectangular) increments of luminance, by virtue of the perturbation resulting from the difference vis-à-vis the background around its boundary or, in a brief presentation, at onset and offset [35, Fig. 7.10, p. 122]. The mean $\mu (z)$ is itself set in a background noise of power ${L}_{0}(1-1/\pi )/2$ (this is the variance of the output from half-wave rectification), and it is convenient to define a normalized increment, output from this first stage of transmission,

The second stage of analysis in Fig. 9 repeats the first to deliver a normalized quantity $h(h(z))$ for assessment in a signal-detection model. (This is the same function as in Laming [35, p. 150], although the derivation is slightly different.) The function plotted in Fig. 7 is

It is initially fourth power for very small contrasts (because $h(z)$ is square law) and transits to linear for supra-threshold contrasts (because $h(z)$ likewise tends to linearity). The detectability function in Fig. 1 is the normal integral of this transform,

When a contrast $C+\mathrm{\Delta}C$ has to be distinguished from contrast $C$, the signal-detection model compares two quantities, $h(h(aC\surd {L}_{0}))$ and $h(h(a(C+\mathrm{\Delta}C)\surd {L}_{0}))$, each superimposed on a background of mean

For supra-threshold contrasts

which is exactly proportional to $z$. Assuming that Weber’s law applies exactly to the transformed output (A11), $\mathrm{\Delta}C$ is determined by the relation where $\mathrm{\Theta}$ is the Weber fraction (reason has been suggested above why Weber’s law does not hold for contrast). When, however, the contrast $C$ is sub-detection-threshold, Eq. (A11) has to be replaced withThis is the model function in Fig. 6.

## ACKNOWLEDGMENTS

I thank John Robson for providing the raw data in Fig. 1, Jacob Nachmias for the data in Fig. 4, and John Foley for the data in Fig. 7. I also thank Bruce Henning and John Mollon for their comments on earlier drafts of this paper.

## REFERENCES AND NOTES

**1. **B. Leshowitz, H. B. Taub, and D. H. Raab, “Visual detection of signals in the presence of continuous and pulsed backgrounds,” Percept. Psychophys. **4**, 207–213 (1968). [CrossRef]

**2. **D. Laming, “Fechner’s law: where does the log transform come from?” Seeing Perceiving **23**, 155–171 (2010).

**3. **D. Koenig and H. Hofer, “The absolute threshold of cone vision,” J. Vis. **11**(1):21, 1–24 (2011). [CrossRef]

**4. **M. B. Sachs, J. Nachmias, and J. G. Robson, “Spatial-frequency channels in human vision,” J. Opt. Soc. Am. **61**, 1176–1186 (1971). [CrossRef]

**5. **The argument by Sachs *et al.* ([4]; see their Fig. 4) looks compelling, except that they have assumed the psychometric function for detection of a grating to be normal with respect to contrast. They could alternatively have accommodated their experimental results by supposing the psychometric function to be normal with respect to some power of contrast (as in Fig. 1 here), with detectability depending on the summation of that power over all grating components. There would then have been no need for probability summation, nor for a fixed threshold. The evidence, as at that time, pointing to a power-law transform of small near-threshold stimuli had already been summarized by Nachmias and Kocher [6]. This article shows that the idea that Sachs *et al.* did not explore, that is, of detectability depending on the summation of power-law transforms over all grating components, provides a more comprehensive account of visual sensitivity.

**6. **J. Nachmias and E. C. Kocher, “Visual detection and discrimination of luminance increments,” J. Opt. Soc. Am. **60**, 382–389 (1970). [CrossRef]

**7. **M. H. Pirenne, “Binocular and uniocular threshold of vision,” Nature **152**, 698–699 (1943). [CrossRef]

**8. **L. Matin, “Binocular summation at the absolute threshold of peripheral vision,” J. Opt. Soc. Am. **52**, 1276–1286 (1962). [CrossRef]

**9. **G. Collier, “Probability of response and interocular association as function of monocular and binocular stimulation,” J. Exp. Psychol. **47**, 75–83 (1954). [CrossRef]

**10. **R. F. Quick, “A vector-magnitude model of contrast detection,” Kybernetik **16**, 65–67 (1974). [CrossRef]

**11. **N. v. S. Graham, “Spatial-frequency channels in human vision: detecting edges without edge detectors,” in *Visual Coding and Adaptability*, C. S. Harris, ed. (Erlbaum, 1980), pp. 215–262.

**12. **A. B. Watson, “Probability summation over time,” Vis. Res. **19**, 515–522 (1979). [CrossRef]

**13. **C. I. Howarth and M. G. Bulmer, “Non-random sequences in visual threshold experiments,” Q. J. Exp. Psychol. **8**, 163–171 (1956). [CrossRef]

**14. **T. A. Tanner, J. A. Rauk, and R. C. Atkinson, “Signal recognition as influenced by information feedback,” J. Math. Psychol. **7**, 259–274 (1970). [CrossRef]

**15. **D. Laming, *Human Judgment: The Eye of the Beholder*(Thomson Learning, 2004), p. 179.

**16. **N. Stewart, G. D. A. Brown, and N. Chater, “Absolute identification by relative judgment,” Psychol. Rev. **112**, 881–911 (2005). [CrossRef]

**17. **G. T. Fechner, *Elemente der Psychophysik* (Breitkopf and Härtel, 1860).

**18. **D. Laming, *Mathematical Psychology* (Academic, 1973).

**19. **J. A. Swets, W. P. Tanner, and T. G. Birdsall, “Decision processes in perception,” Psychol. Rev. **68**, 301–340 (1961). [CrossRef]

**20. **J. Nachmias, “On the psychometric function for contrast detection,” Vis. Res. **21**, 215–223 (1981). [CrossRef]

**21. **J. Nachmias and R. M. Steinman, “Brightness and discriminability of light flashes,” Vis. Res. **5**, 545–557 (1965). [CrossRef]

**22. **W. W. Peterson, T. G. Birdsall, and W. C. Fox, “The theory of signal detectability,” IEEE Trans. Inf. Theory **PGIT-4**, 171–212 (1954). This paper is the ultimate source of signal-detection theory. The authors conducted their research at the University of Michigan, where Tanner and Swets learnt about it well in advance of their own publication. Tanner and Swets (1954) list a precursor of the Peterson, Birdsall, and Fox (1954) paper among their references. [CrossRef]

**23. **J. Neyman and E. S. Pearson, “On the problem of the most efficient tests of statistical hypotheses,” Philos. Trans. R. Soc. Lond. A **231**, 289–337 (1933). [CrossRef]

**24. **D. M. Green and J. A. Swets, *Signal Detection Theory and Psychophysics* (Wiley, 1966).

**25. **S. O. Rice, “Mathematical analysis of random noise,” Bell Syst. Tech. J. **23**, 282–332 (1944).

**26. **S. O. Rice, “Mathematical analysis of random noise,” Bell Syst. Tech. J. **24**, 46–156 (1945).

**27. **S. Kullback, *Information Theory and Statistics* (Wiley, 1959).

**28. **D. Marr, *Vision* (Freeman, 1982).

**29. **S. M. Pfafflin and M. V. Mathews, “Energy-detection model for monaural auditory detection,” J. Acoust. Soc. Am. **34**, 1842–1853 (1962). [CrossRef]

**30. **J. Nachmias and R. V. Sansbury, “Grating contrast: discrimination may be better than detection,” Vis. Res. **14**, 1039–1042 (1974). [CrossRef]

**31. **G. B. Weatherill and H. Levitt, “Sequential estimation of points on a psychometric function,” Brit. J. Math. Statist. Psychol. **18**, 1–9 (1965). [CrossRef]

**32. **J. M. Foley and G. E. Legge, “Contrast detection and near-threshold discrimination in human vision,” Vis. Res. **21**, 1041–1053 (1981). [CrossRef]

**33. **A. B. Watson and A. J. Ahumada Jr., “A standard model for foveal detection of spatial contrast,” J. Vis. **5**(9):6, 1–23 (2005). [CrossRef]

**34. **J. J. Kulikowski, “Effective contrast constancy and linearity of contrast sensation,” Vis. Res. **16**, 1419–1431 (1976).

**35. **D. Laming, *Sensory Analysis* (Academic, 1986).

**36. **C. M. Bird, G. B. Henning, and F. A. Wichmann, “Contrast discrimination with sinusoidal gratings of different spatial frequency,” J. Opt. Soc. Am. A **19**, 1267–1273 (2002). [CrossRef]

**37. **G. B. Henning and F. A. Wichmann, “Some observations on the pedestal effect,” J. Vis. **7**(1):3, 1–15 (2007). [CrossRef]

**38. **H. E. Smithson, G. B. Henning, D. I. A. MacLeod, and A. Stockman, “The effect of notched noise on flicker detection and discrimination,” J. Vis. **9**(5):21, 1–18 (2009). [CrossRef]

**39. **F. W. Campbell and D. G. Green, “Monocular versus binocular visual acuity,” Nature **208**, 191–192 (1965). [CrossRef]

**40. **G. E. Legge, “Binocular contrast summation—II. Quadratic summation,” Vis. Res. **24**, 385–394 (1984). [CrossRef]

**41. **G. E. Legge, “Spatial frequency masking in human vision: binocular interactions,” J. Opt. Soc. Am. **69**, 838–847 (1979). [CrossRef]

**42. **D. Laming, “Spatial frequency channels,” in *Vision and Visual Dysfunction, Vol 5: Limits of Visual Perception*, J. J. Kulikowski, V. Walsh, and I. J. Murray, eds. (Macmillan, 1991), pp. 97–105.

**43. **G. E. Legge, “A power law for contrast discrimination,” Vis. Res. **21**, 457–467 (1981). [CrossRef]

**44. **G. J. Burton, “Contrast discrimination by the human visual system,” Biol. Cybern. **40**, 27–38 (1981). [CrossRef]

**45. **F. W. Campbell and J. J. Kulikowski, “Orientational selectivity of the human visual system,” J. Physiol. **187**, 437–445 (1966).

**46. **J. J. Kulikowski, “Limiting conditions of visual perception,” Prace Inst. Automat. PAN (Warsaw) **77**, 1–133 (1969). (English translation)

**47. **H. D. Speed and J. Ross, “Spatial frequency tuning of facilitation by masks,” Vis. Res. **32**, 1143–1148 (1992). [CrossRef]

**48. **J. Yang and W. Makous, “Modeling pedestal experiments with amplitude instead of contrast,” Vis. Res. **35**, 1979–1989 (1995). [CrossRef]

**49. **H. Lotze, *Metaphysik; drei Bücher der Ontologie, Kosmologie und Psychologie* (Hirzel, 1879), translated B. Bosanquet (Clarendon, 1884), p. 455.

**50. **C. Enroth-Cugell and J. G. Robson, “The contrast sensitivity of retinal ganglion cells of the cat,” J. Physiol. **187**, 517–552 (1966).

**51. **D. H. Kelly, “Flickering patterns and lateral inhibition,” J. Opt. Soc. Am. **59**, 1361–1370 (1969). [CrossRef]

**52. **D. H. Hubel and T. N. Wiesel, “Functional architecture of macaque monkey visual cortex,” Proc. R. Soc. Lond. B **198**, 1–59 (1977). [CrossRef]

**53. **E. R. Howell and R. F. Hess, “The functional area for summation to threshold for sinusoidal gratings,” Vis. Res. **18**, 369–374 (1978). [CrossRef]

**54. **Detection of a grating in a dark surround under continuous inspection needs further comment. The only cue to detection of sinusoidal modulation in such a case is the variation in input to individual units as the eye moves laterally with respect to the bars of the grating. Temporal modulations generate a square-law perturbation [27, Fig. 7.4, p. 112], in this case over the face of the grating. Moreover, temporal sensitivity is maintained in the peripheral retina, so that the extreme extent of the grating in Fig. 11 does not matter. Ordinarily, comparison with a matched surround provides the more sensitive cue. However, thresholds for ($0.1\text{\hspace{0.17em}}\text{\hspace{0.17em}}\mathrm{c}/\mathrm{deg}$) gratings with and without a surround both reach a lower limit at a grating height of $3.2\lambda $, whereafter they do not noticeably differ. This suggests that the same limit to summation applies to both (as one should expect) and, at that limit, spatio-temporal modulations across the boundary with a matched surround are no more informative than modulations within the grating field.

**55. **J. Rovamo, O. Luntinen, and R. Naesaenen, “Modelling the dependence of contrast sensitivity on grating area and spatial frequency,” Vis. Res. **33**, 2773–2788 (1993). [CrossRef]

**56. **J. Rovamo, J. Mustonen, and R. Naesaenen, “Modelling contrast sensitivity as a function of retinal illuminance and grating area,” Vis. Res. **34**, 1301–1314 (1994). [CrossRef]

**57. **N. v. S. Graham, *Visual Pattern Analyzers* (Oxford, 1989).

**58. **J. M. Foley, “Human luminance pattern-vision mechanisms: masking experiments require a new model,” J. Opt. Soc. Am. A **11**, 1710–1719 (1994). [CrossRef]

**59. **R. L. T. Goris, F. A. Wichmann, and G. B. Henning, “A neurophysiologically plausible population-code model for human contrast discrimination,” J. Vis. **9**(7):15, 1–22 (2009). [CrossRef]

**60. **L. Itti, C. Koch, and J. Braun, “Revisiting spatial vision: toward a unifying model,” J. Opt. Soc. Am. A **17**, 1899–1917 (2000). [CrossRef]

**61. **D. G. Pelli, “Uncertainty explains many aspects of visual contrast detection and discrimination,” J. Opt. Soc. Am. A **2**, 1508–1532 (1985). [CrossRef]

**62. **J. M. Keynes, *The General Theory of Employment, Interest and Money* (Macmillan, 1936).