In a magnitude estimation experiment, twenty observers rated the brightness of several unrelated, self-luminous stimuli surrounded by a dark background. The performance of a number of existing vision models, color appearance models and models based on the concept of equivalent luminance in predicting brightness has been investigated. Due to a severe underestimation of the Helmholtz-Kohlrausch effect, none of the models performed acceptable. Increasing the weight of the colorfulness contribution to the brightness attribute in the CAM97u model results in a very good correlation between the model predictions and the visually perceived brightness. Finally the experimental results and the brightness prediction from the modified model CAM97u,m are verified through a matching experiment and a validation magnitude estimation experiment.
© 2014 Optical Society of America
Unrelated colors are colors perceived to belong to areas seen in isolation from any other colors [1, 2]. A self-luminous stimulus surrounded by a dark background, like a traffic or marine signal light viewed during a dark night, is a typical example of an unrelated color. Some vision models have been developed to predict the perception of brightness of these kind of stimuli:
- •CAM97u, a color appearance model (abbreviated as CAM hereafter) for unrelated colors developed by Hunt , allows for the calculation of the perceptual attributes brightness, hue, colorfulness, and saturation under varying conditions by taking into account some of the physiological processes that occur in the human visual system.
- •ATD01, a color vision model developed by Guth , is developed to predict the brightness, saturation and hue of unrelated colors by transforming the XYZ tristimulus values of a stimulus into LMS cone responses. These LMS cone responses are gain-controlled and undergo a second transformation to yield an achromatic (A) and two chromatic (T and D) signals which are used to calculate the perceptual attributes.
- •LEq,Nay(VCC), a model developed by Nayatani , is based on the concept of equivalent luminance which is defined as the photopic luminance of a reference stimulus that matches the test stimulus in terms of brightness . The Variable-Chromatic-Color (VCC) method of Nayatani is used when the luminance of the colored stimulus is changed in order to match the achromatic reference.
- •LEq,Nay(VAC), a similar model of Nayatani as the one described above , in which the luminance of the reference achromatic color is changed in order to match the colored stimulus (Variable-Achromatic-Color or VAC method).
- •LEq,CIE, a model developed by the CIE , is also based on the concept of equivalent luminance and can be seen as an international agreed supplementary system of photometry.
The Helmholtz-Kohlrausch effect (abbreviated as the H-K effect hereafter) states that highly saturated colors appear brighter than colors having low saturation, even when they are equal in luminance [8–12]. In a previous study  the performance of the six vision models described above was evaluated with respect to their ability to include the H-K effect. However, none of the models were fully able to predict the perceived brightness of unrelated stimuli. Especially notable was the failure of CAM97u as it was specifically designed to deal with unrelated colors.
Note that the models based on the concept of equivalent luminance apply in principle only to related colors. However as they take into account the H-K effect, they have been used in this study.
2. Brightness prediction in CAM97u
The input parameters of CAM97u are the XYZ tristimulus values of the stimulus and the conditioning field and the photopic and scotopic luminance of the stimulus, the adapting field and the conditioning field. As there is no conditioning field present in our experimental setup, the equi-energy stimulus SE is used as the conditioning field and its luminance is taken to be the same as the one of the adapting field.
According to CAM97u, the brightness is calculated using Eq. (1):3]. The inclusion of MCAM97u in the expression for brightness represents the H-K effect.
In a previous study with unrelated, self-luminous, colored stimuli having a constant luminance of 51 cd/m2 , a strong dependence of the observed brightness with the colorfulness MCAM97u suggested that an increase of the colorfulness weighting factor wM in Eq. (1) might lead to a much better brightness prediction. However the number of stimuli considered was too limited to allow for the determination of an improved weighting factor. In this study, a magnitude estimation experiment including colored stimuli having the same luminance, L10 = 6.23 cd/m2, as well as achromatic stimuli having a luminance between 7.54 cd/m2 and 47.60 cd/m2 is described. The results have been used to modify the brightness prediction of CAM97u by increasing the colorfulness weighting factor. To verify the modified model, a matching experiment and a validation magnitude estimation experiment, using random stimuli within a wide range of chromaticity coordinates and luminance values, have been performed.
3. Experimental setup
In this study, a viewing room of 3 m wide by 5 m long by 3.5 m high with black walls, a grey ceiling and a grayish black floor carpet was used (see Fig. 1 (left)). A circular stimulus with a diameter of 37 cm, from which the color and luminance was changed by controlling the intensity of red, green and blue LEDs, is presented to the observers at a distance of 211 cm, providing a field of view (FOV) of approximately 10°. The stimulus is surrounded by a dark background (see Fig. 1 (right)). All colorimetric and photometric quantities were calculated using the CIE 10° observer and determined from spectral measurements using a spectroradiometer (MS260i Oriel instruments spectrograph) and a suitable calibration. More details about this setup can be found in .
To improve the brightness prediction of CAM97u given by Eq. (1), in particular taking into account the H-K effect correctly, colored stimuli each having more or less the same value of ACAM97u are preferred, Eq. (1). Therefore a set of 58 colored stimuli with a FOV of 10° and an equal luminance of 6.23 cd/m2 (standard deviation 0.11 cd/m2) have been selected. Their CIE 1976 u’10,v’10 chromaticity coordinates are illustrated in Fig. 2 (left). In addition to these 58 colored stimuli, a set of 17 achromatic stimuli with a luminance between 7.54 cd/m2 and 47.60 cd/m2 and a chromaticity close to that of illuminant D65 (u’10,v’10 = 0.1979, 0.4695; mean ∆Eu’v’ = 0.005) has been selected (see Fig. 2 (right)). The colorfulness of these achromatic stimuli is approximately the same. These stimuli are necessary to obtain a single CAM97u brightness scale appropriate to both chromatic and achromatic stimuli.
4. Visual tests
In a psychophysical experiment, the magnitude estimation method was used in which observers were asked to rate test stimuli in comparison with a reference achromatic stimulus shown in temporal juxtaposition and with a brightness value of 50 attributed to it. The luminance of the reference stimulus was approximately the same as the luminance of the colored stimuli, 6.38 cd/m2, and the chromaticity was close to that of illuminant D65 (∆Eu’v’ = 0.006). The experiment started by viewing the reference stimulus. After 5 seconds a test stimulus was shown for 15 seconds. Just after switching back to the reference, again for 5 seconds, the observers were asked to rate the brightness of the test stimulus relative to the reference achromatic stimulus. Before the experiment, the observers adapted to the dark viewing conditions for at least 5 minutes.
The following instructions were given to each observer:
- You will see 90 test stimuli. First a reference stimulus will be shown for 5 seconds. Each test stimulus is then presented for 15 seconds. Between each of these 90 test stimuli, the reference stimulus will be shown again for 5 seconds. Give a value to the brightness of the test stimulus immediately after the disappearing of this test stimulus and in comparison with the reference. The reference has a brightness value of 50. A value of zero represents a dark stimulus without any brightness. There is no upper limit to the value of brightness, a value of 100 represents a stimulus appearing double as bright as the reference, a value of 25 is given to a stimulus appearing half as bright, etc.
In this magnitude estimation experiment, 90 test stimuli were presented; 5 stimuli as ‘warming up’, 75 stimuli as described above and finally 10 stimuli used to calculate observer variability. The stimuli were randomly arranged in two series, each being evaluated by half of the observers to avoid possible biases due to the series sequence .
Twenty observers, 10 male and 10 female, with ages ranging between 20 and 31 years (average 25) participated in the psychophysical experiment. All had normal color vision according to the Ishihara 24 plate Test for Color Blindness. Six observers already participated in previous experiments while the others were naïve with respect to the purpose of the experiment. The naïve observers participated in a training of 45 minutes to become familiar with the magnitude estimation method. They completed a straightforward exercise in which they were asked to rate the length of a line in comparison with a line of length 100, similar to a method described in the ASTM International standard test method for unipolar magnitude estimation of sensory attributes . In addition, a set of training stimuli having the same hue and luminance as being used in the experiments, was also presented, allowing the naïve observers to be aware of the brightness range and to become familiar with the brightness rating technique. The six experienced observers started the magnitude estimation experiment with a set of 25 training stimuli. After a small break the actual experiment started and lasted for approximately 30 minutes.
5.1 Observer variability
The agreement between any two sets of data can be analyzed using the coefficient of variation (CV), Eq. (2) . For a perfect agreement between two sets of data, the CV should be equal to zero. The inter-observer agreement was assessed by calculating the CV values between each individual observer’s results and the geometric mean of all the observers :
In addition to the agreement between observers, both short term and long term observer accuracy was also assessed. The short term intra-observer agreement was analyzed by randomly selecting ten stimuli and having each observer rate them a second time at the end of the experiment. This short term agreement was quantified by calculating the CV values between each individual observer’s results of the ten stimuli during the test and their results of the same ten stimuli at the end of the test. The observers were not aware that ten stimuli were presented a second time. The long term intra-observer agreement was analyzed by having three observers (2 male, 1 female) repeat the experiment three months later and was quantified by calculating the CV values between each individual observer’s results of both experiments. Observers were not told that the experiment was identical.
The results for the inter and intra observer agreement are summarized in Table 1 in terms of CV values. These results show that the mean CV values for inter-observer, short term intra-observer and long term intra-observer agreement are 13%, 11% and 8%, respectively. The CV values are fairly low for all observers, indicating a good agreement.
The CV values are much better than the value of 29% reported by Fu et al.  and 40% reported by Koo and Kwak  and similar to the value of 11% reported by Withouck et al. , when scaling the brightness of unrelated colors in conditions similar to those used in this study. This result is also similar to the inter-observer value of 13% reported by Luo et al.  for the lightness of related colors. The CV value for the short term intra-observer agreement of 11% could only be compared to the 15% ‘repeatability’ obtained by Fu et al. , as none of the other studies reported an intra-observer agreement.
5.2 Brightness perception
As the same group of observers rated the brightness of each stimulus, a one-way repeated-measures design of ANOVA on all colored stimuli was calculated. The analysis showed that, although all colored stimuli have the same luminance, their brightness perception was significantly different between each other, F(1.824, 34.659) = 14.801, p < 0.001. In Fig. 3, the geometric mean of Qobs of all the observers, Qgeom (“average observer”), are plotted against the saturation (su’v’,10) of each stimulus, calculated using Eq. (3):
Figure 3 and Table 2 suggest that for each hue series, all having the same luminance, the perceived brightness increases with saturation, illustrating the Helmholtz-Kohlrausch effect. In fact the slopes of most hue series seem to be coincident, suggesting that the effect of saturation on brightness is equal for all hue series, except for red and blue. Indeed, a customized ANCOVA with Qgeom as dependent variable, the 11 hues as fixed factors and su’v’,10 as covariate, showed that the effect of the interaction term between Qgeom and su’v’,10 is significant, F(10,1058) = 2.155, p < 0.05, while the same analysis with only 9 hues (without red and blue) as fixed factors is not significant, F(8,842) = 0.485, p = 0.867. This indicates that the regression slopes are homogeneous for all colors except red and blue. Although some studies reported that the H-K effect is different or even absent for yellow , this extensive study suggests that the H-K effect, which is clearly visible, is only different for red and blue.
Remarkably, four of the twenty observers, although obtaining good results in the Farnsworth Munsell 100 Hue Test, rated the red stimuli as being less bright compared to the reference stimulus. Although the other colors were rated brighter and their CV values were normal, these four observers indicated, independently from each other, having trouble with rating the brightness of red stimuli. However by converting the brightness values to z-scores using SPSS [19, 20], it seemed that none of these observer results for the red stimuli are outliers, so they were not removed from the experiment.
5.3 Model performances
The brightness predictions according to the six vision models described before were compared to the ‘average observer’ brightness of the stimuli. In Fig. 4, the ‘averaged observer’ brightness, Qgeom, has been plotted against the predicted brightness for each of these models. To assess the variability in brightness perception explained by each model, the coefficient of determination (R2) of the linear regression between the observed and predicted brightness has been determined. A R2 close to 1 suggests a good prediction by the model . Although the relation between the observed and the predicted brightness is expected to be linear, the Spearman correlation coefficient  has also been calculated. The Spearman correlation coefficient, having a value between −1 (perfect negative correlation) and + 1 (perfect positive correlation), is a rank order metric insensitive to the potential nonlinearity of the relation between the observed and predicted values. In Table 3 the statistical results for each model obtained with all stimuli and with the achromatic stimuli only, are summarized.
Table 3 and Fig. 4, the low values of the Spearman correlation coefficient and the low coefficient of determination for all stimuli are striking. It is clear that none of the described models perform satisfactory, in accordance with the conclusion of previous experiments . However, if only the achromatic stimuli are considered (see Table 3, achromatic stimuli), it can be observed that all models perform very well, indicating that the low overall correlation is due to a severe underestimation of the H-K effect. This is also clearly visible in Fig. 4, showing a different slope for the colored and the achromatic stimuli.
5.4 Modified CAM97u
As suggested before, the brightness prediction of CAM97u, Eq. (1), could be improved by increasing the colorfulness weighting factor wM, taking the H-K effect better into account. To determine an optimized colorfulness weighting factor, the ‘average observer’ brightness Qgeom was first rescaled to the original CAM97u (wM = 0.01) brightness scale but only using the data of the achromatic stimuli for which the CAM97u model works well. This ‘rescaled observer’ brightness, Qr, was obtained by multiplying Qgeom with the slope of the linear regression between Qgeom and QCAM97u for all 17 achromatic stimuli.
By minimizing the mean of the squared residual errors between Qr and the brightness values calculated according to Eq. (1), the value of the colorfulness weighting factor wM was optimized from its original value of 0.01 to 0.268. Similar to Eq. (1), the modified brightness model, QCAM97u,m, is given by
When plotting the ‘average observer’ brightness against QCAM97u,m for all stimuli, it is clear that the new model outperforms the former models (see Fig. 5). This is confirmed by the high Spearman correlation coefficient (0.961) and the high coefficient of determination (0.914).
A color appearance model can be considered successful when the error of the model’s prediction is smaller than the observer accuracy in terms of inter-observer agreement . The CV coefficients between the observed brightness rating and the predicted brightness results from the color vision models, including the modified model prediction QCAM97u,m, have been calculated using Eq. (2). The CV values range between 14 and 59 for the six original models and are higher than the inter-observer agreement of 13. However the modified brightness model results in a CV of only 6, which proves again his excellent performance.
6.1 Matching experiment
The performance of the modified CAM97u,m model is verified by a successive matching experiment performed by the same observers and using the same experimental setup. The matching experiment started immediately after the magnitude estimation experiment except for a break of ten minutes. In the experiment, which lasted for about 25 minutes, observers adjusted the intensity of the achromatic reference stimulus until it matched that of the colored stimulus in terms of brightness. For the colored stimuli, only the most saturated red (4) and the most saturated blue (3), yellow (3) and green (3) stimuli have been used (see Fig. 2 (left)). The initial luminance of the reference stimulus, shown in temporal juxtaposition with the colored stimuli, was randomly high or low in order to avoid an initial (il)luminance bias [21, 22]. Observers were able to switch back and forth between the reference and the colored stimulus as much as they wanted to until a satisfactory match was found, but as in the magnitude estimation experiment the colored stimulus was always shown for 15 seconds. Two groups of 10 observers viewed the same sequence of colored stimuli but with an opposite initial reference luminance.
The 10° luminance of the reference was measured after each match and an “average matched reference luminance” was obtained for each colored stimulus by taking the mean of all observer matches. A high initial luminance of the reference mostly resulted in a higher matched reference luminance compared with a low initial luminance (see Table 4). This effect is responsible for a luminance difference of 22% between the two experimental conditions. However, the experiment was set up with both conditions having an equal number of matches. By averaging the results, this type of bias will be neutralized .
A plot of the averaged matched reference luminance versus the saturation suv,10 of the stimuli that were to be matched, is given in Fig. 6 (left), clearly illustrating the H-K effect. In Fig. 6 (right) a plot of the modified CAM97u,m brightness of the matched reference against the corresponding stimuli is given. The figure indicates that the modified CAM97u,m brightness is capable to predict the outcome of the matching experiment. The CV value between both brightness predictions was calculated using Eq. (2), using a scale factor f equal to one because both data sets have the same scale. The low value of 7% confirms the excellent performance of CAM97u,m.
6.2 Magnitude estimation experiment
A decisive magnitude estimation experiment to validate the modified CAM97u,m brightness prediction was set up with 107 stimuli: 15 achromatic stimuli, 40 colored stimuli and 52 ‘random’ stimuli. The luminance of the achromatic stimuli ranged from 5.94 cd/m2 to 297.47 cd/m2 (see Fig. 7 (left)) with a chromaticity close to that of illuminant D65 (mean ∆Eu’v’ = 0.002). The 40 colored stimuli consisted of the four primary hues with both a low and a high saturation, at five luminance levels (see Fig. 7 (left)). The 52 ‘random’ stimuli had a luminance ranged randomly within 6.48 and 57.60 cd/m2 (see Fig. 7 (middle)) and covered the whole chromaticity gamut of the experimental setup (see Fig. 7 (right)).
The experimental method used in this validation magnitude estimation experiment was identical to the one used in the magnitude estimation experiment described above, except for the reference stimulus to which an intermediate luminance of 43.10 cd/m2 was attributed. Twenty observers participated in this experiment. All except two had also participated in the first magnitude estimation and matching experiment. The mean CV values for inter-observer, short term intra-observer and long term intra-observer agreement of this validation magnitude estimation experiment are 18%, 12% and 15%, respectively, and are similar to the values mentioned before.
The geometric mean was used to obtain the observer brightness Qgeom (“average observer”) and was plotted versus QCAM97u,m in Fig. 8. From the coefficient of determination of 0.807, the Spearman correlation coefficient of 0.899, and the CV value of 11%, it is clear that the modified CAM97u,m model gives an excellent prediction for brightness, given by Eq. (4), of colored and achromatic unrelated, self-luminous stimuli covering a wide color gamut and range of luminance levels.
The brightness perception of a set of unrelated, self-luminous colored stimuli having a constant luminance of 6.23 cd/m2, and of a set of achromatic stimuli, having a luminance ranging from 7.54 cd/m2 to 47.60 cd/m2, was investigated in a magnitude estimation experiment with twenty observers. It was found that the Helmholtz-Kohlrausch effect contributed significantly to the observed brightness. The brightness prediction of six existing vision models was investigated but none of the models performed satisfactory. Adapting the CAM97u model by increasing the colorfulness contribution in the brightness attribute resulted in modified model, called CAM97u,m, which allows for a substantially better brightness prediction.
The performance of the new model was confirmed by both a matching experiment and an extensive validation magnitude estimation experiment using a random sequence of stimuli within a wide chromaticity range, including achromatic ones, and within a wide range of luminance. The modified model CAM97u,m clearly outperformed the other existing vision models and was found to give a reliable brightness prediction for unrelated, self-luminous stimuli.
The authors would like to thank the Research Council of the KU Leuven for supporting this research project (STIM - OT/11/056 and OT/13/069). Author K.S. would also like to thank the Research Foundation Flanders for the support through a postdoctoral fellowship.
References and links
1. M. D. Fairchild, Color Appearance Models, Second ed., Wiley-IS&T Series in Imaging Science and Technology (John Wiley & Sons Ltd, 2005).
2. R. W. G. Hunt and M. R. Pointer, Measuring colour, Fourth ed., Wiley-IS&T Series in Imaging Science and Technology (John Wiley & Sons Ltd, 2011).
3. R. W. G. Hunt, Measuring colour, Third ed. (Fountain Press, 1998), pp. 239–246.
4. S. L. Guth, “ATD01 model for color appearances, color differences and chromatic adaptation,” in 9th Congress of the International Colour Association,Proceedings of SPIEVol. 4421, 2002) [CrossRef]
5. C. Fu, C. Li, G. Cui, M. R. Luo, R. W. G. Hunt, and M. R. Pointer, “An investigation of colour appearance for unrelated colours under photopic and mesopic vision,” Color Res. Appl. 37(4), 238–254 (2012). [CrossRef]
6. CIE, A colour appearance model for colour management systems: CIECAM02, (CIE Central Bureau, Austria, 2004).
7. N. Moroney, M. D. Fairchild, R. W. G. Hunt, C. Li, M. R. Luo, and T. Newman, “The CIECAM02 Color Appearance Model,” in 10th Color Imaging Conference, IS&T and SID, (Scottsdale, Arizona, 2002).
8. Y. Nayatani, “Simple estimation methods for the Helmholtz—Kohlrausch effect,” Color Res. Appl. 22(6), 385–401 (1997). [CrossRef]
9. CIE, Supplementary System of Photometry, (CIE Central Bureau, 2011).
11. G. Wyszecki and W. S. Stiles, Color Science, Second ed. (Wiley, 1982), pp. 410.
12. CIE, International Lighting Vocabulary, (CIE Central Bureau, 2011).
13. M. Withouck, K. A. G. Smet, W. R. Ryckaert, M. R. Pointer, G. Deconinck, J. Koenderink, and P. Hanselaer, “Brightness perception of unrelated self-luminous colors,” J. Opt. Soc. Am. A 30(6), 1248–1255 (2013). [CrossRef] [PubMed]
14. S. A. Fotios and C. Cheal, “A comparison of simultaneous and sequential brightness judgements,” Lighting Res. Tech. 42(2), 183–197 (2010). [CrossRef]
15. ASTM International, Standard Test Method for Unipolar Magnitude Estimation of Sensory Attributes, (ASTM International, 2012).
16. P. A. García, R. Huertas, M. Melgosa, and G. Cui, “Measurement of the relationship between perceived and computed color differences,” J. Opt. Soc. Am. A 24(7), 1823–1829 (2007). [CrossRef] [PubMed]
17. B. Koo and Y. Kwak, “Color appearance and color connotation models for unrelated colors,” Color Res. Appl., (2013), doi: 10.1002/col.21857. [CrossRef]
18. M. R. Luo, A. A. Clarke, P. A. Rhodes, A. Schappo, S. A. R. Scrivener, and C. J. Tait, “Quantifying colour appearance. Part I. Lutchi colour appearance data,” Color Res. Appl. 16(3), 166–180 (1991). [CrossRef]
19. IBM SPSS Statistics for Windows, IBM Corp., Armonk, NY, 2012.
20. A. Field, Discovering statistics using SPSS, Third ed. (SAGE, 2009).
21. S. Fotios, K. Houser, and C. Cheal, “Counterbalancing needed to avoid bias in side-by-side brightness matching tasks,” J. Illum. Eng. Soc. 4, 207–223 (2008).