Over the past years there has been increasing evidence that the CIE color rendering index R a fails to correspond to the perceived color quality of many light sources, especially some Light-Emitting-Diodes. Several proposals to update, complement or even replace the CIE R a have therefore been made. The performance of thirteen color quality metrics was evaluated by calculating the average correlation of the metric predictions with the visual scaling of the perceived color quality obtained in several psychophysical studies. Two aspects of perceived color quality were investigated, appreciation (preference or attractiveness) and naturalness. The memory color quality metric (S a) of Smet et al. was found to correlate highly with perceived appreciation (r = 0.88). It was found to be statistically better (p<0.0001) at it than all other metrics. The CIE R a performed the worst. A metric that combines the gamut area index (GAI) and the CIE R a using an arithmetic mean correlated highly with the perceived naturalness of a light source (r = 0.85). It was found to be statistically better at predicting naturalness than all other metrics (p<0.0001). A negative correlation was found, between the capabilities of a light source’s ability to predict appreciation and naturalness, indicating that a complete description of the color quality of a light source probably requires more than one metric.
©2011 Optical Society of America
For decades the impact of a light source on the color appearance of objects has been evaluated with the Color Rendering Index R a (CRI). The method was first specified in 1965 by Nickerson and Jerome , but standardized in a technical document by the Commission Internationale de l’Eclairage (CIE) in 1974 [2,3]. However, over the past years there has been increasing evidence from psychophysical experiments that the CIE color rendering metric correlates poorly with the visual appreciation of the light from many so-called “white” Light-Emitting-Diodes (LEDs) [4–12]. Several new metrics have therefore been proposed, especially in the framework of the CIE technical committee TC1-69 dealing with the color rendering of white light sources. Some of these metrics, like the CIE color rendering index, offer an objective measure of the color rendering properties of a light source. Others, like Judd’s flattery index , Thornton’s color preference index  and the memory color quality metric developed by the Smet et al. [8,15] have focussed on the more subjective aspect of lighting color quality.
Objective metrics that describe the shift in color appearance with respect to an “optimum” reference illuminant (a Planckian radiator or phase of daylight) are often required for many professional applications such as color reproduction, printing and quality control. Such metrics may however not be the best answer to the needs of lighting designers, architects and those in the shop and retail sector. Many times end users and consumers are more interested in the color quality of the lighting in terms of appreciation, i.e. how appealing objects look. A good example is the Neodymium incandescent lamps with a CIE R a value of approximately 77, but “which are popularly sold for twice the price of normal incandescent lamps having a perfect score (R a=100)” [16,17]. Practical cases such as this, as well as the results of several psychophysical studies [5,8,11], investigating the perceived color quality of light sources, clearly indicate that in addition to an objective metric, a metric that describes the more subjective characteristics such as attractiveness and preference would be useful.
Although there have been studies in the past that have investigated the performance of some color quality metrics, they were either limited to the comparison of the correlation of the CIE R a and one or two metrics with the visual results of one or two studies [5,6,18,19]; or to the cross-comparison of several metrics without any reference to the actual perceived color quality obtained in a psychophysical experiment . In this paper the performance of thirteen color quality metrics has been systematically assessed using the scalings of the perceived color quality obtained in several different psychophysical studies. Following a short introduction to each of the metrics and visual studies, the performance of the discussed metrics was evaluated by calculating the correlation of the metric predictions with the perceived color quality obtained in each study. The general performance of a metric was estimated by a weighted average correlation based on all studies. Finally, the average correlations of all metrics were cross-compared to determine the metric that has the best overall ability to predict the perceived color quality of a light source in terms of appreciation (preference and attractiveness) and naturalness.
2. Color quality metrics
2.1. CIE Color Rendering Index, Ra
The CIE color rendering index (CRI) is a color difference metric and is calculated using the Test Sample Method [2,3]. In this method, the color differences, ΔE i, between a set of 14 standard colored cards under the test source and a reference illuminant are calculated in the obsolete 1964 U*V*W* color space. The reference illuminant is calculated as either a blackbody radiator (CCT< = 5000K) or a daylight phase (CCT>5000k) of the same correlated color temperature (CCT) as the test source. Before calculating the color differences ΔE i chromatic adaptation is accounted for with a Von Kries correction. Finally, the specific color rendering indices R i are calculated as:
A Ra-score of 100 means that there are no color differences between the test source and the reference illuminant for any of the first eight samples.
2.2 CAM02UCS CRI with 35 color constant samples, Ra,cam02ucs35
Luo et al. developed a color rendering metric using the CAM02UCS  color space. The CRICAM02UCS  follows the same procedure as the CIE color rendering metric but it has been updated with a new color space and an associated chromatic adaptation transform. The colored test samples and the method for selecting the reference illuminants were kept unchanged.
To further improve the metric, Luo et al. have replaced the set of 14 Munsell samples with a set of 35 different samples. This sample set is composed of 32 specially selected color constant samples and 3 skin solours. Currently Luo et al. are working on an even more comprehensive sample set. In this paper the metric with 35 samples will be referred to as Ra,cam02ucs35 to distinguish it from the published version .
2.3. Color Quality Scale, CQSa,p,f
The Color Quality Scale (CQS a ), developed by the National Institute for Standards and Technology (NIST), is a metric that has addressed some points of critique on the CIE color rendering index in order to better correlate with the visual appreciation of a light source. First of all, the CQS uses a set of more saturated colored cards, because it is possible for a light source to perform well on non-saturated samples while poorly on saturated ones. The reverse was found not to be the case . Secondly, it has also updated the color space (CIELAB) and chromatic adaptation transform (CMCCAT2000 ). Thirdly, the CQS a does not penalize deviations from the reference illuminant that are more chromatic, because as stated by Davis et al.: “evidence suggests that increases in object chroma, as long as they are not excessive, are not detrimental to color quality and may even be beneficial” . The CQS metric kept the method for selecting the reference illuminants unchanged, i.e. blackbody radiators or daylight phases of the same correlated color temperature as the test light source. Fourthly, the arithmetic mean is replaced by a root-mean-square to ensure that even the poor rendering of only one or two samples has a significant impact on the general CQS a value. Fifthly, the CQS a values are rescaled to a 0-100 range to avoid undesired negative values. The rescaling is such that only values below 30 are affected, while high values are only minutely affected, keeping the linearity of the scale at high values intact. Finally, a CCT-factor is applied to penalize light sources with extremely low correlated color temperatures as these light sources have small gamut areas and thus render fewer object colors.
Besides the general CQS a,, which does not penalize increases in saturation, two other indices were developed that give other specific information about the color rendering properties of light sources. For example, the color fidelity scale CQS f and the color preference scale CQS p which respectively penalize and reward increases in saturation.
For a more in-depth description (including all the equations) of the color quality scale the interested reader is referred to an excellent paper by Davis et al. .
2.4 Rank order Color Rendering Index, RCRI
The RCRI, developed by Bodrogi et al. , is a rank order based color rendering index. Based on a series of psychophysical experiments it predicts the visual rating of the perceived color differences, between a set of 17 colored samples illuminated by the test light and by a reference illuminant, on a five step ordinal rating scale R. The rating scale R ranges from ‘excellent’ (R = 1) to ‘very bad’ (R = 5). The reference illuminants are determined using the same method as that of the CIE color rendering metric. Color differences are calculated using the CAM02UCS  color difference formula.
Based on the number of samples predicted ‘excellent’ (N1) and ‘good’ (N2), the ordinal color rendering index (RCRI) is determined as follows:
A more thorough description can be found in a paper by Bodrogi et al. .
2.5 Feeling of Contrast color rendering Index, FCI
Hashimoto et al. developed a color rendering index based on the feeling of contrast (FCI) . The idea is that a light source that increases the feeling of contrast (also sometimes referred to as visual clarity or color discrimination) also increases the saturation of colored objects. Saturation or chroma enhancement is generally considered a positive trait in many lighting applications [7,23,26]. The FCI metric estimates the feeling of contrast as a function of the CIELAB gamut area of the corresponding colors of four specially selected highly chromatic samples (red: 5R4/12; yellow: 5Y8.2/10; green: 5.5G5/8 and blue: 4.5PB3.2/6) under the test light source (GA testsource) and the CIELAB gamut area under a D65 reference illuminant (GA D65):
2.6 Gamut Area Index, GAI
The Gamut Area Index (GAI) of Rea and Freyssinier  is based on the work by Thornton on color saturation and hue discrimination . The idea is basically the same as that of the FCI metric. An increase in the chroma of colored objects or an increase in the color discrimination generally has a positive impact on the perceived color quality [7,23,26]. Instead of using only four highly chromatic samples, the GAI uses the eight Munsell samples used in the calculation of the CIE R a value. The calculations are performed in the CIE 1976 u’,v’ color space and the equal energy stimulus (EES) is chosen as a reference illuminant. The GAI is defined as:
GA testsource and GA EES are respectively the gamut areas of the 8 samples under the test source and the EES reference illuminant.
Rea and Freyssinier suggest that the GAI be used together with the CIE R a and reported that light sources with GAI and CIE R a values between 80 and 100 should ensure a natural and vivid appearance of objects . However, no equation was suggested as to how they could be practically combined. In this paper, a combined metric GAI_R a was therefore defined as follows:
This metric rewards light sources that comply with the suggestions of Rea and Freyssinier.
2.7. Cone Surface Area, CSA
Another gamut area based index is the Cone Surface Area developed by Fotios . It combines the 1976 u’v’ gamut area of the first eight test samples of the CIE color rendering index with the u’v’ chromaticity of the light source. CSA is the surface area of the cone with a base area of the same size as the gamut area (GA) of the first eight test CIE CRI test samples and a height equal to the 1976 w’ chromaticity coordinate. It is calculated as follows:
2.8. Memory Color Quality metric, Sa
The memory color quality metric was developed by the authors [8,15]. Color quality is assessed as the general degree of similarity between the color appearance of a set of ten familiar objects under the test light source and the memory colors of those objects. The similarity of each object under the test source with its memory color is calculated using the similarity distributions obtained in a series of psychophysical experiments . First, for all objects, the tristimulus values under the light source are calculated using the spectral reflectance of the objects and the CIE 10° standard observer. Secondly, the corresponding tristmulus values are calculated under D65, the IPT white point, with the CAT02 chromatic adaptation transform. The decrease in the degree of chromatic adaptation for more saturated sources is taken into account by using the Dm-factor proposed by Lee et al. . Thirdly, the corresponding tristimulus values are transformed to IPT chromaticity coordinates, X i = (P i,T i). Fourthly, the function values of the corresponding similarity distributions S i(X i) are calculated with the object chromaticities X i as input, resulting in a set of ten S i values describing the degree of similarity with each object’s memory color:
The model parameters a i,1-5 describe the location, shape, size and orientation of the similarity distribution S i(X i) . Finally, the general degree of memory color similarity S a, assumed to be a measure for color quality, is obtained by taking the geometric mean of the ten individual S i values.
A S a score of 1 means that the light source renders all familiar objects exactly as we expect them to look.
2.9 Judd’s Flattery Index, Rf
In 1967, Judd proposed a flattery index to supplement the CIE color rendering index as proposed by Nickerson and Jerome , because of concerns that the CIE “color rendering index of a light source may correlate poorly with public preference of the source for general lighting purposes” . Judd based his flattery index on the work of Sanders  and Newhall  on prefered and memory colors. Unlike the memory color quality index of Smet et al. , Judd’s Flattery is a reference illuminant based metric and is, with a few exceptions, very similar to the CIE color rendering index. Judd calculated a prefered chromaticity shift for 10 of the 14 Munsell samples (samples 1 to 8 and samples 13, 14) to correct the chromaticity of the reference illuminant. The method for selecting the reference illuminant is the same as that used in the CIE color rendering index. Judd’s flattery index (Rf) is calculated as follows:
The average color difference is a weighted aritmethic mean of the chromaticity differences between the chromaticities of the ten samples under the test source and the chromaticities of the samples under the reference illuminant corrected for by one fifth of the prefered chromaticity shift. Judd used only one fifth of the of the calculated prefered shift in order to assign the reference illuminant a Rf value of 90, while keeping the CIE color rendering scale factor of 4.6. A light source can therefore score higher than it’s reference, but not higher than 100. All chromaticities are calculated in the CIE 1960 uniform color space. A detailed description of the flattery index can be found in Judd’s 1967 paper .
2.10 Thornton’s Color Preference Index, CPI
Thornton’s color preference index (CPI)  is very similar to Judd’s flattery index, except for a few differences. First, only the first 8 Munsell samples are used. Second, the original magnitude of the prefered chromaticity shift calculated by Judd is retained. Third, all samples have the same weighting. Fourth, the maximum score of a light source is 156 and illuminant D65 is assigned a value of 100. The CPI is calculated as follows:
3. Psychophysical studies
In this section, several psychophysical studies that have investigated the perceived color quality in terms of appreciation (attractiveness or preference) or in terms of naturalness are introduced. Although some other studies, than the ones listed below, are available in the literature, only those studies were used for which both the spectral power distributions and the visual scalings of the perceived color quality were available to the authors. As this is not a review, it is not the intent of this paper to describe all the intricate details of each of these studies. The short introduction of each study merely serves as a quick reference to the type and number of light sources investigated, the number of observers participating in the experiment and what was actually determined in the study in terms of color quality. The scalings of the perceived color quality of each of the light sources from all studies are summarized in Table 1 at the end of this section (see subsection 3.8). Based on these scalings the general performance of a color quality metric to predict the color quality of a light source was evaluated.
3.1. Smet et al
The perceived color quality of six 2700 K light sources was determined in a paired comparisson experiment . The six light sources, a mixture of conventional and solid-state sources, were a halogen (Ha), a Neodymium incandescent source (Nd), a F4 fluorescent lamp (F4), a Fortimo LED with a green filter (FG), a RGB LED (RGB) and a tetrachromatic LED cluster optimized for the memory color quality metric S a (LC). The chromaticities of the objects presented under the light sources spanned the entire hue circle. The illumination level at the object location was 248 lux ± 2%. A group of 92 color normal observers participated in the experiment. The observers were asked to rate, on a 7 point scale, the difference in color quality for each presented pair in terms of five quality descriptors: preference, fidelity, vividness, naturalness and attractiveness. Based on the observer ratings a scaling for the color quality of the light sources was obtained.
3.2 Jost-Boissard et al
3.2.1 Jost-Boissard et al.: 3000 K light sources
In the first series, nine light sources of approximately 3000 K were compared in a forced choice comparison experiment . The nine sources were a halogen (Ha), a fluorescent source (Fl) and eight LED clusters. The LED clusters were composed of combinations of two or more LEDs (White phosphor LED, Red, Amber, Green, Cyan and Blue). The chromaticities of the objects presented under the light sources did not span the entire hue circle. As shown by Smet et al. , object sets that do not take into account the entire hue circle can have an important impact on the perceived color quality, possibly even masking the correlation with metric predictions. Metric calculations were therefore adjusted to correct for the lack of blue and purple objects in the experiment, i.e. the metric test samples having a blue or purple hue were omitted when calculating the metric scores . The illumination level at the object location was 230 lux ± 3%. A group of 45 color normal observers participated in the experiment. The observers were shown all possible light source combinations in a random order using a double viewing booth setup. For each lighting combination observers had to assess which light source had the best lighting quality. Based on these data an interval scale for color quality in terms of attractiveness and naturalness was established.
3.2.2 Jost-Boissard et al.: 4000 K light sources
A similar experiment was performed with eight light sources of approximately 4000 K . The light sources were a fluorescent source (Fl) and seven LED clusters. A group of 36 color normal observers participated in this experiment. The illumination level at the object location was 210 lux ± 3%. An interval scale for color quality in terms of attractiveness and naturalness was again established.
3.3 TETRA project: “Retail Design and Light: from number to emotion.”
As part of a technology transfer (TETRA) project of the fund for Innovation through Science and Technology (IWT) two psychophysical experiments were conducted in which a group of 30 color normal observers had to rank six light sources in order of the perceived color quality in terms of attractivity and naturalness. The six light sources, of approximately 3000 K, were a halogen (Ha), a cold white Fortimo LED adjusted with a filter to reduce the CCT to 3000 K (FoCW), a warm white Fortimo LED (FoWW), a RGB LED cluster, a cdm830 (CDM) and a Regent PAL LED module. The order of the light sources was switched between observers using a latin-square setup to avoid possible biasing effects. The illuminance level was 310 lux ± 3%. The averaged observer rankings for each light source provided a simple scaling for the color quality in terms of attractiveness and naturalness .
3.4 Schanda and Madár
A visual experiment was performed by Schanda and Madár to evaluate the color quality of thirteen light sources with respect to D65, which was taken as the prefered reference . The CCT of the light sources ranged from 2788 K to 9310 K. Instead of using real objects illuminated by a real light source, a colored scene was simulated on a calibrated CRT monitor. The simulated scene was calculated from the hyperspectral image and the spectral power distribution of the light source. A hyperspectral image is an image where the spectral reflectance is known for each pixel. The light sources simulated in this experiment were CIE illuminants D65, A, FL3.5, FL3.12, FL2, FL4,FL7 and FL11. Their CCTs ranged from 2856 K to 6505 K. In addition to these CIE illuminants the following illuminants were also simulated: FILamp, a warm white and cold white phosphor LED of approximately the same CCT and three RGB LEDs of the same CCT but decreasing CIE CRI Ra values. Unfortunately, the spectral power distributions of these sources were not available to the authors. The data set used in this paper is therefore composed of the eight CIE illuminants and their rankings as reported in the paper by Schanda and Madár .
3.5 Szabo and Schanda
In a paired comparison experiment the perceived color quality of five light sources were compared by Szabo and Schanda . The light sources used were a halogen (Ha), a white phosphor LED with a high R a (pLEDH), a white phosphor LED with a low R a (pLEDL), one RGB cluster (RGB) and one compact fluorescent source (CFL). All light sources had a CCT of approximately 3000 K. The luminance in the viewing plane was reported to be 170 cd/m2 ± 5%, corresponding to an illuminance level of 530 lux. In a double booth setup a picture of a woman holding a glass was shown. In one booth the picture was illuminated by a halogen and in the other the picture was illuminated with one of the other four light sources. Observers were unaware which picture was illuminated with the halogen reference source. They were asked to quantify their preference for one or the other booth on a continuous scale, resulting in a relative scaling of the five light sources. The experiment was also repeated for naturalness, vividness and liveliness.
3.6 Narendran et al
Narendran et al. investigated the perceived color quality of five LED based aircraft reading lights in comparison with a halogen (Ha) or an incandescent (Inc.) reference source . The five LED sources were a high power LED (HP), an amber white LED (AW), a phosphor white LED (PW), a RGB cluster with low R a (RGBL) and a RGB cluster with high R a (RGBH). Thirthy color normal observers participated in the experiment. The perceived color quality in terms of ‘general preference’ was determined in a paired comparison experimental setup with a halogen or incandescent source as a reference and in an absolute scaling experiment. In the latter experiment each light source was shown individually and the observers had to scale how much they prefered the perceived color quality on a −3 (‘strongly disliked’) to + 3 (‘strongly liked’) scale. A rating of zero indicated the lighting was just acceptable for the aircraft reading application. The illuminance level was approximately 200 lux.
3.7 Rea and Freyssinier
In two series of visual experiments Rea and Freyssinier examined the perceived color quality of three cold white and three warm white LED sources in terms of naturalness, vividness and acceptibility . The warm and cold white LEDs had CCT ranges of respectively 2800 K to 3200 K and 4500 K to 4800 K. For each CCT range, the sources were constructed by mixing light from among nine commercially available light sources such that one had a high R a (≥80) and low GAI (<65) (CW5 & WW5), one had a low R a and a high GAI (CW6 & WW6) and one had a high R a and GAI (CW7 & WW7). The illuminance level was approximately 355 lux. Eighteen color normal observers participated in the experiment. Observers had to rate the naturalness, vividness and acceptability on a −5 (‘strongly disagree’) to + 5 (‘strongly agree’) scale. However, preference was not examined and acceptability was found to correlate strongly with naturalness.
3.8. Perceived color quality scalings of light sources in the individual studies
4. Metric performance analysis
In this paper, the performance of a metric to predict the color quality of a light source was measured as the Spearman correlation between the metric predictions and the scalings of the perceived color quality obtained in a study. Although statistical significance testing and a cross comparisson of metric performances could be done seperately for each study, the limited number of light sources in most of the studies might make it difficult to single out any genuine effects. The overall performance of a metric was therefore evaluated by combining the correlations of all studies into a single average correlation. Two methods are commonly used in the literature : the fixed and random effects model of Hedges and Olkin  and Hedges and Vevea  and the method of Hunter and Schmidt . In this paper, the average correlation was calculated using the method of Hunter-Schmidt, because it generally gives a better, although underestimated (especially for higher correlation sizes), estimate. Field reported underestimation values of 5 to 10 percent for medium to large correlation sizes . For the purpose of this paper, comparing the performances of different metrics and finding the best one, an underestimation of the ‘true’ correlation was preferred as it represented the worst case scenario.
The analysis of the metric correlations was performed as follows. First, following the method of Hunter-Schmidt, the average correlation coefficient was calculated for each metric by weighting the individual correlations r i by the number of samples N i in each study:
Secondly, the standard error on the average correlation coefficient SEr, corrected for the sampling variance error, was calculated as follows:
Thirdly, the statistical significance of H 0: = 0 (‘no correlation’) was estimated by calculating the p-value from the z-score of the average correlation coefficient, Zr:
Finally, the performances of the metrics, estimated by their average correlation coefficients and standard errors SEr, were then compared using the method of Meng, Rosenthal and Rubin for comparing correlated correlation coefficients . A null-hypothesis of ‘no difference between the two compared metrics’ was postulated, i.e. H 0: ρ metric1 = ρ metric2.
The performance of a metric was also assessed by looking at the number of transpositions (errors) between the visually determined rank order of the light sources and the rank order predicted by the metric. The number of transpositions was counted using the scoring system of the Farnsworth-Munsell 100-Hue test .
The Spearman correlation coefficients between the metrics predictions and the perceived color quality of the light sources in each of the studies are shown in Table 2 . The average correlation coefficients and the p-values are given in the last two columns of Table 2.
The two-tailed p-values for the cross comparisons between the average correlation coefficients of all the metrics, as calculated with the method of Meng, Rosenthal and Rubin, are shown in Table 3 .
The performance of a metric, measured by counting the number of erroneous transpositions between the rank order found in the psychophysical study and the rank order predicted by the metric is given in Table 4 . The mean and the standard deviation σe of the number of erroneous transpositions of each metric are given in the last two columns.and its standard deviation σe are tabulated in the last two columns.
The performance of the metrics is discussed in two different subsections, one for the color quality in terms of appreciation (preference or attractiveness) and one for the color quality in terms of naturalness.
6.1. Color quality: appreciation (preference / attractiveness)
From Table 2 it is clear that the 13 metrics showed large differences in performance. Of all the metrics, the CIE R a had the worst average correlation (r = 0.17, p = 0.086). Updating the CIE R a with a modern chromatic adaptation transform, a more uniform color space and different set of test samples had only a small influence on the performance, as can be seen from the low correlations of the R a,cam02ucs,35 and CQS f metrics. The poor correlation for these strict color difference based metrics is probably due to the use of the non-optimal CIE reference illuminants. Strict color difference based metrics penalize any deviation from these CIE illuminants. It is however clear from Table 1 that many light sources did score visually better than their halogen reference sources. Not all deviations should therefore be considered bad. It is known for years that chroma enhancement often has a favourable impact on the perceived color quality . The CQS a and CQS p metrics take this into account by respectively not-penalizing and even rewarding chroma increases. It is clear from Table 2, that such approaches, especially the latter, had a positive impact on the performance of these metrics. Gamut area based metrics, like the FCI, the GAI, the GAI_R a and the CSA, also take this effect into account, because an increase in chroma of the test samples is often accompanied by an increase in the gamut area. Compared to the strict color difference metrics, the gamut area based metrics did indeed show an increased correlation with the perceived color quality. The RCRI metric, which predicts the color quality based on psychophysical experiments showed only a slight, but non-significant, improvement compared to the color difference metrics. All other metrics performed better, but only for the S a and CQS p metrics were the results statistically significant. Other metrics that are directly or indirectly based on psychophysical experiments are the memory color based metrics (S a, R f and CPI). The memory color quality metric developed by Smet et al. showed the highest correlation with the perceived color quality (r = 0.88, p = 0). The other two color memory based metrics (R f and CPI) showed a poor correlation (resp. 0.29 and 0.39). This can be explained by the fact that these metrics do not use memory colors directly, but rather a ‘preferred chromaticity shift’. The assumption that the corrected chromaticity of a Munsell sample is indeed the sample’s memory color is questionable, as colored cards generally do not have a memory color associated with them. Furthermore, once both metrics have established an optimal chromaticity they use a Euclidean color difference in a rather non-uniform color space. However, it is expected and shown by Smet et al. that color tolerances for hue and chroma are not alike. People are generally more tolerant of chroma differences than they are of hue differences . These tolerance differences are implicitly taken into account by the similarity distributions in the memory color quality metric of Smet et al. , resulting in a much better correlation with the visual appreciation of a light source.
The results in Table 4, confirmed the findings from the correlation analysis. The memory color quality metric S a had a very low average transposition error (te = 6.0, σ = 2.6), while the color difference based metrics had the largest (te≈18, σ≈12). Gamut area metrics, that take into account the possibly positive effect of chroma enhancement, had lower transpositions errors (te≈14, σ≈9), although they were still more than twice the size of the transposition error of the memory color quality metric S a.
Examining the p-values in Table 3, it is clear that the memory color quality metric was statistically significantly better (p < 10−4) than all other investigated metrics at predicting the perceived appreciation of a light source. No statistical differences were found between the three strict color difference based metrics (CIE R a, R a,cam02ucs,35 and CQS f). The CQS a correlated statistically better than the three strict color difference metrics with the perceived appreciation. Only the CQS p and S a metrics were found to be statistically better than the CQS a. With a rather low correlation (r = 0.58) the CQS p was still found to be statistically better than the CIE R a, the CQS a, the CQS f, the RCRI, the FCI, the CSA and the R f metrics. The FCI was only statistically better than the R a,cam02ucs,35 and CSA metrics. The combined GAI_R a metric outperformed the CIE R a, the R a,cam02ucs,35 and the CQS f. Finally, Judd’s flattery index was found to be statistically better than the R a,cam02ucs,35 and the CQS f.
To summarize, in terms of perceived appreciation, the S a metric was statistically the best at predicting the perceived color quality of a light source (r = 0.88), followed by the CQS p (r = 0.58). The color difference based metrics performed the worst (r≈0.20).
6.2. Color quality: naturalness
The same analysis was performed on the average correlations between the metric predictions and the perceived color quality in terms of naturalness.
From Table 2, it is clear that the combined metric GAI_R a had the best overall performance (r = 0.85, p < 10−4). It was found to be statistically better at predicting the perceived naturalness of a light source than all other metrics, as can be seen from the p-values in Table 3. The color difference based metrics (CIE R a, R a,cam02ucs,35 and CQS f) had a medium correlation to the perceived naturalness (r≈0.65). They were statistically better at predicting naturalness than all gamut area metrics, except GAI_R a. They were also better than most memory color based metrics. No significant differences were found between the RCRI and the color difference based metrics. It was however statistically better than all gamut area based and memory color based metrics, except for the GAI_R a and R f metrics.
The results from Table 4 again confirmed the findings of the correlation analysis. The combined GAI_R a metric had the lowest transposition error (te = 5.9, σ = 7.3), although the standard deviation was large. It was respectively followed by the color difference based metrics (te≈9, σ≈8) and the other gamut area based metrics (te≈13, σ≈13).
Finally, predictive performance in terms of naturalness was found to be roughly negatively correlated with the predictive performance for preference (r = −0.44, p = 0.13). Therefore, a metric that performs well for one aspect of color quality will not perform well for the other. This confirmed the finding of Rea and Freyssinier  that a complete description of the color quality of a light source will probably require more than one metric.
The performance of thirteen color quality metrics was evaluated in terms of their ability to predict the perceived color quality in terms of appreciation (attractiveness or preference) and in terms of naturalness. The overall performance of a metric was estimated by averaging the Spearman correlation coefficients between the metric predictions and the scalings of the perceived color quality obtained in several psychophysical studies. The S a memory color quality metric of Smet et al. showed the highest correlation with perceived appreciation (r = 0.88). It was significantly better than all other metrics (p < 0.0001). The CQS p metric performed second best, but with a substantially lower correlation (r = 0.58). Unsurprisingly, the CIE R a and the other two strict color difference based metrics (Ra,cam02ucs,35 and CQS f) showed a poor correlation with the perceived appreciation (r≈0.20). These results suggest that the recently developed S a metric could be considered a good alternative to predict the color quality of a light source in terms of attractiveness or preference.
The naturalness aspect of color quality was best predicted by the combined GAI_R a metric (r = 0.85). It found to be statistically better (p < 0.0001) than all other metric in predicting naturalness. The worst at predicting naturalness was the FCI (r = −0.17), closely followed by the GAI (r = 0.06). The strict color difference based metrics showed only a moderate correlation (r≈0.65).
A moderate negative correlation (r = −0.44) was found between the predictive abilities of a metric for color quality in terms of appreciation and naturalness, thereby confirming the finding of Rea and Freyssinier that a complete description of all aspects of the color quality of a light source will probably require more than one metric .
The contributions of the authors who have kindly shared the spectral power distributions of the light sources and the scalings of the perceived color quality from their psychophysical experiments are gratefully acknowledged.
References and links
1. D. Nickerson and C. W. Jerome, “Color rendering of light sources: CIE method of specification and its application,” Illum. Eng. 60, 262–271 (1965).
2. CIE, “Method of Measuring and Specifying Color Rendering Properties of Light Sources,” in CIE13.2–1974(CIE, Vienna, Austria, 1974).
3. CIE, “Method of Measuring and Specifying Color Rendering Properties of Light Sources,” in CIE13.2–1995(CIE, Vienna, Austria, 1995).
4. P. Bodrogi, P. Csuti, P. Hotváth, and J. Schanda, “Why does the CIE Color Rendering Index fail for White RGB LED Light Sources?” in CIE Expert Symposium on LED Light Sources: Physical Measurement and Visual and Photobiological Assessment (Tokyo, Japan, 2004).
5. S. Jost-Boissard, M. Fontoynont, and J. Blanc-Gonnet, “Perceived lighting quality of LED sources for the presentation of fruit and vegetables,” J. Mod. Opt. 56(13), 1420 (2009). [CrossRef]
6. N. Narendran, and L. Deng, “Color Rendering Properties of LED Light Sources,” in Solid State Lighting II: Proceedings of SPIE (2002).
7. M. S. Rea and J. P. Freyssinier-Nova, “Color rendering: A tale of two metrics,” Color Res. Appl. 33(3), 192–202 (2008). [CrossRef]
8. K. A. G. Smet, W. R. Ryckaert, M. R. Pointer, G. Deconinck, and P. Hanselaer, “Memory colors and color quality evaluation of conventional and solid-state lamps,” Opt. Express 18(25), 26229–26244 (2010). [CrossRef] [PubMed]
9. F. Szabó, J. Schanda, P. Bodrogi, and E. Radkov, “A Comparative Study of New Solid State Light Sources,” in CIE Session 2007 (2007).
10. T. Tarczali, P. Bodrogi, and J. Schanda, “Color Rendering Properties of LED Sources,” in CIE 2nd LED Measurement Symposium (Gaithersburg, 2001).
11. Y. Nakano, H. Tahara, K. Suehara, J. Kohda, and T. Yano, “Application of multispectral camera to color rendering simulator ” in Proc. of 10th Congress of the International Color Association (AIC05) (Grenada, Spain, 2005).
12. CIE, “TC 1-62: Color Rendering of White LED Light Sources,” in CIE 177:2007 (CIE, Vienna, Austria, 2007).
13. D. B. Judd, “A flattery index for artificial illuminants,” Illum. Eng. 62, 593–598 (1967).
14. W. A. Thornton, “A validation of the color preference index,” Illum. Eng. 62, 191–194 (1972).
15. K. Smet, W. R. Ryckaert, M. R. Pointer, G. Deconinck, and P. Hanselaer, “Color appearance rating of familiar real objects,” Color Res. Appl. 36(3), 192–200 (2011). [CrossRef]
16. W. Davis and Y. Ohno, “Approaches to color rendering measurement,” J. Mod. Opt. 56(13), 1412–1419 (2009). [CrossRef]
17. Y. Ohno, and W. Davis, “Color Quality and Spectra,” in Photonics Spectra (2008). [PubMed]
18. J. Schanda, and G. Madár, “Light source quality assessment,” in CIE 26th Session 2007 (CIE, Bejing, China, 2007), pp. D1–72 −75.
19. F. Szabó, P. Csuti, and J. Schanda, “Color preference under different illuminants—new approach of light source coloour quality,” in Light and Lighting Conference with Special Emphasis on LEDs and Solid State Lighting (CIE, Budapest, Hungary, 2009), pp. PWDAS-43.
20. X. Guo and K. W. Houser, “A review of color rendering indices and their application to commercial light sources,” Lighting Res. Tech. 36(3), 183–199 (2004). [CrossRef]
21. M. R. Luo, G. Cui, and C. Li, “Uniform color spaces based on CIECAM02 color appearance model,” Color Res. Appl. 31(4), 320–330 (2006). [CrossRef]
22. M. R. Luo, “The quality of light sources,” Color. Technol. 127, 75–87 (2011). [CrossRef]
23. W. Davis and Y. Ohno, “Color quality scale,” Opt. Eng. 49(3), 033602 (2010). [CrossRef]
24. C. Li, M. R. Luo, B. Rigg, and R. W. G. Hunt, “CMC 2000 chromatic adaptation transform: CMCCAT2000,” Color Res. Appl. 27(1), 49–58 (2002). [CrossRef]
25. P. Bodrogi, S. Brückner, and T. Q. Khanh, “Ordinal scale based description of color rendering,” Color Res. Appl. n/a (2010).
26. K. Hashimoto, T. Yano, M. Shimizu, and Y. Nayatani, “New method for specifying color-rendering properties of light sources based on feeling of contrast,” Color Res. Appl. 32(5), 361–371 (2007). [CrossRef]
27. J. P. Freyssinier-Nova, and M. S. Rea, “A two-metric proposal to specify the color-rendering properties of light sources for retail lighting.,” in Tenth International Conference of Solid-State Lighting, Proceedings of SPIE (San Diego, CA, 2010), p. 77840V.
29. S. A. Fotios, “The perception of light sources of different color properties,” (Univ. of Manchester, Inst. of Sci. and Technol., Manchester, UK, 1997).
30. S. H. Lee, M. H. Lee, and K. I. Sohng, ““Factors of incomplete adaptation for color reproduction considering subjective white point shift for varying illuminant,” IEICE Trans. Fundamentals 91A, 1438–1442 (2008). [CrossRef]
31. C. L. Sanders, “Color preferences for natural objects,” Illum. Eng. 54, 452–456 (1959).
32. S. M. Newhall, R. W. Burnham, and J. R. Clark, “Comparison of successive with simultaneous color matching,” J. Opt. Soc. Am. 47(1), 43–54 (1957). [CrossRef]
33. S. Jost-Boissard, M. Fontoynont, and J. Blanc-Gonnet, Personal communication (2009).
34. J. Vanrie, Appendix 4: Technical report to the user committee of the IWT-TETRA project (80163): The effect of the spectral composition of a light source on the visual appreciation of a composite objectset (PHL, Diepenbeek, Belgium, 2009), p. 11.
35. M. S. Rea and J. P. Freyssinier, “Color rendering: Beyond pride and prejudice,” Color Res. Appl. 35(6), 401–409 (2010). [CrossRef]
36. A. Field, Discovering Statistics Using SPSS (SAGE Publications Ltd, London, UK, 2009).
37. L. V. Hedges, and I. Olkin, Statistical Methods for Meta-analysis (Academic Press, San Diego, CA, 1985).
38. L. V. Hedges and J. L. Vevea, “Fixed- and random-effects models in meta-analysis,” Psychol. Methods 3(4), 486–504 (1998). [CrossRef]
39. J. E. Hunter, and F. L. Schmidt, Methods of Meta-analysis:Correcting Error and Bias in Research Findings (Sage Publications, Inc., Newbury Park, CA, 2004).
40. X. L. Meng, R. Rosenthal, and D. B. Rubin, “Comparing correlated correlation coefficients,” Psychol. Bull. 111(1), 172–175 (1992). [CrossRef]
41. D. Farnsworth, The Farnsworth-Munsell 100-Hue Test for the Examination of Color Discrimination (Munsell Color Company, Inc., 1957).
42. Y. Ohno, “Spectral design considerations for white LED color rendering,” Opt. Eng. 44(11), 111302 (2005). [CrossRef]