Abstract
We compared the ability of three model observers (nonprewhitening matched filter with an eye filter, Hotelling and channelized Hotelling) in predicting the effect of JPEG and wavelet-Crewcode image compression on human visual detection of a simulated lesion in single frame digital x-ray coronary angiograms. All three model observers predicted the JPEG superiority present in human performance, although the nonprewhitening matched filter with an eye filter (NPWE) and the channelized Hotelling models were better predictors than the Hotelling model. The commonly used root mean square error and related peak signal to noise ratio metrics incorrectly predicted a JPEG inferiority. A particular image discrimination/perceptual difference model correctly predicted a JPEG advantage at low compression ratios but incorrectly predicted a JPEG inferiority at high compression ratios. In the second part of the paper, the NPWE model was used to perform automated simulated annealing optimization of the quantization matrix of the JPEG algorithm at 25:1 compression ratio. A subsequent psychophysical study resulted in improved human detection performance for images compressed with the NPWE optimized quantization matrix over the JPEG default quantization matrix. Together, our results show how model observers can be successfully used to perform automated evaluation and optimization of diagnostic performance in clinically relevant visual tasks using real anatomic backgrounds.
©2003 Optical Society of America
1. Introduction
A number of studies have evaluated the effect of JPEG image compression on diagnostic decision in x-ray coronary angiograms. Rigolin et al. [1] have examined the effect of compression on quantitative angiographic analysis of phantom coronary stenosis. Other studies have examined the effect of JPEG compression on human visual detection of morphologic features [2–6]. In general these studies seem to agree that JPEG compression ratios of up to 10:1–12:1 are acceptable while the studies differ on whether 15:1 significantly degrades image quality [2–6]. Newer wavelet based compression algorithms have also been proposed as alternatives to the discrete cosine transform based JPEG algorithm. In a previous study [6], we compared the JPEG algorithm with a specific wavelet algorithm (Crewcode [7]) on their effect on the detectability of simulated filling lesions (thrombi, ulcerations, a bridging lumen and the classification of stenosis) embedded in real x-ray coronary angiograms. The study found a superiority of the standard JPEG algorithm for the tasks evaluated. The development and availability of new compression algorithms require new psychophysical studies to evaluate and compare them to standard algorithms. As the number of algorithms becomes large, thorough psychophysical evaluation of all algorithms and compression ratios becomes impractical. For this reason, researchers have sought to develop a computer metric of image quality that can be computed from the test images and that reliably reflects the degradation of diagnostic information in the compressed images for any arbitrary compression algorithm and ratio [8]. Such metrics could potentially be used for automated computer evaluation of image compression algorithms as well as optimization of parameters of the compression algorithms.
The root mean square error (RMSE) between an uncompressed and compressed image has been commonly used to measure image quality as a function of compression ratio [9–11]. A different approach to quantifying image quality associated with lossy compression is to use image discrimination models (i.e. perceptual differences models) developed in the field of human vision [12–16]. Such models attempt to quantify the human perceptual difference between an original image and a distorted version of the image. The models have been successfully used to predict human performance discriminating degraded images (through image compression) from the original images [15]. In addition, the models have also been used to optimize the quantization matrices of JPEG with respect to the image discrimination task [16]. One potential limitation of the use of RMSE and/or image discrimination/perceptual differences models in the current context is that it is unclear how predicting an observer’s ability to discriminate between an original image and its degraded version relates to performance detecting a low contrast signal within the image.
Other investigators have proposed a number of models of human visual detection in noise that generate explicit predictions about the detectability of a signal embedded in a noisy background [17–34]. These models focus on the relationship between the signal to be detected and the properties of the image noise. Previous investigators have compared these model observers’ performance to human performance in computer generated noise including white noise [17–21], filtered white noise [22], backgrounds with random inhomogeneities (lumpy backgrounds [23]), combinations of white and low-pass filtered noise (2-component noise) [24,25], and real anatomical backgrounds [26–28]. A very recent study investigated the effect of image compression on a model observer (non-prewhitening matched filter) in white noise [29]. They found that performance degradation for this model for increasing wavelet image compression varied with the signal contrast and size of the signal. However, model performance was not compared to human performance and the model observer results might not generalize from white noise to real anatomic backgrounds.
The purpose of the present paper is to apply model observers in order to predict the effect of JPEG and wavelet-Crewcode image compression algorithms on human performance. Our previous psychophysical studies have shown that human performance in a number of clinically relevant visual tasks degrades less with the JPEG algorithm vs. the wavelet based Crewcode algorithm [6]. In this study, we use three different model observers: nonprewhitening matched filter with an eye filter (NPWE), Hotelling observer (HOT), and channelized Hotelling model (CH-HOT). In addition, for comparison we use the root mean square error (RMSE) and a particular image discrimination model (DC-TUNE2.0). In the first part of the present paper, we evaluate the ability of these metrics of image quality (3 model observers, the RMSE metric and an image discrimination model) to predict the effect of JPEG and wavelet-Crewcode compression on human performance visually detecting simulated filling defects (thrombi) within simulated arterial segments embedded in x-ray coronary angiograms. In the second part of the paper we attempt to perform automated computer optimization of the quantization matrix parameters of the JPEG compression algorithm using visual detection performance of a model observer as the figure of merit.
In the next section, we present the theory behind the model observers used in the present paper. A comprehensive treatment of model observers for synthetic and real backgrounds can be found elsewhere [30, 31].
2. Theory of model observers for signal known exactly tasks
2.1 Description of models
In a signal known exactly task (SKE), the signal does not vary from trial to trial and the observer knows a priori the signal and the possible signal locations [10]. Most model observers for signal known exactly tasks (SKE) are linear models. For a multiple alternative forced choice task (MAFC) task where the signal is present in one of M locations, the linear models compute the correlation between a template and the data at the different image locations:
where w(x,y) is the 2-dimensional template, g_{m}(x,y) is the data at the m^{th} location and λ_{m} is the model’s template scalar response associated with the m^{th} location.
Other authors [8] have found it useful to express the correlation as a matrix multiplication. In this context, the N ×; N image region and template are represented as a N^{2} × 1 column vector. Equation 1 becomes:
where w ^{t} and g are vectors and the superscript t refers to the transpose.
The various models that have been proposed diverge in the amount of prior knowledge about the signal and/or noise statistics used to define their associated templates. In addition, templates vary in the number of built-in components that attempt to reflect the physiological constraints imposed by the human visual system (e.g. contrast sensitivity function or spatial frequency and orientation tuned channels). Following, we describe in more detail the three different model observers used in the present paper.
2.1.1. Non-prewhitening matched (to the uncompressed signal) filter with an eye filter (NPWE)
The non-prewhitening matched filter model is perhaps one of the most well-known in medical imaging [32] and consists of a template that exactly matches the signal profile: w(x,y)=s(x,y). Ishida [33] and Burgess [34] have added a front filter to this model (“eye filter”) to reflect the varying sensitivity of the humans to different spatial frequencies due to the optical and neural properties of the visual system. The modified model is often referred as the nonprewhitening matched filter with an eye filter (NPWE). The effective template used by the observer is given by:
where FFT^{-1} is the inverse fast-Fourier transform, s(u,v) is the signal amplitude in the frequency domain and E(u,v) is the contrast sensitivity function and is given by:
where f=sqrt(u^{2}+v^{2}) is the radial spatial frequency in cycles per degree, c=0.98, γ=0.68 and ρ=1.5.
The eye filter is different to that used by Burgess. The current eye filter is based on contrast matching of sinusoidals after adaptation to power law noise [35]. Also, in the current paper the NPWE model uses the profile of the signal in the uncompressed images for all compression levels. However, since compression does affect the signal profile, a different version of the NPWE model could use a signal template that matched the compressed version of the signal for each compression condition.^{1}
2.1.2. Hotelling observer (with square spatial windowing)
The template of the Hotelling observer takes into account knowledge about not only the signal profile but also the background statistics [8, 23, 25, 27]. In correlated noise, the Hotelling observer derives a template that effectively decorrelates the noise prior to matched filtering. The statistical properties of the noise are characterized with the covariance matrix. When the noise is computer generated then the covariance matrix is known in advance and the template for the Hotelling observer, w _{h} can be derived as follows [8]:
where K ^{-1} is the inverse of the image covariance matrix, a N^{2}×N^{2} matrix, <g _{s}> is the mean signal plus background vector, and <g _{b}> is the mean background vector.
When the noise backgrounds are real anatomical backgrounds rather than computer simulated then the covariance needs to be calculated directly from the images. However, estimating a covariance matrix for a large image (e.g. in our case, 512×512) becomes computationally intractable due to the large number of samples needed. One approach to reduce the dimensionality of the covariance matrix is to use square windows (in our case 12× 12pixels) centered at the possible signal locations. The calculation of the resulting 144×144 covariance matrix was based on 1200 samples of 12×12 pixel regions. The images used to derive the template were different from the images in the testing set. An alternative approach to reduce the dimensionality of the covariance matrix is using a set of basis functions such as Laguerre-Gauss but was not investigated in this paper [36].
2.1.3. Channelized Hotelling (with Gabor channels)
The channelized Hotelling observer [8, 24, 25, 26, 37, 38] consists of a set of spatial frequency channels that attempt to reflect the existence of spatial frequency tuned cells in the primary visual cortex. The detection process is constrained by a reduction in information content by the processing through the channels. The channelized Hotelling template is the optimal template that can be derived from a linear combination of the channels. The channelized Hotelling is identical to the Hotelling except that it acts on the outputs of the channels rather than directly on the image-pixels. There are a number of different channelized Hotelling models in the literature including square channels [37], difference of Gaussians [24] difference of Mesa filters [25], and Gabor channels [26, 27]. The Gabor channel mechanism is unlike most other channel models because it is not rotationally symmetric and has different channels tuned to different orientations. The Gabor functions have been used to model the response of cells in area V1 of the striate cortex [40–41]. The Gabor channels are given by:
where x and y are the spatial coordinates, f_{c} is the center spatial frequency, θ is the orientation, W_{s} is the width and β is the phase. The Fourier transform of a Gabor channel is a Gaussian function centered at the center frequency f_{c} and with a half-height full width given by W_{f}=0.8825 W_{s}. Another way of measuring the width in the frequency domain of the Gabor channel is in octave bandwidth (bw) defined as:
where f_{c} is the central frequency and W_{f} is as previously defined. In this paper we used a 50-channel model with 5 spatial frequencies (central frequencies, 32, 16, 8, 4 and 2 cycles per degree), 5 orientations (0°, 72°, 144°, 210° and 284°), and two phases (odd, 0 and even, π/2). The spatial frequency bandwidth of the channels was approximately 0.9 octaves.
The channel weights for the best linear combination of the output of the channels (Hotelling combination rule) are given by [24, 30, 37, 38]:
where a is a vector containing the optimal linear weights for each of the Gabor channels, K _{V} is a N×N matrix describing the covariance matrix of the output of the channels to the images. For our particular implementation the covariance matrix consisted of a 50×50 matrix. Also <g _{V/s}> is a vector containing the mean signal plus background as seen by each channel and <g _{V/b}> is the mean background only as seen by each channel. The best linear template that can be obtained from the channel weights and the channels profiles is calculated as:
where V_{i}(x,y) is the two dimensional profile of the i^{th} channel as given by Eq. 6 and a_{i} is the weight for the i^{th} channel from the vector a in Eq. 8.
2.1.4. Internal noise
One difficulty in comparing model and human performance is that the models often result in a higher absolute performance level than the human. To quantitatively compare the variation of human and model performance as a function of image compression, we degrade model observer performance by injecting internal noise. In this paper, internal noise was injected into all model observers in order to match model and human performance in the uncompressed condition. The same magnitude of internal noise was then used for the model in all the compression conditions (for both JPEG and wavelet-Crewcode).
For the Hotelling and channelized Hotelling models the templates can be derived taking into consideration or ignoring the internal noise. In our implementation the template derivation was performed taking into account the external and internal noise. For the Hotelling model the total covariance is given by: K=K _{ext}+K _{int} where the internal noise is assumed to be proportional to the diagonal elements of the external noise covariance matrix, K _{int}=αDiag (K _{ext}). The Diag function zeroes all off-diagonal elements of the covariance matrix and α is a proportionality constant. For the channelized Hotelling model the total covariance and internal noise are given by the same expressions as for the Hotelling model but where the covariance matrix is with respect to the channel outputs, K _{v}. The templates are then derived from Eq. 5 (Hotelling) and 8 (channelized Hotelling) for each model respectively.
For the calculation of model performance the internal noise was injected into the scalar decision variable as follows,
where λ_{m} is the decision variable after injection of internal noise at the m^{th} location, λ_{m,e} is the decision variable due to the external noise (prior to internal noise injection, eq. 1 and 2) at the m^{th} location, and ε_{m} is a Gaussian random variable with zero mean and a variance proportional to the variance of the decision variable due to external noise (Κ${{\mathrm{\sigma}}_{\mathrm{\lambda}\mathrm{e}}}^{2}$) and is independent across locations and trials. The value of κ was iteratively adjusted as to match human performance in the uncompressed condition and kept constant for all other conditions.
2.2 Computing performance for model observers
One important property of the task in the present paper is that the M possible signal locations are within an image and not from independent samples of anatomic background. Given the low-pass characteristics of the x-ray coronary angiograms, if a model observer template is being applied to locations with correlated pixel values, it is plausible that the scalar response of the model will also be correlated. This is the case for the NPWE model for the task and types of backgrounds used in the present paper [43]. When the model scalar responses are correlated, computing performance using the standard index of detectability might lead to erroneous estimates of model performance and comparisons across models [43]. This error occurs because the index of detectability assumes that the model internal responses to the signal plus background location and the background only location are statistically independent.
One reliable method to estimate performance that does not assume statistical independence is simply to tally from the sample images the number of trials where the model correctly identifies the signal location and divide it by the number of total trials. This can be mathematically expressed as:
where λ_{s,j} is the model response to the signal plus background in trial j, λ_{b,ij} is the model response to the background only at location i in trial j, J is the total number of trials. The max function takes the maximum model response among the responses to the background only locations. The step function takes the value of one when the argument is larger than zero and takes the value of zero when the argument is less than zero. Model performance can then be converted to a detectability metric with the use of the following equation [44]:
where $\phi \left(z\right)=\frac{1}{\sqrt{2\pi}}\mathrm{exp}\left(-\frac{1}{2}{x}^{2}\right)$ , ϕ(Z) is the cumulative Gaussian, M is the number of alternatives. In order to emphasize that this estimate of the index of detectability is a transformation of Pc (rather than calculated from mean and standard deviation of the decision variable, λ) it is referred to as d_{mafc}
3. Other metrics of image quality for compression
3.1. Root Mean Square Error
Perhaps the most common metric of image quality for images that have undergone compression is the root mean square error between the original image and the image that underwent lossy image compression [9–11]:
where I(x,y) is the original image and I_{c}(x,y) is the image that underwent lossy compression. X and Y are the sizes of the image. The RMSE does not take into account at all the properties of the human visual system nor the visual task. Also, note that often times the peak signal to noise ratio (PSNR) is used as a metric. The PSNR is simply a transformation of the RMSE metric.
3.2. Perceptual image discrimination metrics
Image discrimination/perceptual differences models attempt to predict human performance visually discriminating an original image from a degraded (after undergoing image compression) version of that same image [14–16]. These models typically include a contrast sensitivity function, a set of spatial frequency channels, a compressive non-linearity in the response of each of the channels, inhibition across channels tuned to close spatial frequencies, and Minkowski error pooling across space, spatial frequency and orientation [12–16]. A comprehensive description of different types of image discrimination models can be found elsewhere [14–16]. In this paper we used a particular image discrimination model, DCTune2.0 which is available over the web: http://vision.arc.nasa.gov/dctune/. The software takes as an input the original image and the image that underwent image compression and outputs a perceptual error number. The software was used to assess the perceptual error associated with the JPEG and wavelet-Crewcode algorithms for all our test-images.
4. Methods
4.1 Test-Images: Simulated arteries and lesions
The test-images used in this study consist of simulated signals embedded in real x-ray coronary angiographic backgrounds. We have previously used an algorithm to create the simulated arterial segments and lesions. The algorithm attempts to simulate the image generation process of x-ray coronary angiograms. Components include exponential attenuation, focal spot and image receptor blur, scattering and veiling glare. Details of the algorithm to generate the computer-simulated arteries and lesion (filling defect) are discussed in detail elsewhere [6]. For our test-images, the projected simulated arteries consisted of 3-D right circular cylinders with a diameter of 12 pixels (3.6 mm), a sinusoidally modulated narrowing in diameter toward the center (minimum diameter of 8 pixels), and a length of 50 pixels (15.0 mm). Four simulated arteries were generated for each test-image and projected 32 pixels apart into 512×512 pixel images extracted from real patient digital x-ray coronary angiograms, acquired with a 7-in. image intensifier filed size (Advantx/DXC, General Electric Medical Systems), and with a resolution of 0.3 mm/pixel. The attenuation coefficient µ, was set to 0.16 mm^{-1} to produce simulated arteries with the same projected intensity as real angiograms of coronary arteries of the same diameter. The signal to be detected was a simulated filling defect with a hemi-ellipsoidal shape (meant to simulate a thrombus) located at the vertical and horizontal center of one of the four simulated arteries with a diameter of 6 pixels. We simulated imaging system blur caused by the physical extent of the x-ray focal spot and image intensifier unsharpness by convolving the projected cylinders with an isotropic Gaussian point spread function with standard deviation of 1 pixel (0.3 mm).
4.2. JPEG compression of images
Images were compressed and decompressed with the 5^{th} public release of the Independent JPEG Group’s free JPEG software. We used six levels of JPEG compression: 7:1, 10:1, 15:1, 20:1, 30:1, 45:1. The actual achieved compression ratios for the 424 image-set were (averaged compression across all images ± standard deviation): 6.86±0.14, 9.86±0.2, 14.9±0.39, 19.8±0.75, 29.8±0.93, 44.6±0.56. Images were also compressed with the wavelet-Crewcode algorithm. The actual achieved mean compression ratios were 6.6:1±0 .79 and 19.0:1±4.2, 35.7±3.95, 48.2±2.68 for the wavelet algorithm. The inability to match the desired ratios and the larger variability of compression ratios for the wavelet- Crewcode algorithm was due to an inherent discontinuity in the function relating quality factor and compression ratio for this algorithm. Example of the test images can be found in a previous publication [6].
4.3 Psychophysical studies
The observer’s task was to detect the filling defect (thrombus) at the vertical and horizontal center of one of four simulated arteries (4 alternative forced choice; 4 AFC). On each trial an image was randomly sampled from the 424 image-set database. There were six different compression conditions for the JPEG algorithm and four compression ratios for the wavelet- Crewcode condition. Three observers participated in the experiment. Two observers were non-physician observers (GR, CH) but with extensive training visually detecting simulated lesions in medical images. The two observers were also naïve to the goals of the study. The third observer (JH) is a cardiologist with experience reading x-ray coronary angiograms. Observers participated in 5 sessions of 100 trials per condition. Images were displayed on an Image Systems M17L monochrome monitor which is manufactured for medical imaging applications. The mean luminance was 16.0 cd/m^{2}. The luminance vs. gray level relationship was the default non-linear curve that would be used by the physicians in a clinic with this monitor. Observers viewed the images binocularly from a distance of 50 cm and had unlimited time to reach a decision. When a decision was reached they pressed the number 1, 2, 3 or 4 in the keyboard to indicate their choice for that trial.
4.4 Data Analysis of human performance
Accuracy for a given observer in a given experimental condition was quantified by computing the percent of trials (Pc) that the observer correctly detected the target. Pc was then transformed to an index of detectability (d_{mafc}) for a 4-alternative forced-choice given by Eq. 12.
5. Results and discussion for automated evaluation
Figure 1 shows performance for two naïve yet trained observers (GR left column; CH right column) as a function of image compression for both the JPEG (red squares) and wavelet-Crewcode (blue triangles). Figure 2 shows performance for physician observer JH. Due to the limited availability of JH, he did not participate in the two highest compression conditions. The different rows of Fig. 1 show performance for the three different model observers (NPWE, Hotelling and Gabor-Channelized Hotelling; empty squares connected by a continuous line) degraded with internal noise to approximately match human performance in the uncompressed condition. For the physician observer (JH) the three panels in Fig. 2 correspond to the different models. The results show that all three models can successfully assess the JPEG superiority over the wavelet-Crewcode algorithm and predict human performance as a function of compression ratio.
The ability of the models to predict human performance can be assessed using a Chisquare (χ^{2}) goodness of fit. Table 1 shows χ^{2} goodness of fit for the three model observers for individual observers, compression algorithms and across both observers and algorithms. Table 1 shows that the NPWE and the Channelized Hotelling model are better predictors (lower χ^{2}) for all three observers and both compression algorithms than the Hotelling model.
For comparison, Fig. 3 (left) shows the RMSE computed for all 424 test-images (standard errors are smaller than symbols). The results show that the RMSE is larger for the JPEG algorithm than the wavelet-Crewcode algorithm. Therefore, if an engineer used the RMSE to assess medical image quality, he or she would erroneously conclude that the JPEG algorithm degrades image quality more than the wavelet-Crewcode algorithm. The result emphasizes the danger of the RMSE as a metric of medical image quality. Note that the related PSNR metric (a monotonic transformation of the RMSE) would also lead to similar erroneous conclusions.
Finally a specific image discrimination or perceptual difference model was used (DCTUNE 2.0) to assess whether this metric could predict the JPEG superiority. Figure 3 (right graph) shows that the image discrimination metric does predict a JPEG advantage at low compression levels (lower perceptual error). This shows that a model that takes into account the properties of the human visual system is a better predictor of human performance than the plain RMSE metric. However, the image discrimination incorrectly predicts a wavelet-Crewcode advantage at high compression ratios. It should be noted, that a different use of image discrimination models is to assess the ability of the model to discriminate an image containing a signal from the same image without the signal [45]. This approach was not evaluated in this paper and might result in a better prediction of human diagnostic performance. However, a recent study, has found that one of such image discrimination models does not predict the degradation in human performance detecting signals across power law backgrounds (1/f^{3}) with increasing tumor size [46].
Model | GR-WAV | CH-WAV | JH-WAV | ALL-WAV |
---|---|---|---|---|
NPWE | 1.89 | 1.44 | 1.18 | 1.145 |
Channelized Hotelling (CH-HOT) | 2.30 | 3.01 | 2.57 | 2.38 |
Hotelling (HOT) | 4.38 | 6.97 | 8.07 | 8.66 |
Model | GR-BOTH | CH-BOTH | JH-BOTH | ALL-BOTH |
---|---|---|---|---|
NPWE | 2.47 | 2.81 | 1.32 | 2.53 |
Channelized Hotelling (CH-HOT) | 1.873 | 2.81 | 2.99 | 2.84 |
Hotelling (HOT) | 14.83 | 9.659 | 11.22 | 13.26 |
6. Optimization of the JPEG quantization matrix
Our results in the previous section show that model observers can be successfully used to perform automated evaluation of image compression. Here, we test whether model observers can be used to perform automated optimization of compression algorithms. For this purpose, we use the NPWE model because it has shown to be a good predictor of human detection performance for our images and because it is computationally more economic for optimization than the CH-HOT model that requires inversion of the covariance matrix. In this paper, we attempt to optimize the 64 parameters in the JPEG quantization matrix. Prior to a discussion of the optimization we briefly discuss the function of the quantization matrix within the JPEG algorithm for those readers not familiar with the JPEG compression standard.
6.1. Quantization matrix of JPEG algorithm
In the JPEG image compression standard [47] the image is divided into 8 × 8 pixel blocks. Discrete Cosine Transform (DCT) transform is applied to each individual block. The result is an 8 × 8 matrix containing the DCT coefficients corresponding to different DCT frequencies. Following, each DCT coefficient is then divided by a quantization value and rounded off. The quantization values corresponding to each coefficient are arranged in an 8 x 8 block called the quantization matrix (Q-matrix) where each entry corresponds to the quantization value to divide the DCT coefficient in the corresponding position of the 8 × 8 DCT transform of the image. The JPEG standard uses a default quantization matrix (Table 2). Values at the upper left corner correspond to low frequencies and values at the low right corner correspond to high frequencies. The new quantized coefficient c’_{i,j} for the i,j entry of each DCT block can be expressed in terms of the original coefficient c_{i,j} and the quantization value q_{i,j} where the subscripts refer to the position in the block:
The choice of the independent expert group for the default quantization values attempts to reflect the contrast sensitivity of the human visual system to different spatial frequencies. For, example high spatial frequencies (lower right corner) are quantized more because of the lower sensitivity of the human visual system to the frequencies. However, the default quantization matrix does not take into account the noise in the image or the visual task. It is therefore possible that for diagnostic performance in medical images a different quantization matrix could lead to better task performance.
6.2. Optimization method
In order to optimize the quantization matrix with respect to model performance we used a simulated annealing technique. The method used is a form of Metropolis algorithm previously used by Smith [48]. The starting point of the method is the default quantization parameters for the Q-matrix and the associated model observer performance for this matrix. The 64 parameters of the compression Q-matrix were randomly perturbed with random values sampled from a uniform distribution with zero mean and with a standard deviation equal to 30% of the current quantization value. Quantization values were bounded not to be less than 1 (values of 0 would lead the compression algorithm to an error) and not larger than 250. All 424 test-images were then compressed (at our desired compression ratio, 25:1) and decompressed using the perturbed Q-matrix. The NPWE model observer template was then applied to each image to obtain performance (Pc) across all 424 test-images (Eq. 11) and then converted to an index of detectability (Eq. 12). If model performance (d_{mafc}) improved over performance associated with the current Q-matrix then the perturbed Q-matrix was accepted and became the current Q-matrix. If model observer performance was worse than performance associated with the current Q-matrix then the perturbed Q-matrix is not necessarily rejected but is accepted with the Boltzmann probability distribution: P=exp(-Δ${{\mathrm{d}}_{\text{mafc}}}^{2}$/T) where Δ${{\mathrm{d}}_{\text{mafc}}}^{2}$ is the change in detectability index squared and T is the effective temperature of the system. Note that the larger the reduction in performance from the current to the perturbed Q-matrix the lower the Boltzmann probability of accepting the perturbed Qmatrix. Also, the lower the temperature T, the lower the probability of accepting the perturbed Q-matrix in those instances in which the perturbed Q-matrix did not improve performance over the current Q-matrix. T is then lowered with each iteration with what is known as the annealing schedule: T_{i+1}=k T_{i} where k<1. Therefore, as the iterations become large the probability that a perturbed Q-matrix is accepted (given that it did not improve performance over the current Q-matrix) becomes increasingly small. Eventually, after a large number of iterations, the perturbed Q-matrices are not accepted because they do not improve model performance and because the Boltzmann probability is too small. If the Q-matrix does not change for a set number of iterations (e.g., 50 iterations) then the procedure stops. Figure 4 shows a schematic of the simulated annealing procedure.
7. Results and discussion for automated optimization
7.1. Model observer performance for the optimized vs. default quantization matrix
Figure 5 shows performance in the detection task for the NPWE model (top left) for the default JPEG quantization matrix and for the JPEG optimized quantization matrix for a 25:1 compression ratio. The improvement in the NPWE index of detectability from the default to the optimized quantization matrix is 13 %. This improvement might seem modest. However, if the % improvement is calculated with respect to the performance difference between the default 25:1 compressed images and the uncompressed images (which resulted in a NPWE index of detectability of 2.3), then the improvement with optimized Q-matrix makes up for approximately 35 % of that performance difference. For comparison, we added a third experimental condition where we used a quantization matrix with uniform coefficients (referred to as flat). Performance with the “flat” quantization matrix resulted in a large NPWE model performance degradation. In addition, one question that is of interest is whether a model observer optimized Q-matrix for a specific compression ratio (e.g. 25:1) will generalize to an improved Q-matrix over the default Q-matrix at other compression ratios. Figure 5 (top right) shows NPWE model performance for the default Q-matrix and a 25:1 optimized Q-matrix across a range of compression ratios. For the present task, the results show that the 25:1 optimized Q-matrix results in improved NPWE performance for compressions ranging from approximately 19:1 to 38:1. For compression ratios lower than 19:1 there does not seem to be an advantage for the 25:1 optimized Q-matrix over the default Q-matrix.
7.2. Psychophysical validation of the optimized JPEG quantization matrix
We performed a 4 AFC detection task of the simulated thrombus and images used for the model NPWE based JPEG optimization. We compared human performance detecting the lesion with the set of 424 test-images compressed with the default JPEG Q-matrix and the NPWE optimized Q-matrix and a flat Q-matrix. Each test-image was iteratively compressed with the quantization matrices as to achieve a fixed 25:1 compression ratio. Observers participated in 700 trials per condition (see methods section for other details).
Figure 5 (bottom panels) shows human performance as measured by the index of detectability for a physician (DV) and a naïve observer (GR) for the three conditions. Human performance in the optimized JPEG Q-matrix improved by 8% over performance with the default Q-matrix. If the improvement is calculated as a % of the difference in human performance between the uncompressed images (index of detectability = 2.0; see Fig. 1 for GR) and the 25:1 default compressed images then the improvement accounts for approximately 18% of that performance difference. In addition, performance in the “flat” quantization matrix is much poorer than the default quantization matrix. Overall there is a good agreement between the NPWE model predictions and the human results predicting the best performance for the optimized Q-matrix, followed by the default and worst for the flat Q-matrix. However, the effects seem to be quantitatively smaller for the human observers than the model observer. One possible reason for such result is that the human observers have internal noise and the NPWE model used for the automated optimization did not. Addition of internal noise to the NPWE model in the optimization might improve the predictive power of the model. However, it would make performance of the model stochastic and would therefore require simulations with larger number of sample trials to perform a statistically reliable optimization.
8. Summary and conclusions
Overall, the non-prewhitening matched filter with an eye filter and the channelized Hotelling model (with internal noise) were better predictors of human performance than the Hotelling observer. The RMSE metric unsuccessfully predicted a wavelet-Crewcode superiority over the JPEG algorithm showing that such metric can lead to erroneous assessment of task based image quality. In addition, a particular image discrimination model used to compute the perceptual error between the original and compressed images correctly predicted a JPEG superiority at low compression ratios but incorrectly predicted a wavelet-Crewcode superiority at high compression ratios. Finally, JPEG quantization matrix parameters optimized for a particular model observer performance (NPWE) led to improved human performance. To conclude, our work shows how model observers can successfully be used for automated evaluation and optimization of task based image quality using real anatomic backgrounds.
9. Limitations of present work
A limitation of the present work is that model and human performance is evaluated for a task in which the signal is known to the observer and has a fixed size and shape (signal known exactly). However, in clinical practice, the lesions vary in size and shape from patient image to patient image. In addition, the physician does not know a priori the size and shape of the lesion (signal known statistically). It is unknown whether model observer evaluation and/or optimization based on a fixed signal size and shape will generalize to the more clinically realistic signal known statistically tasks. Current work is investigating the relationship between these two tasks and developing model observers for the more complex tasks [49,50].
Acknowledgements
The authors thank Cedric Heath, George Ruan, Darko Vodopich and Joerg Hausleiter for participation as observers in the study. This was work was supported by NIH RO1-HLB 53455. Parts of this research were presented at the Annual Meeting of the SPIE Medical Imaging (1999, 2000) and the Optical Society of America Annual Meeting, 1999.
Footnotes
^{1} | A NPWE model that used a template matched to the mean compressed signal plus background minus the mean compressed background only was also investigated but resulted in poor absolute performance and predictability of human performance |
References and links
1. V. H. Rigolin, P. A. Robiolio, L.A. Spero, B.P. Harrawood, K.G. Morris, D.F. Fortin, W.A. Baker, T.M. Bashore, and J.T. Cusma, “Compression of Digital Coronary Angiograms Does Not Affect Visual or Quantitative Assessment of Coronary Artery Stenosis Severity,” Am. J. of Card. 78, 131–135 (1996). [CrossRef]
2. W.A. Baker et al., “Lossy (15:1) JPEG compression of digital coronary angiograms does not limit detection of subtle morphological features,” Circ. 96, 1157–1164 (1997). [CrossRef]
3. J.S. Whiting, M.P. Eckstein, S. Einav, and N.L. Eigler, “Perceptual Evaluation of JPEG compression for medical image sequences,” in OSA Annual Meeting Tech. Dig. 23, 161 (1992).
4. S. Silber, R. Dorr, G. Zindler, H. Muhling, and T. Diebel, “Impact of various compression rates on interpretation of digital coronary angiograms,” Int. J. Cardiology 60, 195–200 (1997). [CrossRef]
5. R.A. Kerensky, J.T. Cusma, P. Kubilis, R. Simon, T.M. Bashore, J.W. Hirshfeld, D.R. Holmes Jr, C.J. Pepine, and S.E. Nissen, “American College of Cardiology/European Society of Cardiology International Study of Angiographic Data Compression Phase I: The effect of lossy data compression on recognition of diagnostic features in digital coronary angiography,” J. Am. College Cardiology 35, 1370–1379 (2000). [CrossRef]
6. C.A. Morioka, M.P. Eckstein, J.L. Bartroff, J. Hausleiter, and J.S. Whiting, “Observer performance for JPEG vs. wavelet image compression of x-ray coronary angiograms,” Opt. Express 5, 8–19 (1999), http://www.opticsexpress.org/abstract.cfm?URI=OPEX-5-1-8 [CrossRef] [PubMed]
7. A. Zandi, J. Allen, E.L. Schwartz, and M. Boliek, “Crewcode Lossless/Lossy Medical Image Compression,” IEEE Data Compression Conference, 212–221 (1995).
8. H.H. Barrett, J. Yao, J.P. Rolland, and K.J. Myers. “Model observers for assessment of image quality,” Proc. Natl. Acad. Sci. USA 90:9758–9765 (1993). [CrossRef] [PubMed]
9. S.C. Lo, E.L. Shen, and K.M. Seong, “An image splitting and remapping method for radiological image compression,” Medical Imaging IV: Image Capture and Display, Proc. SPIE1232, 312–321 (1990). [CrossRef]
10. K.K. Chan, C.C. Lau, S.L. Lou, A. Hayrepatian, B.K.T. Ho, and H.K. Huang, “Three-dimensional Transform Compression of Image from Dynamic Studies,” Medical Imaging IV: Image Capture and Display, Proc SPIE1232, 322–326 (1990). [CrossRef]
11. M. Goldberg, S. Panchanathan, and L.A. Wang. “Comparison of Lossy Techniques for Digitized Radiographic Images,” Medical Imaging IV: Image Capture, Formatting and Display, Proc. SPIE1987, 269–281 (1993).
12. J. Lubin, “The use of psychophysical data and models in the analysis of display system performance,” in Digital images and human vision, Ed. A.B. Watson, (MIT Press, 1993) 163–178.
13. S. Daly, “The visible differences predictor: an algorithm for the assessment of image fidelity,” in Digital images and Human Vision, A.B. Watson, ed. (MIT Press, Cambridge, Mass., 1993) 162–178.
14. A.B. Watson, A.P. Gale, J.A. Solomon, and A.J. Ahumada. “Visibility of DCT quantization noise: Effects of display resolution, Proceedings, Society for Information Display,” San Jose, CA, Society for Information Display, 697–700 (1995).
15. H.A. Peterson, A.J. Ahumada, and A.B. Watson. “The visibility of DCT quantization noise, Soc. For Information Display,” Digest of Tech. Papers 24, 942–945 (1993).
16. A.B. Watson, “DCTune: A Technique for visual optimization of DCT quantization matrices for individual images, Soc. For Information Display,” Digest of Tech. Papers XXIV, 946–949 (1993).
17. Burgess AE, Wagner RB, Jennings RJ, and Barlow HB. “Efficiency of human visual signal discrimination,” Science 214: 93–94 (1981). [CrossRef]
18. A.E. Burgess and H. Ghandeharian, “Visual signal detection. II. Signal location identification,” J. Opt. Soc. Am. A 1, 900–905 (1984). [CrossRef] [PubMed]
19. A.E. Burgess and B. Colborne. “Visual Signal Detection IV: Observer inconsistency,” J. Opt. Soc. Am. A 5, 617–627 (1988). [CrossRef] [PubMed]
20. P.F. Judy and R.G. Swensson. “Detection of small focal lesions in CT images: effects of reconstruction filters and visual display windows,” British Journal of Radiology 58, 137–145 (1985). [CrossRef] [PubMed]
21. R.G. Swensson and P.F. Judy. “Detection of noisy visual targets: model for the effects of spatial uncertainty and signal to noise ratio,” Percept. Psychophys. 29: 521–534 (1981). [CrossRef] [PubMed]
22. K.J. Myers, H.H. Barrett, M.C. Borgstrom, D.D. Patton, and G.W. Seeley. “Effect of noise correlation on detectability of disk signals in medical imaging,” J. Opt. Soc. Am. A 2, 1752–1759 (1985). [CrossRef] [PubMed]
23. J.P. Rolland and H.H. Barrett. “Effect of random inhomogeneity on observer detection performance,” J. Opt. Soc. Am. A 9, 649–658 (1992). [CrossRef] [PubMed]
24. C.K. Abbey and H.H. Barrett. “Human and model-observer performance in ramp-spectrum noise: effects of regularization and object variability,” J. Opt. Soc. of Am. A 18, 473–488 (2001). [CrossRef]
25. A.E. Burgess, X. Li, and C.K. Abbey. “Visual signal detectability with two noise components: anomalous masking effects,” J. Opt. Soc. Am. A 14, 2420–2442 (1997). [CrossRef]
26. M.P. Eckstein and J.S. Whiting. “Lesion detection in structured noise,” Academic Radiology 2, 249–253 (1995). [CrossRef] [PubMed]
27. M.P. Eckstein, C.A. Abbey, and J.S. Whiting. “Human vs model observers in anatomic backgrounds,” Proceedings SPIE Image Perception 3340, 15–26 (1998).
28. A.E. Burgess, F.L. Jacobson, and P.F. Judy. “Human observer detection experiments with mammograms and power-law noise,” Med. Phys. 28, 419–437 (2001). [CrossRef] [PubMed]
29. B. Zhao, L.H. Schwarz, and P.K. Kijewski. “Effect of lossy compression on lesion detection: Predictions of the nonprewhitening matched filter,” Med. Phys. 25, 1621–1624 (1998). [CrossRef] [PubMed]
30. M.P. Eckstein, C.K. Abbey, and F.O Bochud. “Practical guide to model observers in synthetic and real noisy backgrounds,” in Handbook of Medical Imaging Vol. I: Physics and Psychophysics, Editors, J. Beutel and Van Metter Kundel, SPIE Press, 593–628 (2000).
31. C.K. Abbey, H.H. Barrett, and M.P. Eckstein. “Practical issues and methodology in assessment of image quality using model observers,” in Medical Imaging, H. Roerhig, ed., Proc. SPIE, The physics of medical imaging, 3032: 182–194 (1997). [CrossRef]
32. R.F. Wagner and K.E. Weaver. “An assortment of image quality indices for radiographic film-screen combinations- can they be resolved?” In Application of Optical Instrumentation in Medicine I, P.L. Carson, WH Hendee, and WC Zarnstorff, eds, Proc. SPIE35, 83–94 (1972). [CrossRef]
33. M. Ishida, K. Doi, L.N. Loo, C.E. Metz, and J.L. Lehr. “Digital image processing: effect of detectability of simulated low-contrast radiographic patterns,” Radiology 150, 569–575 (1984). [PubMed]
34. A.E. Burgess. “Statistically defined backgrounds: Performance of a modified nonprewhitening matched filter model,” J. Opt. Soc. Am. A 11,1237–42 (1994). [CrossRef]
35. M.A. Webster and E. Miyahara. “Contrast adaptation and the spatial structure of natural images,” J. Opt. Soc. Am. A 9, 2355–2366 (1997). [CrossRef]
36. H.H. Barrett, C.K. Abbey, B. Gallas, and M.P. Eckstein. “Stabilized estimates of Hotelling-observer detection performance in patient structured noise,” Proc. SPIE3340 (1998). [CrossRef]
37. K. Myers and H.H. Barrett. “Addition of a channel mechanism to the ideal observer model,” J Opt. Soc. Am. A 4, 2447–2457 (1987). [CrossRef] [PubMed]
38. J. Yao and H.H. Barrett. “Predicting human performance by a channelized Hotelling observer model,” Math. Methods Med. Imaging, SPIE 1768:161–168 (1992).
39. C.K. Abbey, H.H. Barrett, and D.W. Wilson. “Observer signal to noise ratios for the ML-EM algorithm,” Proc. SPIE 2712:47–58 (1996). [CrossRef]
40. S. Marcelja. “Mathematical description of the responses of simple cortical cells,” J. Opt. Soc. Am. A 70, 1297–1300 (1980). [CrossRef]
41. A.B. Watson. Detection and recognition of simple spatial forms, in Physical and Biological Processing of Images, O.J. Bradick and A.C. Sleigh, Eds. (New York, Springer-Verlag, 1983).
42. F.O. Bochud, C.K. Abbey, and M.P. Eckstein. “Correlated human responses for visual detection in natural images; Annual Meeting of the Association for Research,” in Vision and Ophthalmology; Fort Lauderdale, USA; 40, 4; 350 (1999).
43. M.P. Eckstein, C.K. Abbey, and F.O. Bochud. “Visual signal detection in structured backgrounds IV. Figures of merit for model observers with internal response,” J. Opt. Soc. Am. 17, 2 206–217 (2000). [CrossRef]
44. D.M. Green and J.A. Swets. Signal Detection Theory and Psychophysics, (Wiley, NewYork, 1966).
45. A.J. Ahumada Jr., A.B. Watson, and A.M. Rohally. “Models of human image discrimination predict object detection in natural backgrounds,” in Human Vision, Visual Proc., and Digital Display VI, ed. B. Rogowitz and J. Allebach, SPIE2411, 355–362 (1995). [CrossRef]
46. J.P. Johnson, J. Lubin, J. Nafziger, and D. Chakraborty. “Visual Discrimination Modeling of lesion discriminability,” Medical Imaging, Image Percep.and Performance, Ed. E.A. Krupinski and D.P. Chakraborty, Proc. SPIE, 4686, 248–255 (2002). [CrossRef]
47. W.B. Pennebaker and J.L. Mitchell, The JPEG still image data compression standard, (Van Nostrand Reinhold, New York, 1993).
48. W.E. Smith. “Simulated annealing and estimation theory in coded aperture imaging,” PhD Dissertation, University of Arizona (2002).
49. M.P. Eckstein and C.K Abbey. “Model observers for signal known statistically tasks,” Proc. SPIE, Medical Imaging, Image Percep. and Performance, Ed. E.A. Krupinski and D.P. Chakraborty4321, 91–102 (2001). [CrossRef]
50. M.P. Eckstein, C.K. Abbey, and B. Pham. “The effect of image compression on signal known statistically tasks,” Proc. SPIE, Medical Imaging, Image Percep . and Performance, Ed. E.A. Krupinski and D.P. Chakraborty4686,13–24 (2002). [CrossRef]