Abstract

Multireader multicase (MRMC) variance analysis has become widely utilized to analyze observer studies for which the summary measure is the area under the receiver operating characteristic (ROC) curve. We extend MRMC variance analysis to binary data and also to generic study designs in which every reader may not interpret every case. A subset of the fundamental moments central to MRMC variance analysis of the area under the ROC curve (AUC) is found to be required. Through multiple simulation configurations, we compare our unbiased variance estimates to naïve estimates across a range of study designs, average percent correct, and numbers of readers and cases.

© 2007 Optical Society of America

Full Article  |  PDF Article

References

  • View by:
  • |
  • |

  1. D. D. Dorfman, K. S. Berbaum, and C. E. Metz, "Receiver operating characteristic rating analysis: generalization to the population of readers and patients with the jackknife method," Invest. Radiol. 27, 723-731 (1992).
    [CrossRef] [PubMed]
  2. S. V. Beiden, R. F. Wagner, and G. Campbell, "Components-of-variance models and multiple-bootstrap experiments: an alternative method for random-effects, receiver operating characteristic analysis," Acad. Radiol. 7, 341-349 (2000).
    [CrossRef] [PubMed]
  3. N. A. Obuchowski, S. V. Beiden, K. S. Berbaum, S. L. Hillis, H. Ishwaran, H. H. Song, and R. F. Wagner, "Multireader, multicase receiver operating characteristic analysis: an empirical comparison of five methods," Acad. Radiol. 11, 980-995 (2004).
    [CrossRef] [PubMed]
  4. B. D. Gallas, "One-shot estimate of MRMC variance: AUC," Acad. Radiol. 13, 353-362 (2006).
    [CrossRef] [PubMed]
  5. B. D. Gallas and D. G. Brown, "Reader studies for validation of CAD systems," submitted to Neural Networks.
  6. C. A. Roe and C. E. Metz, "Dorfman-Berbaum-Metz method for statistical analysis of multireader, multimodality receiver operating characteristic (ROC) data: validation with computer simulation," Acad. Radiol. 4, 298-303 (1997).
    [CrossRef] [PubMed]
  7. M. Schiffman and M. E. Adrianza, "ASCUS-LSIL triage study: design, methods and characteristics of trial participants," Acta Cytol. 44, 726-742 (2000).
    [CrossRef] [PubMed]
  8. J. Jeronimo, L. S. Massad, and M. Schiffman, "Visual appearance of the uterine cervix: correlation with human papillomavirus detection and type," Am. J. Obstet. Gynecol. 97, 47.e1-47.e8 (2007).
    [CrossRef]
  9. S. L. Hillis and K. S. Berbaum, "Monte Carlo validation of the Dorfman-Berbaum-Metz method using normalized pseudovalues and less data-based model simplification," Acad. Radiol. 12, 1534-1541 (2005).
    [CrossRef] [PubMed]
  10. S. L. Hillis, N. A. Obuchowski, K. M. Schartz, and K. S. Berbaum, "A comparison of the Dorfman-Berbaum-Metz and Obuchowski-Rockette methods for receiver operating characteristic (ROC) data," Stat. Med. 24, 1579-1607 (2005).
    [CrossRef] [PubMed]
  11. X. Song and X.-H. Zhou, "A marginal model approach for analysis of multi-reader multi-test receiver operating characteristic (ROC) data," Biostatistics 6, 303-312 (2005).
    [CrossRef] [PubMed]
  12. W. A. Yousef, R. F. Wagner, and M. H. Loew, "Assessing classifiers from two independent data sets using ROC analysis: a nonparametric approach," IEEE Trans. Pattern Anal. Mach. Intell. 28, 1809-1817 (2006).
    [CrossRef] [PubMed]
  13. M. S. Pepe, The Statistical Evaluation of Medical Tests for Classification and Prediction (Oxford U. Press, 2003).
  14. C. A. Roe and C. E. Metz, "Variance-component modeling in the analysis of receiver operating characteristic (ROC) index estimates," Acad. Radiol. 4, 587-600 (1997).
    [CrossRef] [PubMed]
  15. H. H. Barrett, M. A. Kupinski, and E. Clarkson, "Probabilistic Foundations of the MRMC Method," Proc. SPIE 5749, 21-31 (2005).
    [CrossRef]
  16. E. Clarkson, M. A. Kupinski, and H. H. Barrett, "A probabilistic model for the MRMC method. Part 1. theoretical development," Acad. Radiol. 13, 1410-1421 (2006).
    [CrossRef] [PubMed]

2007 (1)

J. Jeronimo, L. S. Massad, and M. Schiffman, "Visual appearance of the uterine cervix: correlation with human papillomavirus detection and type," Am. J. Obstet. Gynecol. 97, 47.e1-47.e8 (2007).
[CrossRef]

2006 (3)

B. D. Gallas, "One-shot estimate of MRMC variance: AUC," Acad. Radiol. 13, 353-362 (2006).
[CrossRef] [PubMed]

W. A. Yousef, R. F. Wagner, and M. H. Loew, "Assessing classifiers from two independent data sets using ROC analysis: a nonparametric approach," IEEE Trans. Pattern Anal. Mach. Intell. 28, 1809-1817 (2006).
[CrossRef] [PubMed]

E. Clarkson, M. A. Kupinski, and H. H. Barrett, "A probabilistic model for the MRMC method. Part 1. theoretical development," Acad. Radiol. 13, 1410-1421 (2006).
[CrossRef] [PubMed]

2005 (4)

H. H. Barrett, M. A. Kupinski, and E. Clarkson, "Probabilistic Foundations of the MRMC Method," Proc. SPIE 5749, 21-31 (2005).
[CrossRef]

S. L. Hillis and K. S. Berbaum, "Monte Carlo validation of the Dorfman-Berbaum-Metz method using normalized pseudovalues and less data-based model simplification," Acad. Radiol. 12, 1534-1541 (2005).
[CrossRef] [PubMed]

S. L. Hillis, N. A. Obuchowski, K. M. Schartz, and K. S. Berbaum, "A comparison of the Dorfman-Berbaum-Metz and Obuchowski-Rockette methods for receiver operating characteristic (ROC) data," Stat. Med. 24, 1579-1607 (2005).
[CrossRef] [PubMed]

X. Song and X.-H. Zhou, "A marginal model approach for analysis of multi-reader multi-test receiver operating characteristic (ROC) data," Biostatistics 6, 303-312 (2005).
[CrossRef] [PubMed]

2004 (1)

N. A. Obuchowski, S. V. Beiden, K. S. Berbaum, S. L. Hillis, H. Ishwaran, H. H. Song, and R. F. Wagner, "Multireader, multicase receiver operating characteristic analysis: an empirical comparison of five methods," Acad. Radiol. 11, 980-995 (2004).
[CrossRef] [PubMed]

2000 (2)

M. Schiffman and M. E. Adrianza, "ASCUS-LSIL triage study: design, methods and characteristics of trial participants," Acta Cytol. 44, 726-742 (2000).
[CrossRef] [PubMed]

S. V. Beiden, R. F. Wagner, and G. Campbell, "Components-of-variance models and multiple-bootstrap experiments: an alternative method for random-effects, receiver operating characteristic analysis," Acad. Radiol. 7, 341-349 (2000).
[CrossRef] [PubMed]

1997 (2)

C. A. Roe and C. E. Metz, "Dorfman-Berbaum-Metz method for statistical analysis of multireader, multimodality receiver operating characteristic (ROC) data: validation with computer simulation," Acad. Radiol. 4, 298-303 (1997).
[CrossRef] [PubMed]

C. A. Roe and C. E. Metz, "Variance-component modeling in the analysis of receiver operating characteristic (ROC) index estimates," Acad. Radiol. 4, 587-600 (1997).
[CrossRef] [PubMed]

1992 (1)

D. D. Dorfman, K. S. Berbaum, and C. E. Metz, "Receiver operating characteristic rating analysis: generalization to the population of readers and patients with the jackknife method," Invest. Radiol. 27, 723-731 (1992).
[CrossRef] [PubMed]

Acad. Radiol. (7)

S. V. Beiden, R. F. Wagner, and G. Campbell, "Components-of-variance models and multiple-bootstrap experiments: an alternative method for random-effects, receiver operating characteristic analysis," Acad. Radiol. 7, 341-349 (2000).
[CrossRef] [PubMed]

N. A. Obuchowski, S. V. Beiden, K. S. Berbaum, S. L. Hillis, H. Ishwaran, H. H. Song, and R. F. Wagner, "Multireader, multicase receiver operating characteristic analysis: an empirical comparison of five methods," Acad. Radiol. 11, 980-995 (2004).
[CrossRef] [PubMed]

B. D. Gallas, "One-shot estimate of MRMC variance: AUC," Acad. Radiol. 13, 353-362 (2006).
[CrossRef] [PubMed]

C. A. Roe and C. E. Metz, "Dorfman-Berbaum-Metz method for statistical analysis of multireader, multimodality receiver operating characteristic (ROC) data: validation with computer simulation," Acad. Radiol. 4, 298-303 (1997).
[CrossRef] [PubMed]

S. L. Hillis and K. S. Berbaum, "Monte Carlo validation of the Dorfman-Berbaum-Metz method using normalized pseudovalues and less data-based model simplification," Acad. Radiol. 12, 1534-1541 (2005).
[CrossRef] [PubMed]

C. A. Roe and C. E. Metz, "Variance-component modeling in the analysis of receiver operating characteristic (ROC) index estimates," Acad. Radiol. 4, 587-600 (1997).
[CrossRef] [PubMed]

E. Clarkson, M. A. Kupinski, and H. H. Barrett, "A probabilistic model for the MRMC method. Part 1. theoretical development," Acad. Radiol. 13, 1410-1421 (2006).
[CrossRef] [PubMed]

Acta Cytol. (1)

M. Schiffman and M. E. Adrianza, "ASCUS-LSIL triage study: design, methods and characteristics of trial participants," Acta Cytol. 44, 726-742 (2000).
[CrossRef] [PubMed]

Am. J. Obstet. Gynecol. (1)

J. Jeronimo, L. S. Massad, and M. Schiffman, "Visual appearance of the uterine cervix: correlation with human papillomavirus detection and type," Am. J. Obstet. Gynecol. 97, 47.e1-47.e8 (2007).
[CrossRef]

Biostatistics (1)

X. Song and X.-H. Zhou, "A marginal model approach for analysis of multi-reader multi-test receiver operating characteristic (ROC) data," Biostatistics 6, 303-312 (2005).
[CrossRef] [PubMed]

IEEE Trans. Pattern Anal. Mach. Intell. (1)

W. A. Yousef, R. F. Wagner, and M. H. Loew, "Assessing classifiers from two independent data sets using ROC analysis: a nonparametric approach," IEEE Trans. Pattern Anal. Mach. Intell. 28, 1809-1817 (2006).
[CrossRef] [PubMed]

Invest. Radiol. (1)

D. D. Dorfman, K. S. Berbaum, and C. E. Metz, "Receiver operating characteristic rating analysis: generalization to the population of readers and patients with the jackknife method," Invest. Radiol. 27, 723-731 (1992).
[CrossRef] [PubMed]

Proc. SPIE (1)

H. H. Barrett, M. A. Kupinski, and E. Clarkson, "Probabilistic Foundations of the MRMC Method," Proc. SPIE 5749, 21-31 (2005).
[CrossRef]

Stat. Med. (1)

S. L. Hillis, N. A. Obuchowski, K. M. Schartz, and K. S. Berbaum, "A comparison of the Dorfman-Berbaum-Metz and Obuchowski-Rockette methods for receiver operating characteristic (ROC) data," Stat. Med. 24, 1579-1607 (2005).
[CrossRef] [PubMed]

Other (2)

B. D. Gallas and D. G. Brown, "Reader studies for validation of CAD systems," submitted to Neural Networks.

M. S. Pepe, The Statistical Evaluation of Medical Tests for Classification and Prediction (Oxford U. Press, 2003).

Cited By

OSA participates in CrossRef's Cited-By Linking service. Citing articles from OSA journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (5)

Fig. 1
Fig. 1

Graphic on the left shows the (transpose) layout of data from a binary-outcome experiment with multiple readers. Compared to a fully crossed data set, which would fill the entire matrix, much data are missing. The table on the right shows a simple example. The PC in the last row weighs each reading equally. One estimate of the standard error of that average considers all the readings to be iid. The result is 3.7. One could instead obtain an average PC by averaging the reader-specific PCs, resulting in 60. Continuing, one might estimate the standard error of this average, yielding 20.6. While both of the averages are valid, both variance estimates are wrong.

Fig. 2
Fig. 2

Population variances calculated using the integral expressions for the moments compared to those estimated from MC. A separate point is given for each of the 324 simulation configurations, 6 types of study designs, and both ways to weigh individual reader PCs.

Fig. 3
Fig. 3

Population variances for all the high-PC (0.96) simulation configurations compared to the expected values (from MC averaging) of the naïve variance estimates. The expected values of our moment estimators are unbiased and thus equal the population variances. The naïve estimates bracket the true MRMC variances: V ̂ naive ̱ γ is biased high (dotted curve) and V ̂ naive ̱ g is biased low (dashed curve). In all but panel D, V g lies on top of V γ (notice some dots peaking out from behind the solid curve).

Fig. 4
Fig. 4

RRMSE for the fully crossed study design: A, high PC (0.96); B, low PC (0.70).

Fig. 5
Fig. 5

RRMSE for the broad uniform study design: A, high PC (0.96); B, low PC (0.70).

Tables (3)

Tables Icon

Table 1 List of the Coefficients Needed to Appropriately Weight the Success Moments to Determine the Variance of P ̂ for a Fixed Study Design a

Tables Icon

Table 2 Parameters Investigated in the Simulations According to a Factorial Design, Yielding a Total of 3 × 3 × 3 × 3 × 2 × 2 = 324 Simulation Configurations

Tables Icon

Table 3 One Example of the Six Distributions of Cases for Five Readers Reading 102 Cases on Average

Equations (52)

Equations on this page are rendered with MathJax. Learn more.

p ̂ r = 1 N g r i = 1 N g d i r s i r .
P ̂ = r = 1 N γ w r p ̂ r .
P ̂ D = s ( g , γ ) D = s ( g , γ ) .
V D = var ( P ̂ D ) = c 1 s ( g , γ ) 2 + c 4 s ( g , γ ) γ 2 + c 5 s ( g , γ ) g 2 + c 8 s ( g , γ ) 2 ,
V D = r = 1 N γ w r 2 var ( p ̂ r D ) + r = 1 N γ r r N γ w r w r cov ( p ̂ r , p ̂ r D ) .
var ( p ̂ r N g r ) = p ̂ r 2 N g r p ̂ r 2
= ( 1 N g r M 1 + N g r 1 N g r M 4 ) M 8 .
cov ( p ̂ r , p ̂ r ) = 1 N g M 5 1 N g M 8 ,
V ̂ D = c ̱ t M ̱ ̂ ,
M ̂ 1 = r = 1 N γ w r i = 1 N g d i r 2 s i r 2 i * = 1 N g d i * r 2 ,
M ̂ 4 = r = 1 N γ w r i = 1 N g d i r s i r i * = 1 N g d i * r i i N g d i r s i r i * i N g d i * r ,
M ̂ 5 = r = 1 N γ w r r r N γ w r 1 w r i = 1 N g d i r d i r s i r s i r i * = 1 N g d i * r d i * r ,
M ̂ 8 = r = 1 N γ w r i = 1 N g d i r s i r i * = 1 N g d i * r r r N γ w r 1 w r i i N g d i r s i r i * i N g d i * r .
V ̂ naive ̱ g = 1 N total ( N total 1 ) r = 1 N γ i = 1 N g d i r ( s i r P ̂ γ ) 2 ,
V ̂ naive ̱ γ = 1 N γ ( N γ 1 ) r = 1 N γ ( p ̂ r P ̂ γ ) 2 .
s i r = s ( t 1 i r t 0 i r ) = { 1 if t 1 i r t 0 i r > 0 0 if t 1 i r t 0 i r < 0 } .
t 0 i r = 0 + [ R ] 0 r + [ C ] 0 i + [ R C ] 0 i r ,
t 1 i r = μ t + [ R ] 1 r + [ C ] 1 i + [ R C ] 1 i r .
σ R 2 + σ C 2 + σ R C 2 = 1 .
p r = p ̂ r r = s ( t 1 i r t 0 i r ) r ,
p r = Φ ( μ t + [ R ] 1 r [ R ] 0 r 2 σ C 2 + 2 σ R C 2 ) ,
Pr ( p r τ ) = τ d x exp ( x 2 4 σ R 2 ) 4 π σ R 2 Φ ( μ t + x 2 σ C 2 + 2 σ R C 2 ) .
M 0 = p r = s ( t 1 i r t 0 i r ) = Φ ( μ t 2 ) ,
M 4 = d x exp ( x 2 4 σ R 2 ) 4 π σ R 2 [ Φ ( μ t + x 2 σ C 2 + 2 σ R C 2 ) ] 2 ,
M 5 = d x exp ( x 2 4 σ C 2 ) 4 π σ C 2 [ Φ ( μ t + x 2 σ R 2 + 2 σ R C 2 ) ] 2 .
RRMSE = 1 V ( V ̂ V ̂ 2 + var ( V ̂ ) ) 1 2 ,
P ̂ 2 D = ( r = 1 N γ w r N g r i = 1 N g d i r s i r ) 2 .
P ̂ 2 D = r = 1 N γ w r 2 N g r 2 i = 1 N g d i r 2 s i r 2 + r = 1 N γ w r 2 N g r 2 i = 1 N g i i N g d i r d i r s i r s i r + r = 1 N γ r r N γ w r w r N g r N g r i = 1 N g d i r d i r s i r s i r + r = 1 N γ r r N γ w r w r N g r N g r i = 1 N g i i N g d i r d i r s i r s i r .
M 1 = s i r 2 = s ( g , γ ) 2 ,
M 4 = s i r s i r = s ( g , γ ) γ 2 ,
M 5 = s i r s i r = s ( g , γ ) g 2 ,
M 8 = s i r s i r = s ( g , γ ) 2 .
c 1 = r = 1 N γ w r 2 N g r ,
c 4 = r = 1 N γ w r 2 r = 1 N γ w r 2 N g r ,
c 5 = r = 1 N γ r r N γ w r w r N g r N g r i = 1 N g d i r d i r ,
c 8 = r = 1 N γ r r N γ w r w r r = 1 N γ r r N γ w r w r N g r N g r i = 1 N g d i r d i r .
p ̂ r p ̂ r = i = 1 N g d i r s i r N g r d i r s i r N g r + i = 1 N g i i N g d i r s i r N g r d i r s i r N g r .
p ̂ r 2 = 1 N g r M 1 + ( N g r 1 ) N g r M 4 ,
p ̂ r p ̂ r FC = 1 N g M 5 + ( N g 1 ) N g M 8 ,
p ̂ r p ̂ r Dr Pt = M 8 .
p γ = s ( g , γ ) γ .
μ γ = p γ = s ( g , γ ) γ = M 0 ,
σ γ 2 = s ( g , γ ) γ 2 s ( g , γ ) 2 = M 4 M 8 .
var ( p ̂ γ N g γ ) = var ( p ̂ γ γ , N g γ ) + var ( p ̂ γ γ , N g γ )
= var ( p γ ) + p γ ( 1 p γ ) N g γ
= σ γ 2 + μ γ ( 1 μ γ ) σ γ 2 N g r .
p g = s ( g , γ ) g .
μ g = p g = s ( g , γ ) g = M 0 ,
σ g 2 = s ( g , γ ) g 2 s ( g , γ ) 2 = M 5 M 8 .
p ̂ G γ = β ¯ + β γ + β G + β G γ ,
var ( P ̂ FC ) = M 4 M 8 N γ + M 5 M 8 N g + M 1 M 4 M 5 + M 8 N γ N g .
var ( P ̂ flat Dr Pt ) = M 4 M 8 N γ + M 1 M 4 N γ N g r .

Metrics