Single‐number summary and decision analytic measures can happily coexist

M. Pencina,E. Steyerberg,R. D'Agostino

Published 2019 in Statistics in Medicine

ABSTRACT

In his commentary, Dr Vickers repeats some of his previous criticisms of the NRI and IDI measures and challenges us to find examples where NRI at event rate (NRI(p)) offers information over and above net benefit.1 However, this challenge makes little sense. NRI(p) is a single-number statistical summary measure. Net benefit should be presented across a range of thresholds; thus one should consider a net benefit curve, which is an important and useful contribution of Drs Vickers and Elkin.2 NRI(p) is a difference between two points on the standardized net benefit curves evaluated at event rate. A point on a curve cannot contain the same or more information than the curve itself. Instead, one might contrast the NRI(p) with the change in the area under the ROC curve (AUC), or more directly, the maximum standardized net benefit (the parent measure of NRI(p)) with the AUC. We have shown that the maximum standardized net benefit is a global measure that does not depend on the event rate itself.3 Moreover, it is proper and cannot be “fooled” by miscalibration. In these regards, it shares the properties of the AUC. Its connection with several other statistical summary measures affords a richer interpretation. Indeed, the measure can be interpreted as the Kolmogorov-Smirnov distance between the risk distributions among events and nonevents as well as the maximum relative utility or standardized net benefit. We argue that presenting standardized net benefit curves with maximum standardized net benefit as a companion single-number summary offers a more elegant and interpretable pairing than standardized net benefit and the AUC. Moreover, Baker4 shows that the inverse of the NRI at event rate times the event rate can be interpreted as the summary test trade-off, ie, an approximate lower bound over all thresholds for the minimum number of tests for a new marker that needs to be traded for a true positive to yield a positive net benefit.5 We agree with Dr Vickers that a thoughtful discussion about a range of classification thresholds can be useful. However, identifying any threshold, or even a threshold range, can be arbitrary, is likely to vary from person to person, and is subject to change and debate. The thresholds used in primary prevention of cardiovascular disease are a good example. When introducing the NRI in 2008, we used 6% and 20%, consistent with the practice at the time.6 However, the 2013 American Heart Association/American College of Cardiology guidelines lowered the thresholds to 5% and 7.5%, and at the same time, expanded the definition of the outcome used in the risk prediction model.7 The subsequent US Preventive Services Task Force guideline raised the threshold to 10%,8 but future guidelines may lower the threshold again. The biomarker discovery process needs more grounding. Furthermore, some researchers argue that model and biomarker evaluation needs a continuous framework. That is why we need global measures of model performance. AUC, maximum standardized net benefit, and R-squared-type measures are the examples here. We disagree with Dr Vickers that the requirement of good calibration is “highly problematic.”1 We note that poor calibration can either mask the usefulness of a promising new biomarker or exaggerate its true impact. As suggested by the TRIPOD statement,9 visual displays of calibration offer the best illustration of where we are. Of note, neither NRI(p) or IDI defined as a difference in rescaled Brier scores require “good” calibration. NRI and IDI measures were proposed to focus the discussion on reclassification away from overall movement in predicted risks to changes that are helpful and those which are not. These measures have been extensively used and misused in the 10 years from their original presentation.10 A lot of valuable research has been conducted to inform their advantages and weak spots. We view this advancement as the necessary desirable scientific process. So where are we after these 10 years? 1. Given the shortcomings of the continuous NRI and the variability and difficulty in selecting thresholds, the NRI(p) is the preferred member of the NRI family. 2. NRI(p) offers an attractive alternative to the change in AUC and is an appropriate single-number companion to the standardized net benefit (relative utility) curves for models with and without the new biomarker. 3. In situations where established thresholds exist, change in net benefit, standardized benefit, relative utility, or weighted NRI are the preferred single-number measures with the event and non-event components of the two-category NRI as

PUBLICATION RECORD

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-15 of 15 references · Page 1 of 1

CITED BY