Single‐number summary and decision analytic measures can happily coexist

Published 2019 in Statistics in Medicine

ABSTRACT

In his commentary, Dr Vickers repeats some of his previous criticisms of the NRI and IDI measures and challenges us to find examples where NRI at event rate (NRI(p)) offers information over and above net benefit.1 However, this challenge makes little sense. NRI(p) is a single-number statistical summary measure. Net benefit should be presented across a range of thresholds; thus one should consider a net benefit curve, which is an important and useful contribution of Drs Vickers and Elkin.2 NRI(p) is a difference between two points on the standardized net benefit curves evaluated at event rate. A point on a curve cannot contain the same or more information than the curve itself. Instead, one might contrast the NRI(p) with the change in the area under the ROC curve (AUC), or more directly, the maximum standardized net benefit (the parent measure of NRI(p)) with the AUC. We have shown that the maximum standardized net benefit is a global measure that does not depend on the event rate itself.3 Moreover, it is proper and cannot be “fooled” by miscalibration. In these regards, it shares the properties of the AUC. Its connection with several other statistical summary measures affords a richer interpretation. Indeed, the measure can be interpreted as the Kolmogorov-Smirnov distance between the risk distributions among events and nonevents as well as the maximum relative utility or standardized net benefit. We argue that presenting standardized net benefit curves with maximum standardized net benefit as a companion single-number summary offers a more elegant and interpretable pairing than standardized net benefit and the AUC. Moreover, Baker4 shows that the inverse of the NRI at event rate times the event rate can be interpreted as the summary test trade-off, ie, an approximate lower bound over all thresholds for the minimum number of tests for a new marker that needs to be traded for a true positive to yield a positive net benefit.5 We agree with Dr Vickers that a thoughtful discussion about a range of classification thresholds can be useful. However, identifying any threshold, or even a threshold range, can be arbitrary, is likely to vary from person to person, and is subject to change and debate. The thresholds used in primary prevention of cardiovascular disease are a good example. When introducing the NRI in 2008, we used 6% and 20%, consistent with the practice at the time.6 However, the 2013 American Heart Association/American College of Cardiology guidelines lowered the thresholds to 5% and 7.5%, and at the same time, expanded the definition of the outcome used in the risk prediction model.7 The subsequent US Preventive Services Task Force guideline raised the threshold to 10%,8 but future guidelines may lower the threshold again. The biomarker discovery process needs more grounding. Furthermore, some researchers argue that model and biomarker evaluation needs a continuous framework. That is why we need global measures of model performance. AUC, maximum standardized net benefit, and R-squared-type measures are the examples here. We disagree with Dr Vickers that the requirement of good calibration is “highly problematic.”1 We note that poor calibration can either mask the usefulness of a promising new biomarker or exaggerate its true impact. As suggested by the TRIPOD statement,9 visual displays of calibration offer the best illustration of where we are. Of note, neither NRI(p) or IDI defined as a difference in rescaled Brier scores require “good” calibration. NRI and IDI measures were proposed to focus the discussion on reclassification away from overall movement in predicted risks to changes that are helpful and those which are not. These measures have been extensively used and misused in the 10 years from their original presentation.10 A lot of valuable research has been conducted to inform their advantages and weak spots. We view this advancement as the necessary desirable scientific process. So where are we after these 10 years? 1. Given the shortcomings of the continuous NRI and the variability and difficulty in selecting thresholds, the NRI(p) is the preferred member of the NRI family. 2. NRI(p) offers an attractive alternative to the change in AUC and is an appropriate single-number companion to the standardized net benefit (relative utility) curves for models with and without the new biomarker. 3. In situations where established thresholds exist, change in net benefit, standardized benefit, relative utility, or weighted NRI are the preferred single-number measures with the event and non-event components of the two-category NRI as

PUBLICATION RECORD

Publication year
2019
Venue
Statistics in Medicine
Publication date
2019-01-04
Fields of study
Medicine, Psychology
Identifiers
DOI 10.1002/sim.8031 PMID 30609149
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Comments on “Net reclassification index at event rate: Properties and relationships”
2019cited by this paper
Net reclassification index at event rate: properties and relationships
2017cited by this paper
The summary test tradeoff: a new measure of the value of an additional risk prediction marker
2017cited by this paper
Discrimination slope and integrated discrimination improvement – properties, relationships and impact of calibration
2017cited by this paper
Authors' response to comments
2017cited by this paper
Statin Use for the Primary Prevention of Cardiovascular Disease in Adults: US Preventive Services Task Force Recommendation Statement.
2016cited by this paper
Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): Explanation and Elaboration
2015cited by this paper
2013 ACC/AHA Guideline on the Treatment of Blood Cholesterol to Reduce Atherosclerotic Cardiovascular Risk in Adults: A Report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines
2014cited by this paper
Net reclassification improvement: computation, interpretation, and controversies: a literature review and clinician's guide.
2014cited by this paper
Reprint: 2013 ACC/AHA Guideline on the Treatment of Blood Cholesterol to Reduce Atherosclerotic Cardiovascular Risk in Adults.
2014cited by this paper
Net Reclassification Improvement: Computation, Interpretation, and Controversies
2014cited by this paper
2013 ACC/AHA guideline on the treatment of blood cholesterol to reduce atherosclerotic cardiovascular risk in adults: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines.
2014cited by this paper
Coefficients of Determination in Logistic Regression Models—A New Proposal: The Coefficient of Discrimination
2009cited by this paper
Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond
2008cited by this paper
Decision Curve Analysis: A Novel Method for Evaluating Prediction Models
2006cited by this paper

CITED BY

Three myths about risk thresholds for prediction models
2019cites this paper