Discussion of Schuemie et al: “A plea to stop using the case‐control design in retrospective database studies”

S. Schneeweiss,S. Suissa

Published 2019 in Statistics in Medicine

ABSTRACT

There is nothing wrong with case-control sampling designs The case-control sampling design is an efficient analytic strategy within a cohort study. Fifty years ago, Miettinen showed that the odds ratio computed from a case-control study using a sampled risk-set or a random sample of person-moments estimates the rate ratio of the underlying cohort study.1,2 An inherent property of the case-control sampling design is that one would obtain the same rate ratio estimate as that from an analysis of the underlying cohort that gave rise to the cases and controls. A key characteristic when discussing case-control sampling in healthcare database studies is that the underlying cohort that gave rise to cases and controls is identifiable and enumerable. This is distinct from many community-based or hospital-based case-control studies where the true underlying source population remains unknown and cannot be enumerated. There are several good reasons, even in database studies working with previously collected information, to apply case-control sampling, many of which have been pointed out decades ago.3 First, it is at times necessary to collect data on a confounder not available in the database by reviewing source data, such as medical records or questionnaires. This is costly and time consuming, so that the efficiency of case-control sampling is welcome in such settings. This approach was essential in a case-control study of inhaled beta-agonist use and life-threatening asthma, where additional clinical information on asthma severity was obtained from hospital medical records and physician questionnaires to improve confounding control.4 Second, biomarkers observed before cohort entry can be used to stratify the study population. Here again, such information is costly to retrieve and therefore a variation of case-control sampling, namely, case-cohort sampling, is applied.3 Third, there are drug safety surveillance programs that focus on specific endpoints, particularly if such endpoints require adjudication or special expertise in their assessment and classification, eg, a system focused on severe liver injury due to drug exposure. Such systems have established an elaborate system to identify and validate the outcomes of interest and screen a wide range of medications for the incidence of liver injury.5 Finally, a subtler point of case-control sampling designs is that they make it convenient to study the triggers of an acute event by flexibly modeling the exposure window at varying proximities to the event of interest.6,7 In earlier years, when data processing came with a considerable cost, the computational efficiency of using a case-control sampling within a cohort was mentioned as an advantage over the full cohort analysis, particularly in very large cohorts with time-varying exposures. Today, this is generally no longer considered an issue. Nevertheless, some efficiency can be gained as in a study of the long-term effects of antihypertensive drugs on the incidence of cancer which involved a cohort of over 1.1 million patients of whom over 40 000 developed cancer during 14 years of follow-up, a size that necessitated sampling within the cohort.8 A well-described limitation of case-control designs is the inability to directly estimate incidence rates. While that remains true in database studies, we are usually able to enumerate the underlying cohort in such data and therefore can determine the sampling fractions of cases and of controls, and indirectly establish incidence rates. This concept led to the quasi-cohort approach that allows to derive incidence rates, as well as corresponding crude and adjusted rate differences, from case-control sampling within a defined cohort.9 We therefore have noted earlier that, in situations of no additional data collection, limited to previously collected information, and with an interest to assess the incidence and relative risk of health events, there is no reason to conduct a case-control sampling design. A cohort design will be more informative, is easier to communicate to readers, and is less prone to investigator errors.10 There is nothing wrong with case-control sampling designs, yet people make mistakes Theory and practice show over and over that well-designed and well-executed case-control sampling provides valid estimates of rate ratios compared with cohort studies. For example, an illustrative study of the association between statins and lung cancer incidence found, using the full cohort analysis of all 365 467 subjects with 1786 incident cases of lung cancer occurring during follow-up, a rate ratio of 1.02 (95% CI: 0.90 to 1.17), while a nested case control 1:10 sampling design within the cohort produced a corresponding rate ratio of 0.99 (95% CI: 0.85 to 1.16), well within sampling variation.11

PUBLICATION RECORD

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-21 of 21 references · Page 1 of 1

CITED BY

Showing 1-18 of 18 citing papers · Page 1 of 1