Discussion of Schuemie et al: “A plea to stop using the case‐control design in retrospective database studies”

Published 2019 in Statistics in Medicine

ABSTRACT

There is nothing wrong with case-control sampling designs The case-control sampling design is an efficient analytic strategy within a cohort study. Fifty years ago, Miettinen showed that the odds ratio computed from a case-control study using a sampled risk-set or a random sample of person-moments estimates the rate ratio of the underlying cohort study.1,2 An inherent property of the case-control sampling design is that one would obtain the same rate ratio estimate as that from an analysis of the underlying cohort that gave rise to the cases and controls. A key characteristic when discussing case-control sampling in healthcare database studies is that the underlying cohort that gave rise to cases and controls is identifiable and enumerable. This is distinct from many community-based or hospital-based case-control studies where the true underlying source population remains unknown and cannot be enumerated. There are several good reasons, even in database studies working with previously collected information, to apply case-control sampling, many of which have been pointed out decades ago.3 First, it is at times necessary to collect data on a confounder not available in the database by reviewing source data, such as medical records or questionnaires. This is costly and time consuming, so that the efficiency of case-control sampling is welcome in such settings. This approach was essential in a case-control study of inhaled beta-agonist use and life-threatening asthma, where additional clinical information on asthma severity was obtained from hospital medical records and physician questionnaires to improve confounding control.4 Second, biomarkers observed before cohort entry can be used to stratify the study population. Here again, such information is costly to retrieve and therefore a variation of case-control sampling, namely, case-cohort sampling, is applied.3 Third, there are drug safety surveillance programs that focus on specific endpoints, particularly if such endpoints require adjudication or special expertise in their assessment and classification, eg, a system focused on severe liver injury due to drug exposure. Such systems have established an elaborate system to identify and validate the outcomes of interest and screen a wide range of medications for the incidence of liver injury.5 Finally, a subtler point of case-control sampling designs is that they make it convenient to study the triggers of an acute event by flexibly modeling the exposure window at varying proximities to the event of interest.6,7 In earlier years, when data processing came with a considerable cost, the computational efficiency of using a case-control sampling within a cohort was mentioned as an advantage over the full cohort analysis, particularly in very large cohorts with time-varying exposures. Today, this is generally no longer considered an issue. Nevertheless, some efficiency can be gained as in a study of the long-term effects of antihypertensive drugs on the incidence of cancer which involved a cohort of over 1.1 million patients of whom over 40 000 developed cancer during 14 years of follow-up, a size that necessitated sampling within the cohort.8 A well-described limitation of case-control designs is the inability to directly estimate incidence rates. While that remains true in database studies, we are usually able to enumerate the underlying cohort in such data and therefore can determine the sampling fractions of cases and of controls, and indirectly establish incidence rates. This concept led to the quasi-cohort approach that allows to derive incidence rates, as well as corresponding crude and adjusted rate differences, from case-control sampling within a defined cohort.9 We therefore have noted earlier that, in situations of no additional data collection, limited to previously collected information, and with an interest to assess the incidence and relative risk of health events, there is no reason to conduct a case-control sampling design. A cohort design will be more informative, is easier to communicate to readers, and is less prone to investigator errors.10 There is nothing wrong with case-control sampling designs, yet people make mistakes Theory and practice show over and over that well-designed and well-executed case-control sampling provides valid estimates of rate ratios compared with cohort studies. For example, an illustrative study of the association between statins and lung cancer incidence found, using the full cohort analysis of all 365 467 subjects with 1786 incident cases of lung cancer occurring during follow-up, a rate ratio of 1.02 (95% CI: 0.90 to 1.17), while a nested case control 1:10 sampling design within the cohort produced a corresponding rate ratio of 0.99 (95% CI: 0.85 to 1.16), well within sampling variation.11

PUBLICATION RECORD

Publication year
2019
Venue
Statistics in Medicine
Publication date
2019-09-05
Fields of study
Medicine
Identifiers
DOI 10.1002/sim.8320 PMID 31489683
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Principles of confounder selection
2019cited by this paper
Author response for "Glucose‐lowering medications and the risk of cancer: a methodological review of studies based on real‐world data"
2019cited by this paper
Glucose‐lowering medications and the risk of cancer: A methodological review of studies based on real‐world data
2019cited by this paper
Graphical Depiction of Longitudinal Study Designs in Health Care Databases
2019cited by this paper
Previous Drug Exposure in Patients Hospitalised for Acute Liver Injury: A Case-Population Study in the French National Healthcare Data System
2018cited by this paper
Limitations of empirical calibration of p‐values using observational data
2016cited by this paper
Persistent User Bias in Case-Crossover Studies in Pharmacoepidemiology.
2016cited by this paper
The Quasi-cohort approach in pharmacoepidemiology: upgrading the nested case-control.
2015cited by this paper
Incretin-Based Drugs and the Risk of Congestive Heart Failure
2014cited by this paper
Advanced Approaches to Controlling Confounding in Pharmacoepidemiologic Studies
2013cited by this paper
Long-Term Use of Angiotensin Receptor Blockers and the Risk of Cancer
2012cited by this paper
Time-window bias in case-control studies: statins and lung cancer.
2011cited by this paper
A basic study design for expedited safety signal evaluation based on electronic healthcare data
2010cited by this paper
The Multitime Case-control Design for Time-varying Exposures
2010cited by this paper
Is the association between inhaled beta-agonist use and life-threatening asthma because of confounding by severity?
1993cited by this paper
Selection of controls in case-control studies. I. Principles.
1992cited by this paper
Selection of controls in case-control studies. III. Design options.
1992cited by this paper
Practical considerations in choosing between the case-cohort and nested case-control designs.
1991cited by this paper
The "case-control" study: valid selection of subjects.
1985cited by this paper
Estimability and estimation in case-referent studies.
1976cited by this paper
Estimation of relative risk from individually matched series.
1970cited by this paper