Graphical Depiction of Longitudinal Study Designs in Health Care Databases

S. Schneeweiss,J. Rassen,Jeffrey S. Brown,Kenneth R Rothman,L. Happe,P. Arlett,G. D. Dal Pan,W. Goettsch,W. Murk,Shirley V. Wang

Published 2019 in Annals of Internal Medicine

ABSTRACT

The pharmacoepidemiologic and pharmacoeconomic analysis of databases containing administrative claims and electronic health records has become a routine source of evidence to support regulatory (1) and reimbursement (2) decisions, as well as efficient management of health care organizations. When decision makers understand the study design and analytic choices of a nonrandomized database study and recognize those choices as valid, they have confidence in their decisions based on the study's evidence about the comparative effectiveness and safety of medical products (3, 4). Generally, they consider nonexperimental database studies more difficult to review than randomized trials and see the increased complexity, greater variability in design and analysis options, and lack of consistency in presentation of design choices as key barriers to using database evidence for high-stakes decisions. Unfortunately, some poorly designed studies have led to negative generalizations about the entire field of health care database research rather than a refined view that distinguishes robust evidence from less reliable evidence (5). Confounding from treatment selection based on outcome risk is well known to cause bias (6). Time-related study design flaws can also introduce large biases, including immortal time bias (7), reverse causation (8, 9), adjustment for causal intermediates, unobservable time bias (10), and depletion of susceptibles (11, 12). The methods sections of study reports should describe the study design and analytic choices clearly enough to allow the reader to judge the validity of findings. However, convoluted prose often makes it difficult for most readers to understand what methods were implemented or identify avoidable design flaws. Design diagrams provide key information that needs to be considered when evidence is interpreted from pharmacoepidemiologic and pharmacoeconomic studies done with health care databases. Improving transparency in how these studies are designed and implemented will make it easier for reviewers and decision makers to distinguish the useful from the flawed or irrelevant (13). Graphical study design representations were recommended by the most recent guidance for reporting on database studies from the REporting of studies Conducted using Observational Routinely collected health Data statement for pharmacoepidemiology (RECORD-PE) (14), as well as recently published consensus papers by 2 leading professional societies (15, 16). We propose a simple framework of graphical representations that will clarify critical design choices in database analyses of the effectiveness and safety of medical products. A recent consensus statement laid out a set of parameters that define decisions in database study implementation, which, if reported, would increase reproducibility of studies (16). Building on these parameters, we sought to develop a visualization framework that describes study design implementation in a comprehensive, unambiguous, and intuitive way; contains a level of detail that enables reproduction of key study design variables; and uses standardized structure and terminology to simplify review and communication to a broad audience of decision makers. Our multistakeholder group comprised international leaders with more than 75 years of combined experience in academia, regulatory decisions, health technology assessment, journal leadership, payer decision making, and analyses of distributed health care data networks. The example figures and templates are covered by a Creative Commons license. The PowerPoint figures are free to download and adapt, with appropriate attribution, from www.repeatinitiative.org/projects.html. Terminology The terminology we suggest for temporal anchors is frequently used in descriptions of database studies and in textbooks (17), as well as in the recently published consensus statement (15, 16). We define 3 categories of temporal anchors (Table): base anchors, first-order anchors, and second-order anchors. Base anchors are defined in calendar time and describe the source databasethat is, the longitudinal streams of administrative or clinical health care data from which an analyzable study data set is derived. First-order anchors are defined in patient event time rather than calendar time and specify the study entry or index date. Second-order anchors are also measured in patient event time and are defined relative to the first-order anchor. We provide more detail on each temporal anchor in the following section. Table. Temporal Anchors Study Design Implementation in Health Care Databases The Nature of Health Care Databases Relevant to Effectiveness Research Health care databases are derived from transactional databases that record clinical and administrative information for delivering and administering health care. As encounters occur and services are provided, records are generated and tallied. Each addition to the database comes with a service date stamp and is attributed to the patient via a unique patient identification number, thus generating longitudinal patient records of increasing duration. There is substantial literature describing the details of data integration, cleaning, and normalization (1820). For each patient, all encounters with the health care system that are reimbursable by health insurance (or are captured by the provider's electronic health record system) can be sorted by the service date in calendar time (Figure 1). Each encounter is associated with information on medical services, diagnoses, procedures, and similar events, plus information on payments (in claims data) or charges (in electronic health record data). The rules and algorithms that stem from a specific study implementation will then be applied to each patient's longitudinal data stream. The study implementation is usually oriented around an event-based timeline anchored to a key event, in contrast to the calendar time arrangement of the raw data (Figure 1) (21). Figure 1. From transactional data to study implementation. Individual patient data are documented as encounters from various sources and are arranged in calendar time. This work is licensed under CC BY, and the original versions can be found at www.repeatinitiative.org/projects.html. Dx= diagnosis; E= exposure; Lab= laboratory test; O= outcome; Rx= drug dispensing; V= visit. Dates and Time Windows Certain principles guide the design and implementation of studies in health care data streams. One of the most important is temporality. Unlike in primary data collection, many measurements in health care databasesfor example, patients' baseline characteristicsare measured by reviewing information recorded during multiple health care encounters over time. In primary data collection, a study participant's health state is usually established when the patient is thoroughly interviewed or examined at a study visit. Health care databases have no defined interview date with the investigator team; rather, studies rely on the occurrence of routine visits and other health care encounters to collect information that was recorded during provision of care. Thus, information that may be conceptualized as characterizing a point in time, such as baseline patient characteristics before the start of exposure, is actually recorded during a time window through a series of encounters. Anchors in Calendar Time For a database study to be reproducible, temporal anchors must be defined to specify the underlying longitudinal data used to create a study population (Table). The data extraction date is particularly important to record when working with recent data that are still fluid. The dynamic data flow in a health care database is stabilized by extracting and physically or virtually setting aside requested data for research purposes. However, some administrative records may be corrected or amended retroactively for up to 6 months or longer (22). If the underlying database has data that are dynamically updated over time, a study using the most recently available data extracted today will probably not be exactly replicated using data covering the same period but extracted a year later. The source data range reflects the calendar date boundaries beyond which encounter information is not captured for patients. Investigators must be clear about the lag between the most recent update to the data source and the calendar time boundaries for data included in their study (study period). For example, investigators may access a data source where the tables containing up-to-date information on patient health care contacts are extracted on 1 January 2019 (data extraction date). The source data range included in those tables covers 1 January 2003 to 31 December 2018. The investigators, however, choose a study period that focuses on time after market entry of a drug and does not use the most recent 6 months, a period during which the data may be more fluid. The data extraction date and source data range do not need to be included in visualization of study design, but reporting them and archiving extracted longitudinal data will make study implementation reproducible (16). Anchors in Patient Event Time When an effectiveness or safety study is implemented in a longitudinal database, the time scale shifts from calendar time to patient event time. Specific algorithms define events in the patient timeline. As in randomized controlled trials, where the randomization date is the anchor date, the cohort entry date (CED, also called the index date) is the primary anchor in a nonrandomized database study (Table). The CED is the date when patients enter the analytic study population. For some study designs, study entry can be defined by an event date (as described under Nested CaseControl Study and in Self-Controlled Study Design Visualization in the Appendix). The CED is considered a first-order anchor because most other anchors and parameters used in study implementation w

PUBLICATION RECORD

Publication year
2019
Venue
Annals of Internal Medicine
Publication date
2019-03-12
Fields of study
Medicine
Identifiers
DOI 10.7326/M18-3079 PMID 30856654
External record
Open on Semantic Scholar
Source metadata
Semantic Scholar, PubMed

CITATION MAP

EXTRACTION MAP

CLAIMS

No claims are published for this paper.

CONCEPTS

No concepts are published for this paper.

REFERENCES

Methods for addressing “innocent bystanders” when evaluating safety of concomitant vaccines
2018cited by this paper
Automated data-adaptive analytics for electronic healthcare data to study causal treatment effects
2018cited by this paper
The reporting of studies conducted using observational routinely collected health data statement for pharmacoepidemiology (RECORD-PE)
2018cited by this paper
Updating the Evidence of the Interaction Between Clopidogrel and CYP2C19-Inhibiting Selective Serotonin Reuptake Inhibitors: A Cohort Study and Meta-Analysis
2017cited by this paper
Effects of expanding the look‐back period to all available data in the assessment of covariates
2017cited by this paper
Using Design Thinking to Differentiate Useful From Misleading Evidence in Observational Research.
2017cited by this paper
Reporting to Improve Reproducibility and Facilitate Validity Assessment for Healthcare Database Studies V1.0
2017cited by this paper
Real-World Evidence: Useful in the Real World of US Payer Decision Making? How? When? And What Studies?
2017cited by this paper
Good practices for real‐world data studies of treatment and/or comparative effectiveness: Recommendations from the joint ISPOR‐ISPE Special Task Force on real‐world evidence in health care decision making
2017cited by this paper
Cardiovascular Safety of Tocilizumab Versus Tumor Necrosis Factor Inhibitors in Patients With Rheumatoid Arthritis: A Multi‐Database Cohort Study
2017cited by this paper
Reporting to Improve Reproducibility and Facilitate Validity Assessment for Healthcare Database Studies V1.0.
2017cited by this paper
Policies for Use of Real-World Data in Health Technology Assessment (HTA): A Comparative Study of Six HTA Agencies.
2016cited by this paper
The FDA's sentinel initiative—A comprehensive approach to medical product surveillance
2016cited by this paper
Agreement of treatment effects for mortality from routinely collected data and subsequent randomized trials: meta-epidemiological survey
2016cited by this paper
Data Extraction and Management in Networks of Observational Health Care Databases for Scientific Research: A Comparison of EU-ADR, OMOP, Mini-Sentinel and MATRICE Strategies
2016cited by this paper
Design and analysis choices for safety surveillance evaluations need to be tuned to the specifics of the hypothesized drug–outcome association
2016cited by this paper
A vaccine study design selection framework for the postlicensure rapid immunization safety monitoring program.
2015cited by this paper
Counterpoint: the treatment decision design.
2015cited by this paper
Fidelity Assessment of a Clinical Practice Research Datalink Conversion to the OMOP Common Data Model
2014cited by this paper
Opioid prescribing by multiple providers in Medicare: retrospective observational study of insurance claims
2014cited by this paper
Estimation using all available covariate information versus a fixed look‐back window for dichotomous covariates
2013cited by this paper
Orlistat and the risk of acute liver injury: self controlled case series study in UK Clinical Practice Research Datalink
2013cited by this paper
Effect of statin use on acute kidney injury risk following coronary artery bypass grafting.
2013cited by this paper
Comparative risk for angioedema associated with the use of drugs that target the renin-angiotensin-aldosterone system.
2012cited by this paper
The use of pioglitazone and the risk of bladder cancer in people with type 2 diabetes: nested case-control study
2012cited by this paper
Beyond the intention-to-treat in comparative effectiveness research
2012cited by this paper
When should case‐only designs be used for safety monitoring of medical products?
2012cited by this paper
Design considerations, architecture, and use of the Mini‐Sentinel distributed data system
2012cited by this paper
Utilizing Medicare claims data for real‐time drug safety evaluations: is it feasible? ,
2011cited by this paper
A combined comorbidity score predicted mortality in elderly patients better than existing scores.
2011cited by this paper
A basic study design for expedited safety signal evaluation based on electronic healthcare data
2010cited by this paper
Comparative Risk for Angioedema Associated with the Use of Drugs That Target the Renin-angiotensin-aldosterone System
2010cited by this paper
Cardiovascular Outcomes and Mortality in Patients Using Clopidogrel With Proton Pump Inhibitors After Percutaneous Coronary Intervention or Acute Coronary Syndrome
2009cited by this paper
Immortal time bias in pharmaco-epidemiology.
2008cited by this paper
Immeasurable time bias in observational studies of drug effects on mortality.
2008cited by this paper
Observational Studies Analyzed Like Randomized Experiments: An Application to Postmenopausal Hormone Therapy and Coronary Heart Disease
2008cited by this paper
Increasing Levels of Restriction in Pharmacoepidemiologic Database Studies of Elderly and Comparison With Randomized Trial Results
2007cited by this paper
Immortal time bias in observational studies of drug effects
2007cited by this paper
Tutorial in biostatistics: the self‐controlled case series method
2006cited by this paper
Causation of Bias: The Episcope
2001cited by this paper
Acute respiratory-tract infections and risk of first-time acute myocardial infarction.
1998cited by this paper
Evidence of Depression Provoked by Cardiovascular Medication: A Prescription Sequence Symmetry Analysis
1996cited by this paper
Evidence of the depletion of susceptibles effect in non-experimental pharmacoepidemiologic research.
1994cited by this paper
Evidence of the depletion of susceptibles effect in non-experimental pharmacoepidemiologic research.
1994cited by this paper
Adapting a clinical comorbidity index for use with ICD-9-CM administrative data: differing perspectives.
1993cited by this paper
The case-crossover design: a method for studying transient effects on the risk of acute events.
1991cited by this paper
Induction and latent periods.
1981cited by this paper
[Modern epidemiology].
1971cited by this paper