Graphical Depiction of Longitudinal Study Designs in Health Care Databases

S. Schneeweiss,J. Rassen,Jeffrey S. Brown,Kenneth R Rothman,L. Happe,P. Arlett,G. D. Dal Pan,W. Goettsch,W. Murk,Shirley V. Wang

Published 2019 in Annals of Internal Medicine

ABSTRACT

The pharmacoepidemiologic and pharmacoeconomic analysis of databases containing administrative claims and electronic health records has become a routine source of evidence to support regulatory (1) and reimbursement (2) decisions, as well as efficient management of health care organizations. When decision makers understand the study design and analytic choices of a nonrandomized database study and recognize those choices as valid, they have confidence in their decisions based on the study's evidence about the comparative effectiveness and safety of medical products (3, 4). Generally, they consider nonexperimental database studies more difficult to review than randomized trials and see the increased complexity, greater variability in design and analysis options, and lack of consistency in presentation of design choices as key barriers to using database evidence for high-stakes decisions. Unfortunately, some poorly designed studies have led to negative generalizations about the entire field of health care database research rather than a refined view that distinguishes robust evidence from less reliable evidence (5). Confounding from treatment selection based on outcome risk is well known to cause bias (6). Time-related study design flaws can also introduce large biases, including immortal time bias (7), reverse causation (8, 9), adjustment for causal intermediates, unobservable time bias (10), and depletion of susceptibles (11, 12). The methods sections of study reports should describe the study design and analytic choices clearly enough to allow the reader to judge the validity of findings. However, convoluted prose often makes it difficult for most readers to understand what methods were implemented or identify avoidable design flaws. Design diagrams provide key information that needs to be considered when evidence is interpreted from pharmacoepidemiologic and pharmacoeconomic studies done with health care databases. Improving transparency in how these studies are designed and implemented will make it easier for reviewers and decision makers to distinguish the useful from the flawed or irrelevant (13). Graphical study design representations were recommended by the most recent guidance for reporting on database studies from the REporting of studies Conducted using Observational Routinely collected health Data statement for pharmacoepidemiology (RECORD-PE) (14), as well as recently published consensus papers by 2 leading professional societies (15, 16). We propose a simple framework of graphical representations that will clarify critical design choices in database analyses of the effectiveness and safety of medical products. A recent consensus statement laid out a set of parameters that define decisions in database study implementation, which, if reported, would increase reproducibility of studies (16). Building on these parameters, we sought to develop a visualization framework that describes study design implementation in a comprehensive, unambiguous, and intuitive way; contains a level of detail that enables reproduction of key study design variables; and uses standardized structure and terminology to simplify review and communication to a broad audience of decision makers. Our multistakeholder group comprised international leaders with more than 75 years of combined experience in academia, regulatory decisions, health technology assessment, journal leadership, payer decision making, and analyses of distributed health care data networks. The example figures and templates are covered by a Creative Commons license. The PowerPoint figures are free to download and adapt, with appropriate attribution, from www.repeatinitiative.org/projects.html. Terminology The terminology we suggest for temporal anchors is frequently used in descriptions of database studies and in textbooks (17), as well as in the recently published consensus statement (15, 16). We define 3 categories of temporal anchors (Table): base anchors, first-order anchors, and second-order anchors. Base anchors are defined in calendar time and describe the source databasethat is, the longitudinal streams of administrative or clinical health care data from which an analyzable study data set is derived. First-order anchors are defined in patient event time rather than calendar time and specify the study entry or index date. Second-order anchors are also measured in patient event time and are defined relative to the first-order anchor. We provide more detail on each temporal anchor in the following section. Table. Temporal Anchors Study Design Implementation in Health Care Databases The Nature of Health Care Databases Relevant to Effectiveness Research Health care databases are derived from transactional databases that record clinical and administrative information for delivering and administering health care. As encounters occur and services are provided, records are generated and tallied. Each addition to the database comes with a service date stamp and is attributed to the patient via a unique patient identification number, thus generating longitudinal patient records of increasing duration. There is substantial literature describing the details of data integration, cleaning, and normalization (1820). For each patient, all encounters with the health care system that are reimbursable by health insurance (or are captured by the provider's electronic health record system) can be sorted by the service date in calendar time (Figure 1). Each encounter is associated with information on medical services, diagnoses, procedures, and similar events, plus information on payments (in claims data) or charges (in electronic health record data). The rules and algorithms that stem from a specific study implementation will then be applied to each patient's longitudinal data stream. The study implementation is usually oriented around an event-based timeline anchored to a key event, in contrast to the calendar time arrangement of the raw data (Figure 1) (21). Figure 1. From transactional data to study implementation. Individual patient data are documented as encounters from various sources and are arranged in calendar time. This work is licensed under CC BY, and the original versions can be found at www.repeatinitiative.org/projects.html. Dx= diagnosis; E= exposure; Lab= laboratory test; O= outcome; Rx= drug dispensing; V= visit. Dates and Time Windows Certain principles guide the design and implementation of studies in health care data streams. One of the most important is temporality. Unlike in primary data collection, many measurements in health care databasesfor example, patients' baseline characteristicsare measured by reviewing information recorded during multiple health care encounters over time. In primary data collection, a study participant's health state is usually established when the patient is thoroughly interviewed or examined at a study visit. Health care databases have no defined interview date with the investigator team; rather, studies rely on the occurrence of routine visits and other health care encounters to collect information that was recorded during provision of care. Thus, information that may be conceptualized as characterizing a point in time, such as baseline patient characteristics before the start of exposure, is actually recorded during a time window through a series of encounters. Anchors in Calendar Time For a database study to be reproducible, temporal anchors must be defined to specify the underlying longitudinal data used to create a study population (Table). The data extraction date is particularly important to record when working with recent data that are still fluid. The dynamic data flow in a health care database is stabilized by extracting and physically or virtually setting aside requested data for research purposes. However, some administrative records may be corrected or amended retroactively for up to 6 months or longer (22). If the underlying database has data that are dynamically updated over time, a study using the most recently available data extracted today will probably not be exactly replicated using data covering the same period but extracted a year later. The source data range reflects the calendar date boundaries beyond which encounter information is not captured for patients. Investigators must be clear about the lag between the most recent update to the data source and the calendar time boundaries for data included in their study (study period). For example, investigators may access a data source where the tables containing up-to-date information on patient health care contacts are extracted on 1 January 2019 (data extraction date). The source data range included in those tables covers 1 January 2003 to 31 December 2018. The investigators, however, choose a study period that focuses on time after market entry of a drug and does not use the most recent 6 months, a period during which the data may be more fluid. The data extraction date and source data range do not need to be included in visualization of study design, but reporting them and archiving extracted longitudinal data will make study implementation reproducible (16). Anchors in Patient Event Time When an effectiveness or safety study is implemented in a longitudinal database, the time scale shifts from calendar time to patient event time. Specific algorithms define events in the patient timeline. As in randomized controlled trials, where the randomization date is the anchor date, the cohort entry date (CED, also called the index date) is the primary anchor in a nonrandomized database study (Table). The CED is the date when patients enter the analytic study population. For some study designs, study entry can be defined by an event date (as described under Nested CaseControl Study and in Self-Controlled Study Design Visualization in the Appendix). The CED is considered a first-order anchor because most other anchors and parameters used in study implementation w

PUBLICATION RECORD

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-48 of 48 references · Page 1 of 1

CITED BY

Showing 1-100 of 240 citing papers · Page 1 of 3