smdi

An R package to perform structural missing data investigations for real-world evidence studies

Division of Pharmacoepidemiology and Pharmacoeconomics
Brigham and Women’s Hospital
Harvard Medical School

October 26, 2023

Disclosures

Disclosures

  • Janick Weberpals reports prior employment by Hoffmann-La Roche and previously held shares in Hoffmann-La Roche
  • This project was supported by Task Order 75F40119F19002 under Master Agreement 75F40119D10037 from the U.S. Food and Drug Administration (FDA)

Background

Administrative insurance claims databases are increasingly linked to electronic health records (EHR) to improve confounding adjustment for variables which cannot be measured in administrative claims

Examples:

  • Labs (HbA1c, LDL, etc.)
  • Vitals (Blood pressure, BMI, etc.)
  • Disease-specific data (cancer stage, biomarkers, etc.)
  • Physician assessments (ECOG, etc.)
  • Lifestyle factors (smoking, alcohol, etc.)

These covariates are often just partially observed for various reasons:

  • Physician did not perform/order a certain test
  • Certain measurements are just collected for particularly sick patients
  • Information is ‘hiding’ in unstructured records, e.g. clinical notes

Knowledge gaps and objectives

Missing data in EHR confounding factors are frequent

Two common missing data taxonomies

  • Mechanisms: Missing completely at random (MCAR), at random (MAR) and not at random (MNAR)
  • Patterns: Monotone, Non-monotone

Unresolved challenges for causal inference:

  • In an empirical study, it is usually unclear which of the missing data patterns and mechanisms are dominating.
  • What covariate relationships exist and are partially observed covariates recoverable in high-dimensional covariate spaces (e.g., database linkages)?

Objectives

Objectives of the Sentinel Innovation Center Causal Inference Workstream

  • Develop a framework and tools to assess the structure of missing data processes in EHR studies
  • Connect this with the most appropriate analytical approach, followed by sensitivity analyses
  • Develop an R package to implement framework and missing data investigations on a routine basis

Assumed missingness structures