advanced.qmd

# Advanced topics {#sec-advanced}

The previous sections describe propensity score analysis in the simplest case, that of a binary treatment administered at a single time point. In reality, research is often more complicated, and more complicated research questions require adjustments to to this simple case. Below, we briefly outline some more complicated scenarios, including performing subgroup analysis, dealing with multi-category and continuous treatments, dealing with longitudinal/sequential treatments, and dealing with missing data. In practice, one should consult with a biostatistician specially trained in these methods rather than attempt them oneself.

## Subgroup analysis

Subgroup analysis is required to understand how treatments affect different types of patients and to be able to provide reasoned recommendations when information about patients is available (in contrast to the broad policy-based recommendations implied by the usual estimands). Subgroup analysis can be done simply by performing separate analyses within each subgroup [@greenExaminingModerationAnalyses2014], though in some case it can be beneficial to share information (e.g., estimation of the propensity score or outcome model) across subgroups [@dongSubgroupBalancingPropensity2020].

It is also important to remember that performing subgroup analysis does not allow one to make a causal claim about the effect of subgroup membership on the treatment effect unless additional work is done to remove confounding from subgroup membership. For example, one may be interested in a subgroup analysis stratified by hospital. It may be that the treatment effect in one hospital differs from that in another, but that does not mean which hospital one goes to causes differences in the treatment effect (e.g., because of different quality of care); it may simply be that one hospital caters to patients for whom treatment is less effective (e.g., because of systemic issues that cause people both to suffer from comorbidities that change the treatment effect and to live closer to one hospital than another). This distinction between a scenario in which subgroup membership causes treatment effect heterogeneity and one in which subgroup membership is merely associated with treatment effect heterogeneity is described in detail in [@vanderweeleDistinctionInteractionEffect2009].

## Multi-category and continuous treatments

Treatments do not have to be binary to be used with propensity score analysis. Methods also exist for multi-category and continuous treatments. An example of a multi-category treatment might be drug type in a study comparing two drugs to each other and to control. [@strasserEstimatesSARSCoV2Omicron2022] considered virus variant a multi-category exposure when examining the effect of COVID subvariant (Delta, Omicron, and Omicron BA.2) on patient health outcomes. An example of a continuous treatment might be the effect of pollutant exposure on mortality, as examined by [@wuMatchingGeneralizedPropensity2022].

### Multi-category treatments

Estimating effects for multi-category treatment involves adjusting the sample so that the distributions of covariates in all categories resemble each other and some target population corresponding to the estimand of interest. This can be done using matching [@lopezEstimationCausalEffects2017] or weighting [@mccaffreyTutorialPropensityScore2013]. Instead of a single-valued propensity score, each unit has a vector-valued "generalized" propensity score corresponding to the probability of receiving each level of treatment [@imbensRolePropensityScore2000]. For example, for a three-level treatment, an individual unit may have a generalized propensity score of $[.1, .4, .5]$. @mccaffreyTutorialPropensityScore2013 and @liPropensityScoreWeighting2019 describe how to use these generalized propensity scores to compute weights. Currently, weighting methods are better developed and easier to use than matching methods for multi-category treatments and are available in `WeightIt`.

### Continuous treatments

The usual estimand for a continuous treatment is the average dose-response function (ADRF), which links the expected potential outcome (i.e., the average outcome if everyone was assigned to a single treatment value) to the corresponding treatment level. Propensity score analysis for continuous treatments involves adjusting the sample so that the treatment is independent from the covariates. This can be done using matching [@wuMatchingGeneralizedPropensity2022] or weighting [@robinsMarginalStructuralModels2000; @zhuBoostingAlgorithmEstimating2015; @hulingIndependenceWeightsCausal2023a] (available in `WeightIt`). The propensity score is instead represented as a single-valued generalized propensity score corresponding to the conditional density of treatment given the covariates (i.e., rather than the probability, which would be 0 for a all values of a truly continuous treatment) [@hiranoPropensityScoreContinuous2005]. Balance is often assessed using the correlations between the treatment and each covariate in the adjusted sample [@austinAssessingCovariateBalance2019], though more holistic measures such as the distance covariance have also been developed [@hulingIndependenceWeightsCausal2023a]. To estimate the ADRF, one can fit a flexible model for the outcome given the treatment in the weighted sample.

## Longitudinal/sequential treatments

Methods have been developed for estimating the effect of a treatment that can occur at multiple time points. For example, @robinsMarginalStructuralModels2000 described methods for estimating the effect of zidovudine (AZT) treatment on mortality in HIV-infected patients, where treatment was defined each day since start of follow-up as the those dose of AZT received that day. These special methods must be used when confounding is time-varying, i.e., confounders of subsequent treatment and the outcome are themselves affected by previous treatments. Simply adjusting for these time-varying confounders by regression adjustment or standard propensity score analysis causes the same problems that adjusting for any post-treatment variable does.

The methods used for adjusting for time-varying confounding are called "g-methods" and are described in @hernanCausalInferenceWhat2020. The simplest one is inverse probability weighting of marginal structural models, which essentially involves creating a propensity score weight at each time point and multiplying them together (available in `WeightIt`); ideally, this yields a scenario analogous to one in which treatment is randomized at each time point. @thoemmesPrimerInverseProbability2016 and @robinsMarginalStructuralModels2000 provide clear examples of the method.

## Missing data

Missing data is often present in the analysis of real datasets. There are a variety of reasons why data could be missing: an administrative error, loss to follow-up, or participant refusal to provide information are some examples. Handling missing data generally is a serious topic that requires expertise to do correctly, though there are mainstream methods that are commonly used and have been shown to be compatible with propensity score analysis and can yield accurate results if certain assumptions about why the data are missing are met [@chamPropensityScoreAnalysis2016]. The most common methods for dealing with missing data in propensity score analysis are multiple imputation [@rubinMultipleImputationNonresponse2004] and censoring weights [@hernanCausalInferenceWhat2020, Ch 12.6].

Imputation involves making a guess about the true value of each missing value. This guess often comes from a predictive model that describes the relationships among variables in the data. Instead of making a single guess, multiple imputation involves making many guesses, each stored in a separate version of the dataset with the guesses filled in. The analysis occurs in each imputed dataset, and then the results are pooled across datasets to arrive at a final single estimate. Although there have been doubts about the best way to perform propensity score analysis with multiply imputed data, simulations frequently verify that the standard approach described above yields the most accurate results [@leyratPropensityScoreAnalysis2019]. The `MatchThem` package provides some utilities for matching and weighting with multiple imputed data [@pishgarMatchThemMatchingWeighting2021a], and `cobalt` supports assessing balance across imputations [@greiferCobaltCovariateBalance2020].

Censoring weights are an alternative to imputation that are more commonly used when a single variable, e.g., the outcome, is missing for some units. Censoring weights discard any units with missing data and weight the remaining units to resemble the full sample (i.e., the original sample that included those with missing data) [@hernanCausalInferenceWhat2020, Ch 12.6]. Censoring weights are multiplied by propensity score weights when both are used to create a final set of weights that adjust for both confounding and censoring. Censoring weights are especially common with longitudinal treatments and in survival analysis.