generated from IQSS/dss-template-quarto
-
Notifications
You must be signed in to change notification settings - Fork 1
/
reporting.qmd
55 lines (28 loc) · 9.97 KB
/
reporting.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
# Reporting the Analysis {#sec-reporting}
Reporting the analysis accurately and completely is critical to ensuring the audience can correctly interpret the results of a study and replicate the methodology. Failure to report the main analytic strategy is commonly noted on systematic reviews of the use of propensity score analysis in medical research [@arguellesSystematicReviewPropensity2022; @zakrisonSystematicReviewPropensity2018]. The main aspects to report are the balancing method and its performance, the procedure for estimating the effect, the effect itself, and limitations of the conclusions.
There are often constraints on how much space one can use to describe results and methods, so it is okay to summarize some of this information, though ideally it should be available in its entirety somewhere in the article, possibly in supplementary materials.
## Reporting the balancing method and results
The balancing method is central to the validity of the analysis, so its nature and performance must be reported for readers to be able to correctly interpret the results. Below are some of the qualities that should be reported:
- The conditioning method used (e.g., matching, weighting, or subclassification) and the specifics of the method. For example, for matching, was optimal or nearest neighbor matching used? Was matching done with or without replacement? Were calipers or exact matching constraints applied? What was the distance measure used to define close matches? For weighting, was propensity score weighting or entropy balancing used? Were the weights trimmed?
- How the propensity score, if any, was estimated. For example, was logistic regression used? Were any polynomial terms, interactions, or covariate transformations included in the model? Was generalized boosted modeling (GBM) used? How were the tuning parameters selected (e.g., optimizing cross-validation accuracy or a multivariate measure of balance)?
- The intended estimand and the estimand that resulted from the analysis. For example, perhaps the ATT was targeted using nearest neighbor matching, but the only way to achieve balance was with a caliper, which changes the estimand to the ATO. Or perhaps there was not enough overlap to estimate the ATE using entropy balancing so ATO weights were used instead. Ideally this also includes a rationale for the choice of estimand, which should be reflected in the body of the paper (e.g., if the paper implies a universal treatment policy or a policy applied only to a certain group of individuals).
- The covariate balance results of the conditioning procedure. Balance is often reported using a table or plot (e.g., a love/dot plot) that contains the balance statistics (e.g., SMD and KS statistic) for each covariate. Balance should also be summarized in a way that provides a more complete story than the univariate statistics computed on the original variables can tell, e.g., by mentioning the worst balance for all squares, cubes and fourth powers of the covariates and all two-way interactions (individual balance statistics on these do not need to be reported, but demonstrating that balance is achieved on these aspects of the covariate distribution lends more credibility to the results). For example, entropy balancing guarantees SMDs of zero on all covariate and requested transformations thereof; simply mentioning this fact and that entropy balancing was successful provides just as much information as presenting SMDs for each covariate individually.
- The effective sample size of the adjusted sample. This should be reported for each treatment group separately. A nonsignificant result found in a sample with a low ESS would be interpreted exactly as it would be in any under-powered study, and it is important that readers understand the sample size context when judging the results of a study. The raw sample size does not provide this information; for example, weighting does not change the raw sample size but can dramatically reduce the effective sample size, and readers must be aware of that.
Some of these aspects might be unfamiliar to a reader, and so it is useful to include a short description of them, especially if they are a newer method or would be confusing without proper context. For example, a sentence describing the use of entropy balancing might go as follows:
> To adjust for confounding by measured confounders, we used entropy balancing [@hainmuellerEntropyBalancingCausal2012], a version of propensity score weighting that guarantees exact balance on the covariate means while minimizing the variability of the weights without explicitly modeling a propensity score.
A footnote describing the ESS might go as follows:
> The effective sample size (ESS) is an estimate of the size of a hypothetical unweighted sample that carries the same precision of our weighted sample and reflects the loss in precision due to weighting, analogous to discarding units when matching.
Of course, more detail on the methods is always better, and word count restrictions should not be an excuse for incompletely reporting the most critical part of a study's methodology.
## Reporting the effect estimation procedure
It is also important to report the effect estimation procedure, as how the effect is estimated can affect its interpretation. The following elements should be reported:
- The outcome model used. Was it a linear, logistic, Cox, or Poisson regression model? Were covariates included? If so, how were covariates included (e.g., as main effects or fully interacted with treatment)? How were covariates selected to be in the model (e.g., all were included, only those thought to explain the most variability in the outcome, only those with some remaining imbalance, etc.)? One must also clarify how the weights were used in the model, i.e., by using a weighted regression.
- The method of estimating the treatment effect from the model. Although in this document we have recommended g-computation, in some cases simply using the coefficient on treatment in the outcome model is sufficient for estimating the treatment effect (i.e., with linear outcome models or with models that lack any covariates). As previously mentioned, it is critical that the coefficient on treatment not be used as an effect estimate when covariates are included in the outcome model and the estimand is a marginal effect.
- The method of computing the SEs and CIs. Was it by the delta method? Was a robust or cluster-robust standard error used? Was bootstrapping used? Was it the traditional bootstrap? How were CIs extracted from the bootstrap procedure (e.g., using percentiles or bias-correct and accelerated CIs)? Remember that under no circumstances should the maximum likelihood estimates of the SEs that come from the models be used for inference; matching and weighting require special treatment of the estimates for them to be valid.
## Reporting the effect estimate
Of course, one needs to report the effect estimate itself. Ideally the estimate and its CI are on a natural, interpretable scale (e.g., the risk ratio rather than the log odds ratio), though it may also be useful to include the scale on which inference is performed. For example, one may have computed the log risk ratio in order to compute its standard error and p-value for the test that it is equal to 0 (i.e., that the risk ratio is equal to 1), which can be reported, but the critical clinically useful measure is the risk ratio itself and its CI.
## Reporting software
In order to ensure results are replicable, one must report the specific software used and the version of that software. For example, one must specify that they are running the analysis in R (including the specific version of R used) and must name and cite all R packages used in the analysis (and their version). Instructions for citing a specific R package can be found by running `citation("pkg_name")`, e.g., `citation("MatchIt")`. In some cases, using a specific function in an R package has additional citation requirements. For example, when using optimal pair matching in `MatchIt`, in addition to citing `MatchIt`, one must also cite `optmatch`, which `MatchIt` uses under the hood. This is explained the documentation for using optimal pair matching with `MatchIt`, which can be accessed using `?method_optimal`.
Often, the packages cited will include `MatchIt` or `WeightIt` for performing the matching or weighting, any package mentioned in the documentation page for the specific method used, `cobalt` for assessing balance, and `marginaleffects` for estimating the treatment effect. For survival outcomes, the `survival` package might be used. Citing packages in critical for ensuring the work done by package authors is correctly attributed. It also ensures someone attempting to replicate one's work can do so without having to infer the software used for the analysis.
## Reporting limitations
In order to provide context for the estimate and prevent misinterpretations, it is important to report the limitations of the study. Limitations often come in the form of assumptions that cannot be verified that would change the interpretation of the results if false. For example, if it is impossible to verify that all confounders of the treatment and outcome have been collected and adjusted for, the estimate cannot be interpreted as causal and may be biased for the true causal effect, in which case the inability to make a definitive causal claim is a limitation. If the most useful or desired estimand was the ATE, but aspects of the sample and analysis required that the ATO be estimated instead to preserve precision or achieve balance, the inability to generalize the effect to the intended population would be a limitation.
These are in addition to the limitations you would report even in a clinical trial, e.g., with respect to measurement error in the treatment, covariates, or outcome, timing of the outcome, treatment compliance, missing data, etc. The limitations section of a paper does not have to be long but anticipating criticism of the paper by acknowledging its limitations can go a long way in getting it accepted by reviewers.