Panel Quality vs. Quantity Simulation

Research framework for comparing large sparse panels vs. small rich panels for advertising measurement.

Overview

This project implements a simulation study to answer: When measuring advertising effectiveness, is it better to invest in panel size (quantity) or data quality?

Key Trade-off:

Large Sparse (LS): N=50,000, few covariates, noisy measurements
Small Rich (SR): N=4,000, many covariates, clean measurements

Files

Core Files

config.R: All parameters and settings
population_generator.R: Generate UK-like reference population
panel_generator.R: Sample LS and SR panels from population
outcome_generator.R: Generate confounded treatment and outcomes
advertising_model.stan: Bayesian logistic regression model
estimation.R: Estimate treatment effects (frequentist & Bayesian)
decision_analysis.R: Decision-theoretic evaluation
visualization.R: Plotting functions
main_simulation.R: Main simulation pipeline

Documentation

RESEARCH_DESIGN.md: Comprehensive research design document
README.md: This file

Installation

R Packages

install.packages(c(
  "tidyverse",
  "rstan",
  "furrr",
  "patchwork",
  "scales",
  "cli",
  "here"
))

Stan

Ensure Stan is properly installed. See: https://mc-stan.org/users/interfaces/rstan

Test installation:

library(rstan)
example(stan_model, package = "rstan", run.dontrun = TRUE)

Quick Start

1. Pilot Run (Fast, ~10 minutes)

Test the pipeline with a small number of scenarios:

source("main_simulation.R")
results <- main(pilot = TRUE, n_cores = 4)

This runs:

10 simulations per scenario
3 confounding levels (1, 2, 5)
2 effect sizes (0.18, 0.25)
Both panel types
Total: ~120 scenarios

2. Full Simulation (Slow, ~48 hours)

source("main_simulation.R")
results <- main(pilot = FALSE, n_cores = 8)

This runs all 60,000 scenarios (500 replications × 120 parameter combinations).

3. View Results

# Load results
results <- readRDS("simulation_results/data/simulation_results.rds")

# View summary
summary <- readRDS("simulation_results/tables/summary_table.csv")

# View plots
list.files("simulation_results/plots")

Usage Examples

Generate Population

source("config.R")
source("population_generator.R")

# Create 5M person reference population
ref_pop <- generate_reference_population(seed = 42)
validate_population(ref_pop)

# Save
saveRDS(ref_pop, "reference_population.rds")

Generate Panels

source("panel_generator.R")

# Large sparse panel (biased sampling, few covariates)
ls_panel <- generate_panel(ref_pop, "large_sparse")

# Small rich panel (representative, many covariates)
sr_panel <- generate_panel(ref_pop, "small_rich")

# Compare demographics
validate_panel(ls_panel)
validate_panel(sr_panel)

Generate Outcomes with Confounding

source("outcome_generator.R")

# Moderate confounding (women 2× more likely to be targeted AND purchase)
panel_with_outcomes <- generate_outcomes(
  ls_panel,
  true_effect = 0.18,              # 20% relative uplift
  confounding_strength = 2,         # Gender factor
  include_measurement_error = TRUE
)

# Check confounding
table(Treatment = panel_with_outcomes$treatment_true,
      Gender = panel_with_outcomes$gender_label)

Estimate Effects

source("estimation.R")

# Frequentist (fast)
est_freq <- estimate_both_methods(panel_with_outcomes, use_bayesian = FALSE)

cat("Unadjusted:", est_freq$unadjusted$estimate)
cat("Adjusted:", est_freq$adjusted$estimate)
cat("True effect: 0.18")

# Bayesian (slow but full posterior)
est_bayes <- estimate_both_methods(panel_with_outcomes, use_bayesian = TRUE)

cat("P(positive effect):", est_bayes$adjusted$prob_positive)
cat("P(breaks even):", est_bayes$adjusted$prob_breakeven)

Decision Analysis

source("decision_analysis.R")

# Compute expected utility
utility <- compute_decision_utility(est_bayes$adjusted$posterior)

cat("Expected profit: £", round(utility$expected_utility))
cat("P(profitable):", utility$prob_profitable)

# Make decision
decision <- make_decision(utility)
cat("Decision:", decision)

# Evaluate vs. truth
eval <- evaluate_decision(est_bayes$adjusted, true_effect = 0.18)
cat("Correct decision:", eval$decision_correct)
cat("Utility loss: £", round(eval$utility_loss))

Key Parameters (in `config.R`)

Panel Sizes

PANEL_SIZE_LARGE <- 50000   # Large sparse
PANEL_SIZE_SMALL <- 4000    # Small rich

Data Quality

MEASUREMENT_QUALITY <- list(
  large_sparse = list(
    treatment_accuracy = 0.85,  # 85% ad tracking
    outcome_linkage = 0.60      # 60% purchase linkage
  ),
  small_rich = list(
    treatment_accuracy = 0.98,  # 98% ad tracking
    outcome_linkage = 0.95      # 95% purchase linkage
  )
)

Simulation Scenarios

TRUE_EFFECTS <- c(0.10, 0.15, 0.18, 0.25, 0.30)
CONFOUNDING_STRENGTHS <- c(1.0, 1.5, 2.0, 3.0, 5.0, 10.0)
N_SIMULATIONS <- 500

Decision Parameters

DECISION_PARAMS <- list(
  ad_cost = 100000,              # £100k
  revenue_per_conversion = 50,   # £50 AOV
  gross_margin = 0.50,           # 50%
  n_impressions = 1000000        # 1M impressions
)

Output Structure

simulation_results/
├── data/
│   ├── reference_population.rds
│   ├── simulation_results.rds
│   └── pilot_results.rds
├── tables/
│   ├── summary_table.csv
│   └── crossover_analysis.csv
├── plots/
│   ├── bias_variance_tradeoff.png
│   ├── confounding_adjustment.png
│   ├── decision_accuracy.png
│   ├── utility_loss.png
│   └── crossover_heatmap.png
└── config.rds

EVSI Notes

The provided EVSI function uses a normal-approximation shortcut for speed. It is suitable for relative comparisons across panels but may not match a full Bayesian data-augmentation approach. Expose and tune its parameters (n_sims, prior) for sensitivity checks.

Customization

Adding New Scenarios

Edit config.R:

# Test additional confounding levels
CONFOUNDING_STRENGTHS <- c(1.0, 1.5, 2.0, 3.0, 5.0, 7.5, 10.0, 15.0)

# Test different panel sizes
PANEL_SIZE_LARGE <- 100000  # Test with larger panel

Changing Priors

Edit config.R:

PRIORS <- list(
  baseline_intercept = list(mean = qlogis(0.01), sd = 1),  # More informative
  treatment_effect = list(mean = 0.18, sd = 0.3),          # Stronger prior
  covariate_effects = list(mean = 0, sd = 0.5)
)

Adding Covariates

Edit population_generator.R to add new latent variables, then update:

panel_generator.R: Include in small_rich panel
outcome_generator.R: Add effects in outcome model
estimation.R: Include in adjusted formulas

Troubleshooting

Stan compilation errors

# Try recompiling model
stan_model("advertising_model.stan", verbose = TRUE)

Memory issues

Reduce number of scenarios or use fewer replications:

# In config.R
N_SIMULATIONS <- 100  # Reduce from 500

Parallel processing issues

# Use sequential processing
plan(sequential)

# Or reduce cores
plan(multisession, workers = 2)

Performance Tips

Use pilot mode for testing: main(pilot = TRUE)
Start with frequentist estimates (much faster than Bayesian)
Save checkpoints: Results are saved after each scenario
Use parallel processing: Set n_cores to number of available cores
Monitor progress: Check simulation_results/data/ for checkpoint files
Ablations: Control LS sampling bias and SR covariate richness via ABLATION_FLAGS in config.R
Risk options: Use make_decision(..., risk_measure = 'CE'|'VaR', risk_param = ...) for risk aversion

Validation

Check confounding is working

panel <- generate_outcomes(panel, confounding_strength = 5)

# Should see different gender proportions in treatment/control
prop.table(table(panel$treatment_true, panel$gender), 1)

# Should see different purchase rates by gender
tapply(panel$outcome_true, panel$gender, mean)

Check measurement error

panel_noerror <- generate_outcomes(panel, measurement_error = FALSE)
panel_error <- generate_outcomes(panel, measurement_error = TRUE)

# Agreement rates should differ
mean(panel_noerror$treatment_obs == panel_noerror$treatment_true)  # 100%
mean(panel_error$treatment_obs == panel_error$treatment_true)      # 85% or 98%

Citation

If you use this code, please cite:

Panel Quality vs. Quantity for Advertising Measurement: A Simulation Study
[Authors]
2025

License

MIT License

Contact

For questions or issues, please open an issue on GitHub or contact [email].

References

Johnson, G. A. (2023). "Inferno: A guide to field experiments in online display advertising." Journal of Economics & Management Strategy.
Lewis, R. A., & Rao, J. M. (2015). "The unfavorable economics of measuring the returns to advertising." The Quarterly Journal of Economics.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
R		R
docs		docs
research_analysis_files		research_analysis_files
simulation_results/visualizations		simulation_results/visualizations
simulation_summary_files		simulation_summary_files
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
chatgpt_review.md		chatgpt_review.md
itv_lantern_sim.Rproj		itv_lantern_sim.Rproj
research_analysis.html		research_analysis.html
research_analysis.qmd		research_analysis.qmd
simulation_summary.html		simulation_summary.html
simulation_summary.qmd		simulation_summary.qmd
slides_outline.md		slides_outline.md

License

ITV/mit-lantern-panel-simulation

Folders and files

Latest commit

History

Repository files navigation

Panel Quality vs. Quantity Simulation

Overview

Files

Core Files

Documentation

Installation

R Packages

Stan

Quick Start

1. Pilot Run (Fast, ~10 minutes)

2. Full Simulation (Slow, ~48 hours)

3. View Results

Usage Examples

Generate Population

Generate Panels

Generate Outcomes with Confounding

Estimate Effects

Decision Analysis

Key Parameters (in config.R)

Panel Sizes

Data Quality

Simulation Scenarios

Decision Parameters

Output Structure

EVSI Notes

Customization

Adding New Scenarios

Changing Priors

Adding Covariates

Troubleshooting

Stan compilation errors

Memory issues

Parallel processing issues

Performance Tips

Validation

Check confounding is working

Check measurement error

Citation

License

Contact

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Key Parameters (in `config.R`)

Packages