diff --git a/.gitignore b/.gitignore index 53b7e1523..8434e7077 100644 --- a/.gitignore +++ b/.gitignore @@ -48,6 +48,7 @@ test/*/Makefile # RStudio files .Rproj.user/ +COVIDScenarioPipeline.Rproj # produced vignettes vignettes/*.html @@ -72,3 +73,8 @@ packrat/lib*/ build/ dist/ SEIR.egg-info/ +Outcomes.egg-info/ +.Rproj.user + +# R package manuals +man/ diff --git a/R/pkgs/covidcommon/R/DataUtils.R b/R/pkgs/covidcommon/R/DataUtils.R index f128e59d6..06ab90d61 100755 --- a/R/pkgs/covidcommon/R/DataUtils.R +++ b/R/pkgs/covidcommon/R/DataUtils.R @@ -972,6 +972,12 @@ get_groundtruth_from_source <- function( run_parallel = FALSE) %>% dplyr::select(Update, FIPS, source, !!variables) + } else if((source == 'hhsCMU') && (scale == "US state")) { + warning("This code is almost certainly broken") + hosp <- get_hhsCMU_incidH_st_data() + hosp <- hosp %>% dplyr::select(-FIPS) + rc <- hosp + stop("This code is almost certainly broken") } else if(source == "file"){ rc <- get_gt_file_data(source_file) diff --git a/R/pkgs/hospitalization/DESCRIPTION b/R/pkgs/hospitalization/DESCRIPTION index 75fbd295b..283837f83 100644 --- a/R/pkgs/hospitalization/DESCRIPTION +++ b/R/pkgs/hospitalization/DESCRIPTION @@ -23,4 +23,4 @@ Description: Generate hospitalization scenarios corresponding to the infection s License: What license it uses Encoding: UTF-8 LazyData: true -RoxygenNote: 7.1.0 +RoxygenNote: 7.1.1 diff --git a/R/pkgs/inference/NAMESPACE b/R/pkgs/inference/NAMESPACE index 01a4d5302..5202c45bc 100644 --- a/R/pkgs/inference/NAMESPACE +++ b/R/pkgs/inference/NAMESPACE @@ -1,6 +1,8 @@ # Generated by roxygen2: do not edit by hand export(accept_reject_new_seeding_npis) +export(add_perturb_column_hnpi) +export(add_perturb_column_snpi) export(aggregate_and_calc_loc_likelihoods) export(calc_hierarchical_likadj) export(calc_prior_likadj) @@ -18,13 +20,16 @@ export(logLikStat) export(multi_hosp_run) export(multi_loc_inference_test) export(npis_dataframe) -export(perform_MCMC_step_copies) +export(perform_MCMC_step_copies_chimeric) +export(perform_MCMC_step_copies_global) export(periodAggregate) export(perturb_expand_npis) export(perturb_hnpi) +export(perturb_hnpi_from_file) export(perturb_hpar) export(perturb_seeding) export(perturb_snpi) +export(perturb_snpi_from_file) export(simulate_multi_epi) export(simulate_single_epi) export(single_hosp_run) diff --git a/R/pkgs/inference/R/filter_MC_runner_funcs.R b/R/pkgs/inference/R/filter_MC_runner_funcs.R index 3c8196cb1..aa5655eb6 100644 --- a/R/pkgs/inference/R/filter_MC_runner_funcs.R +++ b/R/pkgs/inference/R/filter_MC_runner_funcs.R @@ -3,19 +3,19 @@ ##'Function that performs aggregation and calculates likelihood data across all given locations. ##' -##' @param all_locations all of the locoations to calculate likelihood for +##' @param all_locations all of the locations to calculate likelihood for ##' @param modeled_outcome the hospital data for the simulations -##' @param obs_nodename the name of the column containg locations. -##' @param config the full configuraiton setup +##' @param obs_nodename the name of the column containing locations. +##' @param config the full configuration setup ##' @param obs the full observed data ##' @param ground_truth_data the data we are going to compare to aggregated to the right statistic ##' @param hosp_file the filename of the hosp file being used (unclear if needed in scope) ##' @param hierarchical_stats the hierarchical stats to use ##' @param defined_priors information on defined priors. ##' @param geodata the geographics data to help with hierarchies -##' @param snpi the file with the npi information for seir -##' @param hnpi the file with the npi information for outcomes -##' @param hpar data frame of hospitalization parameters +##' @param snpi the file with the npi information for seir, only used for heirarchical likelihoods +##' @param hnpi the file with the npi information for outcomes, only used for heirarchical likelihoods +##' @param hpar data frame of hospitalization parameters, only used for heirarchical likelihoods ##' ##' @return a data frame of likelihood data. ##' @@ -85,7 +85,10 @@ aggregate_and_calc_loc_likelihoods <- function( likelihood_data[[location]] <- dplyr::tibble( ll = this_location_log_likelihood, filename = hosp_file, - geoid = location + geoid = location, + accept = 0, # acceptance decision (0/1) . Will be updated later when accept/reject decisions made + accept_avg = 0, # running average acceptance decision + accept_prob = 0 # probability of acceptance of proposal ) names(likelihood_data)[names(likelihood_data) == 'geoid'] <- obs_nodename } @@ -93,7 +96,7 @@ aggregate_and_calc_loc_likelihoods <- function( #' @importFrom magrittr %>% likelihood_data <- likelihood_data %>% do.call(what = rbind) - ##Update liklihood data based on hierarchical_stats + ##Update likelihood data based on hierarchical_stats for (stat in names(hierarchical_stats)) { if (hierarchical_stats[[stat]]$module %in% c("seir_interventions", "seir")) { @@ -135,14 +138,14 @@ aggregate_and_calc_loc_likelihoods <- function( ##probably a more efficient what to do this, but unclear... - likelihood_data <- dplyr::left_join(likelihood_data, ll_adjs, by="geoid") %>% + likelihood_data <- dplyr::left_join(likelihood_data, ll_adjs, by = obs_nodename) %>% tidyr::replace_na(list(likadj = 0)) %>% ##avoid unmatched location problems dplyr::mutate(ll = ll + likadj) %>% dplyr::select(-likadj) } - ##Update lieklihoods based on priors + ##Update likelihoods based on priors for (prior in names(defined_priors)) { if (defined_priors[[prior]]$module %in% c("seir_interventions", "seir")) { #' @importFrom magrittr %>% @@ -181,12 +184,11 @@ aggregate_and_calc_loc_likelihoods <- function( } ##probably a more efficient what to do this, but unclear... - likelihood_data<- dplyr::left_join(likelihood_data, ll_adjs, by="geoid") %>% + likelihood_data<- dplyr::left_join(likelihood_data, ll_adjs, by = obs_nodename) %>% dplyr::mutate(ll = ll + likadj) %>% dplyr::select(-likadj) } - if(any(is.na(likelihood_data$ll))) { print("Full Likelihood") print(likelihood_data) @@ -204,19 +206,19 @@ aggregate_and_calc_loc_likelihoods <- function( ##' Function that performs the necessary file copies the end of an MCMC iteration of ##' filter_MC. ##' -##'@param current_index the current indec in the run -##'@param slot what is the current slot numbe +##'@param current_index the current index in the run +##'@param slot what is the current slot number ##'@param block what is the current block ##'@param run_id what is the id of this run ##'@param global_local_prefix the prefix to be put on both global and local runs. ##'@param gf_prefix the prefix for the directory containing the current globally accepted files. ##'@param global_block_prefix prefix that describes this block. ##' -##'@return TRUE if this succeded. +##'@return TRUE if this succeeded. ##' ##'@export ##' -perform_MCMC_step_copies <- function(current_index, +perform_MCMC_step_copies_global <- function(current_index, slot, block, run_id, @@ -224,10 +226,11 @@ perform_MCMC_step_copies <- function(current_index, gf_prefix, global_block_prefix) { + stop("This needs to be fixed") rc <- list() - if(current_index != 0){ + if(current_index != 0){ #move files from global/intermediate/slot.block.run to global/final/slot rc$seed_gf <- file.copy( covidcommon::create_file_name(run_id,global_local_prefix,current_index,'seed','csv'), covidcommon::create_file_name(run_id,gf_prefix,slot,'seed','csv'), @@ -275,7 +278,7 @@ perform_MCMC_step_copies <- function(current_index, covidcommon::create_file_name(run_id,gf_prefix,slot,'hpar','parquet'), overwrite = TRUE ) - + #move files from global/intermediate/slot.block.run to global/intermediate/slot rc$seed_block <- file.copy( covidcommon::create_file_name(run_id,global_local_prefix,current_index,'seed','csv'), covidcommon::create_file_name(run_id,global_block_prefix,block,'seed','csv') @@ -317,7 +320,7 @@ perform_MCMC_step_copies <- function(current_index, covidcommon::create_file_name(run_id,global_local_prefix,current_index,'hpar','parquet'), covidcommon::create_file_name(run_id,global_block_prefix,block,'hpar','parquet') ) - } else { + } else { #move files from global/intermediate/slot.(block-1) to global/intermediate/slot.block rc$seed_prevblk <- file.copy( covidcommon::create_file_name(run_id,global_block_prefix,block - 1 ,'seed','csv'), covidcommon::create_file_name(run_id,global_block_prefix,block,'seed','csv') @@ -337,6 +340,7 @@ perform_MCMC_step_copies <- function(current_index, covidcommon::create_file_name(run_id,global_block_prefix,block - 1,'llik','parquet'), covidcommon::create_file_name(run_id,global_block_prefix,block,'llik','parquet') ) + rc$snpi_prvblk <-file.copy( covidcommon::create_file_name(run_id,global_block_prefix,block - 1,'snpi','parquet'), @@ -364,6 +368,174 @@ perform_MCMC_step_copies <- function(current_index, } +##' +##' Function that performs the necessary file copies the end of an MCMC iteration of +##' filter_MC. +##' +##'@param current_index the current index in the run +##'@param slot what is the current slot number +##'@param block what is the current block +##'@param run_id what is the id of this run +##'@param chimeric_local_prefix the prefix to be put on both chimeric and local runs. +##'@param cf_prefix the prefix for the directory containing the current chimericly accepted files. +##'@param chimeric_block_prefix prefix that describes this block. +##' +##'@return TRUE if this succeeded. +##' +##'@export +##' +perform_MCMC_step_copies_chimeric <- function(current_index, + slot, + block, + run_id, + chimeric_local_prefix, + cf_prefix, + chimeric_block_prefix) { + + + rc <- list() + + if(current_index != 0){ #move files from chimeric/intermediate/slot.block.run to chimeric/final/slot + rc$seed_gf <- file.copy( + covidcommon::create_file_name(run_id,chimeric_local_prefix,current_index,'seed','csv'), + covidcommon::create_file_name(run_id,cf_prefix,slot,'seed','csv'), + overwrite = TRUE + ) + + # No chimeric SEIR or HOSP files + + # rc$seir_gf <- file.copy( + # covidcommon::create_file_name(run_id,chimeric_local_prefix,current_index,'seir','parquet'), + # covidcommon::create_file_name(run_id,cf_prefix,slot,'seir','parquet'), + # overwrite = TRUE + # ) + # + # rc$hosp_gf <- file.copy( + # covidcommon::create_file_name(run_id,chimeric_local_prefix,current_index,'hosp','parquet'), + # covidcommon::create_file_name(run_id,cf_prefix,slot,'hosp','parquet'), + # overwrite = TRUE + # ) + + rc$llik_gf <- file.copy( + covidcommon::create_file_name(run_id,chimeric_local_prefix,current_index,'llik','parquet'), + covidcommon::create_file_name(run_id,cf_prefix,slot,'llik','parquet'), + overwrite = TRUE + ) + + + rc$snpi_gf <- file.copy( + covidcommon::create_file_name(run_id,chimeric_local_prefix,current_index,'snpi','parquet'), + covidcommon::create_file_name(run_id,cf_prefix,slot,'snpi','parquet'), + overwrite = TRUE + ) + + rc$hnpi_gf <- file.copy( + covidcommon::create_file_name(run_id,chimeric_local_prefix,current_index,'hnpi','parquet'), + covidcommon::create_file_name(run_id,cf_prefix,slot,'hnpi','parquet'), + overwrite = TRUE + ) + + rc$spar_gf <-file.copy( + covidcommon::create_file_name(run_id,chimeric_local_prefix,current_index,'spar','parquet'), + covidcommon::create_file_name(run_id,cf_prefix,slot,'spar','parquet'), + overwrite = TRUE + ) + + rc$hpar_gf <- file.copy( + covidcommon::create_file_name(run_id,chimeric_local_prefix,current_index,'hpar','parquet'), + covidcommon::create_file_name(run_id,cf_prefix,slot,'hpar','parquet'), + overwrite = TRUE + ) + #move files from chimeric/intermediate/slot.block.run to chimeric/intermediate/slot + rc$seed_block <- file.copy( + covidcommon::create_file_name(run_id,chimeric_local_prefix,current_index,'seed','csv'), + covidcommon::create_file_name(run_id,chimeric_block_prefix,block,'seed','csv') + ) + + # no chimeric SEIR or HOSP files + + # rc$seir_block <- file.copy( + # covidcommon::create_file_name(run_id,chimeric_local_prefix,current_index,'seir','parquet'), + # covidcommon::create_file_name(run_id,chimeric_block_prefix,block,'seir','parquet') + # ) + # + # rc$hosp_block <- file.copy( + # covidcommon::create_file_name(run_id,chimeric_local_prefix,current_index,'hosp','parquet'), + # covidcommon::create_file_name(run_id,chimeric_block_prefix,block,'hosp','parquet') + # ) + + rc$llik_block <- file.copy( + covidcommon::create_file_name(run_id,chimeric_local_prefix,current_index,'llik','parquet'), + covidcommon::create_file_name(run_id,chimeric_block_prefix,block,'llik','parquet') + ) + + rc$snpi_block <- file.copy( + covidcommon::create_file_name(run_id,chimeric_local_prefix,current_index,'snpi','parquet'), + covidcommon::create_file_name(run_id,chimeric_block_prefix,block,'snpi','parquet') + ) + + rc$hnpi_block <- file.copy( + covidcommon::create_file_name(run_id,chimeric_local_prefix,current_index,'hnpi','parquet'), + covidcommon::create_file_name(run_id,chimeric_block_prefix,block,'hnpi','parquet') + ) + + rc$spar_block <- file.copy( + covidcommon::create_file_name(run_id,chimeric_local_prefix,current_index,'spar','parquet'), + covidcommon::create_file_name(run_id,chimeric_block_prefix,block,'spar','parquet') + ) + + + rc$hpar_block <- file.copy( + covidcommon::create_file_name(run_id,chimeric_local_prefix,current_index,'hpar','parquet'), + covidcommon::create_file_name(run_id,chimeric_block_prefix,block,'hpar','parquet') + ) + } else { #move files from chimeric/intermediate/slot.(block-1) to chimeric/intermediate/slot.block + rc$seed_prevblk <- file.copy( + covidcommon::create_file_name(run_id,chimeric_block_prefix,block - 1 ,'seed','csv'), + covidcommon::create_file_name(run_id,chimeric_block_prefix,block,'seed','csv') + ) + + rc$seir_prevblk <- file.copy( + covidcommon::create_file_name(run_id,chimeric_block_prefix,block - 1 ,'seir','parquet'), + covidcommon::create_file_name(run_id,chimeric_block_prefix,block,'seir','parquet') + ) + + rc$hosp_prevblk <- file.copy( + covidcommon::create_file_name(run_id,chimeric_block_prefix,block - 1 ,'hosp','parquet'), + covidcommon::create_file_name(run_id,chimeric_block_prefix,block,'hosp','parquet') + ) + + rc$llik_prevblk <- file.copy( + covidcommon::create_file_name(run_id,chimeric_block_prefix,block - 1,'llik','parquet'), + covidcommon::create_file_name(run_id,chimeric_block_prefix,block,'llik','parquet') + ) + + rc$snpi_prvblk <-file.copy( + covidcommon::create_file_name(run_id,chimeric_block_prefix,block - 1,'snpi','parquet'), + covidcommon::create_file_name(run_id,chimeric_block_prefix,block,'snpi','parquet') + ) + + rc$hnpi_prvblk <-file.copy( + covidcommon::create_file_name(run_id,chimeric_block_prefix,block - 1,'hnpi','parquet'), + covidcommon::create_file_name(run_id,chimeric_block_prefix,block,'hnpi','parquet') + ) + + rc$spar_prvblk <- file.copy( + covidcommon::create_file_name(run_id,chimeric_block_prefix,block - 1,'spar','parquet'), + covidcommon::create_file_name(run_id,chimeric_block_prefix,block,'spar','parquet') + ) + + rc$hpar_prvblk <- file.copy( + covidcommon::create_file_name(run_id,chimeric_block_prefix,block - 1,'hpar','parquet'), + covidcommon::create_file_name(run_id,chimeric_block_prefix,block,'hpar','parquet') + ) + } + + + return(rc) + +} + ## Create a list with a filename of each type/extension. A convenience function for consistency in file names #' @export create_filename_list <- function( @@ -389,7 +561,7 @@ create_filename_list <- function( ##'@name initialize_mcmc_first_block ##'@title initialize_mcmc_first_block -##'@param slot what is the current slot numbe +##'@param slot what is the current slot number ##'@param block what is the current block ##'@param run_id what is the id of this run ##'@param global_prefix the prefix to use for global files @@ -411,8 +583,8 @@ initialize_mcmc_first_block <- function( non_llik_types <- paste(c("seed", "seir", "snpi", "hnpi", "spar", "hosp", "hpar"), "filename", sep = "_") extensions <- c("csv", "parquet", "parquet", "parquet", "parquet", "parquet", "parquet", "parquet") - global_files <- create_filename_list(run_id, global_prefix, block - 1, types, extensions) - chimeric_files <- create_filename_list(run_id, chimeric_prefix, block - 1, types, extensions) + global_files <- create_filename_list(run_id, global_prefix, block - 1, types, extensions) # makes file names of the form variable/name/scenario/deathrate/run_id/global/intermediate/slot.(block-1).run_ID.variable.ext + chimeric_files <- create_filename_list(run_id, chimeric_prefix, block - 1, types, extensions) # makes file names of the form variable/name/scenario/deathrate/run_id/chimeric/intermediate/slot.(block-1).run_ID.variable.ext global_check <- sapply(global_files, file.exists) chimeric_check <- sapply(chimeric_files, file.exists) @@ -481,7 +653,7 @@ initialize_mcmc_first_block <- function( )) } - global_file_names <- names(global_files[!global_check]) + global_file_names <- names(global_files[!global_check]) # names are of the form "variable_filename", only files that DONT already exist will be in this list ## seed if ("seed_filename" %in% global_file_names) { @@ -504,6 +676,7 @@ initialize_mcmc_first_block <- function( ## seir, snpi, spar checked_par_files <- c("snpi_filename", "spar_filename", "hnpi_filename", "hpar_filename") checked_sim_files <- c("seir_filename", "hosp_filename") + # These functions save variables to files of the form variable/name/scenario/deathrate/run_id/global/intermediate/slot.(block-1),runID.variable.ext if (any(checked_par_files %in% global_file_names)) { if (!all(checked_par_files %in% global_file_names)) { stop("Provided some InferenceSimulator input, but not all") @@ -536,10 +709,13 @@ initialize_mcmc_first_block <- function( ## Refactor me later: global_likelihood_data <- likelihood_calculation_function(hosp_data) - arrow::write_parquet(global_likelihood_data, global_files[["llik_filename"]]) + arrow::write_parquet(global_likelihood_data, global_files[["llik_filename"]]) # save global likelihood data to file of the form llik/name/scenario/deathrate/run_id/global/intermediate/slot.(block-1).run_ID.llik.ext + + #print("from inside initialize_mcmc_first_block: column names of likelihood dataframe") + #print(colnames(global_likelihood_data)) for (type in names(chimeric_files)) { - file.copy(global_files[[type]], chimeric_files[[type]], overwrite = TRUE) + file.copy(global_files[[type]], chimeric_files[[type]], overwrite = TRUE) # copy files that were in global directory into chimeric directory } return() } diff --git a/R/pkgs/inference/R/functions.R b/R/pkgs/inference/R/functions.R index 110d00b34..c3cfeb8fd 100644 --- a/R/pkgs/inference/R/functions.R +++ b/R/pkgs/inference/R/functions.R @@ -319,7 +319,7 @@ compute_totals <- function(sim_hosp) { # MCMC stuff ------------------------------------------------------------------- -##' Fuction perturbs a seeding file based on a normal +##' Function perturbs a seeding file based on a normal ##' proposal on the start date and ##' a poisson on the number of cases. ##' @@ -328,7 +328,7 @@ compute_totals <- function(sim_hosp) { ##' @param amount_sd the standard deviation parameter of the normal distribution used to perturb amount ##' @param continuous Whether the seeding is passed to a continuous model or not ##' -##' @return a pertubed data frame +##' @return a perturbed data frame ##' ##' @export perturb_seeding <- function(seeding, date_sd, date_bounds, amount_sd = 1, continuous = FALSE) { @@ -354,20 +354,20 @@ perturb_seeding <- function(seeding, date_sd, date_bounds, amount_sd = 1, contin -##' Fuction perturbs an npi parameter file based on +##' Function perturbs an npi parameter file based on ##' user-specified distributions ##' ##' @param snpi the original npis. -##' @param intervention_settings a list of perturbation specificationss +##' @param intervention_settings a list of perturbation specifications ##' ##' -##' @return a pertubed data frame +##' @return a perturbed data frame ##' @export perturb_snpi <- function(snpi, intervention_settings) { ##Loop over all interventions for (intervention in names(intervention_settings)) { # consider doing unique(npis$npi_name) instead - ##Only perform pertubations on interventions where it is specified ot do so. + ##Only perform perturbations on interventions where it is specified to do so. if ('perturbation' %in% names(intervention_settings[[intervention]])){ @@ -380,7 +380,7 @@ perturb_snpi <- function(snpi, intervention_settings) { next } - ##add the pertubation...for now always parameterized in terms of a "reduction" + ##add the perturbation...for now always parameterized in terms of a "reduction" snpi_new <- snpi[["reduction"]][ind] + pert_dist(sum(ind)) ##check that this is in bounds (equivalent to having a positive probability) @@ -396,20 +396,20 @@ perturb_snpi <- function(snpi, intervention_settings) { } -##' Fuction perturbs an npi parameter file based on +##' Function perturbs an npi parameter file based on ##' user-specified distributions ##' ##' @param hnpi the original npis. ##' @param intervention_settings a list of perturbation specificationss ##' ##' -##' @return a pertubed data frame +##' @return a perturbed data frame ##' @export perturb_hnpi <- function(hnpi, intervention_settings) { ##Loop over all interventions for (intervention in names(intervention_settings)) { # consider doing unique(npis$npi_name) instead - ##Only perform pertubations on interventions where it is specified ot do so. + ##Only perform perturbations on interventions where it is specified to do so. if ('perturbation' %in% names(intervention_settings[[intervention]])){ @@ -422,7 +422,7 @@ perturb_hnpi <- function(hnpi, intervention_settings) { next } - ##add the pertubation...for now always parameterized in terms of a "reduction" + ##add the perturbation...for now always parameterized in terms of a "reduction" hnpi_new <- hnpi[["reduction"]][ind] + pert_dist(sum(ind)) ##check that this is in bounds (equivalent to having a positive probability) @@ -437,14 +437,14 @@ perturb_hnpi <- function(hnpi, intervention_settings) { return(hnpi) } -##' Fuction perturbs an npi parameter file based on +##' Fucction perturbs an outcomes parameter file based on ##' user-specified distributions ##' -##' @param hpar the original hospitalization parameters. +##' @param hpar the original hospitalization (outcomes) parameters. ##' @param intervention_settings a list of perturbation specifications ##' ##' -##' @return a pertubed data frame +##' @return a perturbed data frame ##' @export perturb_hpar <- function(hpar, intervention_settings) { ##Loop over all interventions @@ -487,7 +487,7 @@ perturb_hpar <- function(hpar, intervention_settings) { return(hpar) } -##' Function to go through to accept or reject seedings in a block manner based +##' Function to go through to accept or reject proposed parameters for each geoid based ##' on a geoid specific likelihood. ##' ##' @@ -526,7 +526,9 @@ accept_reject_new_seeding_npis <- function( accept <- ratio > runif(length(ratio), 0, 1) orig_lls$ll[accept] <- prop_lls$ll[accept] - + + orig_lls$accept <- as.numeric(accept) # added column for acceptance decision + orig_lls$accept_prob <- min(1,ratio) # added column for acceptance decision for (place in orig_lls$geoid[accept]) { rc_seeding[rc_seeding$place == place, ] <- seeding_prop[seeding_prop$place ==place, ] @@ -564,3 +566,195 @@ iterateAccept <- function(ll_ref, ll_new) { } return(FALSE) } + +# Extra functions for MCMC diagnostics and adaptation ------------------ + +##' Function adds a column to the npi parameter file to record the perturbation standard deviation, initially taken from the config file +##' +##' @param snpi the original npis. +##' @param intervention_settings a list of perturbation specifications +##' +##' +##' @return data frame with perturb_sd column added +##' @export +add_perturb_column_snpi <- function(snpi, intervention_settings) { + + snpi$perturb_sd <- 0 # create a column in the parameter data frame to hold the perturbation sd + + ##Loop over all interventions + for (intervention in names(intervention_settings)) { + ##Only perform perturbations on interventions where it is specified to do so. + + if ('perturbation' %in% names(intervention_settings[[intervention]])){ + + ##find the npi with this name + ind <- (snpi[["npi_name"]] == intervention) + if(!any(ind)){ + next + } + + if(!'sd' %in% names(intervention_settings[[intervention]][['perturbation']])){ + stop("Cannot add perturbation sd to column unless 'sd' values exists in config$interventions$settings$this_intervention$perturbation") + } + + pert_sd <-intervention_settings[[intervention]][['perturbation']][['sd']] + #print(paste0(intervention," initial perturbation sd is ",pert_sd)) + + snpi$perturb_sd[ind] <- pert_sd # update perturbation + + } + } + + return(snpi) +} + + + + +##' Function perturbs an npi parameter file based on the current perturbation standard deviation in the file, and also can update the perturbation value if the adaptive mode is turned on +##' +##' @param snpi the original npis. +##' @param intervention_settings a list of perturbation specifications +##' @param llik log likelihood values +##' +##' @return a perturbed data frame +##' @export +perturb_snpi_from_file <- function(snpi, intervention_settings, llik){ + + + ##Loop over all interventions + for (intervention in names(intervention_settings)) { + + ##Only perform perturbations on interventions where it is specified to do so. + + if ('perturbation' %in% names(intervention_settings[[intervention]])){ + + ##find all the npi with this name (might be one for each geoID) + ind <- (snpi[["npi_name"]] == intervention) + if(!any(ind)){ + next + } + + ## for each of them generate the perturbation and update their value + for (this_npi_ind in which(ind)){ # for each geoid that has this interventions + + this_geoid <- snpi[["geoid"]][this_npi_ind] + this_accept_avg <- llik$accept_avg[llik$geoid==this_geoid] + his_accept_prob <- llik$accept_prob[llik$geoid==this_geoid] + this_intervention_setting<- intervention_settings[[intervention]] + + ##get the random distribution from covidcommon package + pert_dist <- covidcommon::as_random_distribution(this_intervention_setting$perturbation) + + ##add the perturbation...for now always parameterized in terms of a "reduction" + snpi_new <- snpi[["reduction"]][this_npi_ind] + pert_dist(1) + + ##check that this is in bounds (equivalent to having a positive probability) + in_bounds_index <- covidcommon::as_density_distribution( + intervention_settings[[intervention]][['value']] + )(snpi_new) > 0 + + ## include this perturbed parameter if it is in bounds + snpi$reduction[this_npi_ind][in_bounds_index] <- snpi_new[in_bounds_index] + + } + } + } + + return(snpi) +} + +##' Function adds a column to the npi parameter file to record the perturbation standard deviation, initially taken from the config file +##' +##' @param hnpi the original npis. +##' @param intervention_settings a list of perturbation specifications +##' +##' +##' @return data frame with perturb_sd column added +##' @export +add_perturb_column_hnpi <- function(hnpi, intervention_settings) { + + hnpi$perturb_sd <- 0 # create a column in the parameter data frame to hold the perturbation sd + + ##Loop over all interventions + for (intervention in names(intervention_settings)) { + ##Only perform perturbations on interventions where it is specified to do so. + + if ('perturbation' %in% names(intervention_settings[[intervention]])){ + + ##find the npi with this name + ind <- (hnpi[["npi_name"]] == intervention) + if(!any(ind)){ + next + } + + if(!'sd' %in% names(intervention_settings[[intervention]][['perturbation']])){ + stop("Cannot add perturbation sd to column unless 'sd' values exists in config$interventions$settings$this_intervention$perturbation") + } + + pert_sd <-intervention_settings[[intervention]][['perturbation']][['sd']] + #print(paste0(intervention," initial perturbation sd is ",pert_sd)) + + hnpi$perturb_sd[ind] <- pert_sd # update perturbation + + } + } + + return(hnpi) +} + + + + +##' Function perturbs an npi parameter file based on the current perturbation standard deviation in the file, and also can update the perturbation value if the adaptive mode is turned on +##' +##' @param hnpi the original npis. +##' @param intervention_settings a list of perturbation specifications +##' @param llik log likelihood values +##' +##' @return a perturbed data frame +##' @export +perturb_hnpi_from_file <- function(hnpi, intervention_settings, llik){ + + + ##Loop over all interventions + for (intervention in names(intervention_settings)) { + + ##Only perform perturbations on interventions where it is specified to do so. + + if ('perturbation' %in% names(intervention_settings[[intervention]])){ + + ##find all the npi with this name (might be one for each geoID) + ind <- (hnpi[["npi_name"]] == intervention) + if(!any(ind)){ + next + } + + ## for each of them generate the perturbation and update their value + for (this_npi_ind in which(ind)){ # for each geoid that has this interventions + + this_geoid <- hnpi[["geoid"]][this_npi_ind] + this_accept_avg <- llik$accept_avg[llik$geoid==this_geoid] + this_intervention_setting<- intervention_settings[[intervention]] + + ##get the random distribution from covidcommon package + pert_dist <- covidcommon::as_random_distribution(this_intervention_setting$perturbation) + + ##add the perturbation...for now always parameterized in terms of a "reduction" + hnpi_new <- hnpi[["reduction"]][this_npi_ind] + pert_dist(1) + + ##check that this is in bounds (equivalent to having a positive probability) + in_bounds_index <- covidcommon::as_density_distribution( + intervention_settings[[intervention]][['value']] + )(hnpi_new) > 0 + + ## include this perturbed parameter if it is in bounds + hnpi$reduction[this_npi_ind][in_bounds_index] <- hnpi_new[in_bounds_index] + + } + } + } + + return(hnpi) +} + diff --git a/R/pkgs/inference/tests/testthat/test-aggregate_and_calc_loc_likelihoods.R b/R/pkgs/inference/tests/testthat/test-aggregate_and_calc_loc_likelihoods.R index 70aa92fa1..1af68cf83 100644 --- a/R/pkgs/inference/tests/testthat/test-aggregate_and_calc_loc_likelihoods.R +++ b/R/pkgs/inference/tests/testthat/test-aggregate_and_calc_loc_likelihoods.R @@ -195,7 +195,7 @@ test_that("aggregate_and_calc_loc_likelihoods returns a likelihood per location expect_that(nrow(tmp), equals(length(stuff$all_locations))) - expect_that(sort(colnames(tmp)), equals(sort(c("ll","filename",stuff$obs_nodename)))) + expect_that(sort(colnames(tmp)), equals(sort(c("ll","accept","accept_prob","accept_avg","filename",stuff$obs_nodename)))) }) diff --git a/R/pkgs/inference/tests/testthat/test-perform_MCMC_step_copies.R b/R/pkgs/inference/tests/testthat/test-perform_MCMC_step_copies.R index b1035220a..b26cf3472 100644 --- a/R/pkgs/inference/tests/testthat/test-perform_MCMC_step_copies.R +++ b/R/pkgs/inference/tests/testthat/test-perform_MCMC_step_copies.R @@ -2,7 +2,7 @@ context("perform_MCMC_step_copies") ##THESE TESTS CAN BE MADE MORE DETAILED...JUST MAKING PLACE HOLDERS -test_that("MCMC step copies are correctly performed when we are not at the start of a block", { +test_that("MCMC step copies (global) are correctly performed when we are not at the start of a block", { ##some information on our phantom runs current_index <- 2 slot <- 2 @@ -16,7 +16,7 @@ test_that("MCMC step copies are correctly performed when we are not at the start global_local_prefix <- covidcommon::create_prefix(prefix=global_block_prefix, slot=list(slot,"%09d"), sep='.', trailing_separator='.') - ##To be save make a direectory + ##To be save make a directory dir.create("MCMC_step_copy_test") setwd("MCMC_step_copy_test") ##get file names @@ -44,7 +44,7 @@ test_that("MCMC step copies are correctly performed when we are not at the start ##print(hosp_src) ##print(covidcommon::create_file_name(run_id,gf_prefix,slot,'hosp','parquet')) - res <- perform_MCMC_step_copies(current_index, + res <- perform_MCMC_step_copies_global(current_index, slot, block, run_id, @@ -63,7 +63,7 @@ test_that("MCMC step copies are correctly performed when we are not at the start }) -test_that("MCMC step copies are correctly performed when we are at the start of a block", { +test_that("MCMC step copies (global) are correctly performed when we are at the start of a block", { ##some information on our phantom runs current_index <- 0 slot <- 2 @@ -105,7 +105,7 @@ test_that("MCMC step copies are correctly performed when we are at the start of print(hosp_src) print(covidcommon::create_file_name(run_id,global_block_prefix,block,'hosp','parquet')) - res <- perform_MCMC_step_copies(current_index, + res <- perform_MCMC_step_copies_global(current_index, slot, block, run_id, @@ -122,3 +122,125 @@ test_that("MCMC step copies are correctly performed when we are at the start of }) + +test_that("MCMC step copies (chimeric) are correctly performed when we are not at the start of a block", { + ##some information on our phantom runs + current_index <- 2 + slot <- 2 + block <- 5 + run_id <- "TEST_RUN" + slot_prefix <- covidcommon::create_prefix("config","scenario","deathrate",run_id,sep='/',trailing_separator='/') + cf_prefix <- covidcommon::create_prefix(prefix=slot_prefix,'chimeric','final',sep='/',trailing_separator='/') + ci_prefix <- covidcommon::create_prefix(prefix=slot_prefix,'chimeric','intermediate',sep='/',trailing_separator='/') + chimeric_block_prefix <- covidcommon::create_prefix(prefix=ci_prefix, slot=list(slot,"%09d"), sep='.', + trailing_separator='.') + chimeric_local_prefix <- covidcommon::create_prefix(prefix=chimeric_block_prefix, slot=list(slot,"%09d"), sep='.', + trailing_separator='.') + + ##To be save make a directory + dir.create("MCMC_step_copy_test") + setwd("MCMC_step_copy_test") + ##get file names + seed_src <- covidcommon::create_file_name(run_id,chimeric_local_prefix,current_index,'seed','csv') + seir_src <- covidcommon::create_file_name(run_id,chimeric_local_prefix,current_index,'seir','parquet') + hosp_src <- covidcommon::create_file_name(run_id,chimeric_local_prefix,current_index,'hosp','parquet') + llik_src <- covidcommon::create_file_name(run_id,chimeric_local_prefix,current_index,'llik','parquet') + snpi_src <- covidcommon::create_file_name(run_id,chimeric_local_prefix,current_index,'snpi','parquet') + spar_src <- covidcommon::create_file_name(run_id,chimeric_local_prefix,current_index,'spar','parquet') + hnpi_src <- covidcommon::create_file_name(run_id,chimeric_local_prefix,current_index,'hnpi','parquet') + hpar_src <- covidcommon::create_file_name(run_id,chimeric_local_prefix,current_index,'hpar','parquet') + + + + ##create the copy from files + arrow::write_parquet(data.frame(file="seed"), seed_src) + arrow::write_parquet(data.frame(file="seir"), seir_src) + arrow::write_parquet(data.frame(file="hosp"), hosp_src) + arrow::write_parquet(data.frame(file="llik"), llik_src) + arrow::write_parquet(data.frame(file="snpi"), snpi_src) + arrow::write_parquet(data.frame(file="spar"), spar_src) + arrow::write_parquet(data.frame(file="hnpi"), hnpi_src) + arrow::write_parquet(data.frame(file="hpar"), hpar_src) + + ##print(hosp_src) + ##print(covidcommon::create_file_name(run_id,cf_prefix,slot,'hosp','parquet')) + + res <- perform_MCMC_step_copies_chimeric(current_index, + slot, + block, + run_id, + chimeric_local_prefix, + cf_prefix, + chimeric_block_prefix) + + + expect_equal(prod(unlist(res)),1) + + ##clean up + setwd("..") + unlink("MCMC_step_copy_test", recursive=TRUE) + + +}) + + +test_that("MCMC step copies (chimeric) are correctly performed when we are at the start of a block", { + ##some information on our phantom runs + current_index <- 0 + slot <- 2 + block <- 5 + run_id <- "TEST_RUN" + slot_prefix <- covidcommon::create_prefix("config","scenario","deathrate",run_id,sep='/',trailing_separator='/') + cf_prefix <- covidcommon::create_prefix(prefix=slot_prefix,'chimeric','final',sep='/',trailing_separator='/') + ci_prefix <- covidcommon::create_prefix(prefix=slot_prefix,'chimeric','intermediate',sep='/',trailing_separator='/') + chimeric_block_prefix <- covidcommon::create_prefix(prefix=ci_prefix, slot=list(slot,"%09d"), sep='.', + trailing_separator='.') + chimeric_local_prefix <- covidcommon::create_prefix(prefix=chimeric_block_prefix, slot=list(slot,"%09d"), sep='.', + trailing_separator='.') + + ##To be save make a direectory + dir.create("MCMC_step_copy_test") + setwd("MCMC_step_copy_test") + ##get file names + seed_src <- covidcommon::create_file_name(run_id,chimeric_block_prefix,block-1,'seed','csv') + seir_src <- covidcommon::create_file_name(run_id,chimeric_block_prefix,block-1,'seir','parquet') + hosp_src <- covidcommon::create_file_name(run_id,chimeric_block_prefix,block-1,'hosp','parquet') + llik_src <- covidcommon::create_file_name(run_id,chimeric_block_prefix,block-1,'llik','parquet') + snpi_src <- covidcommon::create_file_name(run_id,chimeric_block_prefix,block-1,'snpi','parquet') + spar_src <- covidcommon::create_file_name(run_id,chimeric_block_prefix,block-1,'spar','parquet') + hnpi_src <- covidcommon::create_file_name(run_id,chimeric_block_prefix,block-1,'hnpi','parquet') + hpar_src <- covidcommon::create_file_name(run_id,chimeric_block_prefix,block-1,'hpar','parquet') + + + + ##create the copy from files + arrow::write_parquet(data.frame(file="seed"), seed_src) + arrow::write_parquet(data.frame(file="seir"), seir_src) + arrow::write_parquet(data.frame(file="hosp"), hosp_src) + arrow::write_parquet(data.frame(file="llik"), llik_src) + arrow::write_parquet(data.frame(file="snpi"), snpi_src) + arrow::write_parquet(data.frame(file="spar"), spar_src) + arrow::write_parquet(data.frame(file="hnpi"), hnpi_src) + arrow::write_parquet(data.frame(file="hpar"), hpar_src) + + print(hosp_src) + print(covidcommon::create_file_name(run_id,chimeric_block_prefix,block,'hosp','parquet')) + + res <- perform_MCMC_step_copies_chimeric(current_index, + slot, + block, + run_id, + chimeric_local_prefix, + cf_prefix, + chimeric_block_prefix) + + + expect_equal(prod(unlist(res)),1) + + ##clean up + setwd("..") + unlink("MCMC_step_copy_test", recursive=TRUE) + + +}) + diff --git a/R/pkgs/report.generation/NAMESPACE b/R/pkgs/report.generation/NAMESPACE index f841fa879..d242c427d 100644 --- a/R/pkgs/report.generation/NAMESPACE +++ b/R/pkgs/report.generation/NAMESPACE @@ -9,6 +9,8 @@ export(load_USAFacts_for_report) export(load_cum_hosp_geounit_date) export(load_cum_inf_geounit_dates) export(load_geodata_file) +export(load_hnpi_sims_filtered) +export(load_hnpi_sims_filtered_interm) export(load_hosp_county) export(load_hosp_geocombined_totals) export(load_hosp_geounit_peak_date) @@ -16,15 +18,22 @@ export(load_hosp_geounit_relative_to_threshold) export(load_hosp_geounit_threshold) export(load_hosp_sims_filtered) export(load_hpar_sims_filtered) +export(load_hpar_sims_filtered_interm) export(load_inf_geounit_peaks_date) export(load_jhu_csse_for_report) +export(load_llik_sims_filtered) +export(load_llik_sims_filtered_interm) export(load_npi_sims_filtered) export(load_r_daily_sims_filtered) export(load_r_sims_filtered) export(load_scenario_sims_filtered) +export(load_seed_sims_filtered_interm) +export(load_seir_sims_filtered) export(load_shape_file) export(load_snpi_sims_filtered) +export(load_snpi_sims_filtered_interm) export(load_spar_sims_filtered) +export(load_spar_sims_filtered_interm) export(load_ts_current_hosp_geounit) export(make_CI) export(make_excess_heatmap) @@ -36,27 +45,39 @@ export(make_scn_time_summary_table_withVent) export(make_sparkline_tab_intervention_effect) export(make_sparkline_tab_r) export(mtr_estimates) +export(plot_accept_by_location) +export(plot_accept_by_location_cumul) +export(plot_accept_by_location_rolling) export(plot_county_outcomes) export(plot_event_time_by_geoid) export(plot_geounit_attack_rate_map) export(plot_geounit_map) export(plot_hist_incidHosp_state) +export(plot_hnpi_by_location) export(plot_hosp_effec) +export(plot_hpar_by_location) export(plot_inference_r) export(plot_line_hospPeak_time_county) +export(plot_llik_by_location) +export(plot_llik_total) export(plot_model_vs_obs) export(plot_needs_relative_to_threshold_heatmap) export(plot_outcome_rate) export(plot_rt_ts) export(plot_scn_outcomes_ratio) +export(plot_snpi_by_location) +export(plot_spars) export(plot_truth_by_county) +export(plot_truth_by_location) export(plot_ts_hosp_state_sample) export(plot_ts_incid_ar_state) export(plot_ts_incid_death_state_sample_allPdeath) export(plot_ts_incid_inf_state_sample) +export(pretty_num) +export(pretty_prop) export(print_pretty_date) export(print_pretty_date_short) export(read_file_of_type) export(reference_chunk) export(rt_estimates) -importFrom(magrittr,"%>%") +export(setup_testing_environment) diff --git a/R/pkgs/report.generation/R/DataLoadFuncs.R b/R/pkgs/report.generation/R/DataLoadFuncs.R index d5ea7c03d..3f47ada58 100644 --- a/R/pkgs/report.generation/R/DataLoadFuncs.R +++ b/R/pkgs/report.generation/R/DataLoadFuncs.R @@ -1,4 +1,3 @@ - ##' Wrapper function for loading multiple hospitalization outcomes with open_dataset ##' ##' @param outcome_dir the subdirectory with all model outputs @@ -16,7 +15,7 @@ ##'@export load_hosp_sims_filtered <- function(outcome_dir, model_output = 'hosp', - partitions=c("location", "scenario", "pdeath", "runid", "lik_type", "is_final", "filename"), + partitions=c("location", "scenario", "pdeath", "runID", "lik_type", "is_final", "file_name"), pdeath_filter=c("high", "med", "low"), incl_geoids, pre_process=function(x) {x}, @@ -28,63 +27,41 @@ load_hosp_sims_filtered <- function(outcome_dir, require(tidyverse) if(inference){ - hosp<-arrow::open_dataset(file.path(outcome_dir,model_output), - partitioning = partitions) %>% + hosp<-arrow::open_dataset(file.path(outcome_dir,model_output), + partitioning = partitions) %>% dplyr::filter(!!as.symbol(partitions[5])=="global", !!as.symbol(partitions[6])=="final") %>% dplyr::filter(!!as.symbol(partitions[3]) %in% pdeath_filter) %>% dplyr::filter(geoid %in% incl_geoids) %>% pre_process(...)%>% - collect() + dplyr::collect() } else { if(length(partitions)==7){partitions <- partitions[-5:-6]} - - hosp<-arrow::open_dataset(file.path(outcome_dir,model_output), - partitioning = partitions) %>% + + hosp<-arrow::open_dataset(file.path(outcome_dir,model_output), + partitioning = partitions) %>% dplyr::filter(!!as.symbol(partitions[3]) %in% pdeath_filter) %>% dplyr::filter(geoid %in% incl_geoids) %>% pre_process(...)%>% - collect() + dplyr::collect() } hosp<-hosp %>% - mutate(sim_num=as.numeric(str_extract(max(partitions), "^\\d+")), - time=as.Date(time)) - - # fnames <- hosp$.data$files %>% - # stringr::str_split("\\/") %>% - # dplyr::tibble() %>% - # tidyr::unnest_wider(., col=`.`) - # - # fnames <- fnames[,(ncol(fnames)-length(partitions)):ncol(fnames)] - # - # colnames(fnames) <- c(partitions, "sim_num") - # - # fnames <- fnames %>% - # pre_process$partitions() %>% - # pull(sim_num) - # - # hosp<-hosp %>% - # pre_process$partitions() %>% - # pre_process$data() %>% - # collect() %>% - # group_by(geoid, time, !!as.symbol(partitions[2]), !!as.symbol(partitions[3])) %>% - # mutate(sim_num=as.numeric(stringr::str_remove(fnames, '\\..+$')), - # time=as.Date(time)) %>% - # dplyr::ungroup() - + dplyr::mutate(sim_num=as.numeric(stringr::str_extract(!!as.symbol(partitions[length(partitions)]), "^\\d+")), # removes everything after the first dot + time=as.Date(time)) + if(nrow(hosp)==0){stop("Nothing was loaded, confirm filtering values are correct.")} hosp <- hosp %>% - post_process(...) - + post_process(...) + message("Finished loading") return(hosp) } -##' Wrapper function for loading hpar files with open_dataset -##' +##' Wrapper function for loading final hpar files with open_dataset +##' ##' @param outcome_dir the subdirectory with all model outputs ##' @param model_output folder with hpar outcomes ##' @param partitions used by open_dataset @@ -99,54 +76,84 @@ load_hosp_sims_filtered <- function(outcome_dir, ##'@export load_hpar_sims_filtered <- function(outcome_dir, model_output = 'hpar', - partitions=c("location", "scenario", "pdeath", "runid", "lik_type", "is_final", "filename"), + partitions=c("location", "scenario", "pdeath", "runID", "lik_type", "is_final", "file_name"), pdeath_filter=c("high", "med", "low"), - pre_process=function(x){x}, + pre_process=function(x) {x}, incl_geoids, ... ) { require(tidyverse) - - hpar<-arrow::open_dataset(file.path(outcome_dir,model_output), - partitioning = partitions) %>% + + hpar<-arrow::open_dataset(file.path(outcome_dir,model_output), + partitioning = partitions) %>% dplyr::filter(!!as.symbol(partitions[5])=="global", !!as.symbol(partitions[6])=="final") %>% dplyr::filter(!!as.symbol(partitions[3]) %in% pdeath_filter) %>% dplyr::filter(geoid %in% incl_geoids) %>% pre_process(...) %>% - collect() %>% - mutate(sim_num=as.numeric(str_extract(max(partitions), "^\\d+"))) - - - # fnames <- hpar$.data$files %>% - # stringr::str_split("\\/") %>% - # dplyr::tibble() %>% - # tidyr::unnest_wider(., col=`.`) - # - # fnames <- fnames[,(ncol(fnames)-length(partitions)):ncol(fnames)] - # - # colnames(fnames) <- c(partitions, "sim_num") - # - # fnames <- fnames %>% - # pre_process$partitions() %>% - # pull(sim_num) - # - # hpar<-hpar %>% - # pre_process$partitions() %>% - # pre_process$data() %>% - # collect() %>% - # group_by(quantity, outcome, geoid, !!as.symbol(partitions[2]), !!as.symbol(partitions[3])) %>% - # mutate(sim_num=as.numeric(stringr::str_remove(fnames, '\\..+$'))) %>% - # dplyr::ungroup() - + dplyr::collect() %>% + dplyr::mutate(sim_num=as.numeric(stringr::str_extract(!!as.symbol(partitions[length(partitions)]), "^\\d+"))) # removes everything after the first dot + message("Finished loading") return(hpar) } -##' Wrapper function for loading spar files with open_dataset +##' Wrapper function for loading intermediate hpar files with open_dataset +##' +##' @param outcome_dir the subdirectory with all model outputs +##' @param partitions used by open_dataset +##' @param pdeath_filter string that indicates which pdeath to import from outcome_dir +##' @param incl_geoids character vector of geoids that are included in the report +##' +##' @return a combined data frame of all R simulations with filters applied pre merge. +##' - geoid +##' - start_date +##' - end_date +##' - npi_name +##' - parameter +##' - reduction +##' - location +##' - scenario +##' - pdeath +##' - date +##' - lik_type +##' - is_final +##' - file_name +##' ##' +##'@export +load_hpar_sims_filtered_interm <- function(outcome_dir, + partitions=c("location", "scenario", "pdeath", "runID", "lik_type", "is_final", "file_name"), + pdeath_filter=c("high", "med", "low"), + incl_geoids, + ... +) { + + require(tidyverse) + + hpar <- arrow::open_dataset(file.path(outcome_dir,'hpar'), + partitioning = partitions) %>% + #dplyr::filter(!!as.symbol(partitions[5])=="global") %>% + dplyr::filter(!!as.symbol(partitions[6])=="intermediate") %>% + dplyr::filter(!!as.symbol(partitions[3]) %in% pdeath_filter) %>% + dplyr::filter(geoid %in% incl_geoids) %>% + collect() + + hpar<-hpar %>% + dplyr::mutate(sim_num=stringr::str_remove(!!as.symbol(partitions[length(partitions)]),paste0(".",runID,'.hpar.parquet'))) %>% # remove runID component + tidyr::separate(sim_num,c('slot_num','block_num','iter_num'),sep="\\.",convert=TRUE,remove=TRUE) + + message("Finished loading intermediate Outcome parameters.") + + return(hpar) + +} + + +##' Wrapper function for loading final spar files with open_dataset +##' ##' @param outcome_dir the subdirectory with all model outputs ##' @param partitions used by open_dataset ##' @param pdeath_filter string that indicates which pdeath to import from outcome_dir @@ -161,12 +168,12 @@ load_hpar_sims_filtered <- function(outcome_dir, ##' - date ##' - lik_type ##' - is_final -##' - sim_num +##' - file_name ##' ##' ##'@export load_spar_sims_filtered <- function(outcome_dir, - partitions=c("location", "scenario", "pdeath", "date", "lik_type", "is_final", "filename"), + partitions=c("location", "scenario", "pdeath", "runID", "lik_type", "is_final", "file_name"), pdeath_filter=c("high", "med", "low"), pre_process=function(x) {x}, ... @@ -180,37 +187,65 @@ load_spar_sims_filtered <- function(outcome_dir, !!as.symbol(partitions[6])=="final") %>% dplyr::filter(!!as.symbol(partitions[3]) %in% pdeath_filter) %>% pre_process(...) %>% - collect() %>% - mutate(sim_num=as.numeric(str_extract(max(partitions), "^\\d+"))) - - # fnames <- spar$.data$files %>% - # stringr::str_split("\\/") %>% - # dplyr::tibble() %>% - # tidyr::unnest_wider(., col=`.`) - # - # fnames <- fnames[,(ncol(fnames)-length(partitions)):ncol(fnames)] - # - # colnames(fnames) <- c(partitions, "sim_num") - # - # fnames <- fnames %>% - # pre_process$partitions() %>% - # pull(sim_num) - # - # spar<-spar %>% - # pre_process$partitions() %>% - # pre_process$data() %>% - # collect() %>% - # group_by(parameter, !!as.symbol(partitions[2]), !!as.symbol(partitions[3])) %>% - # mutate(sim_num=as.numeric(stringr::str_remove(fnames, '\\..+$'))) %>% - # dplyr::ungroup() - + dplyr::collect() %>% + dplyr::mutate(sim_num=as.numeric(stringr::str_extract(!!as.symbol(partitions[length(partitions)]), "^\\d+"))) # removes everything after the first dot message("Finished loading.") return(spar) } -##' Wrapper function for loading spar files with open_dataset + +##' Wrapper function for loading intermediate spar files with open_dataset +##' +##' @param outcome_dir the subdirectory with all model outputs +##' @param partitions used by open_dataset +##' @param pdeath_filter string that indicates which pdeath to import from outcome_dir +##' +##' @return a combined data frame of all R simulations with filters applied pre merge. +##' - start_date +##' - end_date +##' - npi_name +##' - parameter +##' - reduction +##' - location +##' - scenario +##' - pdeath +##' - date +##' - lik_type +##' - is_final +##' - file_name ##' +##' +##'@export +load_spar_sims_filtered_interm <- function(outcome_dir, + partitions=c("location", "scenario", "pdeath", "runID", "lik_type", "is_final", "file_name"), + pdeath_filter=c("high", "med", "low"), + ... +) { + + require(tidyverse) + + spar <- arrow::open_dataset(file.path(outcome_dir,'spar'), + partitioning = partitions) %>% + #dplyr::filter(!!as.symbol(partitions[5])=="global") %>% + dplyr::filter(!!as.symbol(partitions[6])=="intermediate") %>% + dplyr::filter(!!as.symbol(partitions[3]) %in% pdeath_filter) %>% + collect() + + spar<-spar %>% + dplyr::mutate(sim_num=stringr::str_remove(!!as.symbol(partitions[length(partitions)]),paste0(".",runID,'.spar.parquet'))) %>% # remove runID component + tidyr::separate(sim_num,c('slot_num','block_num','iter_num'),sep="\\.",convert=TRUE,remove=TRUE) + + message("Finished loading intermediate SEIR parameters.") + + return(spar) + +} + + + +##' Wrapper function for loading final snpi files with open_dataset +##' ##' @param outcome_dir the subdirectory with all model outputs ##' @param partitions used by open_dataset ##' @param pdeath_filter string that indicates which pdeath to import from outcome_dir @@ -230,12 +265,12 @@ load_spar_sims_filtered <- function(outcome_dir, ##' - date ##' - lik_type ##' - is_final -##' - sim_num +##' - file_name ##' ##' ##'@export load_snpi_sims_filtered <- function(outcome_dir, - partitions=c("location", "scenario", "pdeath", "date", "lik_type", "is_final", "filename"), + partitions=c("location", "scenario", "pdeath", "runID", "lik_type", "is_final", "file_name"), pdeath_filter=c("high", "med", "low"), pre_process=function(x) {x}, incl_geoids, @@ -251,32 +286,395 @@ load_snpi_sims_filtered <- function(outcome_dir, dplyr::filter(!!as.symbol(partitions[3]) %in% pdeath_filter) %>% dplyr::filter(geoid %in% incl_geoids) %>% pre_process(...)%>% - collect() %>% - mutate(sim_num=as.numeric(str_extract(max(partitions), "^\\d+"))) - - # fnames <- snpi$.data$files %>% - # stringr::str_split("\\/") %>% - # dplyr::tibble() %>% - # tidyr::unnest_wider(., col=`.`) - # - # fnames <- fnames[,(ncol(fnames)-length(partitions)):ncol(fnames)] - # - # colnames(fnames) <- c(partitions, "sim_num") - # - # fnames <- fnames %>% - # pre_process$partitions() %>% - # pull(sim_num) - # - # snpi<-snpi %>% - # pre_process$partitions() %>% - # pre_process$data() %>% - # collect() %>% - # group_by(geoid, npi_name, !!as.symbol(partitions[2]), !!as.symbol(partitions[3])) %>% - # mutate(sim_num=as.numeric(stringr::str_remove(fnames, '\\..+$'))) %>% - # dplyr::ungroup() - + dplyr::collect() %>% + dplyr::mutate(sim_num=as.numeric(stringr::str_extract(!!as.symbol(partitions[length(partitions)]), "^\\d+"))) # removes everything after the first dot + message("Finished loading.") return(snpi) } + + +##' Wrapper function for loading intermediate snpi files with open_dataset +##' +##' @param outcome_dir the subdirectory with all model outputs +##' @param partitions used by open_dataset +##' @param pdeath_filter string that indicates which pdeath to import from outcome_dir +##' @param incl_geoids character vector of geoids that are included in the report +##' +##' @return a combined data frame of all R simulations with filters applied pre merge. +##' - geoid +##' - start_date +##' - end_date +##' - npi_name +##' - parameter +##' - reduction +##' - location +##' - scenario +##' - pdeath +##' - date +##' - lik_type +##' - is_final +##' - file_name +##' +##' +##'@export +load_snpi_sims_filtered_interm <- function(outcome_dir, + partitions=c("location", "scenario", "pdeath", "runID", "lik_type", "is_final", "file_name"), + pdeath_filter=c("high", "med", "low"), + incl_geoids, + ... +) { + + require(tidyverse) + + snpi <- arrow::open_dataset(file.path(outcome_dir,'snpi'), + partitioning = partitions) %>% + #dplyr::filter(!!as.symbol(partitions[5])=="global") %>% + dplyr::filter(!!as.symbol(partitions[6])=="intermediate") %>% + dplyr::filter(!!as.symbol(partitions[3]) %in% pdeath_filter) %>% + dplyr::filter(geoid %in% incl_geoids) %>% + collect() + + snpi<-snpi %>% + dplyr::mutate(sim_num=stringr::str_remove(!!as.symbol(partitions[length(partitions)]),paste0(".",runID,'.snpi.parquet'))) %>% # remove runID component + tidyr::separate(sim_num,c('slot_num','block_num','iter_num'),sep="\\.",convert=TRUE,remove=TRUE) + + message("Finished loading intermediate SNPI parameters.") + + return(snpi) + +} + + +##' Wrapper function for loading final hnpi files with open_dataset +##' +##' @param outcome_dir the subdirectory with all model outputs +##' @param partitions used by open_dataset +##' @param pdeath_filter string that indicates which pdeath to import from outcome_dir +##' @param pre_process function that does processing before collection +##' @param incl_geoids character vector of geoids that are included in the report +##' +##' @return a combined data frame of all R simulations with filters applied pre merge. +##' - geoid +##' - start_date +##' - end_date +##' - npi_name +##' - parameter +##' - reduction +##' - location +##' - scenario +##' - pdeath +##' - date +##' - lik_type +##' - is_final +##' - file_name +##' +##' +##'@export +load_hnpi_sims_filtered <- function(outcome_dir, + partitions=c("location", "scenario", "pdeath", "runID", "lik_type", "is_final", "file_name"), + pdeath_filter=c("high", "med", "low"), + pre_process=function(x) {x}, + incl_geoids, + + ... +) { + + require(tidyverse) + + hnpi<- arrow::open_dataset(file.path(outcome_dir,'hnpi'), + partitioning = partitions) %>% + dplyr::filter(!!as.symbol(partitions[5])=="global", + !!as.symbol(partitions[6])=="final") %>% + dplyr::filter(!!as.symbol(partitions[3]) %in% pdeath_filter) %>% + dplyr::filter(geoid %in% incl_geoids) %>% + pre_process(...)%>% + collect() %>% + dplyr::mutate(sim_num=as.numeric(stringr::str_extract(!!as.symbol(partitions[length(partitions)]), "^\\d+"))) # removes everything after the first dot + + message("Finished loading.") + + return(hnpi) + +} + + +##' Wrapper function for loading intermediate hnpi files with open_dataset +##' +##' @param outcome_dir the subdirectory with all model outputs +##' @param partitions used by open_dataset +##' @param pdeath_filter string that indicates which pdeath to import from outcome_dir +##' @param incl_geoids character vector of geoids that are included in the report +##' +##' @return a combined data frame of all R simulations with filters applied pre merge. +##' - geoid +##' - start_date +##' - end_date +##' - npi_name +##' - parameter +##' - reduction +##' - location +##' - scenario +##' - pdeath +##' - date +##' - lik_type +##' - is_final +##' - file_name +##' +##' +##'@export +load_hnpi_sims_filtered_interm <- function(outcome_dir, + partitions=c("location", "scenario", "pdeath", "runID", "lik_type", "is_final", "file_name"), + pdeath_filter=c("high", "med", "low"), + incl_geoids, + ... +) { + + require(tidyverse) + + hnpi <- arrow::open_dataset(file.path(outcome_dir,'hnpi'), + partitioning = partitions) %>% + #dplyr::filter(!!as.symbol(partitions[5])=="global") %>% + dplyr::filter(!!as.symbol(partitions[6])=="intermediate") %>% + dplyr::filter(!!as.symbol(partitions[3]) %in% pdeath_filter) %>% + dplyr::filter(geoid %in% incl_geoids) %>% + collect() + + hnpi<-hnpi %>% + dplyr::mutate(sim_num=stringr::str_remove(!!as.symbol(partitions[length(partitions)]),paste0(".",runID,'.hnpi.parquet'))) %>% # remove runID component + tidyr::separate(sim_num,c('slot_num','block_num','iter_num'),sep="\\.",convert=TRUE,remove=TRUE) + + message("Finished loading intermediate hnpi parameters.") + + return(hnpi) + +} + + +##' Wrapper function for loading final llik files with open_dataset +##' +##' @param outcome_dir the subdirectory with all model outputs +##' @param partitions used by open_dataset +##' @param pdeath_filter string that indicates which pdeath to import from outcome_dir +##' @param incl_geoids character vector of geoids that are included in the report +##' +##' @return a combined data frame of all R simulations with filters applied pre merge. +##' - geoid +##' - start_date +##' - end_date +##' - npi_name +##' - parameter +##' - reduction +##' - location +##' - scenario +##' - pdeath +##' - date +##' - lik_type +##' - is_final +##' - file_name +##' +##' +##'@export +load_llik_sims_filtered <- function(outcome_dir, + partitions=c("location", "scenario", "pdeath", "runID", "lik_type", "is_final", "file_name"), + pdeath_filter=c("high", "med", "low"), + incl_geoids, + ... +) { + + require(tidyverse) + + llik <- arrow::open_dataset(file.path(outcome_dir,'llik'), + partitioning = partitions) %>% + dplyr::filter(!!as.symbol(partitions[5])=="global", + !!as.symbol(partitions[6])=="final") %>% + dplyr::filter(!!as.symbol(partitions[3]) %in% pdeath_filter) %>% + dplyr::filter(geoid %in% incl_geoids) %>% + #pre_process(...)%>% + collect() + + llik<-llik %>% + mutate(sim_num=as.numeric(stringr::str_extract(!!as.symbol(partitions[length(partitions)]), "^\\d+"))) # removes everything after the first dot + + message("Finished loading final log likelihoods.") + + return(llik) + +} + +##' Wrapper function for loading intermediate llik files with open_dataset +##' +##' @param outcome_dir the subdirectory with all model outputs +##' @param partitions used by open_dataset +##' @param pdeath_filter string that indicates which pdeath to import from outcome_dir +##' @param incl_geoids character vector of geoids that are included in the report +##' +##' @return a combined data frame of all R simulations with filters applied pre merge. +##' - geoid +##' - start_date +##' - end_date +##' - npi_name +##' - parameter +##' - reduction +##' - location +##' - scenario +##' - pdeath +##' - date +##' - lik_type +##' - is_final +##' - file_name +##' +##' +##'@export +load_llik_sims_filtered_interm <- function(outcome_dir, + partitions=c("location", "scenario", "pdeath", "runID", "lik_type", "is_final", "file_name"), + pdeath_filter=c("high", "med", "low"), + incl_geoids, + ... +) { + + require(tidyverse) + + llik <- arrow::open_dataset(file.path(outcome_dir,'llik'), + partitioning = partitions) %>% + #dplyr::filter(!!as.symbol(partitions[5])=="global") %>% + dplyr::filter(!!as.symbol(partitions[6])=="intermediate") %>% + dplyr::filter(!!as.symbol(partitions[3]) %in% pdeath_filter) %>% + dplyr::filter(geoid %in% incl_geoids) %>% + collect() + + llik<-llik %>% + dplyr::mutate(sim_num=stringr::str_remove(!!as.symbol(partitions[length(partitions)]),paste0(".",runID,'.llik.parquet'))) %>% # remove runID component + tidyr::separate(sim_num,c('slot_num','block_num','iter_num'),sep="\\.",convert=TRUE,remove=TRUE) # split string + + message("Finished loading intermediate log likelihoods.") + + return(llik) + +} + +##' Wrapper function for loading final seir files with open_dataset +##' +##' @param outcome_dir the subdirectory with all model outputs +##' @param partitions used by open_dataset +##' @param pdeath_filter string that indicates which pdeath to import from outcome_dir +##' @param pre_process function that does processing before collection +##' @param incl_geoids character vector of geoids that are included in the report +##' +##' @return a combined data frame of all simulations with filters applied pre merge. +##' - comp +##' - pcom +##' - time +##' - location +##' - scenario +##' - pdeath +##' - runID +##' - sim_num +##' - geoid +##' - value +##' +##' +##'@export +load_seir_sims_filtered <- function(outcome_dir, + partitions=c("location", "scenario", "pdeath", "runID", "lik_type", "is_final", "file_name"), + pdeath_filter=c("high", "med", "low"), + pre_process=function(x) {x}, + post_process=function(x) {x}, + incl_geoids, + ... +) { + + require(tidyverse) + + seir<- arrow::open_dataset(file.path(outcome_dir,'seir'), + partitioning = partitions) %>% + dplyr::filter(!!as.symbol(partitions[5])=="global", + !!as.symbol(partitions[6])=="final") %>% + dplyr::filter(!!as.symbol(partitions[3]) %in% pdeath_filter) %>% + dplyr::select(comp, p_comp, time, tidyselect::all_of(partitions[c(1:4, 7)]), tidyselect::any_of(incl_geoids)) %>% + pre_process(...)%>% + dplyr::collect() %>% + dplyr::mutate(sim_num=as.numeric(stringr::str_extract(!!as.symbol(partitions[length(partitions)]), "^\\d+")), + time=as.Date(time)) %>% + dplyr::relocate(comp, p_comp, time, tidyselect::any_of(partitions), sim_num) %>% + dplyr::select(-partitions[length(partitions)]) + + geoids <- colnames(seir)[!colnames(seir) %in% c("comp", "p_comp", "time", partitions, "sim_num")] # turn geoID into a single column instead + temp <- list() + + for(i in 1:length(geoids)){ # loop due to memory issues with large datasets + temp[[i]] <- seir %>% + dplyr::select(comp, p_comp, time, tidyselect::any_of(partitions), sim_num, geoids[i]) %>% + tidyr::pivot_longer(cols=-comp:-sim_num, names_to="geoid") + } + + seir <- dplyr::bind_rows(temp) %>% + dplyr::group_by(across(c(geoid,any_of(partitions)))) %>% + dplyr::arrange(sim_num,time,comp,.by_group=TRUE) %>% + dplyr::ungroup() + + seir <- seir %>% + post_process() + + message("Finished loading seir output.") + + return(seir) + +} + +##' Wrapper function for loading intermediate seeding files from csv files +##' +##' @param outcome_dir the subdirectory with all model outputs +##' @param partitions used by open_dataset +##' @param pdeath_filter string that indicates which pdeath to import from outcome_dir +##' @param incl_geoids character vector of geoids that are included in the report +##' +##' @return a combined data frame of all seeding values for all simulations with filters applied pre merge. +##' - geoid +##' - amount +##' - scenario +##' - pdeath +##' - date +##' - lik_type +##' - is_final +##' - slot_num +##' - block_num +##' -iter_num +##' +##' +##'@export +load_seed_sims_filtered_interm <- function(outcome_dir, + partitions=c("location", "scenario", "pdeath", "runID", "lik_type", "is_final", "file_name"), + pdeath_filter=c("high", "med", "low"), + incl_geoids, + ... +) { + + require(tidyverse) + + # get list of all seeding file names + seeding.files <-list.files(file.path(outcome_dir,'seed'), full.names=TRUE, recursive = TRUE) + + # read in seeding files, get a list of dataframes + seeding.df.list <-setNames(lapply(seeding.files, read.csv, colClasses=c("place"="character")),seeding.files) + + # turn into a single dataframe + seeding_all <- bind_rows(seeding.df.list,.id="id") + + # for each file name in "id" column, separate phrase after each "/" and add it it as columns to the variable + + seeding_all <- seeding_all %>% tidyr::separate(id,into=c("runs_folder","file_type",partitions),sep="/",convert=TRUE,remove=TRUE)%>% + dplyr::mutate(sim_num=str_remove(file_name,paste0(".",runID,'.seed.csv'))) %>% # remove runID component + tidyr::separate(sim_num,c('slot_num','block_num','iter_num'),sep="\\.",convert=TRUE,remove=TRUE) %>% + dplyr::rename(geoid=place) %>% + dplyr::mutate(date=as.Date(date))%>% + dplyr::filter(!!as.symbol(partitions[3]) %in% pdeath_filter) %>% + dplyr::select(-X,-location,-runID,-file_name,-file_type,-runs_folder) + + message("Finished loading intermediate seeding parameters.") + + return(seeding_all) + +} diff --git a/R/pkgs/report.generation/R/ReportBuildUtils.R b/R/pkgs/report.generation/R/ReportBuildUtils.R index e5cbf14e1..32a035ad5 100644 --- a/R/pkgs/report.generation/R/ReportBuildUtils.R +++ b/R/pkgs/report.generation/R/ReportBuildUtils.R @@ -34,7 +34,7 @@ list_chunks <- function() { ##' @export ##' print_pretty_date <- function(date_string) { - + return(format(as.Date(date_string), "%B %d, %Y")) } @@ -47,11 +47,36 @@ print_pretty_date <- function(date_string) { ##' @export ##' print_pretty_date_short <- function(date_string) { - + return(format(as.Date(date_string), "%b %d")) } +##' +##' Pretty print proportion for text +##' +##' @return a function that formats proportions +##' +##' @export +##' +pretty_prop <- function(x, y, markdown = TRUE){ + temp <- paste0(round(x/y*100, 1), '% (n=', x, '/', y, ')') + if(markdown){ + temp <- paste0('**', temp, '**') + } + return(temp) +} + +##' +##' Pretty print numbers for text +##' +##' @return a function that formats numbers +##' +##' @export +##' +pretty_num <- function(x){ + paste0('**', x, '**') +} ## ##'Function to round cleanly ##' @@ -96,12 +121,12 @@ mtr_estimates <- function(rt_dat, n_periods=10){ mtr_start <- rt_dat %>% - dplyr::select(geoid, scenario, pdeath, starts_with("npi"), start_date) %>% + dplyr::select(geoid, scenario, pdeath, tidyselect::starts_with("npi"), start_date) %>% dplyr::distinct() %>% tidyr::separate(start_date, into = as.character(c(1:n_periods)), sep=",") mtr_end <- rt_dat %>% - dplyr::select(geoid, scenario, pdeath, starts_with("npi"), end_date) %>% + dplyr::select(geoid, scenario, pdeath, tidyselect::starts_with("npi"), end_date) %>% dplyr::distinct() %>% tidyr::separate(end_date, into = as.character(c(1:n_periods)), sep=",") @@ -110,14 +135,14 @@ mtr_estimates <- function(rt_dat, for(i in 1:n_periods){ xx<-mtr_start %>% - dplyr::select(geoid, scenario, pdeath, starts_with("npi"), start_date=as.symbol(i)) %>% + dplyr::select(geoid, scenario, pdeath, tidyselect::starts_with("npi"), start_date=as.symbol(i)) %>% dplyr::left_join(mtr_end %>% - dplyr::select(geoid, scenario, pdeath, starts_with("npi"), end_date=as.symbol(i))) %>% + dplyr::select(geoid, scenario, pdeath, tidyselect::starts_with("npi"), end_date=as.symbol(i))) %>% tidyr::drop_na() %>% dplyr::right_join(rt_dat%>% dplyr::select(-start_date, -end_date)) %>% tidyr::drop_na() %>% - dplyr::mutate(across(ends_with("date"), ~lubridate::ymd(.x)))%>% + dplyr::mutate(dplyr::across(tidyselect::any_of(c("start_date", "end_date")), ~lubridate::ymd(.x)))%>% dplyr::bind_rows(xx) } @@ -129,45 +154,75 @@ mtr_estimates <- function(rt_dat, ##'Function to generate summary estimates of daily rt ##' ##' @param r_dat df with daily r estimates by geoid/sim -##' @param county_dat df with geoid, scenario, pdeath, sim_num, time, location, and cum_inf columns -##' @param lo +##' @param seir_dat df with geoid-specific outputs from SEIR +##' @param geodat ##' @param hi +##' @param lo +##' @param pdeath_filter +##' @param overall ##' ##' ##' @export rt_estimates <- function(r_dat, - county_dat, + seir_dat, geodat, lo=0.025, - hi=0.975){ + hi=0.975, + pdeath_filter = "med", + overall = TRUE){ geodat<-geodat %>% - dplyr::rename(pop=starts_with("pop")) + dplyr::rename(pop=tidyselect::starts_with("pop")) + + + r_dat<-r_dat %>% + dplyr::filter(pdeath==pdeath_filter) %>% + dplyr::filter(geoid %in% geodat$geoid) - rc<-county_dat %>% - dplyr::left_join(geodata) %>% - dplyr::right_join(r_dat) %>% + susceptible <- seir_dat %>% + dplyr::filter(comp=="S") %>% + dplyr::left_join(geodat) %>% + dplyr::mutate(vacc = value/pop, + vacc = dplyr::case_when(p_comp==0 ~ vacc, + p_comp==1 ~ vacc*(1-0.5), + p_comp==2 ~ vacc*(1-0.9))) + + susceptible <- susceptible %>% + dplyr::group_by(time, scenario, sim_num, pdeath, geoid, pop) %>% + dplyr::summarize(vacc = sum(vacc)) + + rc <- r_dat %>% + dplyr::left_join(susceptible) %>% + dplyr::mutate(rt = rt*vacc) %>% dplyr::group_by(scenario, time, location) %>% - dplyr::mutate(rt=rt*(1-cum_inf/pop), - weight=pop/sum(pop)) - - rc_state<-rc%>% - dplyr::group_by(scenario, time, pdeath, location) %>% - dplyr::arrange(rt) %>% - dplyr::summarize(mean=Hmisc::wtd.mean(rt, weights = weight, normwt=TRUE), - median=Hmisc::wtd.quantile(rt, weights = weight, normwt=TRUE, probs=0.5), - ci_lo=Hmisc::wtd.quantile(rt, weights = weight, normwt=TRUE, probs=lo), - ci_hi=Hmisc::wtd.quantile(rt, weights = weight, normwt=TRUE, probs=hi)) %>% - dplyr::rename(geoid=location) - - rc<-rc %>% - dplyr::group_by(scenario, time, pdeath, geoid) %>% - dplyr::summarize(mean=mean(rt), - median=quantile(rt, probs=0.5), - ci_lo=quantile(rt, probs=lo), - ci_hi=quantile(rt, probs=hi)) %>% - dplyr::bind_rows(rc_state) + dplyr::mutate(weight=pop/sum(pop)) + + + if(overall){ + rc<-rc%>% + dplyr::group_by(scenario, date=as.Date(time), pdeath, location) %>% + dplyr::arrange(rt) %>% + dplyr::summarize(estimate=Hmisc::wtd.mean(rt, weights = weight, normwt=TRUE), + lower=Hmisc::wtd.quantile(rt, weights = weight, normwt=TRUE, probs=lo), + upper=Hmisc::wtd.quantile(rt, weights = weight, normwt=TRUE, probs=hi)) %>% + dplyr::rename(geoid=location) + } else{ + rc_state<-rc%>% + dplyr::group_by(scenario, date=as.Date(time), pdeath, location) %>% + dplyr::arrange(rt) %>% + dplyr::summarize(estimate=Hmisc::wtd.mean(rt, weights = weight, normwt=TRUE), + lower=Hmisc::wtd.quantile(rt, weights = weight, normwt=TRUE, probs=lo), + upper=Hmisc::wtd.quantile(rt, weights = weight, normwt=TRUE, probs=hi)) %>% + dplyr::rename(geoid=location) + + rc<-rc %>% + dplyr::group_by(scenario, date=as.Date(time), pdeath, location, geoid) %>% + dplyr::summarize(estimate=Hmisc::wtd.mean(rt, weights = weight, normwt=TRUE), + lower=Hmisc::wtd.quantile(rt, weights = weight, normwt=TRUE, probs=lo), + upper=Hmisc::wtd.quantile(rt, weights = weight, normwt=TRUE, probs=hi)) %>% + dplyr::bind_rows(rc_state) + } return(rc) @@ -219,7 +274,7 @@ plot_ts_hosp_state_sample <- function (hosp_state_totals, sim_num %in% sampled_sims) %>% dplyr::mutate(sim_num = factor(sim_num)) %>% dplyr::rename(pltvar = !!varname) - + rc <- ggplot(data=to_plt, aes(x=time, colour = scenario_name, group = interaction(sim_num, scenario_name))) + @@ -232,26 +287,26 @@ plot_ts_hosp_state_sample <- function (hosp_state_totals, values = scenario_colors) + theme_minimal() + theme(axis.title.x = element_blank(), - legend.position = "bottom", - legend.title = element_blank()) + + legend.position = "bottom", + legend.title = element_blank()) + guides(color = guide_legend(nrow = 2, override.aes = list(alpha=1))) - - + + if(plot_intervention){ interv_dates <- data.frame(xmin = as.Date(interv_start_date), xmax = as.Date(interv_end_date), ymin = 0, ymax = 1.05*max(to_plt$NincidHosp)) - + rc <- rc + geom_rect(data = interv_dates, inherit.aes = FALSE, aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax), color = "grey", fill = "grey", alpha = 0.33) } - - + + return(rc) - + } @@ -286,7 +341,7 @@ plot_hist_incidHosp_state <- function(hosp_state_totals, dplyr::filter(time >= sim_start_date & time <= summary_date) %>% dplyr::group_by(scenario_name, sim_num) %>% dplyr::summarise(pltVar = sum(pltVar)) %>% - ungroup + dplyr::ungroup() rc <- ggplot(data=to_plt, aes(x = pltVar, fill = scenario_name, color = scenario_name)) + @@ -297,18 +352,19 @@ plot_hist_incidHosp_state <- function(hosp_state_totals, scale_x_continuous(paste("by", print_pretty_date_short(summary_date)), labels = scales::comma, - ) + + ) + ylab("Number of simulations") + theme_bw() + guides("none") + theme(legend.position = "none", axis.text.x = element_text(angle = 45, hjust = 1, vjust =1)) - - + + return(rc) - + } + ##' ##' Plot map showing infections per 10K on a specific date for one scenario ##' @@ -336,17 +392,17 @@ plot_geounit_attack_rate_map <- function (cum_inf_geounit_dates, display_date <- as.Date(display_date) shp$geoid <- as.character(shp$geoid) - + to_plt <- cum_inf_geounit_dates %>% dplyr::filter(scenario_name == scenariolabel, time == display_date) %>% - left_join(geodat) %>% + dplyr::left_join(geodat) %>% dplyr::rename(pop = !!popnodes) %>% dplyr::group_by(geoid) %>% dplyr::summarise(attack_rate=mean(N/pop)*10000) %>% dplyr::ungroup() - plot_shp <- left_join(shp, to_plt, by="geoid") + plot_shp <- dplyr::left_join(shp, to_plt, by="geoid") rc <- ggplot(plot_shp) + geom_sf(aes(fill=attack_rate)) + @@ -362,8 +418,8 @@ plot_geounit_attack_rate_map <- function (cum_inf_geounit_dates, panel.grid.minor = element_blank(), panel.border = element_blank(), axis.ticks.y=element_blank()) - return(rc) - + return(rc) + } ##' @@ -401,9 +457,9 @@ plot_geounit_map <- function(cum_inf_geounit_dates, dplyr::left_join(geodat) %>% dplyr::rename(pop = !!popnodes, plot_var = !!plot_var) %>% group_by(geoid) %>% dplyr::summarise(geoid_rate = mean(plot_var/pop) * 10000) %>% - dplyr::ungroup + dplyr::ungroup() - plot_shp <- left_join(shp, to_plt, by = "geoid") + plot_shp <- dplyr::left_join(shp, to_plt, by = "geoid") if(is.null(clims)){ clims = range(plot_shp$geoid_rate, na.rm=TRUE) @@ -510,8 +566,8 @@ make_scn_state_table_withVent <- function(current_scenario, dplyr::mutate(pdeath = pdeath_labels[match(pdeath, pdeath_levels)]) - xx<-pivot_longer(xx, cols = nIncidInf_final:nCurrVent_hi) %>% - mutate(Outcome = case_when(stringr::str_detect(name, "nIncidInf")~"Total infections", + xx<-tidyr::pivot_longer(xx, cols = nIncidInf_final:nCurrVent_hi) %>% + dplyr::mutate(Outcome = dplyr::case_when(stringr::str_detect(name, "nIncidInf")~"Total infections", stringr::str_detect(name, "nIncidHosp")~"Total hospital admissions", stringr::str_detect(name, "pIncidHosp")~"Peak daily hospital admissions", stringr::str_detect(name, "nCurrHosp")~"Maximum daily hospital occupancy", @@ -522,25 +578,25 @@ make_scn_state_table_withVent <- function(current_scenario, stringr::str_detect(name, "pIncidVent")~"Peak daily ventilator usage", stringr::str_detect(name, "nCurrVent")~"Maximum daily ventilator usage", stringr::str_detect(name, "nIncidDeath")~"Total deaths"), - name = case_when(stringr::str_detect(name, "final")~"est", + name = dplyr::case_when(stringr::str_detect(name, "final")~"est", stringr::str_detect(name, "lo")~"lo", stringr::str_detect(name, "hi")~"hi")) %>% - pivot_wider(names_from="name", values_from="value") %>% - mutate(ci = make_CI(lo, hi), + tidyr::pivot_wider(names_from="name", values_from="value") %>% + dplyr::mutate(ci = make_CI(lo, hi), est = prettyNum(conv_round(est), big.mark=",")) %>% dplyr::select(Outcome, pdeath, est, ci) %>% - unite("est", est:ci, sep = "\n") + tidyr::unite("est", est:ci, sep = "\n") xx <- xx %>% - add_row(.before = 1) %>% - mutate(est = if_else(is.na(est), table_names[i-1], est)) + dplyr::add_row(.before = 1) %>% + dplyr::mutate(est = dplyr::if_else(is.na(est), table_names[i-1], est)) if(i==2){ tmp<-xx } else{ tmp <- xx %>% dplyr::select(est) %>% - bind_cols(tmp, .) + dplyr::bind_cols(tmp, .) } } @@ -594,138 +650,131 @@ make_scn_time_summary_table_withVent <- function(hosp_state_totals, pi_high = 0.975, round_digit=-2) { ##Make the period ranges and labels - period_breaks <- sort(as.Date(period_breaks)) #out of order leads to bad things.... - period_breaks <- c(min(hosp_state_totals$time)-1, as.Date(period_breaks), max(hosp_state_totals$time)+1) + period_breaks <- sort(as.Date(period_breaks)) + period_breaks <- c(min(hosp_state_totals$time) - 1, as.Date(period_breaks)) + # , + # max(hosp_state_totals$time) + 1) len <- length(period_breaks) - lbls <- sprintf("%s-%s", format(period_breaks[1:(len-1)], "%b %d"), - format(period_breaks[2:len], "%b %d")) - - ## Build the table with summaries of all of the periods in it. - tbl_df <- hosp_state_totals %>% - dplyr::mutate(period = cut(time, period_breaks, labels=lbls)) %>% - dplyr::group_by(period, scenario_name, sim_num) %>% #summarize totals in periods by scenario - dplyr::summarize(PeriodInf = sum(NincidInf), - PeriodDeath = sum(NincidDeath), - PeriodHosp = sum(NincidHosp), - PeriodPkHosp = max(NhospCurr), - PeriodICU = sum(NincidICU), - PeriodPkICU = max(NICUCurr), - PeriodVent = sum(NincidVent), - PeriodPkVent = max(NVentCurr)) %>% - dplyr::ungroup() %>% - dplyr::group_by(period, scenario_name) %>% #now get means and prediction intervals - dplyr::summarize(PeriodInfPILow = round(quantile(PeriodInf, probs = c(pi_low)),digits = round_digit), - PeriodDeathPILow = round(quantile(PeriodDeath, probs = c(pi_low)),digits = round_digit), - PeriodHospPILow = round(quantile(PeriodHosp, probs = c(pi_low)),digits = round_digit), - PeriodPkHospPILow = round(quantile(PeriodPkHosp, probs = c(pi_low)),digits = round_digit), - PeriodICUPILow = round(quantile(PeriodICU, probs = c(pi_low)),digits = round_digit), - PeriodPkICUPILow = round(quantile(PeriodPkICU, probs = c(pi_low)),digits = round_digit), - PeriodVentPILow = round(quantile(PeriodVent, probs = c(pi_low)),digits = round_digit), - PeriodPkVentPILow = round(quantile(PeriodPkVent, probs = c(pi_low)),digits = round_digit), - PeriodInfPIHigh = round(quantile(PeriodInf, probs = c(pi_high)),digits = round_digit), - PeriodDeathPIHigh = round(quantile(PeriodDeath, probs = c(pi_high)),digits = round_digit), - PeriodHospPIHigh = round(quantile(PeriodHosp, probs = c(pi_high)),digits = round_digit), - PeriodPkHospPIHigh = round(quantile(PeriodPkHosp, probs = c(pi_high)),digits = round_digit), - PeriodICUPIHigh = round(quantile(PeriodICU, probs = c(pi_high)),digits = round_digit), - PeriodPkICUPIHigh = round(quantile(PeriodPkICU, probs = c(pi_high)),digits = round_digit), - PeriodVentPIHigh = round(quantile(PeriodICU, probs = c(pi_high)),digits = round_digit), - PeriodPkVentPIHigh = round(quantile(PeriodPkVent, probs = c(pi_high)),digits = round_digit), - PeriodInf = round(mean(PeriodInf),digits = round_digit), - PeriodDeath = round(mean(PeriodDeath),digits = round_digit), - PeriodHosp = round(mean(PeriodHosp),digits = round_digit), - PeriodPkHosp = round(mean(PeriodPkHosp),digits = round_digit), - PeriodICU = round(mean(PeriodICU),digits = round_digit), - PeriodPkICU = round(mean(PeriodPkICU), digits = round_digit), - PeriodVent = round(mean(PeriodVent),digits = round_digit), - PeriodPkVent = round(mean(PeriodPkVent), digits = round_digit)) %>% - dplyr::ungroup() %>% ##make hi/low into CIs - dplyr::mutate(PeriodInfPI = paste(format(PeriodInfPILow,big.mark=","), format(PeriodInfPIHigh,big.mark=","), sep="-"), - PeriodDeathPI = paste(format(PeriodDeathPILow,big.mark=","), format(PeriodDeathPIHigh,big.mark=","), sep="-"), - PeriodHospPI = paste(format(PeriodHospPILow,big.mark=","), format(PeriodHospPIHigh,big.mark=","), sep="-"), - PeriodPkHospPI = paste(format(PeriodPkHospPILow,big.mark=","), format(PeriodPkHospPIHigh,big.mark=","), sep="-"), - PeriodICUPI = paste(format(PeriodICUPILow,big.mark=","), format(PeriodICUPIHigh,big.mark=","), sep="-"), - PeriodPkICUPI = paste(format(PeriodPkICUPILow,big.mark=","), format(PeriodPkICUPIHigh,big.mark=","), sep="-"), - PeriodVentPI = paste(format(PeriodVentPILow,big.mark=","), format(PeriodVentPIHigh,big.mark=","), sep="-"), - PeriodPkVentPI = paste(format(PeriodPkVentPILow,big.mark=","), format(PeriodPkVentPIHigh,big.mark=","), sep="-")) %>% - dplyr::select(-PeriodInfPILow, -PeriodInfPIHigh, - -PeriodDeathPILow, -PeriodDeathPIHigh, - -PeriodHospPILow, -PeriodHospPIHigh, - -PeriodPkHospPILow, -PeriodPkHospPIHigh, - -PeriodICUPILow, -PeriodICUPIHigh, - -PeriodPkICUPILow, -PeriodPkICUPIHigh, - -PeriodVentPILow, -PeriodVentPIHigh, - -PeriodPkVentPILow, -PeriodPkVentPIHigh,) - - - tmp<-sprintf("%s_%s", rep(lbls, each=2),c("mean","95% PI")) - - - ##inellegant but should work - tbl_df <- - dplyr::bind_rows(tbl_df%>% - dplyr::select(period,scenario_name, PeriodInf, PeriodInfPI)%>% - dplyr::mutate(outcome="Infections in Period")%>% - dplyr::rename(mean=PeriodInf,`95% PI`=PeriodInfPI), - tbl_df%>% - dplyr::select(period,scenario_name, PeriodDeath, PeriodDeathPI)%>% - dplyr::mutate(outcome="Deaths in Period")%>% - dplyr::rename(mean=PeriodDeath,`95% PI`=PeriodDeathPI), - tbl_df%>% - dplyr::select(period,scenario_name, PeriodHosp, PeriodHospPI)%>% - dplyr::mutate(outcome="Hospital Admissions in Period")%>% - rename(mean=PeriodHosp,`95% PI`=PeriodHospPI), - tbl_df%>% - dplyr::select(period,scenario_name, PeriodPkHosp, PeriodPkHospPI)%>% - dplyr::mutate(outcome="Peak Hospital Occupancy in Period")%>% - dplyr::rename(mean=PeriodPkHosp,`95% PI`=PeriodPkHospPI), - tbl_df%>% - dplyr::select(period,scenario_name, PeriodICU, PeriodICUPI)%>% - dplyr::mutate(outcome="ICU Admissions in Period")%>% - dplyr::rename(mean=PeriodICU,`95% PI`=PeriodICUPI), - tbl_df%>% - dplyr::select(period,scenario_name, PeriodPkICU, PeriodPkICUPI)%>% - dplyr::mutate(outcome="Peak ICU Occupancy in Period")%>% - dplyr::rename(mean=PeriodPkICU,`95% PI`=PeriodPkICUPI), - tbl_df%>% - dplyr::select(period,scenario_name, PeriodVent, PeriodVentPI)%>% - dplyr::mutate(outcome="Incident Ventilations in Period")%>% - dplyr::rename(mean=PeriodVent,`95% PI`=PeriodVentPI), - tbl_df%>% - dplyr::select(period,scenario_name, PeriodPkVent, PeriodPkVentPI)%>% - dplyr::mutate(outcome="Peak Ventilators in Use in Period")%>% - dplyr::rename(mean=PeriodPkVent,`95% PI`=PeriodPkVentPI)) %>% - dplyr::mutate(period=as.character(period)) %>% - tidyr::pivot_wider(names_from=period, values_from = c(mean,`95% PI`), names_sep=".")%>% - setNames(nm = sub("(.*)\\.(.*)", "\\2_\\1", names(.)))%>% - dplyr::select(outcome,scenario_name,all_of(tmp)) - - #tells how to group columns - tbl_df <- flextable::as_grouped_data(tbl_df,groups="outcome") - tmp <- is.na(tbl_df$scenario_name) - tbl_df$scenario_name[tmp] <-tbl_df$outcome[tmp] - tbl_df <- tbl_df%>%dplyr::select(-outcome) - typology<-tibble(col_keys=colnames(tbl_df), - colA=c("",rep(lbls,each=2)), - colB=c("",rep(c("mean","95% PI"),length(lbls)))) - - - flx <- flextable::flextable(tbl_df) %>% - flextable::colformat_num(digits=0)%>% - #flextable::merge_v(j="outcome")%>% - flextable::autofit(add_w=.05)%>% - flextable::valign(valign="top") %>% - flextable::set_header_df(mapping = typology, key = "col_keys" )%>% - #flextable::merge_h(part="header")%>% - flextable::bold(j=sprintf("%s_mean",lbls))%>% - flextable::bold(part="header",bold=TRUE)%>% - flextable::bold(j = 1, i =which(tmp), bold = TRUE, part = "body" )%>% - flextable::align(i=1,align = "center", part="header") %>% - flextable::align(i=2,j=which(typology$colB=="mean"), align = "right", part="header") %>% - flextable::hline(i=2, part="header", border = officer::fp_border())%>% - flextable::hline_top(part="header", border = officer::fp_border(width=2))%>% - flextable::border(i=which(tmp), border.top = officer::fp_border(col="grey")) + lbls <- sprintf("%s-%s", format(period_breaks[1:(len - 1)], + "%b %d"), format(period_breaks[2:len], "%b %d")) + + tbl_df <- hosp_state_totals %>% + dplyr::mutate(period = cut(time, + period_breaks, labels = lbls)) %>% + dplyr::group_by(period, scenario_name, sim_num) %>% + dplyr::summarize(PeriodInf = sum(NincidInf), + PeriodCase = sum(NincidCase), + PeriodDeath = sum(NincidDeath), + PeriodHosp = sum(NincidHosp), + PeriodPkHosp = max(NhospCurr), + PeriodICU = sum(NincidICU), + PeriodPkICU = max(NICUCurr), + PeriodVent = sum(NincidVent), + PeriodPkVent = max(NVentCurr)) %>% + dplyr::ungroup() %>% + dplyr::group_by(period, scenario_name) %>% + dplyr::summarize(PeriodInfPILow = round(quantile(PeriodInf, probs = c(pi_low)), digits = round_digit), + PeriodCasePILow = round(quantile(PeriodCase, probs = c(pi_low)), digits = round_digit), + PeriodDeathPILow = round(quantile(PeriodDeath, probs = c(pi_low)), digits = round_digit), + PeriodHospPILow = round(quantile(PeriodHosp, probs = c(pi_low)), digits = round_digit), + PeriodPkHospPILow = round(quantile(PeriodPkHosp, probs = c(pi_low)), digits = round_digit), + PeriodICUPILow = round(quantile(PeriodICU, probs = c(pi_low)), digits = round_digit), + PeriodPkICUPILow = round(quantile(PeriodPkICU, probs = c(pi_low)), digits = round_digit), + PeriodVentPILow = round(quantile(PeriodVent, probs = c(pi_low)), digits = round_digit), + PeriodPkVentPILow = round(quantile(PeriodPkVent, probs = c(pi_low)), digits = round_digit), + PeriodInfPIHigh = round(quantile(PeriodInf, probs = c(pi_high)), digits = round_digit), + PeriodCasePIHigh = round(quantile(PeriodCase, probs = c(pi_high)), digits = round_digit), + PeriodDeathPIHigh = round(quantile(PeriodDeath, probs = c(pi_high)), digits = round_digit), + PeriodHospPIHigh = round(quantile(PeriodHosp, probs = c(pi_high)), digits = round_digit), + PeriodPkHospPIHigh = round(quantile(PeriodPkHosp, probs = c(pi_high)), digits = round_digit), + PeriodICUPIHigh = round(quantile(PeriodICU, probs = c(pi_high)), digits = round_digit), + PeriodPkICUPIHigh = round(quantile(PeriodPkICU, probs = c(pi_high)), digits = round_digit), + PeriodVentPIHigh = round(quantile(PeriodICU, probs = c(pi_high)), digits = round_digit), + PeriodPkVentPIHigh = round(quantile(PeriodPkVent, probs = c(pi_high)), digits = round_digit), + PeriodInf = round(mean(PeriodInf), digits = round_digit), + PeriodCase = round(mean(PeriodCase), digits = round_digit), + PeriodDeath = round(mean(PeriodDeath), digits = round_digit), + PeriodHosp = round(mean(PeriodHosp), digits = round_digit), + PeriodPkHosp = round(mean(PeriodPkHosp), digits = round_digit), + PeriodICU = round(mean(PeriodICU), digits = round_digit), + PeriodPkICU = round(mean(PeriodPkICU), digits = round_digit), + PeriodVent = round(mean(PeriodVent), digits = round_digit), + PeriodPkVent = round(mean(PeriodPkVent), digits = round_digit)) %>% + dplyr::ungroup() %>% + dplyr::mutate(PeriodInfPI = paste(format(PeriodInfPILow, big.mark = ","), format(PeriodInfPIHigh, big.mark = ","), sep = "-"), + PeriodCasePI = paste(format(PeriodCasePILow, big.mark = ","), format(PeriodCasePIHigh, big.mark = ","), sep = "-"), + PeriodDeathPI = paste(format(PeriodDeathPILow, big.mark = ","), format(PeriodDeathPIHigh, big.mark = ","), sep = "-"), + PeriodHospPI = paste(format(PeriodHospPILow, big.mark = ","), format(PeriodHospPIHigh, big.mark = ","), sep = "-"), + PeriodPkHospPI = paste(format(PeriodPkHospPILow, big.mark = ","), format(PeriodPkHospPIHigh, big.mark = ","), sep = "-"), + PeriodICUPI = paste(format(PeriodICUPILow, big.mark = ","), format(PeriodICUPIHigh, big.mark = ","), sep = "-"), + PeriodPkICUPI = paste(format(PeriodPkICUPILow, big.mark = ","), format(PeriodPkICUPIHigh, big.mark = ","), sep = "-"), + PeriodVentPI = paste(format(PeriodVentPILow, big.mark = ","), format(PeriodVentPIHigh, big.mark = ","), sep = "-"), + PeriodPkVentPI = paste(format(PeriodPkVentPILow, big.mark = ","), format(PeriodPkVentPIHigh, big.mark = ","), sep = "-")) %>% + dplyr::select(-PeriodInfPILow, -PeriodInfPIHigh, + -PeriodCasePILow, -PeriodCasePIHigh, + -PeriodDeathPILow, -PeriodDeathPIHigh, + -PeriodHospPILow, -PeriodHospPIHigh, + -PeriodPkHospPILow, -PeriodPkHospPIHigh, + -PeriodICUPILow, -PeriodICUPIHigh, + -PeriodPkICUPILow, -PeriodPkICUPIHigh, + -PeriodVentPILow, -PeriodVentPIHigh, + -PeriodPkVentPILow, -PeriodPkVentPIHigh) + + tmp <- sprintf("%s_%s", rep(lbls, each = 2), c("mean", "95% PI")) + tbl_df <- dplyr::bind_rows(tbl_df %>% dplyr::select(period, + scenario_name, PeriodInf, PeriodInfPI) %>% dplyr::mutate(outcome = "Infections in Period") %>% + dplyr::rename(mean = PeriodInf, `95% PI` = PeriodInfPI), + tbl_df %>% dplyr::select(period, + scenario_name, PeriodCase, PeriodCasePI) %>% dplyr::mutate(outcome = "Cases in Period") %>% + dplyr::rename(mean = PeriodCase, `95% PI` = PeriodCasePI), + tbl_df %>% dplyr::select(period, scenario_name, PeriodDeath, + PeriodDeathPI) %>% dplyr::mutate(outcome = "Deaths in Period") %>% + dplyr::rename(mean = PeriodDeath, `95% PI` = PeriodDeathPI), + tbl_df %>% dplyr::select(period, scenario_name, PeriodHosp, + PeriodHospPI) %>% dplyr::mutate(outcome = "Hospital Admissions in Period") %>% + rename(mean = PeriodHosp, `95% PI` = PeriodHospPI), + tbl_df %>% dplyr::select(period, scenario_name, PeriodPkHosp, + PeriodPkHospPI) %>% dplyr::mutate(outcome = "Peak Hospital Occupancy in Period") %>% + dplyr::rename(mean = PeriodPkHosp, `95% PI` = PeriodPkHospPI), + tbl_df %>% dplyr::select(period, scenario_name, PeriodICU, + PeriodICUPI) %>% dplyr::mutate(outcome = "ICU Admissions in Period") %>% + dplyr::rename(mean = PeriodICU, `95% PI` = PeriodICUPI), + tbl_df %>% dplyr::select(period, scenario_name, PeriodPkICU, + PeriodPkICUPI) %>% dplyr::mutate(outcome = "Peak ICU Occupancy in Period") %>% + dplyr::rename(mean = PeriodPkICU, `95% PI` = PeriodPkICUPI), + tbl_df %>% dplyr::select(period, scenario_name, PeriodVent, + PeriodVentPI) %>% dplyr::mutate(outcome = "Incident Ventilations in Period") %>% + dplyr::rename(mean = PeriodVent, `95% PI` = PeriodVentPI), + tbl_df %>% dplyr::select(period, scenario_name, PeriodPkVent, + PeriodPkVentPI) %>% dplyr::mutate(outcome = "Peak Ventilators in Use in Period") %>% + dplyr::rename(mean = PeriodPkVent, `95% PI` = PeriodPkVentPI)) %>% + dplyr::mutate(period = as.character(period)) %>% + tidyr::pivot_wider(names_from = period, + values_from = c(mean, `95% PI`), names_sep = ".") %>% + setNames(nm = sub("(.*)\\.(.*)", "\\2_\\1", names(.))) %>% + dplyr::select(outcome, scenario_name, all_of(tmp)) + + tbl_df <- flextable::as_grouped_data(tbl_df, groups = "outcome") + tmp <- is.na(tbl_df$scenario_name) + tbl_df$scenario_name[tmp] <- tbl_df$outcome[tmp] + tbl_df <- tbl_df %>% dplyr::select(-outcome) + typology <- dplyr::tibble(col_keys = colnames(tbl_df), + colA = c("", rep(lbls, each = 2)), + colB = c("", rep(c("mean", "95% PI"), length(lbls)))) + + flx <- flextable::flextable(tbl_df) %>% flextable::colformat_num(digits = 0) %>% + flextable::autofit(add_w = 0.05) %>% flextable::valign(valign = "top") %>% + flextable::set_header_df(mapping = typology, key = "col_keys") %>% + flextable::bold(j = sprintf("%s_mean", lbls)) %>% + flextable::bold(part = "header", bold = TRUE) %>% + flextable::bold(j = 1, i = which(tmp), bold = TRUE, part = "body") %>% + flextable::align(i = 1, align = "center", part = "header") %>% + flextable::align(i = 2, j = which(typology$colB == "mean"), align = "right", part = "header") %>% + flextable::hline(i = 2, part = "header", border = officer::fp_border()) %>% + flextable::hline_top(part = "header", border = officer::fp_border(width = 2)) %>% + flextable::border(i = which(tmp), border.top = officer::fp_border(col = "grey")) return(flx) } @@ -765,11 +814,11 @@ plot_event_time_by_geoid <- function(hosp_county_peaks, if(is.null(value_name) & (length(unique(hosp_county_peaks$scenario_name)) == 1)){stop("Value name must be provided if only one scenario is plotted.")} value_name <- rlang::sym(value_name) - + hosp_county_peaks$time[is.na(hosp_county_peaks$time)] <- end_date+1 hosp_county_peaks$time[hosp_county_peaks$time > end_date] <- end_date+1 hosp_county_peaks$time[hosp_county_peaks$time < start_date] <- start_date-1 - + to_plt <- hosp_county_peaks %>% dplyr::group_by(geoid, scenario_name) %>% dplyr::summarise(mean_time = mean(time), @@ -794,7 +843,7 @@ plot_event_time_by_geoid <- function(hosp_county_peaks, scale_y_date(time_caption, date_breaks = "1 week", date_labels = "%b %d" - ) + + ) + xlab(geoid_caption) + theme_bw() + theme(axis.text.x = element_text(angle = 90, hjust = 1), legend.position = "bottom") + @@ -808,7 +857,7 @@ plot_event_time_by_geoid <- function(hosp_county_peaks, scale_y_date(time_caption, date_breaks = "1 week", date_labels = "%b %d" - ) + + ) + xlab(geoid_caption) + scale_color_manual("Scenario", values = scenario_colors) + @@ -816,18 +865,17 @@ plot_event_time_by_geoid <- function(hosp_county_peaks, theme(axis.text.x = element_text(angle = 90, hjust = 1), legend.position = "bottom") + coord_flip() } - - + + return(rc) - + } - ##' ##' Compare model outputs and data from CSSE ##' -##' @param state_hosp_totals state hosp data frame -##' @param jhu_obs_dat df with case data by geoid and cols incidI, incidDeath, and currhosp (if plotting hospitalizations) +##' @param state_hosp_totals df with model outputs summed over geoid (e.g., from load_hosp_geocombined_totals), with cols sim_num, pdeath, date, NincidCase, NincidDeath, NhospCurr +##' @param jhu_obs_dat df with observed data by geoid and cols incidI, incidDeath, and currhosp (if plotting hospitalizations) ##' @param scenario_colors character vector with scenario colors ##' @param pdeath_filter IFR level assumption ##' @param obs_data_col character string of observed data color @@ -836,23 +884,23 @@ plot_event_time_by_geoid <- function(hosp_county_peaks, ##' @param date_breaks breaks for dates in figure ##' @param sim_start_date simulation start date ##' @param sim_end_date simulation end date -##' @param week whether to aggregate incident cases/deaths to weeks +##' @param week whether to aggregate incident cases/deaths to weeks (sum for incid cases/deaths, avg for daily hosp beds) ##' @param hosp whether to show hosp values ##' ##' @export plot_model_vs_obs <- function(state_hosp_totals, - jhu_obs_dat, - scenario_colors, - pdeath_filter = pdeath_default, - obs_data_col = "black", - ci.L = 0, - ci.U = 1, - date_breaks = "1 month", - sim_start_date, - sim_end_date, - week=FALSE, - hosp=FALSE, - tendency="mean" + jhu_obs_dat, + scenario_colors, + pdeath_filter = pdeath_default, + obs_data_col = "black", + ci.L = 0, + ci.U = 1, + date_breaks = "1 month", + sim_start_date, + sim_end_date, + week=FALSE, + hosp=FALSE, + tendency="mean" ) { state_hosp_totals <- @@ -864,36 +912,50 @@ plot_model_vs_obs <- function(state_hosp_totals, dplyr::filter(between(date, as.Date(sim_start_date), as.Date(sim_end_date))) jhu_obs_dat <- jhu_obs_dat %>% - dplyr::select(source, date, incidI, incidDeath) %>% + {if(hosp) dplyr::select(., source, date, incidI, incidDeath, currhosp) else(dplyr::select(., source, date, incidI, incidDeath))} %>% dplyr::filter(between(date, as.Date(sim_start_date), as.Date(sim_end_date))) %>% - dplyr::group_by(source, date) %>% - dplyr::summarise_all(sum, na.rm=TRUE) %>% - ungroup()%>% - rename(NincidConfirmed=incidI, + dplyr::group_by(date) %>% #sum over all geoids + dplyr::summarise_if(is.numeric,sum, na.rm=TRUE) %>% + dplyr::ungroup()%>% + dplyr::rename(NincidConfirmed=incidI, NincidDeathsObs=incidDeath) if(hosp){ state_hosp_summary <- state_hosp_totals %>% + {if(week) dplyr::group_by(.,date=lubridate::ceiling_date(date, "weeks"), scenario_name, sim_num) %>% + dplyr::summarize(NhospCurr=mean(NhospCurr)) + else(.)}%>% dplyr::group_by(date, scenario_name) %>% - dplyr::summarize(lo = quantile(NhospCurr, ci.L), + dplyr::summarize(lo = quantile(NhospCurr, ci.L), # statistics taken for different sim_nums (separate slots aka MCMC chains) hi = quantile(NhospCurr, ci.U), mean= mean(NhospCurr), - median = median(NhospCurr))%>% - dplyr::rename(est=!!as.symbol(tendency)) + median = median(NhospCurr)) + + state_hosp_summary<-state_hosp_summary %>% + dplyr::mutate(est = !!as.symbol(tendency)) + + if(week){ + jhu_obs_dat <- jhu_obs_dat %>% + dplyr::group_by(date=lubridate::ceiling_date(date, "weeks")) %>% + dplyr::summarize(NincidConfirmed=sum(NincidConfirmed), + NincidDeathsObs=sum(NincidDeathsObs), + currhosp=mean(currhosp))%>% + dplyr::filter(date!=max(date)) + } - incid_hosp_plot <- - ggplot(state_hosp_summary, aes(x = date)) + + incid_hosp_plot <- state_hosp_summary %>% + dplyr::left_join(jhu_obs_dat %>% dplyr::filter(date < max(date)) %>% dplyr::select(date, truth_var=currhosp)) %>% + ggplot(aes(x = date)) + geom_line(aes(y = est, color = scenario_name)) + geom_ribbon(aes(ymin=lo, ymax=hi, fill = scenario_name), linetype = 0, alpha=0.2) + - geom_point(data = jhu_obs_dat %>% filter(date < max(date)), aes(x = date, y = currhosp), color = obs_data_col) + - #ylab("Incident Cases") + - #theme(legend.position = "bottom") + + geom_point(aes(x = date, y = truth_var), color = obs_data_col) + scale_x_date(date_breaks = date_breaks, date_labels = "%b %Y", limits = c(as.Date(sim_start_date), as.Date(sim_end_date))) + scale_y_continuous("Daily occupied hospital beds", labels = scales::comma) + scale_color_manual("Scenario", + breaks = levels(state_hosp_summary$scenario_name), values = scenario_colors) + theme_minimal() + theme(axis.title.x = element_blank(), @@ -903,50 +965,48 @@ plot_model_vs_obs <- function(state_hosp_totals, guides(color = guide_legend(nrow = 2, override.aes = list(alpha=1)), fill = FALSE) + coord_cartesian(ylim = c(0, 1.5*max(jhu_obs_dat$currhosp))) - } - - if(week){ - jhu_obs_dat <- jhu_obs_dat %>% - dplyr::group_by(date=lubridate::ceiling_date(date, "weeks")) %>% - dplyr::summarize(NincidConfirmed=sum(NincidConfirmed), - NincidDeathsObs=sum(NincidDeathsObs))%>% - ungroup() %>% - dplyr::filter(date!=max(date)) + } else{ + + if(week){ + jhu_obs_dat <- jhu_obs_dat %>% + dplyr::group_by(date=lubridate::ceiling_date(date, "weeks")) %>% + dplyr::summarize(NincidConfirmed=sum(NincidConfirmed), + NincidDeathsObs=sum(NincidDeathsObs))%>% + dplyr::ungroup() + } + } state_inf_summary <- state_hosp_totals %>% {if(week) dplyr::group_by(.,date=lubridate::ceiling_date(date, "weeks"), scenario_name, sim_num) %>% - dplyr::summarize(NincidInf=sum(NincidInf), - NincidCase=sum(NincidCase)) + dplyr::summarize(NincidCase=sum(NincidCase)) else(.)}%>% dplyr::group_by(date, scenario_name) %>% - dplyr::summarize( - # ci_lower_incid_inf = quantile(NincidInf, ci.L), - # ci_upper_incid_inf = quantile(NincidInf, ci.U), - # mean_incid_inf = mean(NincidInf), - # median_incid_inf = median(NincidInf), + dplyr::summarize( # statistics taken for different sim_nums (separate slots aka MCMC chains) lo = quantile(NincidCase, ci.L), hi = quantile(NincidCase, ci.U), mean = mean(NincidCase), median = median(NincidCase)) %>% dplyr::group_by(scenario_name) %>% - dplyr::filter(date!=max(date))%>% - dplyr::rename(est=!!as.symbol(tendency)) + dplyr::filter(date!=max(date)) + + state_inf_summary<-state_inf_summary %>% + dplyr::mutate(est = !!as.symbol(tendency)) ### Incidence of infections plot - incid_infections_plot <- - ggplot(state_inf_summary, aes(x = date)) + + incid_infections_plot <- state_inf_summary %>% + dplyr::left_join(jhu_obs_dat %>% dplyr::filter(date < max(date)) %>% dplyr::select(date, truth_var = NincidConfirmed)) %>% + ggplot(aes(x = date)) + geom_line(aes(y = est, color = scenario_name)) + geom_ribbon(aes(ymin=lo, ymax=hi, fill = scenario_name), linetype = 0, alpha=0.2) + - geom_point(data = jhu_obs_dat %>% filter(date < max(date)), aes(x = date, y = NincidConfirmed), color = obs_data_col) + - #ylab("Incident Cases") + - #theme(legend.position = "bottom") + + geom_point(aes(x = date, y = truth_var), color = obs_data_col) + scale_x_date(date_breaks = date_breaks, date_labels = "%b %Y", limits = c(as.Date(sim_start_date), as.Date(sim_end_date))) + scale_y_continuous("Incident Cases", labels = scales::comma) + scale_color_manual("Scenario", + breaks = levels(state_inf_summary$scenario_name), values = scenario_colors) + theme_minimal() + theme(axis.title.x = element_blank(), @@ -968,21 +1028,23 @@ plot_model_vs_obs <- function(state_hosp_totals, mean = mean(NincidDeath), median = median(NincidDeath)) %>% dplyr::group_by(scenario_name) %>% - dplyr::filter(date!=max(date)) %>% - dplyr::rename(est=!!as.symbol(tendency)) + dplyr::filter(date!=max(date)) - incid_deaths_plot <- - ggplot(state_death_summary, aes(x = date)) + + state_death_summary <- state_death_summary %>% + dplyr::mutate(est = !!as.symbol(tendency)) + + incid_deaths_plot <-state_death_summary %>% + dplyr::left_join(jhu_obs_dat %>% dplyr::filter(date% dplyr::select(date, truth_var=NincidDeathsObs)) %>% + ggplot(aes(x = date)) + geom_line(aes(y = est, color = scenario_name)) + geom_ribbon(aes(ymin=lo, ymax=hi, fill = scenario_name), linetype = 0, alpha=0.2) + - geom_point(data = jhu_obs_dat %>% filter(date% + plt_dat <- hosp_geounit_relative %>% dplyr::rename(threshold = !!value_name) %>% dplyr::filter(time >= start_date & time <= end_date) %>% dplyr::filter(scenario_name %in% scenario_labels) %>% @@ -1043,31 +1104,31 @@ plot_needs_relative_to_threshold_heatmap <- function(hosp_geounit_relative, dplyr::mutate(name_num=seq_along(name)) if(length(scenario_labels)==1){ - + rc <- ggplot(plt_dat, aes(x = time, y = name_num)) + geom_tile(aes(fill = log_prop_needed)) + scale_fill_gradient2(paste("Log", legend_title), low = scale_colors[1], mid = scale_colors[2], high = scale_colors[3], midpoint = 0, na.value = "grey 30", labels = scales::comma, limits = c(floor(min(plt_dat$log_prop_needed)), ceiling(max(plt_dat$log_prop_needed)))) + scale_y_continuous("", - breaks = plt_dat$name_num, - labels = plt_dat$name, - sec.axis = dup_axis(name = value_label, - breaks = plt_dat$name_num, - labels = plt_dat$threshold)) + + breaks = plt_dat$name_num, + labels = plt_dat$name, + sec.axis = dup_axis(name = value_label, + breaks = plt_dat$name_num, + labels = plt_dat$threshold)) + scale_x_date(date_breaks = "2 weeks", date_labels = "%b %d") + theme_bw() + theme(legend.position = "bottom", axis.title.x = element_blank(), axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) - + } else{ - + rc <- ggplot(plt_dat, aes(x = time, y = name_num)) + geom_tile(aes(fill = log_prop_needed)) + scale_fill_gradient2(paste("Log", legend_title), low = scale_colors[1], mid = scale_colors[2], high = scale_colors[3], midpoint = 0, na.value = "grey 30", labels = scales::comma, limits = c(floor(min(plt_dat$log_prop_needed)), ceiling(max(plt_dat$log_prop_needed)))) + scale_y_continuous("", - breaks = plt_dat$name_num, - labels = plt_dat$name, - sec.axis = dup_axis(name = value_label, - breaks = plt_dat$name_num, - labels = plt_dat$threshold)) + + breaks = plt_dat$name_num, + labels = plt_dat$name, + sec.axis = dup_axis(name = value_label, + breaks = plt_dat$name_num, + labels = plt_dat$threshold)) + scale_x_date(date_breaks = "2 weeks", date_labels = "%b %d") + theme_bw() + theme(legend.position = "bottom", axis.title.x = element_blank(), axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) + @@ -1166,14 +1227,14 @@ plot_inference_r <- function(r_dat, } - return(rc) + return(plot_rc) } + ##' Sparkline table with R estimates ##' ##' @param r_dat df with R or reduction estimates per sim (only one scenario) -##' @param county_dat df with cumulative infections, scenario, pdeath, sim_num, and -##' time across geoids and all simulations +##' @param county_dat df with SEIR outputs ##' @param current_scenario name of scenario inputs to use ##' @param susceptible whether to show estimates adjusted for cumulative infections, ##' in which case the estimate reflects the median Rt for past interventions and the Rt for @@ -1213,55 +1274,71 @@ make_sparkline_tab_r <- function(r_dat, require(tidyverse) if(length(npi_labels)!=length(npi_levels)) {stop("Length of npi levels and labels must be equal")} if(susceptible & is.null(county_dat)) {stop("You must specify county_dat")} - + timeline<-crossing(geoid=unique(r_dat$geoid), date = seq(min(r_dat$start_date), max(r_dat$end_date), by=1)) intervention_names <- r_dat %>% dplyr::filter(scenario==current_scenario, - pdeath==pdeath_filter) %>% + pdeath==pdeath_filter & npi_name %in% npi_levels) %>% dplyr::distinct(npi_name, start_date, end_date, geoid) %>% dplyr::group_by(geoid) %>% dplyr::mutate(end_date=if_else(npi_name=="local_variance", lead(start_date)-1, end_date)) - r_dat <- r_dat%>% - dplyr::filter(scenario==current_scenario, - pdeath==pdeath_filter) %>% - dplyr::select(geoid, sim_num, npi_name, reduction, local_r, scenario, start_date, end_date) %>% - dplyr::left_join(timeline) %>% - dplyr::mutate(reduction=if_else(dateend_date, NA_real_, reduction)) %>% - drop_na() %>% - dplyr::mutate(reduction=ifelse(npi_name=="local_variance", 1, 1-reduction)) %>% - dplyr::group_by(geoid, sim_num, date, scenario) %>% - dplyr::summarize(reduction=prod(reduction), - local_r=unique(local_r)) %>% - dplyr::mutate(r=reduction*local_r) %>% - dplyr::select(-local_r, -reduction) - - if(susceptible){ - r_dat<-county_dat %>% + r_dat2 <- list() + for(i in 1:length(unique(r_dat$geoid))){ + r_dat2[[i]] <- r_dat%>% dplyr::filter(scenario==current_scenario, - pdeath==pdeath_filter) %>% - dplyr::select(geoid, scenario, pdeath, sim_num, cum_inf, date=time) %>% - dplyr::right_join(r_dat) %>% - dplyr::left_join(geodat) %>% - dplyr::mutate(r=r*(1-cum_inf/pop)) + pdeath==pdeath_filter, + geoid==unique(r_dat$geoid)[i]) %>% + dplyr::select(geoid, sim_num, npi_name, reduction, local_r, scenario, start_date, end_date) %>% + dplyr::left_join(timeline) %>% + dplyr::mutate(reduction=if_else(dateend_date, NA_real_, reduction)) %>% + tidyr::drop_na() %>% + dplyr::mutate(reduction=ifelse(npi_name=="local_variance", 1, 1-reduction)) %>% + dplyr::group_by(geoid, sim_num, date, scenario) %>% + dplyr::summarize(reduction=prod(reduction), + local_r=unique(local_r)) %>% + dplyr::mutate(r=reduction*local_r) %>% + dplyr::select(-local_r, -reduction) + + if(susceptible){ + susceptible_dat <- county_dat %>% + dplyr::filter(comp=="S" & geoid == unique(r_dat$geoid)[i] & pdeath==pdeath_filter & scenario==current_scenario) %>% + dplyr::left_join(geodat) %>% + dplyr::mutate(vacc = value/pop, + vacc = dplyr::case_when(p_comp==0 ~ vacc, + p_comp==1 ~ vacc*(1-0.5), + p_comp==2 ~ vacc*(1-0.9))) + + susceptible_dat <- susceptible_dat %>% + dplyr::group_by(time, scenario, sim_num, pdeath, geoid, location, pop, name) %>% + dplyr::summarize(vacc = sum(vacc)) + + r_dat2[[i]]<- r_dat2[[i]] %>% + dplyr::left_join(susceptible_dat) %>% + dplyr::mutate(rt = r*vacc) %>% + dplyr::group_by(scenario, time, location) %>% + dplyr::mutate(weight=pop/sum(pop)) + } + + r_dat2[[i]] <- r_dat2[[i]] %>% + dplyr::group_by(geoid, name, date) %>% + dplyr::summarize(est_lo=quantile(r, pi_lo, na.rm=TRUE), + est_hi=quantile(r, pi_hi, na.rm=TRUE), + estimate=mean(r, na.rm=TRUE)) } - r_dat <- r_dat %>% - dplyr::group_by(geoid, name, date) %>% - dplyr::summarize(est_lo=quantile(r, pi_lo, na.rm=TRUE), - est_hi=quantile(r, pi_hi, na.rm=TRUE), - estimate=mean(r, na.rm=TRUE)) + r_dat <- dplyr::bind_rows(r_dat2) r_dat <- r_dat %>% - left_join(intervention_names) %>% - dplyr::mutate(estimate = if_else(dateend_date, NA_real_, estimate)) %>% - drop_na() %>% + dplyr::left_join(intervention_names) %>% + dplyr::mutate(estimate = dplyr::if_else(dateend_date, NA_real_, estimate)) %>% + tidyr::drop_na() %>% dplyr::group_by(geoid) %>% - dplyr::mutate(mid_point=if_else(max(end_date)==end_date, # for summary values mid_point=date + dplyr::mutate(mid_point=dplyr::if_else(max(end_date)==end_date, # for summary values mid_point=date Sys.Date(), start_date+floor((end_date-start_date)/2))) @@ -1272,11 +1349,11 @@ make_sparkline_tab_r <- function(r_dat, dplyr::mutate_if(is.numeric, signif, digits=2) %>% dplyr::mutate_if(is.numeric, as.character) %>% dplyr::mutate(npi_name=factor(npi_name, levels=npi_levels, labels=npi_labels), - est_lo = stringr::str_replace(est_lo, "^", '\n\\('), - est_hi = stringr::str_replace(est_hi, "$", '\\)')) %>% + est_lo = stringr::str_replace(est_lo, "^", '\n\\('), + est_hi = stringr::str_replace(est_hi, "$", '\\)')) %>% dplyr::arrange(npi_name) %>% - unite(col="pi", est_lo:est_hi, sep="-") %>% - unite(col="estimate", estimate:pi, sep="\n") %>% + tidyr::unite(col="pi", est_lo:est_hi, sep="-") %>% + tidyr::unite(col="estimate", estimate:pi, sep="\n") %>% tidyr::pivot_wider(id_cols="name", values_from=estimate, names_from=npi_name) %>% dplyr::arrange(name) %>% dplyr::rename(Location=name) @@ -1290,7 +1367,7 @@ make_sparkline_tab_r <- function(r_dat, fill_values <- expand_palette(npi_num) color_values <- colorspace::darken(fill_values, 0.3) - + # solution from https://stackoverflow.com/questions/61741440/is-there-a-way-to-embed-a-ggplot-image-dynamically-by-row-like-a-sparkline-usi @@ -1300,8 +1377,8 @@ make_sparkline_tab_r <- function(r_dat, dplyr::mutate(plot_var=est_lo, npi_name="blank")) %>% dplyr::mutate(npi_name=factor(npi_name, - levels=c(npi_levels, "blank"), - labels=c(npi_labels,"blank"))) %>% + levels=c(npi_levels, "blank"), + labels=c(npi_labels,"blank"))) %>% dplyr::select(name, date, estimate, plot_var, npi_name) %>% dplyr::arrange(name)%>% dplyr::group_by(name) %>% @@ -1326,28 +1403,28 @@ make_sparkline_tab_r <- function(r_dat, dplyr::mutate(` `=NA) # Table - r_output <- tibble(r_tab, + r_output <- dplyr::tibble(r_tab, ` ` = NA, .rows = nrow(r_tab)) %>% - gt() %>% - text_transform( - locations = cells_body(vars(` `)), + gt::gt() %>% + gt::text_transform( + locations = gt::cells_body(vars(` `)), fn = function(x) { - purrr::map(r_plot$plot, ggplot_image, height = px(px_qual), aspect_ratio=wh_ratio) + purrr::map(r_plot$plot, ggplot_image, height = gt::px(px_qual), aspect_ratio=wh_ratio) } ) r_output %>% - tab_options(table.font.size = pct(80)) %>% - tab_options(column_labels.font.weight = "bold", + gt::tab_options(table.font.size = gt::pct(80)) %>% + gt::tab_options(column_labels.font.weight = "bold", table_body.hlines.color = "#000000", table_body.border.bottom.color="#000000", table_body.border.top.color = "#000000", column_labels.border.top.color = "#000000", column_labels.border.bottom.color = "#000000") %>% - tab_style(style=list(cell_text(style="oblique"), - cell_borders(sides="right")), - locations=cells_body(columns=vars(Location))) + gt::tab_style(style=list(gt::cell_text(style="oblique"), + gt::cell_borders(sides="right")), + locations=gt::cells_body(columns=vars(Location))) } @@ -1393,8 +1470,8 @@ make_sparkline_tab_intervention_effect <- function(r_dat, dplyr::left_join(geodat) %>% dplyr::group_by(geoid, start_date, end_date, npi_name, name) %>% dplyr::summarize(est_lo=quantile(reduction, pi_lo, na.rm=TRUE), - est_hi=quantile(reduction, pi_hi, na.rm=TRUE), - estimate=mean(reduction, na.rm=TRUE)) %>% + est_hi=quantile(reduction, pi_hi, na.rm=TRUE), + estimate=mean(reduction, na.rm=TRUE)) %>% dplyr::mutate_if(is.numeric, signif, digits=2) %>% dplyr::filter(npi_name!="local_variance") @@ -1402,11 +1479,11 @@ make_sparkline_tab_intervention_effect <- function(r_dat, r_tab<-r_dat%>% dplyr::mutate_if(is.numeric, as.character) %>% dplyr::mutate(npi_name=factor(npi_name, levels=npi_levels, labels=npi_labels), - est_lo = stringr::str_replace(est_lo, "^", '\n\\('), - est_hi = stringr::str_replace(est_hi, "$", '\\)')) %>% + est_lo = stringr::str_replace(est_lo, "^", '\n\\('), + est_hi = stringr::str_replace(est_hi, "$", '\\)')) %>% dplyr::arrange(npi_name) %>% - unite(col="pi", est_lo:est_hi, sep="-") %>% - unite(col="estimate", estimate:pi, sep="\n") %>% + tidyr::unite(col="pi", est_lo:est_hi, sep="-") %>% + tidyr::unite(col="estimate", estimate:pi, sep="\n") %>% tidyr::pivot_wider(id_cols="name", values_from=estimate, names_from=npi_name) %>% dplyr::arrange(name) %>% dplyr::rename(Location=name) @@ -1450,28 +1527,28 @@ make_sparkline_tab_intervention_effect <- function(r_dat, dplyr::mutate(` `=NA) #Table - r_output <- tibble(r_tab, + r_output <- dplyr::tibble(r_tab, ` ` = NA, .rows = nrow(r_tab)) %>% - gt() %>% - text_transform( - locations = cells_body(vars(` `)), + gt::gt() %>% + gt::text_transform( + locations = gt::cells_body(vars(` `)), fn = function(x) { - purrr::map(r_plot$plot, ggplot_image, height = px(px_qual), aspect_ratio=wh_ratio) + purrr::map(r_plot$plot, ggplot_image, height = gt::px(px_qual), aspect_ratio=wh_ratio) } ) - + r_output %>% - tab_options(table.font.size = pct(80)) %>% - tab_options(column_labels.font.weight = "bold", + gt::tab_options(table.font.size = gt::pct(80)) %>% + gt::tab_options(column_labels.font.weight = "bold", table_body.hlines.color = "#000000", table_body.border.bottom.color="#000000", table_body.border.top.color = "#000000", column_labels.border.top.color = "#000000", column_labels.border.bottom.color = "#000000") %>% - tab_style(style=list(cell_text(style="oblique"), - cell_borders(sides="right")), - locations=cells_body(columns=vars(Location))) + gt::tab_style(style=list(gt::cell_text(style="oblique"), + gt::cell_borders(sides="right")), + locations=gt::cells_body(columns=vars(Location))) } @@ -1484,7 +1561,9 @@ make_sparkline_tab_intervention_effect <- function(r_dat, ##' @param hosp whether hospitalization data is included in truth_dat with varname currhosp ##' @param filter_by variable name for filtering estimates either: scenario or pdeath ##' @param filter_val desired value of variable -##' @param geodat df with location names +##' @param fig_labs names for the labels of the figures that will show incidence cases, deaths, and hospitalizations +##' @param pi_lo lower limit to interval +##' @param pi_hi upper limit to interval ##' ##' @return plot comparing observed and modeled estimates by geoid ##' @@ -1509,14 +1588,14 @@ plot_truth_by_county <- function(truth_dat, end_date<-as.Date(end_date) if(filter_by!="pdeath" & filter_by!="scenario") stop("You can only filter by 'pdeath' or 'scenario'") - group_var<-if_else(filter_by=="pdeath", "scenario_name", "pdeath") + group_var<-dplyr::if_else(filter_by=="pdeath", "scenario_name", "pdeath") county_dat<-county_dat%>% dplyr::filter(!!as.symbol(filter_by)==filter_val)%>% dplyr::group_by(geoid, !!as.symbol(group_var), sim_num, time=lubridate::ceiling_date(time, unit="week"))%>% dplyr::summarize(NincidCase=sum(NincidCase, na.rm=TRUE), - NincidDeath=sum(NincidDeath, na.rm=TRUE), - NhospCurr=mean(NhospCurr, na.rm=TRUE)) %>% + NincidDeath=sum(NincidDeath, na.rm=TRUE), + NhospCurr=mean(NhospCurr, na.rm=TRUE)) %>% dplyr::group_by(geoid) %>% dplyr::filter(time% dplyr::group_by(geoid, time=lubridate::ceiling_date(date, unit="week")) %>% dplyr::summarize(incidI=sum(incidI, na.rm=TRUE), - incidDeath=sum(incidDeath, na.rm=TRUE), - currhosp=mean(currhosp, na.rm=TRUE)) %>% + incidDeath=sum(incidDeath, na.rm=TRUE), + currhosp=mean(currhosp, na.rm=TRUE)) %>% dplyr::group_by(geoid)%>% dplyr::filter(time% + rc <- dplyr::bind_rows(truth_dat%>% dplyr::mutate(confirmed=incidI, type=fig_labs[1]), truth_dat%>% @@ -1542,29 +1621,29 @@ plot_truth_by_county <- function(truth_dat, truth_dat%>% dplyr::mutate(confirmed=currhosp, type=fig_labs[3])) %>% - dplyr::select(-starts_with("incid")) %>% + dplyr::select(-tidyselect::starts_with("incid")) %>% dplyr::right_join( - bind_rows(county_dat %>% - group_by(time, geoid, !!as.symbol(group_var))%>% - summarize(low=quantile(NincidCase,pi_lo), + dplyr::bind_rows(county_dat %>% + dplyr::group_by(time, geoid, !!as.symbol(group_var))%>% + dplyr::summarize(low=quantile(NincidCase,pi_lo), high=quantile(NincidCase,pi_hi), est=mean(NincidCase), type=fig_labs[1]), county_dat %>% - group_by(time, geoid, !!as.symbol(group_var))%>% - summarize(low=quantile(NincidDeath,pi_lo), + dplyr::group_by(time, geoid, !!as.symbol(group_var))%>% + dplyr::summarize(low=quantile(NincidDeath,pi_lo), high=quantile(NincidDeath,pi_hi), est=mean(NincidDeath), type=fig_labs[2]), county_dat %>% - group_by(time, geoid, !!as.symbol(group_var))%>% - summarize(low=quantile(NhospCurr,pi_lo), + dplyr::group_by(time, geoid, !!as.symbol(group_var))%>% + dplyr::summarize(low=quantile(NhospCurr,pi_lo), high=quantile(NhospCurr,pi_hi), est=mean(NhospCurr), type=fig_labs[3]))) %>% - ungroup() %>% + dplyr::ungroup() %>% dplyr::mutate(type = factor(type, levels = fig_labs[3:1]), - confirmed=if_else(confirmed==0, NA_real_, confirmed)) + confirmed=dplyr::if_else(confirmed==0, NA_real_, confirmed)) } else{ truth_dat <- truth_dat %>% @@ -1574,35 +1653,35 @@ plot_truth_by_county <- function(truth_dat, dplyr::filter(time% - mutate(confirmed=incidI, + dplyr::mutate(confirmed=incidI, type=fig_labs[1]), truth_dat%>% - mutate(confirmed=incidDeath, + dplyr::mutate(confirmed=incidDeath, type=fig_labs[2])) %>% - dplyr::select(-starts_with("incid")) %>% + dplyr::select(-tidyselect::starts_with("incid")) %>% dplyr::right_join( dplyr::bind_rows(county_dat %>% - group_by(time, geoid, !!as.symbol(group_var))%>% - summarize(low=quantile(NincidCase,pi_lo), + dplyr::roup_by(time, geoid, !!as.symbol(group_var))%>% + dplyr::summarize(low=quantile(NincidCase,pi_lo), high=quantile(NincidCase,pi_hi), est=mean(NincidCase), type=fig_labs[1]), county_dat %>% - group_by(time, geoid, !!as.symbol(group_var))%>% - summarize(low=quantile(NincidDeath,pi_lo), + dplyr::group_by(time, geoid, !!as.symbol(group_var))%>% + dplyr::summarize(low=quantile(NincidDeath,pi_lo), high=quantile(NincidDeath,pi_hi), est=mean(NincidDeath), type=fig_labs[2]))) %>% - ungroup() %>% + dplyr::ungroup() %>% dplyr::mutate(type = factor(type, levels = fig_labs), - confirmed=if_else(confirmed==0, NA_real_, confirmed)) + confirmed=dplyr::if_else(confirmed==0, NA_real_, confirmed)) } plot_rc<-list() for(i in 1:length(unique(as.character(rc$type)))){ plot_rc[[i]]<-rc %>% - filter(type==unique(as.character(rc$type))[i]) %>% + dplyr::filter(type==unique(as.character(rc$type))[i]) %>% dplyr::group_by(!!as.symbol(group_var), geoid)%>% dplyr::filter(time% dplyr::ungroup()%>% @@ -1619,8 +1698,8 @@ plot_truth_by_county <- function(truth_dat, strip.background.x = element_blank(), strip.background.y=element_rect(fill="white"), strip.text.y =element_text(face="bold"))+ - ylab("Counts (log scale)")+ - xlab("Time (weeks)")+ + ylab("Counts")+ + xlab("Date (week)")+ facet_grid(rows=vars(name), scales="free") + scale_y_sqrt()+ labs(subtitle = unique(as.character(rc$type))[i]) @@ -1629,17 +1708,15 @@ plot_truth_by_county <- function(truth_dat, return(plot_rc) } + ##' Time series comparing Rt estimates by scenario over time -##' @param county_dat df with geoid, sim_num, cum_inf, scenario, and time columns -##' @param truth_dat df with date, geoid, incidI, and Confirmed columns -##' @param r_dat df with daily estimate of Rt from load_r_daily_sims_filtered +##' @param rc df with daily rt estimates for one geoid/location and columns "estimate", "lower", and "upper" ##' @param geodat df with geoid and population columns ##' @param pdeath_filter which pdeath to select ##' @param scenario_colors colors for each scenario ##' @param scenario_levels levels applied to scenarios ##' @param scenario_labels label applied to scenarios -##' @param pi_lo lower limit to interval -##' @param pi_hi upper limit to interval +##' @param truth_source ##' ##' @return a table with the effectiveness per intervention period and a bar graph ##' @@ -1648,68 +1725,33 @@ plot_truth_by_county <- function(truth_dat, ##'@export ##' -plot_rt_ts <- function(county_dat=NULL, - truth_dat, - r_dat, - pdeath_filter = pdeath_default, - scenario_colors, - scenario_levels, - scenario_labels, - geodat=geodata, - incl_geoids, - susceptible=TRUE, - pi_lo=0.025, - pi_hi=0.975, - truth_source="USA Facts" -){ - require(tidyverse) - if(is.null(county_dat) & susceptible){stop("county_dat is missing")} - - geodat <- geodat %>% - rename(pop=starts_with("pop")) - - r_dat<-r_dat %>% - dplyr::filter(pdeath==pdeath_filter) %>% - dplyr::filter(geoid %in% incl_geoids) - - if(susceptible){ - rc<-county_dat %>% - dplyr::filter(pdeath==pdeath_filter) %>% - dplyr::filter(geoid %in% incl_geoids) %>% - left_join(geodat)%>% - dplyr::select(geoid, sim_num, cum_inf, pop, scenario, time) %>% - dplyr::right_join(r_dat) %>% - dplyr::group_by(scenario, date=time) %>% - dplyr::mutate(rt=rt*(1-cum_inf/pop)) %>% - dplyr::mutate(weight = pop/sum(pop)) - } else { - rc<- r_dat%>% - dplyr::group_by(scenario, date=time) %>% - left_join(geodat) %>% - dplyr::mutate(weight=pop/sum(pop)) - } +plot_rt_ts <- function(rc, + geodat, + truth_dat, + scenario_levels = scn_levels, + scenario_labels = scn_labels, + scenario_colors = scn_colors, + truth_source = "JHU CSSE", + pdeath_filter = "med"){ - rc<-rc %>% - dplyr::group_by(scenario, date) %>% - dplyr::summarize(estimate=Hmisc::wtd.mean(rt, weights=weight, normwt=TRUE), - lower=Hmisc::wtd.quantile(rt, weights=weight, normwt=TRUE, probs=pi_lo), - upper=Hmisc::wtd.quantile(rt, weights=weight, normwt=TRUE, probs=pi_hi)) + geodat<-geodat %>% + dplyr::rename(pop=tidyselect::starts_with("pop")) truth_dat<-truth_dat%>% - filter(geoid %in% incl_geoids) %>% - group_by(date)%>% - summarize(NincidConfirmed=sum(incidI), - NcumulConfirmed=sum(Confirmed)) + dplyr::filter(geoid %in% geodat$geoid) %>% + dplyr::group_by(date)%>% + dplyr::summarize(NincidConfirmed=sum(incidI), + NcumulConfirmed=sum(Confirmed)) truth_dat<-truth_dat%>% - dplyr::filter(NcumulConfirmed!=0)%>% - calcR0(geodat=geodat, by_geoid=FALSE, incl_geoids = incl_geoids, pop_col="pop") %>% + dplyr::filter(NcumulConfirmed>=2)%>% + calcR0(geodat=geodat, by_geoid=FALSE, incl_geoids = geodat$geoid, pop_col="pop") %>% dplyr::mutate(scenario=truth_source) - dplyr::bind_rows(rc, truth_dat) %>% + rc<-dplyr::bind_rows(rc, truth_dat) %>% dplyr::mutate(`Based on`=factor(scenario, - levels=c(scenario_levels, truth_source), - labels=c(scenario_labels, paste0(truth_source, " confirmed cases")))) %>% + levels=c(scenario_levels, truth_source), + labels=c(scenario_labels, paste0(truth_source, " confirmed cases")))) %>% ggplot(aes(x=date, y=estimate, ymin=lower, ymax=upper))+ geom_line(aes(col=`Based on`), size=0.75)+ geom_ribbon(aes(fill=`Based on`), alpha=0.12) + @@ -1726,8 +1768,13 @@ plot_rt_ts <- function(county_dat=NULL, theme(legend.position= "bottom", panel.grid=element_blank()) + guides(col=guide_legend(nrow=2)) + + return(rc) + } + + ##' Plot ratio of outcomes ##' @param hosp_state_totals df with hospitalization outcomes ##' @param start_date start of comparison period @@ -1779,7 +1826,7 @@ plot_scn_outcomes_ratio<-function(hosp_state_totals, dplyr::arrange(scenario_name) %>% dplyr::group_by(sim_num) %>% dplyr::mutate_if(is.numeric, function(x){x/lag(x)}) %>% - drop_na() %>% + tidyr::drop_na() %>% dplyr::group_by(scenario_name) %>% dplyr::summarize(AvghospCurr_lo=quantile(AvghospCurr, pi_lo), AvghospCurr_hi=quantile(AvghospCurr, pi_hi), @@ -1810,7 +1857,7 @@ plot_scn_outcomes_ratio<-function(hosp_state_totals, plt_dat<-dat_wide %>% dplyr::bind_rows() %>% tidyr::pivot_longer(cols=AvghospCurr_lo:NincidCase) %>% - dplyr::mutate(var=case_when(stringr::str_detect(name, "Avghosp")~"Daily average of occupied hospital beds", + dplyr::mutate(var=dplyr::case_when(stringr::str_detect(name, "Avghosp")~"Daily average of occupied hospital beds", stringr::str_detect(name, "incidHosp")~"Total hospital admissions", stringr::str_detect(name, "AvgICU") ~ "Daily average of occupied ICU beds", stringr::str_detect(name, "incidICU") ~ "Total ICU admissions", @@ -1822,7 +1869,7 @@ plot_scn_outcomes_ratio<-function(hosp_state_totals, "Daily average of occupied hospital beds", "Total hospital admissions", "Daily average of occupied ICU beds", "Total ICU admissions", "Daily average deaths", "Total deaths")), - name=case_when(stringr::str_detect(name, "_lo")~"lower", + name=dplyr::case_when(stringr::str_detect(name, "_lo")~"lower", stringr::str_detect(name, "_hi")~"upper", TRUE~"estimate")) %>% tidyr::pivot_wider(names_from=name, values_from=value) @@ -1877,11 +1924,13 @@ make_scn_county_table_withVent <- function(county_dat, dplyr::filter(!is.na(time)) %>% dplyr::filter(time >= start_date, time <= end_date) %>% dplyr::group_by(pdeath, sim_num, name, scenario_name) %>% - dplyr::summarize(TotalIncidCase = sum(NincidCase, na.rm = TRUE), + dplyr::summarize(TotalIncidInf = sum(NincidInf, na.rm=TRUE), + TotalIncidCase = sum(NincidCase, na.rm = TRUE), TotalIncidHosp = sum(NincidHosp, na.rm = TRUE), TotalIncidICU = sum(NincidICU, na.rm = TRUE), TotalIncidVent = sum(NincidVent, na.rm=TRUE), TotalIncidDeath = sum(NincidDeath, na.rm = TRUE), + AvgIncidInf = sum(NincidInf, na.rm=TRUE)/n(), AvgIncidCase = sum(NincidCase, na.rm=TRUE)/n(), AvgIncidDeath = sum(NincidDeath, na.rm = TRUE)/n(), maxHospAdm = max(NincidHosp, na.rm=TRUE), @@ -1892,81 +1941,93 @@ make_scn_county_table_withVent <- function(county_dat, maxVentCap = max(NVentCurr, na.rm=TRUE)) %>% dplyr::ungroup() %>% dplyr::group_by(pdeath, name, scenario_name) %>% - dplyr::summarize(nIncidCase_final = mean(TotalIncidCase), - nIncidCase_lo = quantile(TotalIncidCase, pi_lo), - nIncidCase_hi = quantile(TotalIncidCase, pi_hi), - aIncidCase_final = mean(AvgIncidCase), - aIncidCase_lo = quantile(AvgIncidCase, pi_lo), - aIncidCase_hi = quantile(AvgIncidCase, pi_hi), - aIncidDeath_final = mean(AvgIncidDeath), - aIncidDeath_lo = quantile(AvgIncidDeath, pi_lo), - aIncidDeath_hi = quantile(AvgIncidDeath, pi_hi), - nIncidHosp_final = mean(TotalIncidHosp), - nIncidHosp_lo = quantile(TotalIncidHosp, pi_lo), - nIncidHosp_hi = quantile(TotalIncidHosp, pi_hi), - pIncidHosp_final = mean(maxHospAdm), - pIncidHosp_lo = quantile(maxHospAdm, pi_lo), - pIncidHosp_hi = quantile(maxHospAdm, pi_hi), - nIncidICU_final = mean(TotalIncidICU), - nIncidICU_lo = quantile(TotalIncidICU, pi_lo), - nIncidICU_hi = quantile(TotalIncidICU, pi_hi), - pIncidICU_final = mean(maxICUAdm), - pIncidICU_lo = quantile(maxICUAdm, pi_lo), - pIncidICU_hi = quantile(maxICUAdm, pi_hi), - nIncidVent_final = mean(TotalIncidVent), - nIncidVent_lo = quantile(TotalIncidVent, pi_lo), - nIncidVent_hi = quantile(TotalIncidVent, pi_hi), - pIncidVent_final = mean(maxVentAdm),pIncidVent_lo = quantile(maxVentAdm, pi_lo), - pIncidVent_hi = quantile(maxVentAdm, pi_hi), - nIncidDeath_final = mean(TotalIncidDeath), - nIncidDeath_lo = quantile(TotalIncidDeath, pi_lo), - nIncidDeath_hi = quantile(TotalIncidDeath, pi_hi), - nCurrHosp_final = mean(maxHospCap), - nCurrHosp_lo = quantile(maxHospCap, pi_lo), - nCurrHosp_hi = quantile(maxHospCap, pi_hi), - nCurrICU_final = mean(maxICUCap), - nCurrICU_lo = quantile(maxICUCap, pi_lo), - nCurrICU_hi = quantile(maxICUCap, pi_hi), - nCurrVent_final = mean(maxVentCap), - nCurrVent_lo = quantile(maxVentCap, pi_lo), - nCurrVent_hi = quantile(maxVentCap, pi_hi)) %>% + dplyr::summarize(nIncidInf_final = mean(TotalIncidInf), + nIncidInf_lo = quantile(TotalIncidInf, pi_lo), + nIncidInf_hi = quantile(TotalIncidInf, pi_hi), + aIncidInf_final = mean(AvgIncidInf), + aIncidInf_lo = quantile(AvgIncidInf, pi_lo), + aIncidInf_hi = quantile(AvgIncidInf, pi_hi), + nIncidCase_final = mean(TotalIncidCase), + nIncidCase_lo = quantile(TotalIncidCase, pi_lo), + nIncidCase_hi = quantile(TotalIncidCase, pi_hi), + aIncidCase_final = mean(AvgIncidCase), + aIncidCase_lo = quantile(AvgIncidCase, pi_lo), + aIncidCase_hi = quantile(AvgIncidCase, pi_hi), + aIncidDeath_final = mean(AvgIncidDeath), + aIncidDeath_lo = quantile(AvgIncidDeath, pi_lo), + aIncidDeath_hi = quantile(AvgIncidDeath, pi_hi), + nIncidHosp_final = mean(TotalIncidHosp), + nIncidHosp_lo = quantile(TotalIncidHosp, pi_lo), + nIncidHosp_hi = quantile(TotalIncidHosp, pi_hi), + pIncidHosp_final = mean(maxHospAdm), + pIncidHosp_lo = quantile(maxHospAdm, pi_lo), + pIncidHosp_hi = quantile(maxHospAdm, pi_hi), + nIncidICU_final = mean(TotalIncidICU), + nIncidICU_lo = quantile(TotalIncidICU, pi_lo), + nIncidICU_hi = quantile(TotalIncidICU, pi_hi), + pIncidICU_final = mean(maxICUAdm), + pIncidICU_lo = quantile(maxICUAdm, pi_lo), + pIncidICU_hi = quantile(maxICUAdm, pi_hi), + nIncidVent_final = mean(TotalIncidVent), + nIncidVent_lo = quantile(TotalIncidVent, pi_lo), + nIncidVent_hi = quantile(TotalIncidVent, pi_hi), + pIncidVent_final = mean(maxVentAdm),pIncidVent_lo = quantile(maxVentAdm, pi_lo), + pIncidVent_hi = quantile(maxVentAdm, pi_hi), + nIncidDeath_final = mean(TotalIncidDeath), + nIncidDeath_lo = quantile(TotalIncidDeath, pi_lo), + nIncidDeath_hi = quantile(TotalIncidDeath, pi_hi), + nCurrHosp_final = mean(maxHospCap), + nCurrHosp_lo = quantile(maxHospCap, pi_lo), + nCurrHosp_hi = quantile(maxHospCap, pi_hi), + nCurrICU_final = mean(maxICUCap), + nCurrICU_lo = quantile(maxICUCap, pi_lo), + nCurrICU_hi = quantile(maxICUCap, pi_hi), + nCurrVent_final = mean(maxVentCap), + nCurrVent_lo = quantile(maxVentCap, pi_lo), + nCurrVent_hi = quantile(maxVentCap, pi_hi)) %>% dplyr::ungroup() %>% - dplyr::mutate(aIncidCase = prettyNum(conv_round(aIncidCase_final), big.mark=",", scientific=FALSE,trim=TRUE), - aIncidCase_CI = make_CI(aIncidCase_lo, aIncidCase_hi), - aIncidDeath = prettyNum(conv_round(aIncidDeath_final), big.mark=",", scientific=FALSE,trim=TRUE), - aIncidDeath_CI = make_CI(aIncidDeath_lo, aIncidDeath_hi), - nIncidCase = prettyNum(conv_round(nIncidCase_final), big.mark=",",scientific=FALSE,trim=TRUE), - nIncidCase_CI = prettyNum(make_CI(nIncidCase_lo, nIncidCase_hi), big.mark=",",scientific=FALSE,trim=TRUE), - nIncidHosp = prettyNum(conv_round(nIncidHosp_final), big.mark=",",scientific=FALSE,trim=TRUE), - nIncidHosp_CI = make_CI(nIncidHosp_lo, nIncidHosp_hi), - pIncidHosp = prettyNum(conv_round(pIncidHosp_final), big.mark=",",scientific=FALSE,trim=TRUE), - pIncidHosp_CI = make_CI(pIncidHosp_lo, pIncidHosp_hi), - nCurrHosp = prettyNum(conv_round(nCurrHosp_final), big.mark=",",scientific=FALSE,trim=TRUE), - nCurrHosp_CI = make_CI(nCurrHosp_lo, nCurrHosp_hi), - nIncidICU = prettyNum(conv_round(nIncidICU_final), big.mark=",",scientific=FALSE,trim=TRUE), - nIncidICU_CI = make_CI(nIncidICU_lo, nIncidICU_hi), - pIncidICU = prettyNum(conv_round(pIncidICU_final), big.mark=",",scientific=FALSE,trim=TRUE), - pIncidICU_CI = make_CI(pIncidICU_lo, pIncidICU_hi), - nCurrICU = prettyNum(conv_round(nCurrICU_final), big.mark=",",scientific=FALSE,trim=TRUE), - nCurrICU_CI = make_CI(nCurrICU_lo, nCurrICU_hi), - nIncidVent = prettyNum(conv_round(nIncidVent_final), big.mark=",",scientific=FALSE,trim=TRUE), - nIncidVent_CI = make_CI(nIncidVent_lo, nIncidVent_hi), - pIncidVent = prettyNum(conv_round(pIncidVent_final), big.mark=",",scientific=FALSE,trim=TRUE), - pIncidVent_CI = make_CI(pIncidVent_lo, pIncidVent_hi), - nCurrVent = prettyNum(conv_round(nCurrVent_final), big.mark=",",scientific=FALSE,trim=TRUE), - nCurrVent_CI = make_CI(nCurrVent_lo, nCurrVent_hi), - nIncidDeath = prettyNum(conv_round(nIncidDeath_final), big.mark=",",scientific=FALSE,trim=TRUE), - nIncidDeath_CI = make_CI(nIncidDeath_lo, nIncidDeath_hi)) %>% - dplyr::select(-ends_with("lo"), -ends_with("hi"), -ends_with("final")) + dplyr::mutate(aIncidInf = prettyNum(conv_round(aIncidInf_final), big.mark=",", scientific=FALSE,trim=TRUE), + aIncidInf_CI = make_CI(aIncidInf_lo, aIncidInf_hi), + aIncidCase = prettyNum(conv_round(aIncidCase_final), big.mark=",", scientific=FALSE,trim=TRUE), + aIncidCase_CI = make_CI(aIncidCase_lo, aIncidCase_hi), + aIncidDeath = prettyNum(conv_round(aIncidDeath_final), big.mark=",", scientific=FALSE,trim=TRUE), + aIncidDeath_CI = make_CI(aIncidDeath_lo, aIncidDeath_hi), + nIncidInf = prettyNum(conv_round(nIncidInf_final), big.mark=",", scientific=FALSE,trim=TRUE), + nIncidInf_CI = make_CI(nIncidInf_lo, nIncidInf_hi), + nIncidCase = prettyNum(conv_round(nIncidCase_final), big.mark=",",scientific=FALSE,trim=TRUE), + nIncidCase_CI = prettyNum(make_CI(nIncidCase_lo, nIncidCase_hi), big.mark=",",scientific=FALSE,trim=TRUE), + nIncidHosp = prettyNum(conv_round(nIncidHosp_final), big.mark=",",scientific=FALSE,trim=TRUE), + nIncidHosp_CI = make_CI(nIncidHosp_lo, nIncidHosp_hi), + pIncidHosp = prettyNum(conv_round(pIncidHosp_final), big.mark=",",scientific=FALSE,trim=TRUE), + pIncidHosp_CI = make_CI(pIncidHosp_lo, pIncidHosp_hi), + nCurrHosp = prettyNum(conv_round(nCurrHosp_final), big.mark=",",scientific=FALSE,trim=TRUE), + nCurrHosp_CI = make_CI(nCurrHosp_lo, nCurrHosp_hi), + nIncidICU = prettyNum(conv_round(nIncidICU_final), big.mark=",",scientific=FALSE,trim=TRUE), + nIncidICU_CI = make_CI(nIncidICU_lo, nIncidICU_hi), + pIncidICU = prettyNum(conv_round(pIncidICU_final), big.mark=",",scientific=FALSE,trim=TRUE), + pIncidICU_CI = make_CI(pIncidICU_lo, pIncidICU_hi), + nCurrICU = prettyNum(conv_round(nCurrICU_final), big.mark=",",scientific=FALSE,trim=TRUE), + nCurrICU_CI = make_CI(nCurrICU_lo, nCurrICU_hi), + nIncidVent = prettyNum(conv_round(nIncidVent_final), big.mark=",",scientific=FALSE,trim=TRUE), + nIncidVent_CI = make_CI(nIncidVent_lo, nIncidVent_hi), + pIncidVent = prettyNum(conv_round(pIncidVent_final), big.mark=",",scientific=FALSE,trim=TRUE), + pIncidVent_CI = make_CI(pIncidVent_lo, pIncidVent_hi), + nCurrVent = prettyNum(conv_round(nCurrVent_final), big.mark=",",scientific=FALSE,trim=TRUE), + nCurrVent_CI = make_CI(nCurrVent_lo, nCurrVent_hi), + nIncidDeath = prettyNum(conv_round(nIncidDeath_final), big.mark=",",scientific=FALSE,trim=TRUE), + nIncidDeath_CI = make_CI(nIncidDeath_lo, nIncidDeath_hi)) %>% + dplyr::select(-tidyselect::ends_with("lo"), -tidyselect::ends_with("hi"), -tidyselect::ends_with("final")) county_tab <- county_tab[order(colnames(county_tab))] county_tab <- county_tab %>% + unite("InfAvg", aIncidInf:aIncidInf_CI, sep="\n") %>% unite("CaseAvg", aIncidCase:aIncidCase_CI, sep="\n") %>% unite("DeathAvg", aIncidDeath:aIncidDeath_CI, sep="\n") %>% unite("HospPeakMax", nCurrHosp:nCurrHosp_CI, sep="\n") %>% unite("ICUPeakMax", nCurrICU:nCurrICU_CI, sep="\n") %>% unite("VentPeakMax", nCurrVent:nCurrVent_CI, sep="\n") %>% + unite("InfIncid", nIncidInf:nIncidInf_CI, sep="\n") %>% unite("DeathIncid", nIncidDeath:nIncidDeath_CI, sep="\n") %>% unite("HospIncid", nIncidHosp:nIncidHosp_CI, sep="\n") %>% unite("ICUIncid", nIncidICU:nIncidICU_CI, sep="\n") %>% @@ -1977,7 +2038,7 @@ make_scn_county_table_withVent <- function(county_dat, unite("VentPeakAdmin", pIncidVent:pIncidVent_CI, sep="\n") county_tab <- county_tab[order(colnames(county_tab))] %>% - dplyr::select(name, scenario_name, pdeath, starts_with("Case"), starts_with("Hosp"), starts_with("ICU"), starts_with("Vent"), starts_with("Death")) + dplyr::select(name, pdeath, scenario_name, tidyselect::starts_with("Inf"), tidyselect::starts_with("Case"), tidyselect::starts_with("Hosp"), tidyselect::starts_with("ICU"), tidyselect::starts_with("Vent"), tidyselect::starts_with("Death")) if(!is.na(pdeath_filter)){ newnames <- c(NA_character_, NA_character_, "Daily average","Total", "Total", "Daily peak admissions", "Daily peak capacity", "Total", "Daily peak admissions", "Daily peak capacity", "Total", "Daily peak admissions", "Daily peak capacity", "Daily average", "Total") @@ -1985,17 +2046,18 @@ make_scn_county_table_withVent <- function(county_dat, county_tab %>% dplyr::arrange(name) %>% dplyr::filter(pdeath == pdeath_filter) %>% - dplyr::select(-pdeath) %>% + dplyr::select(-pdeath, -InfAvg, -InfIncid) %>% flextable::flextable() %>% - flextable::set_header_labels(name = "County", scenario_name = "Scenario", CaseAvg = "CONFIRMED CASES", CaseIncid = "CONFIRMED CASES", HospIncid = "HOSPITALIZATIONS", # pdeath= "IFR", + flextable::set_header_labels(name = "County", scenario_name = "Scenario", + CaseAvg = "CONFIRMED CASES", CaseIncid = "CONFIRMED CASES", HospIncid = "HOSPITALIZATIONS", HospPeakAdmin = "HOSPITALIZATIONS", HospPeakMax = "HOSPITALIZATIONS", ICUIncid = "ICU", ICUPeakAdmin = "ICU", ICUPeakMax = "ICU", VentIncid = "VENTILATIONS", VentPeakAdmin = "VENTILATIONS", VentPeakMax = "VENTILATIONS", DeathAvg = "DEATHS", DeathIncid = "DEATHS") %>% - flextable::merge_at(i = 1, j = 3:4, part = "header") %>% - flextable::merge_at(i = 1, j = 5:7, part = "header") %>% - flextable::merge_at(i = 1, j = 8:10, part = "header") %>% + flextable::merge_at(i = 1, j = 3:4, part = "header") %>% + flextable::merge_at(i = 1, j = 5:7, part = "header") %>% + flextable::merge_at(i = 1, j = 8:10, part = "header") %>% flextable::merge_at(i = 1, j = 11:13, part = "header") %>% - flextable::merge_at(i = 1, j = 14:15, part = "header") %>% + flextable::merge_at(i = 1, j = 14:15, part = "header") %>% flextable::add_header_row(values = c(newnames), top = FALSE) %>% #flextable::merge_v(j = 1) %>% flextable::autofit() %>% @@ -2004,35 +2066,7 @@ make_scn_county_table_withVent <- function(county_dat, flextable::align(align="center", part = "all") %>% flextable::bold(part="header")%>% flextable::bold(j=1, part="body") - } else { - newnames <- c(NA_character_,NA_character_, NA_character_,"Daily average", "Total", "Total", "Daily peak admissions", "Daily peak capacity", "Total", "Daily - peak admissions", "Daily peak capacity", "Total", "Daily peak admissions", "Daily peak capacity", "Daily average", "Total") - - county_tab %>% - dplyr::mutate(pdeath = factor(pdeath, - levels=pdeath_levels, - labels=pdeath_labels)) %>% - dplyr::arrange(name, desc(pdeath)) %>% - flextable::flextable() %>% - flextable::set_header_labels(name = "County", scenario_name="Scenario", pdeath = "IFR", CaseAvg = "CONFIRMED CASES", - CaseIncid = "CONFIRMED CASES", HospIncid = "HOSPITALIZATIONS", - HospPeakAdmin = "HOSPITALIZATIONS", HospPeakMax = "HOSPITALIZATIONS", ICUIncid = "ICU", - ICUPeakAdmin = "ICU", ICUPeakMax = "ICU", VentIncid = "VENTILATIONS", VentPeakAdmin = "VENTILATIONS", - VentPeakMax = "VENTILATIONS", DeathAvg = "DEATHS", DeathIncid = "DEATHS") %>% - flextable::merge_at(i = 1, j = 4:5, part = "header") %>% - flextable::merge_at(i = 1, j = 6:8, part = "header") %>% - flextable::merge_at(i = 1, j = 9:11, part = "header") %>% - flextable::merge_at(i = 1, j = 12:14, part = "header") %>% - flextable::merge_at(i = 1, j = 15:16, part = "header") %>% - flextable::add_header_row(values = c(newnames), top = FALSE) %>% - #flextable::merge_v(j = 1) %>% - flextable::autofit() %>% - #flextable::border(i=seq(3, 174, by = 3), border.bottom=officer::fp_border(color="grey", width=0.5)) %>% - flextable::border(j=c(1,2,3,5,8,11,14,16), border.right = officer::fp_border(color="grey", style = "solid", width=0.5)) %>% - flextable::align(align="center", part = "all") %>% - flextable::bold(part="header")%>% - flextable::bold(j=c(1,2,3), part="body") - } + } } @@ -2083,9 +2117,9 @@ calcR0 <- function(USAfacts, Rt1 <- dplyr::bind_rows(Rt1) }else{ covid <- covid %>% - group_by(Date) %>% - summarise_if(is.numeric, sum) %>% - ungroup() + dplyr::group_by(Date) %>% + dplyr::summarise_if(is.numeric, sum) %>% + dplyr::ungroup() pop <- sum(geodat[geodat$geoid %in% incl_geoids, pop_col]) incid <- setNames(covid$New.Cases,1:nrow(covid)) estR0 <- R0::estimate.R(incid, mGT, begin=1, end=as.numeric(length(incid)), methods=c("TD"), pop.size=pop, nsim=1000) @@ -2269,7 +2303,7 @@ plot_county_outcomes <- function(county_dat, start_date <- as.Date(start_date) end_date <- as.Date(end_date) - group_var <- if_else(filter_by=="pdeath", "scenario", "pdeath") + group_var <- dplyr::if_else(filter_by=="pdeath", "scenario", "pdeath") county_dat<- county_dat %>% dplyr::filter(!!as.symbol(filter_by)==filter_val)%>% @@ -2305,7 +2339,7 @@ plot_county_outcomes <- function(county_dat, for(i in 1:length(rc_type)){ plot_rc[[i]]<-rc %>% - filter(type==rc_type[i]) %>% + dplyr::filter(type==rc_type[i]) %>% ggplot(aes(x=time)) + geom_line(aes(y=est, color=var)) + geom_ribbon(aes(ymin=lo, ymax=hi, fill=var), alpha=0.1)+ @@ -2362,12 +2396,12 @@ forecast_plot<-function(usa_facts=NULL, if(!any(scenarios %in% unique(county_dat$scenario))){stop("None of the specified scenarios exist in county_dat.")} forecast_start<-as.Date(forecast_start) - truth_var<-if_else(str_detect(var, "ase"), "incidI", - if_else(str_detect(var, "eath"), "incidDeath", - if_else(str_detect(var, "osp"), "hosps", NULL))) + truth_var<-dplyr::if_else(stringr::str_detect(var, "ase"), "incidI", + dplyr::if_else(stringr::str_detect(var, "eath"), "incidDeath", + dplyr::if_else(stringr::str_detect(var, "osp"), "hosps", NULL))) county_dat<-county_dat%>% - filter(scenario %in% scenarios) + dplyr::filter(scenario %in% scenarios) if(is.null(reichlab)){ color_vals <- c("black", color_vals[1:length(forecast)]) @@ -2379,79 +2413,78 @@ forecast_plot<-function(usa_facts=NULL, if(is.null(usa_facts)){ county_dat<-county_dat%>% - mutate(scenario=scenario_name) %>% - group_by(time=lubridate::ceiling_date(time, "weeks"), sim_num, scenario) %>% - summarize(est=sum(!!as.symbol(var)))%>% - group_by(time, scenario) %>% - summarize(lo=quantile(est, pi_lo), + dplyr::mutate(scenario=scenario_name) %>% + dplyr::group_by(time=lubridate::ceiling_date(time, "weeks"), sim_num, scenario) %>% + dplyr::summarize(est=sum(!!as.symbol(var)))%>% + dplyr::group_by(time, scenario) %>% + dplyr::summarize(lo=quantile(est, pi_lo), hi=quantile(est, pi_hi), mean=mean(est), median=median(est)) %>% - mutate(truth_var=NA_real_) %>% - group_by(scenario) %>% - filter(time!=max(time))%>% - rename(est=!!as.symbol(tendency)) + dplyr::mutate(truth_var=NA_real_) %>% + dplyr::group_by(scenario) %>% + dplyr::filter(time!=max(time))%>% + dplyr::rename(est=!!as.symbol(tendency)) } else{ usa_facts<-usa_facts %>% dplyr::filter(date <= forecast_start) %>% - group_by(time=lubridate::ceiling_date(date, "weeks")) %>% - summarize(truth_var=sum(!!as.symbol(truth_var))) %>% - filter(time!=max(time)| + dplyr::group_by(time=lubridate::ceiling_date(date, "weeks")) %>% + dplyr::summarize(truth_var=sum(!!as.symbol(truth_var))) %>% + dplyr::filter(time!=max(time)| (as.numeric(as.Date(forecast_start)-time)==6)) %>% - mutate(scenario=truth_source) + dplyr::mutate(scenario=truth_source) county_dat<-county_dat %>% - mutate(scenario=scenario_name) %>% - group_by(time=lubridate::ceiling_date(time, "weeks"), sim_num, scenario) %>% - summarize(est=sum(!!as.symbol(var))) %>% - group_by(time, scenario) %>% - summarize(lo=quantile(est, pi_lo), + dplyr::mutate(scenario=scenario_name) %>% + dplyr::group_by(time=lubridate::ceiling_date(time, "weeks"), sim_num, scenario) %>% + dplyr::summarize(est=sum(!!as.symbol(var))) %>% + dplyr::group_by(time, scenario) %>% + dplyr::summarize(lo=quantile(est, pi_lo), hi=quantile(est, pi_hi), mean=mean(est), median=median(est)) %>% - filter(time>=max(usa_facts$time)) %>% - group_by(scenario) %>% - filter(time!=max(time)) %>% - rename(est=!!as.symbol(tendency))%>% - bind_rows(usa_facts, .) + dplyr::filter(time>=max(usa_facts$time)) %>% + dplyr::group_by(scenario) %>% + dplyr::filter(time!=max(time)) %>% + dplyr::rename(est=!!as.symbol(tendency))%>% + dplyr::bind_rows(usa_facts, .) - } - + } if(!is.null(reichlab)){ - reichlab_var<-if_else(str_detect(var, "ase"), "case", "death") + reichlab_var<-dplyr::if_else(stringr::str_detect(var, "ase"), "case", "death") county_dat<-reichlab %>% - filter(quantile %in% c(pi_lo, 0.5, pi_hi) & - str_detect(target_variable, paste("inc", reichlab_var))) %>% - mutate(N=as.numeric(horizon), - quantile=case_when(quantile==0.5~"est", + dplyr::filter(quantile %in% c(pi_lo, 0.5, pi_hi) & + stringr::str_detect(target_variable, paste("inc", reichlab_var))) %>% + dplyr::mutate(N=as.numeric(horizon), + quantile=dplyr::case_when(quantile==0.5~"est", quantile==pi_lo~"lo", quantile==pi_hi~"hi"), time=lubridate::ceiling_date(forecast_date, "weeks")+7*(N-1), scenario="COVID-19 Forecast Hub") %>% dplyr::select(N, time, scenario, quantile, value) %>% - pivot_wider(names_from=quantile, values_from=value) %>% + tidyr::pivot_wider(names_from=quantile, values_from=value) %>% dplyr::select(time, scenario, lo, hi, est) %>% - bind_rows(county_dat) + dplyr::bind_rows(county_dat) } - + if(is.null(usa_facts)){ ymax <- max( - county_dat %>% filter(time > xmin_date) %>% filter(est==max(est)) %>% pull(est) %>% unique() + county_dat %>% dplyr::filter(time > xmin_date) %>% dplyr::filter(est==max(est)) %>% dplyr::pull(est) %>% unique() ) } else { ymax <- max( - usa_facts %>% filter(time > xmin_date) %>% filter(truth_var==max(truth_var)) %>% pull(truth_var) %>% unique(), - county_dat %>% filter(time > xmin_date) %>% filter(est==max(est)) %>% pull(est) %>% unique() + usa_facts %>% dplyr::filter(time > xmin_date) %>% dplyr::filter(truth_var==max(truth_var)) %>% dplyr::pull(truth_var) %>% unique(), + county_dat %>% dplyr::filter(time > xmin_date) %>% dplyr::filter(est==max(est)) %>% dplyr::pull(est) %>% unique() ) } plot_dat<-county_dat %>% - mutate(scenario=factor(scenario, levels=scen_levels)) %>% - arrange(scenario, time) %>% + dplyr::mutate(scenario=factor(scenario, levels=scen_levels)) %>% + dplyr::arrange(scenario, time) %>% ggplot(aes(x=time))+ geom_line(aes(y=truth_var, col=scenario))+ geom_line(aes(y=est, col=scenario), linetype="dashed")+ @@ -2466,3 +2499,707 @@ forecast_plot<-function(usa_facts=NULL, return(plot_dat) } + + +##' Time series comparing reported and estimated cases and deaths in each +##' geoid by pdeath or scenario; possible to show hospitalization data +##' +##' @param truth_dat df with date, geoid, incidI, incidDeath; hosps if adding +##' hospitalization data +##' @param model_dat df with model estimates +##' @param hosp whether hospitalization data is included in truth_dat with varname currhosp +##' @param filter_by variable name for filtering estimates either: scenario or pdeath +##' @param filter_val desired value of variable +##' @param geodat df with location names +##' @param fig_labs names for the labels of the figures that will show incidence cases, deaths, and hospitalizations +##' @param pi_lo lower limit to interval +##' @param pi_hi upper limit to interval +##' +##' @return plot comparing observed and modeled estimates by geoid +##' +##' +##' +##'@export +##' +plot_truth_by_location <- function(truth_dat, + model_dat, + hosp=FALSE, + filter_by, + filter_val, + start_date, + end_date, + geodat=geodata, + fig_labs=c("Incident Cases", "Incident Deaths"), + pi_lo=0.025, + pi_hi=0.975 +){ + + if(filter_by!="pdeath" & filter_by!="scenario") stop("You can only filter by 'pdeath' or 'scenario'") + + group_var<-if_else(filter_by=="pdeath", "scenario_name", "pdeath") + + model_dat<-model_dat%>% + dplyr::filter(!!as.symbol(filter_by)==filter_val)%>% + dplyr::group_by(geoid, name, !!as.symbol(group_var), sim_num, time=lubridate::ceiling_date(time, unit="week"))%>% + dplyr::summarize(NincidCase=sum(NincidCase, na.rm=TRUE), + NincidDeath=sum(NincidDeath, na.rm=TRUE), + NincidHosp=sum(NincidHosp, na.rm=TRUE)) %>% + dplyr::group_by(geoid) %>% + dplyr::filter(time% + dplyr::group_by(geoid, name, time=lubridate::ceiling_date(date, unit="week")) %>% + dplyr::summarize(incidI=sum(incidI, na.rm=TRUE), + incidDeath=sum(incidDeath, na.rm=TRUE), + incidH=sum(incidH, na.rm=TRUE)) %>% + dplyr::group_by(geoid)%>% + dplyr::filter(time% + dplyr::mutate(confirmed=incidI, + type=fig_labs[1]), + truth_dat%>% + dplyr::mutate(confirmed=incidDeath, + type=fig_labs[2]), + truth_dat%>% + dplyr::mutate(confirmed=incidH, + type=fig_labs[3])) %>% + dplyr::select(-starts_with("incid")) %>% + dplyr::right_join( + bind_rows(model_dat %>% + group_by(time, geoid, name, !!as.symbol(group_var))%>% + summarize(low=quantile(NincidCase,pi_lo), + high=quantile(NincidCase,pi_hi), + est=mean(NincidCase), + type=fig_labs[1]), + model_dat %>% + group_by(time, geoid, name, !!as.symbol(group_var))%>% + summarize(low=quantile(NincidDeath,pi_lo), + high=quantile(NincidDeath,pi_hi), + est=mean(NincidDeath), + type=fig_labs[2]), + model_dat %>% + group_by(time, geoid, name, !!as.symbol(group_var))%>% + summarize(low=quantile(NincidHosp,pi_lo), + high=quantile(NincidHosp,pi_hi), + est=mean(NincidHosp), + type=fig_labs[3]))) %>% + dplyr::ungroup() %>% + dplyr::mutate(type = factor(type, levels = fig_labs[3:1]), + confirmed=if_else(confirmed==0, NA_real_, confirmed)) + } else{ + + truth_dat <- truth_dat %>% + dplyr::group_by(geoid, name, time=lubridate::ceiling_date(date, unit="week")) %>% + dplyr::summarize(incidI=sum(incidI), + incidDeath=sum(incidDeath)) %>% + dplyr::filter(time% + mutate(confirmed=incidI, + type=fig_labs[1]), + truth_dat%>% + mutate(confirmed=incidDeath, + type=fig_labs[2])) %>% + dplyr::select(-starts_with("incid")) %>% + dplyr::right_join( + dplyr::bind_rows(model_dat %>% + group_by(time, geoid, name, !!as.symbol(group_var))%>% + summarize(low=quantile(NincidCase,pi_lo), + high=quantile(NincidCase,pi_hi), + est=mean(NincidCase), + type=fig_labs[1]), + model_dat %>% + group_by(time, geoid, name, !!as.symbol(group_var))%>% + summarize(low=quantile(NincidDeath,pi_lo), + high=quantile(NincidDeath,pi_hi), + est=mean(NincidDeath), + type=fig_labs[2]))) %>% + dplyr::ungroup() %>% + dplyr::mutate(type = factor(type, levels = fig_labs), + confirmed=if_else(confirmed==0, NA_real_, confirmed)) + } + + plot_rc<-list() + + for(i in 1:length(unique(as.character(rc$type)))){ + plot_rc[[i]]<-rc %>% + filter(type==unique(as.character(rc$type))[i]) %>% + dplyr::group_by(!!as.symbol(group_var), geoid, name)%>% + dplyr::filter(time% + dplyr::ungroup()%>% + dplyr::filter(time>as.Date(start_date), time% + dplyr::left_join(geodat)%>% + ggplot(aes(x=time)) + + geom_line(aes(y=est, color=!!as.symbol(group_var))) + + geom_ribbon(alpha=0.1, aes(fill=!!as.symbol(group_var), ymin=low, ymax=high))+ + geom_point(aes(y=confirmed), color="black") + + theme_bw()+ + theme(panel.grid = element_blank(), + legend.title=element_blank(), + legend.position="bottom", + strip.background.x = element_blank(), + strip.background.y=element_rect(fill="white"), + strip.text.y =element_text(face="bold"))+ + ylab("Counts")+ + xlab("Date (week)")+ + facet_grid(rows=vars(name), scales="free") + + scale_y_sqrt()+ + labs(subtitle = unique(as.character(rc$type))[i]) + } + + return(plot_rc) +} + +##' Plot intermediate likelihood over time for each geoID +##' +##' @param llik_interm df with geoid, name, ll, lik_type, scenario, pdeath, iter_num +##' @param filter_by variable name for filtering estimates either: scenario or pdeath +##' @param filter_val desired value of variable +##' @param burn_in number of iterations to discard at the beginning +##' +##' @return plot of accepted global and chimeric log likelihood values by geoid +##' +##' +##' +##'@export +##' +plot_llik_by_location <- function(llik_interm, + filter_by = "pdeath", + filter_val = "med", + burn_in = 0 +){ + + if(filter_by!="pdeath" & filter_by!="scenario") stop("You can only filter by 'pdeath' or 'scenario'") + group_var<-if_else(filter_by=="pdeath", "scenario", "pdeath") + + llik_interm <-llik_interm %>% + dplyr::filter(!!as.symbol(filter_by)==filter_val)%>% + dplyr::filter(iter_num >= burn_in)%>% + dplyr::mutate(slot_num = as.factor(slot_num))%>% + drop_na(iter_num) # for now, because don't know what it means to have no iteration number + + rc <- llik_interm + + group_names <- unique(rc[group_var]) #names of groups which will form columns of grid plot + + plot_rc<-list() + + for(i in 1:length(group_names)){ + print(i) + plot_rc[[i]]<-rc %>% + dplyr::filter(!!as.symbol(group_var)==group_names[i]) %>% + dplyr::filter(accept==1)%>% + ggplot(aes(x=iter_num)) + + geom_step(aes(y=ll, color=lik_type, alpha=slot_num)) + + theme_bw()+ + theme(panel.grid = element_blank(), + legend.title=element_blank(), + legend.position="bottom", + strip.background.x = element_blank(), + strip.background.y=element_rect(fill="white"), + strip.text.y =element_text(face="bold"))+ + ylab("Log Likelihood")+ + xlab("Iterations")+ + facet_grid(rows=vars(name), scales="free") + + scale_alpha_discrete(range=if(length(unique((rc$slot_num)))==1){c(0.99,1)}else{c(0.2,1)}) + + #scale_y_sqrt()+ + labs(subtitle = group_names[i]) + } + + return(plot_rc) +} + +##' Plot intermediate likelihood, combined over time for each geoid +##' +##' @param llik_interm df with geoid, name, ll, lik_type, scenario, pdeath, iter_num +##' @param filter_by variable name for filtering estimates either: scenario or pdeath +##' @param filter_val desired value of variable +##' @param burn_in number of iterations to discard at the beginning +##' +##' @return plot of accepted global and chimeric log likelihood values for entire nation +##' +##' +##' +##'@export +##' +plot_llik_total <- function(llik_interm, + filter_by = "pdeath", + filter_val = "med", + burn_in = 0 +){ + + if(filter_by!="pdeath" & filter_by!="scenario") stop("You can only filter by 'pdeath' or 'scenario'") + group_var<-if_else(filter_by=="pdeath", "scenario", "pdeath") + + llik_interm <-llik_interm %>% + dplyr::filter(!!as.symbol(filter_by)==filter_val)%>% + dplyr::filter(iter_num >= burn_in)%>% + dplyr::mutate(slot_num = as.factor(slot_num))%>% + drop_na(iter_num) # for now, because don't know what it means to have no iteration number + + rc <- llik_interm + + group_names <- unique(rc[group_var]) #names of groups which will form columns of grid plot + + plot_rc<-list() + + for(i in 1:length(group_names)){ + print(i) + plot_rc[[i]]<-rc %>% + filter(!!as.symbol(group_var)==group_names[i]) %>% + dplyr::group_by(lik_type,iter_num,slot_num) %>% + dplyr::summarize(ll=sum(ll, na.rm=TRUE),accept=any(accept))%>% #add log likelihoods for all geoids together at each timepoint + dplyr::filter(accept==1)%>% + ggplot(aes(x=iter_num)) + + geom_step(aes(y=ll, color=lik_type,alpha=slot_num)) + + theme_bw()+ + ylab("Log Likelihood")+ + xlab("Iterations")+ + scale_alpha_discrete(range=if(length(unique((rc$slot_num)))==1){c(0.99,1)}else{c(0.2,1)}) + + #scale_y_sqrt()+ + labs(subtitle = group_names[i]) + } + + return(plot_rc) +} + +##' Plot acceptance rate over time for each geoID and combined global +##' +##' @param llik_interm df with geoid, name, ll, lik_type, scenario, pdeath, iter_num +##' @param filter_by variable name for filtering estimates either: scenario or pdeath +##' @param filter_val desired value of variable +##' @param burn_in number of iterations to discard at the beginning +##' +##' @return plot of chimeric and global acceptance rate by geoid +##' +##' +##' +##'@export +##' +plot_accept_by_location <- function(llik_interm, + filter_by = "pdeath", + filter_val = "med", + burn_in = 0 +){ + + if(filter_by!="pdeath" & filter_by!="scenario") stop("You can only filter by 'pdeath' or 'scenario'") + group_var<-if_else(filter_by=="pdeath", "scenario", "pdeath") + + llik_interm <-llik_interm %>% + dplyr::filter(!!as.symbol(filter_by)==filter_val)%>% + dplyr::filter(iter_num >= burn_in)%>% + dplyr::mutate(slot_num = as.factor(slot_num))%>% + drop_na(iter_num) # for now, because don't know what it means to have no iteration number + + rc <- llik_interm + + group_names <- unique(rc[group_var]) #names of groups which will form columns of grid plot + + plot_rc<-list() + + for(i in 1:length(group_names)){ + print(i) + plot_rc[[i]]<-rc %>% + filter(!!as.symbol(group_var)==group_names[i]) %>% + ggplot(aes(x=iter_num)) + + geom_line(aes(y=accept_avg, color=lik_type,alpha=slot_num)) + + theme_bw()+ + theme(panel.grid = element_blank(), + legend.title=element_blank(), + legend.position="bottom", + strip.background.x = element_blank(), + strip.background.y=element_rect(fill="white"), + strip.text.y =element_text(face="bold"))+ + ylab("Average acceptance rate")+ + xlab("Iterations")+ + facet_grid(rows=vars(name), scales="fixed") + + scale_alpha_discrete(range=if(length(unique((rc$slot_num)))==1){c(0.99,1)}else{c(0.2,1)}) + + #scale_y_sqrt()+ + labs(subtitle = group_names[i]) + } + + return(plot_rc) +} + +##' Plot rolling mean acceptance rate over time for each geoID and combined global +##' +##' @param llik_interm df with geoid, name, ll, lik_type, scenario, pdeath, iter_num +##' @param filter_by variable name for filtering estimates either: scenario or pdeath +##' @param filter_val desired value of variable +##' @param burn_in number of iterations to discard at the beginning +##' +##' @return plot of chimeric and global acceptance rate by geoid +##' +##' +##' +##'@export +##' +plot_accept_by_location_rolling <- function(llik_interm, + filter_by = "pdeath", + filter_val = "med", + roll_period = 10, + burn_in = 0 +){ + + if(filter_by!="pdeath" & filter_by!="scenario") stop("You can only filter by 'pdeath' or 'scenario'") + group_var<-if_else(filter_by=="pdeath", "scenario", "pdeath") + + llik_interm <-llik_interm %>% + dplyr::filter(!!as.symbol(filter_by)==filter_val)%>% + dplyr::filter(iter_num >= burn_in)%>% + dplyr::mutate(slot_num = as.factor(slot_num))%>% + drop_na(iter_num) # for now, because don't know what it means to have no iteration number + + rc <- llik_interm + + group_names <- unique(rc[group_var]) #names of groups which will form columns of grid plot + + plot_rc<-list() + + for(i in 1:length(group_names)){ + print(i) + plot_rc[[i]]<-rc %>% + filter(!!as.symbol(group_var)==group_names[i]) %>% + dplyr::group_by(lik_type,geoid,slot_num) %>% + dplyr::mutate(accept_avg_roll=rollmean(accept, roll_period,fill=NA,align="right"))%>% #add log likelihoods for all geoids together at each timepoint + ggplot(aes(x=iter_num)) + + geom_line(aes(y=accept_avg_roll, color=lik_type,alpha=slot_num)) + + theme_bw()+ + theme(panel.grid = element_blank(), + legend.title=element_blank(), + legend.position="bottom", + strip.background.x = element_blank(), + strip.background.y=element_rect(fill="white"), + strip.text.y =element_text(face="bold"))+ + ylab("Rolling mean acceptance rate")+ + xlab("Iterations")+ + facet_grid(rows=vars(name), scales="fixed") + + scale_alpha_discrete(range=if(length(unique((rc$slot_num)))==1){c(0.99,1)}else{c(0.2,1)}) + + #scale_y_sqrt()+ + labs(subtitle = group_names[i]) + } + + return(plot_rc) +} + +##' Plot cumulative # of acceptances over MCMC iterations for each geoID and combined global +##' +##' @param llik_interm df with geoid, name, ll, lik_type, scenario, pdeath, iter_num +##' @param filter_by variable name for filtering estimates either: scenario or pdeath +##' @param filter_val desired value of variable +##' @param burn_in number of iterations to discard at the beginning +##' +##' @return plot of chimeric and global acceptance rate by geoid +##' +##' +##' +##'@export +##' +plot_accept_by_location_cumul <- function(llik_interm, + filter_by = "pdeath", + filter_val = "med", + burn_in = 0 +){ + + if(filter_by!="pdeath" & filter_by!="scenario") stop("You can only filter by 'pdeath' or 'scenario'") + group_var<-if_else(filter_by=="pdeath", "scenario", "pdeath") + + llik_interm <-llik_interm %>% + dplyr::filter(!!as.symbol(filter_by)==filter_val)%>% + dplyr::filter(iter_num >= burn_in)%>% + dplyr::mutate(slot_num = as.factor(slot_num))%>% + drop_na(iter_num) # for now, because don't know what it means to have no iteration number + + rc <- llik_interm + + group_names <- unique(rc[group_var]) #names of groups which will form columns of grid plot + + plot_rc<-list() + + for(i in 1:length(group_names)){ + print(i) + plot_rc[[i]]<-rc %>% + filter(!!as.symbol(group_var)==group_names[i]) %>% + dplyr::group_by(lik_type,geoid,slot_num) %>% + dplyr::mutate(accept_cumul=cumsum(accept))%>% #add up all previous acceptances + ggplot(aes(x=iter_num)) + + geom_line(aes(y=accept_cumul, color=lik_type,alpha=slot_num)) + + theme_bw()+ + theme(panel.grid = element_blank(), + legend.title=element_blank(), + legend.position="bottom", + strip.background.x = element_blank(), + strip.background.y=element_rect(fill="white"), + strip.text.y =element_text(face="bold"))+ + ylab("Cumulative acceptances")+ + xlab("Iterations")+ + facet_grid(rows=vars(name), scales="fixed") + + scale_alpha_discrete(range=if(length(unique((rc$slot_num)))==1){c(0.99,1)}else{c(0.2,1)}) + + #scale_y_sqrt()+ + labs(subtitle = group_names[i]) + } + + return(plot_rc) +} + + +##' Plot intermediate spar over time for each geoID +##' +##' @param spar_interm df with value, parameter, lik_type, scenario, pdeath, iter_num +##' @param filter_by variable name for filtering estimates either: scenario or pdeath +##' @param filter_val desired value of variable +##' @param burn_in number of iterations to discard before plotting +##' +##' @return plot of accepted global and chimeric SEIR parameter values values by geoid +##' +##' +##' +##'@export +##' +plot_spars <- function(spar_interm, + filter_by = "pdeath", + filter_val = "med", + burn_in = 0 +){ + + + if(filter_by!="pdeath" & filter_by!="scenario") stop("You can only filter by 'pdeath' or 'scenario'") + group_var<-if_else(filter_by=="pdeath", "scenario", "pdeath") + + #spar_filter_edit <- paste(snpi_filter,collapse="|") + + spar_interm <-spar_interm %>% + dplyr::filter(!!as.symbol(filter_by)==filter_val)%>% + dplyr::filter(iter_num >= burn_in)%>% + dplyr::mutate(slot_num = as.factor(slot_num))%>% + drop_na(iter_num) # for now, because don't know what it means to have no iteration number + + rc <- spar_interm + + group_names <- unique(rc[group_var]) #names of groups which will form columns of grid plot + + plot_rc<-list() + + #for(i in 1:length(unique(as.character(rc$type)))){ + for(i in 1:length(group_names)){ + print(i) + plot_rc[[i]]<-rc %>% + dplyr::filter(!!as.symbol(group_var)==group_names[i]) %>% + dplyr::filter(accept==1)%>% + ggplot(aes(x=iter_num)) + + geom_step(aes(y=value, color=lik_type, alpha=slot_num)) + + theme_bw()+ + theme(panel.grid = element_blank(), + legend.title=element_blank(), + legend.position="top", + strip.background.x = element_blank(), + strip.background.y=element_rect(fill="white"), + strip.text.y =element_text(face="bold"))+ + ylab("Value")+ + xlab("Iterations")+ + facet_grid(rows=vars(parameter), scales="free") + + scale_alpha_discrete(range=if(length(unique((rc$slot_num)))==1){c(0.99,1)}else{c(0.2,1)}) + + #scale_y_sqrt()+ + labs(subtitle = group_names[i]) + } + + return(plot_rc) +} + + +##' Plot intermediate hpar over time for each geoID +##' +##' @param hpar_interm df with value, parameter, lik_type, scenario, pdeath, iter_num +##' @param pdeath_filter string that indicates which pdeath to import from outcome_dir +##' @param scenario_filter string that indicates which scenario to import from outcome_dir +##' @param hpar_filter string that indicates which outcome parameters will be plotted +##' @param fig_labs string that indicates subtitles to appear on plots of each parameter in hpar_filter +##' @param burn_in number of iterations to discard before plotting +##' +##' @return plot of accepted global and chimeric outcome parameter values by geoid +##' +##' +##' +##'@export +##' +plot_hpar_by_location <- function(hpar_interm, + pdeath_filter, + scenario_filter, + hpar_filter = c('incidC_probability','incidD_probability'), #will find any parameters containing these phrases + fig_labs=c("Case detection probability","Death probability"), + burn_in = 0 +){ + + + hpar_filter_edit <- paste(hpar_filter,collapse="|") + + hpar_interm <-hpar_interm %>% + dplyr::filter(pdeath==pdeath_filter)%>% + dplyr::filter(scenario==scenario_filter)%>% + dplyr::filter(iter_num >= burn_in)%>% + dplyr::mutate(slot_num = as.factor(slot_num))%>% + unite(outcome_quantity,c("outcome","quantity"), sep="_")%>% #make a new parameter that combines the outcome+quantity variables + drop_na(iter_num) # for now, because don't know what it means to have no iteration number + + rc <- hpar_interm + + plot_rc<-list() + + for(i in 1:length(hpar_filter)){ + print(i) + plot_rc[[i]]<-rc %>% + dplyr::filter(grepl(hpar_filter[i],outcome_quantity))%>% + dplyr::filter(accept==1)%>% + ggplot(aes(x=iter_num)) + + geom_step(aes(y=value, color=lik_type, alpha=slot_num)) + + theme_bw()+ + theme(panel.grid = element_blank(), + legend.title=element_blank(), + legend.position="top", + strip.background.x = element_blank(), + strip.background.y=element_rect(fill="white"), + strip.text.y =element_text(face="bold"))+ + ylab("Value")+ + xlab("Iterations")+ + facet_grid(rows=vars(name), scales="fixed") + + scale_alpha_discrete(range=if(length(unique((rc$slot_num)))==1){c(0.99,1)}else{c(0.2,1)}) + + #scale_y_sqrt()+ + labs(subtitle = fig_labs[i]) + } + + return(plot_rc) +} + + +##' Plot intermediate snpi over time for each geoID +##' +##' @param snpi_interm df with value, parameter, lik_type, scenario, pdeath, iter_num +##' @param pdeath_filter string that indicates which pdeath to import from outcome_dir +##' @param scenario_filter string that indicates which scenario to import from outcome_dir +##' @param snpi_filter string that indicates which SNPI parameters will be plotted +##' @param fig_labs string that indicates subtitles to appear on plots of each parameter in snpi_filter +##' @param burn_in number of iterations to discard before plotting +##' +##' @return plot of accepted global and chimeric NPI (on R0) parameter values by geoID +##' +##' +##' +##'@export +##' +plot_snpi_by_location <- function(snpi_interm, + pdeath_filter, + scenario_filter, + snpi_filter = c('local_variance','lockdown_partial'), #will find any NPI containing these phrases + fig_labs=c("Local variance in R0","Partial lockdown"), + burn_in = 0 +){ + + snpi_interm <-snpi_interm %>% + dplyr::filter(pdeath==pdeath_filter)%>% + dplyr::filter(scenario==scenario_filter)%>% + dplyr::filter(iter_num >= burn_in)%>% + dplyr::mutate(slot_num = as.factor(slot_num))%>% + drop_na(iter_num) # for now, because don't know what it means to have no iteration number + + rc <- snpi_interm + + plot_rc<-list() + + for(i in 1:length(snpi_filter)){ + print(i) + plot_rc[[i]]<-rc %>% + dplyr::filter(grepl(snpi_filter[i],npi_name))%>% + dplyr::filter(accept==1)%>% + ggplot(aes(x=iter_num)) + + geom_step(aes(y=reduction, color=lik_type, alpha=slot_num)) + + theme_bw()+ + theme(panel.grid = element_blank(), + legend.title=element_blank(), + legend.position="top", + strip.background.x = element_blank(), + strip.background.y=element_rect(fill="white"), + strip.text.y =element_text(face="bold"))+ + ylab("Reduction in R0")+ + xlab("Iterations")+ + facet_grid(rows=vars(name), scales="fixed") + + scale_alpha_discrete(range=if(length(unique((rc$slot_num)))==1){c(0.99,1)}else{c(0.2,1)}) + + #scale_y_sqrt()+ + labs(subtitle = fig_labs[i]) + } + + return(plot_rc) +} + + + +##' Plot intermediate hnpi over time for each geoID +##' +##' @param hnpi_interm df with value, parameter, lik_type, scenario, pdeath, iter_num +##' @param pdeath_filter string that indicates which pdeath to import from outcome_dir +##' @param scenario_filter string that indicates which scenario to import from outcome_dir +##' @param hnpi_filter string that indicates which SNPI parameters will be plotted +##' @param fig_labs string that indicates subtitles to appear on plots of each parameter in snpi_filter +##' @param burn_in number of iterations to discard before plotting +##' +##' @return plot of accepted global and chimeric NPI (on outcomes) parameter values by geoID +##' +##' +##' +##'@export +##' +plot_hnpi_by_location <- function(hnpi_interm, + pdeath_filter, + scenario_filter, + hnpi_filter = c('incidC_shift'), #will find any NPI containing these phrases + fig_labs=c("Shift in case detection rate"), + burn_in=0 +){ + + hnpi_filter_edit <- paste(hnpi_filter,collapse="|") + + hnpi_interm <-hnpi_interm %>% + dplyr::filter(pdeath==pdeath_filter)%>% + dplyr::filter(scenario==scenario_filter)%>% + dplyr::filter(iter_num >= burn_in)%>% + dplyr::mutate(slot_num = as.factor(slot_num))%>% + #dplyr::filter(grepl(hnpi_filter_edit,npi_name))%>% # find npi_names that contain these phrases in them + drop_na(iter_num) # for now, because don't know what it means to have no iteration number + + rc <- hnpi_interm + + #rc$type = rc$npi_name # need this for columns of plot grid, for now it isn't used + + plot_rc<-list() + + #for(i in 1:length(unique(as.character(rc$type)))){ + for(i in 1:length(hnpi_filter)){ + print(i) + plot_rc[[i]]<-rc %>% + dplyr::filter(grepl(hnpi_filter[i],npi_name))%>% + dplyr::filter(accept==1)%>% + ggplot(aes(x=iter_num)) + + geom_step(aes(y=reduction, color=lik_type, alpha=slot_num)) + + theme_bw()+ + theme(panel.grid = element_blank(), + legend.title=element_blank(), + legend.position="top", + strip.background.x = element_blank(), + strip.background.y=element_rect(fill="white"), + strip.text.y =element_text(face="bold"))+ + ylab("Reduction in parameter")+ + xlab("Iterations")+ + facet_grid(rows=vars(name), scales="fixed") + + scale_alpha_discrete(range=if(length(unique((rc$slot_num)))==1){c(0.99,1)}else{c(0.2,1)}) + + #scale_y_sqrt()+ + labs(subtitle = fig_labs[i]) + } + + return(plot_rc) +} + diff --git a/R/pkgs/report.generation/R/ReportLoadData.R b/R/pkgs/report.generation/R/ReportLoadData.R index cd89c7d85..5e88626b0 100644 --- a/R/pkgs/report.generation/R/ReportLoadData.R +++ b/R/pkgs/report.generation/R/ReportLoadData.R @@ -39,7 +39,7 @@ load_cum_inf_geounit_dates <- function(outcome_dir, hosp_pre_process <- function(x) { x %>% - dplyr::select(geoid, scenario, pdeath, location, time, filename, !!varname) + dplyr::select(geoid, scenario, pdeath, location, time, file_name, !!varname) } ##filter to munge the data at the scenario level if (!is.null(incl_geoids)) { @@ -80,6 +80,7 @@ load_cum_inf_geounit_dates <- function(outcome_dir, } + ##' Convenience function to load cumulative geounit hosp outcomes at a specific date for the given scenario ##' ##' TODO : FIX ME @@ -120,7 +121,7 @@ load_cum_hosp_geounit_date <- function(outcome_dir, hosp_pre_process <- function(x) { x %>% - dplyr::select(geoid, scenario, location, pdeath, time, filename, incidI, incidD, incidICU, incidH, incidVent) + dplyr::select(geoid, scenario, location, pdeath, time, file_name, incidI, incidD, incidICU, incidH, incidVent) } ##filter to munge the data at the scenario level if (!is.null(incl_geoids)) { @@ -134,7 +135,7 @@ load_cum_hosp_geounit_date <- function(outcome_dir, NincidICU=sum(incidICU), NincidHosp=sum(incidH), NincidVent = sum(incidVent)) %>% - ungroup() + dplyr::ungroup() } } else { hosp_post_process <- function(x) { @@ -190,39 +191,61 @@ load_hosp_geocombined_totals <- function(outcome_dir, pre_process=function(x) {x}, incl_geoids, inference=TRUE, + vacc_compartment=FALSE, ... ) { require(tidyverse) - + + if(vacc_compartment){ hosp_post_process <- function(x) { - x %>% - dplyr::group_by(geoid, pdeath, scenario, sim_num, location) %>% - dplyr::arrange(time) %>% - dplyr::mutate(cum_hosp=cumsum(incidH)) %>% - dplyr::mutate(cum_death=cumsum(incidD)) %>% - dplyr::mutate(cum_case=cumsum(incidC)) %>% - dplyr::mutate(cum_inf=cumsum(incidI)) %>% - dplyr::group_by(pdeath, scenario, time, sim_num) %>% - dplyr::summarize(NhospCurr=sum(hosp_curr), - NICUCurr=sum(icu_curr), - NincidDeath=sum(incidD), - NincidInf=sum(incidI), - NincidCase=sum(incidC), - NincidICU=sum(incidICU), - NincidHosp=sum(incidH), - NincidVent=sum(incidVent), - NVentCurr=sum(vent_curr), - cum_hosp=sum(cum_hosp), - cum_death=sum(cum_death), - cum_case=sum(cum_case), - cum_inf=sum(cum_inf)) %>% - dplyr::mutate(scenario_name = factor(scenario, - levels = scenario_levels, - labels = scenario_labels)) %>% - ungroup() + x %>% + dplyr::group_by(pdeath, scenario, sim_num, location, p_comp, time) %>% + dplyr::summarize(NhospCurr=sum(hosp_curr), + NICUCurr=sum(icu_curr), + NincidDeath=sum(incidD), + NincidInf=sum(incidI), + NincidCase=sum(incidC), + NincidICU=sum(incidICU), + NincidHosp=sum(incidH), + NincidVent=sum(incidVent), + NVentCurr=sum(vent_curr)) %>% + dplyr::group_by(pdeath, scenario, sim_num, location, p_comp) %>% + dplyr::mutate(cum_hosp=cumsum(NincidHosp)) %>% + dplyr::mutate(cum_death=cumsum(NincidDeath)) %>% + dplyr::mutate(cum_case=cumsum(NincidCase)) %>% + dplyr::mutate(cum_inf=cumsum(NincidInf)) %>% + dplyr::ungroup() %>% + dplyr::mutate(scenario_name = factor(scenario, + levels = scenario_levels, + labels = scenario_labels)) } + } else{ + hosp_post_process <- function(x) { + x %>% + dplyr::group_by(pdeath, scenario, sim_num, location, time) %>% + dplyr::summarize(NhospCurr=sum(hosp_curr), + NICUCurr=sum(icu_curr), + NincidDeath=sum(incidD), + NincidInf=sum(incidI), + NincidCase=sum(incidC), + NincidICU=sum(incidICU), + NincidHosp=sum(incidH), + NincidVent=sum(incidVent), + NVentCurr=sum(vent_curr)) %>% + dplyr::group_by(pdeath, scenario, sim_num, location) %>% + dplyr::mutate(cum_hosp=cumsum(NincidHosp)) %>% + dplyr::mutate(cum_death=cumsum(NincidDeath)) %>% + dplyr::mutate(cum_case=cumsum(NincidCase)) %>% + dplyr::mutate(cum_inf=cumsum(NincidInf)) %>% + dplyr::ungroup() %>% + dplyr::mutate(scenario_name = factor(scenario, + levels = scenario_levels, + labels = scenario_labels)) + } + } + rc<- load_hosp_sims_filtered(outcome_dir=outcome_dir, pdeath_filter=pdeath_filter, pre_process=pre_process, @@ -279,32 +302,59 @@ load_hosp_county <- function(outcome_dir, pre_process=function(x) {x}, incl_geoids, inference=TRUE, + vacc_compartment = FALSE, ... ) { require(tidyverse) + + if(vacc_compartment){ hosp_post_process <- function(x) { - x %>% - dplyr::group_by(geoid, pdeath, scenario, sim_num, location) %>% - dplyr::mutate(cum_hosp=cumsum(incidH)) %>% - dplyr::mutate(cum_death=cumsum(incidD)) %>% - dplyr::mutate(cum_case=cumsum(incidC)) %>% - dplyr::mutate(cum_inf=cumsum(incidI)) %>% - dplyr::rename(NhospCurr=hosp_curr, - NICUCurr=icu_curr, - NincidDeath=incidD, - NincidInf=incidI, - NincidCase=incidC, - NincidICU=incidICU, - NincidHosp=incidH, - NincidVent=incidVent, - NVentCurr=vent_curr) %>% - dplyr::ungroup() %>% - dplyr::mutate(scenario_name = factor(scenario, - levels = scenario_levels, - labels = scenario_labels)) + x %>% + dplyr::group_by(geoid, pdeath, scenario, sim_num, location, p_comp) %>% + dplyr::mutate(cum_hosp=cumsum(incidH)) %>% + dplyr::mutate(cum_death=cumsum(incidD)) %>% + dplyr::mutate(cum_case=cumsum(incidC)) %>% + dplyr::mutate(cum_inf=cumsum(incidI)) %>% + dplyr::rename(NhospCurr=hosp_curr, + NICUCurr=icu_curr, + NincidDeath=incidD, + NincidInf=incidI, + NincidCase=incidC, + NincidICU=incidICU, + NincidHosp=incidH, + NincidVent=incidVent, + NVentCurr=vent_curr) %>% + dplyr::ungroup() %>% + dplyr::mutate(scenario_name = factor(scenario, + levels = scenario_levels, + labels = scenario_labels)) } + } else{ + hosp_post_process <- function(x) { + x %>% + dplyr::group_by(geoid, pdeath, scenario, sim_num, location, time) %>% + dplyr::summarize(NhospCurr=sum(hosp_curr), + NICUCurr=sum(icu_curr), + NincidDeath=sum(incidD), + NincidInf=sum(incidI), + NincidCase=sum(incidC), + NincidICU=sum(incidICU), + NincidHosp=sum(incidH), + NincidVent=sum(incidVent), + NVentCurr=sum(vent_curr)) %>% + dplyr::group_by(geoid, pdeath, scenario, sim_num, location) %>% + dplyr::mutate(cum_hosp=cumsum(NincidHosp)) %>% + dplyr::mutate(cum_death=cumsum(NincidDeath)) %>% + dplyr::mutate(cum_case=cumsum(NincidCase)) %>% + dplyr::mutate(cum_inf=cumsum(NincidInf)) %>% + dplyr::ungroup() %>% + dplyr::mutate(scenario_name = factor(scenario, + levels = scenario_levels, + labels = scenario_labels)) + } + } rc<- load_hosp_sims_filtered(outcome_dir=outcome_dir, pdeath_filter=pdeath_filter, @@ -318,6 +368,7 @@ load_hosp_county <- function(outcome_dir, return(rc) } + ##' Convenience function to load the slice for each geoid where the value of an outcome exceeds a given threshold ##' ##' @param threshold A named numeric vector. This function will pull the first time slice that meets or exceeds all thresholds @@ -327,6 +378,7 @@ load_hosp_county <- function(outcome_dir, ##' @param variable character string of variable to which to compare threshold ##' @param end_date simulation end date character string ##' @param incl_geoids optional character vector of geoids that are included in the report, if not included, all geoids will be used + ##' @return a data frame with columns ##' - scenario_name ##' - sim_num @@ -379,7 +431,7 @@ load_hosp_geounit_threshold <- function(threshold, inference=inference) rc <- rc %>% dplyr::group_by(geoid, scenario, sim_num) %>% - group_map(function(.x,.y){ + dplyr::group_map(function(.x,.y){ .x <- .x %>% arrange(time) # Take the first element of the arranged data frame that meets the threshold if(.y$geoid %in% names(threshold)) { @@ -393,7 +445,7 @@ load_hosp_geounit_threshold <- function(threshold, return(.x) }, .keep = TRUE) %>% do.call(what=dplyr::bind_rows) %>% - ungroup() + dplyr::ungroup() return(rc) } @@ -404,12 +456,12 @@ load_hosp_geounit_threshold <- function(threshold, ##' ##' @param filename geodata.csv filename ##' @param geoid_len length of geoid character string -##' @param geoid_pad what to pad the geoid character string with +##' @param geoid_pad what to pad the geoid character string with ##' @param to_lower whether to make all column names lowercase ##' @param names whether to add a name column to each geoid (US only) ##' ##' @return a data frame with columns -##' - +##' - ##' @export ### all of the peak times for each sim and each county so we can make a figure for when things peak load_geodata_file <- function(filename, @@ -418,26 +470,46 @@ load_geodata_file <- function(filename, to_lower = FALSE, names = FALSE ) { + # TODO : FIX ME (either use library or remove entirely and use namespaces) require(tigris) if(!file.exists(filename)){stop(paste(filename,"does not exist in",getwd()))} geodata <- readr::read_csv(filename) - - if (to_lower) { + + if(to_lower){ names(geodata) <- tolower(names(geodata)) } - if (!("geoid" %in% names(geodata))) { - stop(paste(filename, "does not have a column named geoid")) + if(!('geoid' %in% names(geodata))){stop(paste(filename,"does not have a column named geoid"))} + + if(geoid_len > 0){ + geodata$geoid <- stringr::str_pad(geodata$geoid,geoid_len, pad = geoid_pad) } - - if (geoid_len > 0) { - geodata$geoid <- stringr::str_pad(geodata$geoid, geoid_len, pad = geoid_pad) + + if(names) { + + # make augmented fips_codes file that has entries for the whole state too + state_fips <- fips_codes %>% + dplyr::distinct(state,.keep_all=TRUE) %>% + dplyr::select(-county,-county_code,-state) %>% + dplyr::rename(name=state_name,geoid=state_code) %>% + dplyr::mutate(geoid=paste0(geoid, "000")) + + fips_codes2 <- fips_codes%>% + unite(col="geoid", ends_with("_code"), sep="") %>% + dplyr::select(-state_name, -state) %>% + dplyr::rename(name=county) %>% + dplyr::mutate(name=stringr::str_remove(name, " County")) + + fips_codes_w_states <-rbind(state_fips,fips_codes2) + + geodata<-geodata %>% + dplyr::left_join(fips_codes_w_states) } if(names) { geodata<-geodata %>% dplyr::left_join(fips_codes%>% - unite(col="geoid", ends_with("_code"), sep="") %>% + tidyr::unite(col="geoid", tidyselect::ends_with("_code"), sep="") %>% dplyr::select(-state_name, -state) %>% dplyr::rename(name=county) %>% dplyr::mutate(name=stringr::str_remove(name, " County"))) @@ -452,34 +524,28 @@ load_geodata_file <- function(filename, ##' ##' @param filename shapefile name ##' @param geoid_len length of geoid character string -##' @param geoid_pad what to pad the geoid character string with +##' @param geoid_pad what to pad the geoid character string with ##' @param to_lower whether to make all column names lowercase -##' +##' ##' @return a data frame with columns -##' - +##' - ##' @export load_shape_file<- function(filename, geoid_len = 0, geoid_pad = "0", to_lower = FALSE ) { - if (!file.exists(filename)) { - stop(paste(filename, "does not exist in", getwd())) - } + if(!file.exists(filename)){stop(paste(filename,"does not exist in",getwd()))} shp <- suppressMessages(sf::st_read(filename, quiet = TRUE)) - - if (to_lower) { + + if(to_lower){ names(shp) <- tolower(names(shp)) } - if (!("geoid" %in% names(shp))) { - stop(paste(filename, "does not have a column named geoid")) - } - if (geoid_len > 0) { - - if (is.na(geoid_pad) | nchar(geoid_pad) > 1) { - stop(paste("Invalid geoid_pad value. Please provide a character or numeric value")) - } - shp$geoid <- stringr::str_pad(shp$geoid, geoid_len, pad = geoid_pad) + if(!('geoid' %in% names(shp))){stop(paste(filename,"does not have a column named geoid"))} + if(geoid_len > 0){ + + if(is.na(geoid_pad) | nchar(geoid_pad)>1){stop(paste("Invalid geoid_pad value. Please provide a character or numeric value"))} + shp$geoid <- stringr::str_pad(shp$geoid,geoid_len, pad = geoid_pad) } return(shp) } @@ -490,23 +556,23 @@ load_shape_file<- function(filename, ##' @param jhu_data_dir data directory ##' @param countries character vector of countries ##' @param states character vector of states (state abbreviations is US-only) -##' +##' ##' @return a data frame with columns ##' - date ##' - NcumulConfirmed ##' - NcumulDeathsObs ##' - NincidConfirmed ##' - NincidDeathsObs -##' +##' ##' @export load_jhu_csse_for_report <- function(jhu_data_dir = "JHU_CSSE_Data", countries = c("US"), states) { - + require(magrittr) - - us_data_only <- (countries == c("US")) - if (us_data_only) { + + us_data_only = (countries == c("US")) + if(us_data_only) { message("For US data, consider using load_usafacts_for_report() instead of load_jhu_csse_for_report().") } @@ -521,13 +587,13 @@ load_jhu_csse_for_report <- function(jhu_data_dir = "JHU_CSSE_Data", us_data_only=us_data_only) jhu_dat <- dplyr::full_join(jhu_cases, jhu_deaths) - - if (us_data_only) { - #' @importFrom magrittr %>% - jhu_dat <- jhu_dat %>% dplyr::mutate(Province_State = state_abb) + + if(us_data_only) + { + jhu_dat <- jhu_dat %>% dplyr::mutate(Province_State=state_abb) } - - jhu_dat <- + + jhu_dat <- jhu_dat %>% dplyr::mutate(date = as.Date(Update)) %>% dplyr::filter(Country_Region %in% countries) %>% @@ -562,6 +628,7 @@ load_USAFacts_for_report <- function(data_dir = "data/case_data", geodat=geodata, aggregate=FALSE) { + require(magrittr) usaf_dat <- covidcommon::get_USAFacts_data(case_data_filename = file.path(data_dir,"USAFacts_case_data.csv"), @@ -572,7 +639,7 @@ load_USAFacts_for_report <- function(data_dir = "data/case_data", dplyr::mutate(date=as.Date(Update), geoid=FIPS) %>% dplyr::select(-FIPS, -Update) - + } else{ usaf_dat <- usaf_dat %>% dplyr::filter(FIPS %in% incl_geoids) %>% @@ -585,13 +652,13 @@ load_USAFacts_for_report <- function(data_dir = "data/case_data", if(aggregate){ usaf_dat <- usaf_dat %>% dplyr::group_by(date, source) %>% - dplyr::summarise(across(-geoid, ~sum(na.rm=TRUE))) %>% + dplyr::summarise(dplyr::across(-geoid, ~sum(na.rm=TRUE))) %>% dplyr::ungroup() %>% dplyr::rename(NcumulConfirmed=Confirmed, NcumulDeathsObs=Deaths, NincidConfirmed=incidI, NincidDeathsObs=incidDeath) - + } else{ usaf_dat <- usaf_dat %>% dplyr::left_join(geodat) @@ -643,7 +710,7 @@ load_hosp_geounit_relative_to_threshold <- function(outcome_dir, stop("You provided more than one catch all threshold") } catch_all_threshold <- Inf - if (sum(names(threshold) == "") > 0) { + if(sum(names(threshold) == "") > 0){ catch_all_threshold <- threshold[names(threshold) == ""] } if(length(pdeath_filter) > 1) {stop("You provided more than one pdeath value")} @@ -652,6 +719,7 @@ load_hosp_geounit_relative_to_threshold <- function(outcome_dir, warning("You have not specified scenario labels for this function. You may encounter future errors.") } + end_date <- lubridate::as_date(end_date) if(is.null(incl_geoids)){incl_geoids<-unique(county_dat$geoid)} @@ -661,7 +729,7 @@ load_hosp_geounit_relative_to_threshold <- function(outcome_dir, incl_geoids=incl_geoids, inference=inference, pre_process=function(x){x%>% - select(geoid, time, pdeath, scenario, filename, ends_with("curr"))}) + dplyr::select(geoid, time, pdeath, scenario, file_name, tidyselect::ends_with("curr"))}) county_dat<-county_dat %>% dplyr::left_join(geodat) %>% @@ -694,7 +762,7 @@ load_hosp_geounit_relative_to_threshold <- function(outcome_dir, county_dat %>% dplyr::left_join(data.frame(geoid = names(threshold), threshold_value = threshold), by = c("geoid")) %>% dplyr::rename(pltVar = !!variable) %>% - dplyr::mutate(prop_needed = pltVar / threshold_value) %>% + dplyr::mutate(prop_needed = pltVar/threshold_value) %>% dplyr::mutate(log_prop_needed = log(prop_needed)) %>% dplyr::mutate(log_prop_needed = ifelse(pltVar == 0, floor(min(log_prop_needed[which(is.finite(log_prop_needed))])), @@ -704,7 +772,7 @@ load_hosp_geounit_relative_to_threshold <- function(outcome_dir, log_prop_needed)) %>% ## if threshold is 0, set the value to the ceiling of the max value among all other logs values (for plotting purposes) dplyr::rename(!!variable := pltVar) %>% return() - + } ##' Convenience function for loading intervention effect and R estimates @@ -756,7 +824,7 @@ load_r_sims_filtered <- function(outcome_dir, dplyr::mutate(local_r = r0*(1-reduction)) %>% # county_r0 ought to be renamed to "geogroup_r0" dplyr::select(geoid, sim_num, local_r, scenario, r0) %>% dplyr::left_join(snpi) %>% - dplyr::mutate(r = if_else(npi_name=="local_variance", + dplyr::mutate(r = dplyr::if_else(npi_name=="local_variance", local_r, local_r*(1-reduction))) %>% dplyr::left_join(geodat) %>% @@ -794,50 +862,46 @@ load_r_daily_sims_filtered <- function(outcome_dir, spar <- load_spar_sims_filtered(outcome_dir=outcome_dir, pre_process=function(x) {x %>% dplyr::filter(parameter=="R0")}, - pdeath_filter=pdeath_filter, - ...) %>% - dplyr::select(r0=value, location, scenario, pdeath, date, sim_num) + pdeath_filter=pdeath_filter) %>% + dplyr::select(r0=value, location, scenario, pdeath, sim_num) snpi<- load_snpi_sims_filtered(outcome_dir=outcome_dir, pre_process=function(x) {x %>% dplyr::filter(parameter=="r0")}, pdeath_filter=pdeath_filter, - incl_geoids=incl_geoids, - ...) %>% + incl_geoids=incl_geoids) %>% dplyr::select(-parameter) if(mtr){ npi <- snpi %>% - left_join(spar) %>% - dplyr::select(-date) %>% + dplyr::left_join(spar) %>% mtr_estimates(n_periods=n_periods) } else { npi <- snpi %>% - left_join(spar) %>% - dplyr::select(-date) %>% - mutate(start_date=lubridate::ymd(start_date), + dplyr::left_join(spar) %>% + dplyr::mutate(start_date=lubridate::ymd(start_date), end_date=lubridate::ymd(end_date)) } - geoiddate<-crossing(geoid=incl_geoids, time=seq(min(as.Date(npi$start_date)), max(as.Date(npi$end_date)), 1)) + geoiddate<-tidyr::crossing(geoid=incl_geoids, time=seq(min(as.Date(npi$start_date)), max(as.Date(npi$end_date)), 1)) rc<-list() for(i in 1:length(incl_geoids)){ rc[[i]]<-npi %>% - filter(geoid == incl_geoids[i])%>% - left_join(geoiddate)%>% - mutate(geoid=if_else(start_date>time | end_date% - drop_na() %>% - group_by(geoid, sim_num, time, pdeath, scenario, location) %>% - mutate(reduction=1-reduction)%>% - summarize(reduction=prod(reduction), + dplyr::filter(geoid == incl_geoids[i])%>% + dplyr::left_join(geoiddate)%>% + dplyr::mutate(geoid=dplyr::if_else(start_date>time | end_date% + tidyr::drop_na() %>% + dplyr::group_by(geoid, sim_num, time, pdeath, scenario, location) %>% + dplyr::mutate(reduction=1-reduction)%>% + dplyr::summarize(reduction=prod(reduction), r0=unique(r0)) %>% - mutate(rt=reduction*r0, + dplyr::mutate(rt=reduction*r0, reduction=1-reduction) } - rc<-bind_rows(rc) %>% - ungroup() + rc<-dplyr::bind_rows(rc) %>% + dplyr::ungroup() warning("Finished loading") return(rc) @@ -856,9 +920,9 @@ load_r_daily_sims_filtered <- function(outcome_dir, ##' ##'@export load_npi_sims_filtered <- function(outcome_dir, - pdeath_filter=c("high", "med", "low"), - incl_geoids, - ... + pdeath_filter=c("high", "med", "low"), + incl_geoids, + ... ) { require(tidyverse) @@ -868,10 +932,11 @@ load_npi_sims_filtered <- function(outcome_dir, pdeath_filter=pdeath_filter, incl_geoids=incl_geoids, ...) %>% - dplyr::select(-parameter, -date) + dplyr::select(-parameter) warning("Finished loading") return(npi) } + diff --git a/R/pkgs/report.generation/R/setup_testing_environment.R b/R/pkgs/report.generation/R/setup_testing_environment.R new file mode 100644 index 000000000..c2ea2c20e --- /dev/null +++ b/R/pkgs/report.generation/R/setup_testing_environment.R @@ -0,0 +1,221 @@ +#' @export +setup_testing_environment <- function(cf = "config.yml"){ + params = list(config_file = cf) + library(tidyverse) + library(covidcommon) + library(report.generation) + + LOADED_CONFIG <<- FALSE + LOADED_GEOIDS <<- FALSE + LOADED_HOSP_STATE_TOTALS <<- FALSE + LOADED_INF_CTY_PEAKS <<- FALSE + LOADED_HOSP_CTY_PEAKS <<- FALSE + LOADED_HOSP_CTY_TOTALS <<- FALSE + LOADED_INF_CTY_TOTALS <<- FALSE + LOADED_SHAPEFILES <<- FALSE + LOADED_POPULATION <<- FALSE + + + if(! LOADED_CONFIG){ + if(is.null(params$config_file)){ + stop("A document parameter `config_file` is required to load the config file") + } + config <- covidcommon:::load_config(params$config_file) + LOADED_CONFIG <<- TRUE + } + + ## Code loads the state geodata if it has not yet been loaded + if(!LOADED_GEOIDS){ + geodata <<- read.csv(file.path(config$spatial_setup$base_path, config$spatial_setup$geodata)) + geodata$geoid <<- ifelse(nchar(geodata$geoid)==4, paste0("0", geodata$geoid), as.character(geodata$geoid)) + included_geoids <<- geodata[["geoid"]][geodata[[config$spatial_setup$include_in_report]]] + LOADED_GEOIDS <<- TRUE + } + + + if (!LOADED_CONFIG) { stop("This chunk requires the config to be loaded")} + ## Code loads the state hospitalization totals if it has not yet been loaded + if (!LOADED_HOSP_CTY_PEAKS) { + hosp_post_process <- function(x) { + x %>% + ungroup %>% + filter(!is.na(time)) %>% + group_by(geoid) %>% + dplyr::slice(which.max(hosp_curr)) %>% + ungroup + } + + scn_dirs <- paste(config$name,config$interventions$scenarios,sep='_') + hosp_cty_peaks <<- NULL + + for (i in 1:length(scn_dirs)) { + for (pdeath in config$hospitalization$parameters$p_death_names) { + hosp_cty_peaks <<- dplyr::bind_rows(hosp_cty_peaks, load_hosp_sims_filtered(scn_dirs[i], + name_filter = pdeath, + post_process = hosp_post_process) %>% + mutate(scenario_num=i, scenario_name=config$report$formatting$scenario_labels[i], pdeath=pdeath)) + } + } + + + LOADED_HOSP_CTY_PEAKS <<- TRUE + } + + + if (!LOADED_CONFIG){stop("This chunk requires the config to be loaded")} + ## Code loads the state hospitalization totals if it has not yet been loaded + if (!LOADED_HOSP_CTY_TOTALS) { + hosp_post_process <- function(x) { + x %>% + dplyr::filter(!is.na(time)) %>% + dplyr::filter(geoid %in% included_geoids) %>% + group_by(geoid, time, sim_num) %>% + summarize(NhospCurr = sum(hosp_curr), + NICUCurr = sum(icu_curr), + NincidDeath = sum(incidD), + NincidInf = sum(incidI), + NincidICU=sum(incidICU), + NincidHosp=sum(incidH)) %>% + ungroup() + + } + + scn_dirs <<- paste(config$name,config$interventions$scenarios,sep='_') + hosp_cty_totals <<- NULL + + for (i in 1:length(scn_dirs)) { + for (pdeath in config$hospitalization$parameters$p_death_names) { + hosp_cty_totals <<- dplyr::bind_rows(hosp_cty_totals, load_hosp_sims_filtered(scn_dirs[i], + name_filter = pdeath, + post_process = hosp_post_process) %>% + mutate(scenario_num=i, scenario_name=config$report$formatting$scenario_labels[i], pdeath=pdeath)) + } + } + + + LOADED_HOSP_CTY_TOTALS <<- TRUE + } + + + + if (!LOADED_CONFIG){stop("This chunk requires the config to be loaded")} + + if (!LOADED_HOSP_STATE_TOTALS) { + hosp_post_process <- function(x) { + x %>% + dplyr::filter(!is.na(time)) %>% + group_by(time, sim_num) %>% + dplyr::summarize(NhospCurr = sum(hosp_curr), + NICUCurr = sum(icu_curr), + NincidDeath = sum(incidD), + NincidInf = sum(incidI), + NincidICU=sum(incidICU), + NincidHosp=sum(incidH)) %>% + ungroup() + } + + scn_dirs <<- paste(config$name,config$interventions$scenarios,sep='_') + hosp_state_totals <<- NULL + + for (i in 1:length(scn_dirs)) { + for (pdeath in config$hospitalization$parameters$p_death_names) { + hosp_state_totals <<- dplyr::bind_rows(hosp_state_totals, load_hosp_sims_filtered(scn_dirs[i], + name_filter = pdeath, + post_process = hosp_post_process) %>% + mutate(scenario_num = i, scenario_name = config$report$formatting$scenario_labels[i], pdeath=pdeath)) + } + } + + + LOADED_HOSP_STATE_TOTALS <- TRUE + } + + + + if(!LOADED_CONFIG){stop("This chunk requires the config to be loaded")} + ## Code loads the county infection peaks if it has not yet been loaded + if (!LOADED_INF_CTY_PEAKS) { + + inf_pre_process <- function(x) { + x %>% + dplyr::filter(comp == "diffI") + } + + inf_post_process <- function(x) { + x %>% + ungroup %>% + dplyr::filter(!is.na(time)) %>% + dplyr::mutate(geoid=ifelse(nchar(geoid)==4, paste0("0",geoid),geoid)) %>% + dplyr::filter(geoid %in% included_geoids) %>% + group_by(geoid) %>% + dplyr::slice(which.max(N)) %>% + ungroup + } + + scn_dirs <<- paste(config$name,config$interventions$scenarios,sep='_') + inf_cty_peaks <<- NULL + + for (i in 1:length(scn_dirs)) { + inf_cty_peaks <<- dplyr::bind_rows(inf_cty_peaks, load_scenario_sims_filtered(scn_dirs[i], + pre_process = inf_pre_process, + post_process = inf_post_process) %>% + mutate(scenario_num=i, + scenario_name=config$report$formatting$scenario_labels[i])) + + } + + + LOADED_INF_CTY_PEAKS <<- TRUE + } + + + if(!LOADED_CONFIG){stop("This chunk requires the config to be loaded")} + ## Code loads the county infection peaks if it has not yet been loaded + if (!LOADED_INF_CTY_TOTALS) { + + inf_pre_process <- function(x) { + x %>% + dplyr::filter(comp == "cumI") + } + + inf_post_process <- function(x) { + x %>% + ungroup %>% + dplyr::mutate(geoid=ifelse(nchar(geoid)==4, paste0("0",geoid),geoid)) %>% + dplyr::filter(!is.na(time), geoid %in% included_geoids) + } + + scn_dirs <<- paste(config$name,config$interventions$scenarios,sep='_') + inf_cty_totals <<- NULL + + for (i in 1:length(scn_dirs)) { + inf_cty_totals <<- inf_cty_totals %>% + dplyr::bind_rows(load_scenario_sims_filtered(scn_dirs[i], + pre_process = inf_pre_process, + post_process = inf_post_process) %>% + mutate(scenario_num=i, + scenario_name=config$report$formatting$scenario_labels[i]) %>% + group_by(geoid, time, scenario_num, scenario_name) %>% + dplyr::summarize(Nincid=mean(N)) %>% + ungroup() + ) + } + + LOADED_INF_CTY_TOTALS <<- TRUE + } + + if (!LOADED_CONFIG){stop("This chunk requires loading the config")} + + if (!LOADED_SHAPEFILES | !LOADED_POPULATION){ + shp <<- suppressMessages(sf::st_read(paste(config$spatial_setup$base_path,config$spatial_setup$shapefile_name,sep='/'), quiet=TRUE) %>% + dplyr::rename(geoid = GEOID, countyname = NAME)) + shp <- shp[shp[["geoid"]] %in% included_geoids,] + LOADED_SHAPEFILES <<- TRUE + + cty_names <<- shp %>% + sf::st_drop_geometry() %>% + dplyr::select(geoid, countyname) + LOADED_POPULATION <<- TRUE + } + +} diff --git a/R/pkgs/report.generation/tests/testthat/test-county_loading.R b/R/pkgs/report.generation/tests/testthat/test-county_loading.R index aacbfce78..f5b6a64b9 100644 --- a/R/pkgs/report.generation/tests/testthat/test-county_loading.R +++ b/R/pkgs/report.generation/tests/testthat/test-county_loading.R @@ -92,7 +92,7 @@ test_that("Simulation loading works", { scenario_labels = 'baseline', incl_geoids = included_geoids, inference=FALSE - ))+2 + )) } ) diff --git a/R/scripts/filter_MC.R b/R/scripts/filter_MC.R index 34b03f47b..6eab866d7 100644 --- a/R/scripts/filter_MC.R +++ b/R/scripts/filter_MC.R @@ -86,19 +86,19 @@ if(!(config$seeding$method %in% c('FolderDraw','InitialConditionsFolderDraw'))){ state_level <- ifelse(!is.null(config$spatial_setup$state_level) && config$spatial_setup$state_level, TRUE, FALSE) -##Load infromationon geographic locations from geodata file. +##Load information on geographic locations from geodata file. suppressMessages(geodata <- report.generation::load_geodata_file( paste( config$spatial_setup$base_path, config$spatial_setup$geodata, sep = "/" ), - geoid_len=5 #Is this hardcode a good idea. + geoid_len=5 #Is this hardcode a good idea? )) obs_nodename <- config$spatial_setup$nodenames ##Load simulations per slot from config if not defined on command line -##command options take precendence -if (is.na(opt$simulations_per_slot)) { +##command options take precedence +if(is.na(opt$simulations_per_slot)){ opt$simulations_per_slot <- config$filtering$simulations_per_slot } print(paste("Running",opt$simulations_per_slot,"simulations")) @@ -133,14 +133,14 @@ if (all(scenarios == "all")){ ##Creat heirarchical stats object if specified hierarchical_stats <- list() if("hierarchical_stats_geo"%in%names(config$filtering)) { - hierarchical_stats <- config$filtering$hierarchical_stats_geo + hierarchical_stats <- config$filtering$hierarchical_stats_geo } ##Create priors if specified defined_priors <- list() if("priors"%in%names(config$filtering)) { - defined_priors <- config$filtering$priors + defined_priors <- config$filtering$priors } @@ -179,14 +179,14 @@ if (gt_end_date > lubridate::ymd(config$end_date)) { obs <- inference::get_ground_truth( - data_path = data_path, - fips_codes = fips_codes_, - fips_column_name = obs_nodename, - start_date = gt_start_date, - end_date = gt_end_date, - gt_source = gt_source, - gt_scale = gt_scale, - variant_filename = config$seeding$variant_filename + data_path = data_path, + fips_codes = fips_codes_, + fips_column_name = obs_nodename, + start_date = gt_start_date, + end_date = gt_end_date, + gt_source = gt_source, + gt_scale = gt_scale, + variant_filename = config$seeding$variant_filename ) geonames <- unique(obs[[obs_nodename]]) @@ -205,29 +205,49 @@ data_stats <- lapply( end_date = gt_end_date ) }) %>% - set_names(geonames) + set_names(geonames) required_packages <- c("dplyr", "magrittr", "xts", "zoo", "stringr") # Load gempyor module gempyor <- reticulate::import("gempyor") +#Temporary +#print("Setting random number seed") +#set.seed(1) # set within R +#reticulate::py_run_string(paste0("rng_seed = ", 1)) #set within Python + +# Scenario loop ----- + for(scenario in scenarios) { for(deathrate in deathrates) { # Data ------------------------------------------------------------------------- # Load - slot_prefix <- covidcommon::create_prefix(config$name,scenario,deathrate,opt$run_id,sep='/',trailing_separator='/') # "USA/inference/med/2022.03.04.10:18:42.CET/" - - gf_prefix <- covidcommon::create_prefix(prefix=slot_prefix,'global','final',sep='/',trailing_separator='/') # "USA/inference/med/2022.03.04.10:18:42.CET/global/final/" - ci_prefix <- covidcommon::create_prefix(prefix=slot_prefix,'chimeric','intermediate',sep='/',trailing_separator='/') #] "USA/inference/med/2022.03.04.10:18:42.CET/chimeric/intermediate/" - gi_prefix <- covidcommon::create_prefix(prefix=slot_prefix,'global','intermediate',sep='/',trailing_separator='/') # USA/inference/med/2022.03.04.10:21:44.CET/global/intermediate/" - - chimeric_block_prefix <- covidcommon::create_prefix(prefix=ci_prefix, slot=list(opt$this_slot,"%09d"), sep='.', trailing_separator='.') # "USA/inference/med/2022.03.04.10:18:42.CET/chimeric/intermediate/000000001." - chimeric_local_prefix <- covidcommon::create_prefix(prefix=chimeric_block_prefix, slot=list(opt$this_block,"%09d"), sep='.', trailing_separator='.') # "USA/inference/med/2022.03.04.10:18:42.CET/chimeric/intermediate/000000001.000000001." - - global_block_prefix <- covidcommon::create_prefix(prefix=gi_prefix, slot=list(opt$this_slot,"%09d"), sep='.', trailing_separator='.') # "USA/inference/med/2022.03.04.10:18:42.CET/global/intermediate/000000001." - global_local_prefix <- covidcommon::create_prefix(prefix=global_block_prefix, slot=list(opt$this_block,"%09d"), sep='.', trailing_separator='.') # "USA/inference/med/2022.03.04.10:18:42.CET/global/intermediate/000000001.000000001." + ## file name prefixes for this scneario + deathrate combination + ## Create prefix is roughly equivalent to paste(...) so + ## create_prefix("USA", "inference", "med", "2022.03.04.10.18.42.CET", sep='/') + ## would be "USA/inference/med/2022.03.04.10.18.42.CET" + ## There is some fanciness about formatting though so + ## create_prefix(("43", "%09d")) + ## would be "000000043" + ## if a prefix argument is explicitly specified, the separator between it and the rest is skipped instead of sep so + ## trailing separator is always added at the end of the string if specified. + ## create_prefix(prefix="USA/", "inference", "med", "2022.03.04.10.18.42.CET", sep='/', trailing_separator='.') + ## would be "USA/inference/med/2022.03.04.10.18.42.CET." + slot_prefix <- covidcommon::create_prefix(config$name,scenario,deathrate,opt$run_id,sep='/',trailing_separator='/') + + gf_prefix <- covidcommon::create_prefix(prefix=slot_prefix,'global','final',sep='/',trailing_separator='/') + cf_prefix <- covidcommon::create_prefix(prefix=slot_prefix,'chimeric','final',sep='/',trailing_separator='/') + ci_prefix <- covidcommon::create_prefix(prefix=slot_prefix,'chimeric','intermediate',sep='/',trailing_separator='/') + gi_prefix <- covidcommon::create_prefix(prefix=slot_prefix,'global','intermediate',sep='/',trailing_separator='/') + + chimeric_block_prefix <- covidcommon::create_prefix(prefix=ci_prefix, slot=list(opt$this_slot,"%09d"), sep='.', trailing_separator='.') + chimeric_local_prefix <- covidcommon::create_prefix(prefix=chimeric_block_prefix, slot=list(opt$this_block,"%09d"), sep='.', trailing_separator='.') + global_block_prefix <- covidcommon::create_prefix(prefix=gi_prefix, slot=list(opt$this_slot,"%09d"), sep='.', trailing_separator='.') + global_local_prefix <- covidcommon::create_prefix(prefix=global_block_prefix, slot=list(opt$this_block,"%09d"), sep='.', trailing_separator='.') + + ### Set up initial conditions ---------- ## python configuration: build simulator model initialized with compartment and all. gempyor_inference_runner <- gempyor$InferenceSimulator( config_path=opt$config, @@ -239,9 +259,13 @@ for(scenario in scenarios) { initialize=TRUE # Shall we pre-compute now things that are not pertubed by inference ) + ## Using the prefixes, create standardized files of each type (e.g., seir) of the form + ## {variable}/{prefix}{block-1}.{run_id}.{variable}.{ext} + ## N.B.: prefix should end in "{slot}." first_global_files <- inference::create_filename_list(opt$run_id, global_block_prefix, opt$this_block - 1) first_chimeric_files <- inference::create_filename_list(opt$run_id, chimeric_block_prefix, opt$this_block - 1) - + ## print("RUNNING: initialization of first block") + ## Functions within this function save variables to files of the form variable/name/scenario/deathrate/run_id/global/intermediate/slot.(block-1),run_id.variable.ext and also copied into the /chimeric/ version, which are referenced by first_global_files and first_chimeric_files inference::initialize_mcmc_first_block( opt$run_id, opt$this_block, @@ -275,11 +299,12 @@ for(scenario in scenarios) { }, is_resume = opt[['is-resume']] ) - - + + ## So far no acceptances have occurred current_index <- 0 -### Load initial files + + ### Load initial files (were created within function initialize_mcmc_first_block) seeding_col_types <- NULL suppressMessages(initial_seeding <- readr::read_csv(first_chimeric_files[['seed_filename']], col_types=seeding_col_types)) if (opt$stoch_traj_flag) { @@ -291,24 +316,48 @@ for(scenario in scenarios) { initial_hpar <- arrow::read_parquet(first_chimeric_files[['hpar_filename']]) chimeric_likelihood_data <- arrow::read_parquet(first_chimeric_files[['llik_filename']]) global_likelihood_data <- arrow::read_parquet(first_global_files[['llik_filename']]) - -#####Get the full likelihood (WHY IS THIS A DATA FRAME) + + ##Add initial perturbation sd values to parameter files---- + initial_snpi <- inference::add_perturb_column_snpi(initial_snpi,config$interventions$settings) + initial_hnpi <- inference::add_perturb_column_hnpi(initial_hnpi,config$interventions$settings) + + #Need to write these parameters back to the SAME chimeric file since they have a new column now + arrow::write_parquet(initial_snpi,first_chimeric_files[['snpi_filename']]) + arrow::write_parquet(initial_hnpi,first_chimeric_files[['hnpi_filename']]) + + # Also need to add this column to the global file (it will always be equal in the first block) (MIGHT NOT BE WORKING) + arrow::write_parquet(initial_snpi,first_global_files[['snpi_filename']]) + arrow::write_parquet(initial_hnpi,first_global_files[['hnpi_filename']]) + + #####Get the full likelihood (WHY IS THIS A DATA FRAME) # Compute total loglik for each sim global_likelihood <- sum(global_likelihood_data$ll) -#####LOOP NOTES -### current means proposed -### initial means accepted/current + #####LOOP NOTES + ### initial means accepted/current + ### current means proposed + + startTimeCount=Sys.time() + ##Loop over simulations in this block ---- -#####Loop over simulations in this block - for(this_index in seq_len(opt$simulations_per_slot)) { - print(paste("Running simulation", this_index)) + # keep track of running average global acceptance rate, since old global likelihood data not kept in memory. Each geoID has same value for acceptance rate in global case, so we just take the 1st entry + old_avg_global_accept_rate <- global_likelihood_data$accept_avg[1] + for( this_index in seq_len(opt$simulations_per_slot)) { + print(paste("Running simulation", this_index)) + + startTimeCountEach=Sys.time() + ## Create filenames + + ## Using the prefixes, create standardized files of each type (e.g., seir) of the form + ## {variable}/{prefix}{block-1}.{run_id}.{variable}.{ext} + ## N.B.: prefix should end in "{block}." this_global_files <- inference::create_filename_list(opt$run_id, global_local_prefix, this_index) this_chimeric_files <- inference::create_filename_list(opt$run_id, chimeric_local_prefix, this_index) - ## Do perturbations from accepted + ### Do perturbations from accepted parameters to get proposed parameters ---- + proposed_seeding <- inference::perturb_seeding( seeding = initial_seeding, date_sd = config$seeding$date_sd, @@ -316,22 +365,36 @@ for(scenario in scenarios) { amount_sd = config$seeding$amount_sd, continuous = !(opt$stoch_traj_flag) ) + +<<<<<<< HEAD proposed_snpi <- inference::perturb_snpi(initial_snpi, config$interventions$settings) proposed_hnpi <- inference::perturb_hnpi(initial_hnpi, config$interventions$settings) proposed_spar <- initial_spar - if(!deathrate %in% names(config$outcomes$settings)){ - stop(paste("Deathrate",deathrate,"does not appear in outcomes::settings in the config")) - } proposed_hpar <- inference::perturb_hpar(initial_hpar, config$outcomes$settings[[deathrate]]) - +======= + # Old perturbation method, directly from config + + #proposed_snpi <- inference::perturb_snpi(initial_snpi, config$interventions$settings) + #proposed_hnpi <- inference::perturb_hnpi(initial_hnpi, config$interventions$settings) + # New perturbation method, from parameter file instead + print("NOTE: Perturbations are being read from files instead of configs after 1st iteration in each slot for snpi and hnpi") + proposed_snpi <- inference::perturb_snpi_from_file(initial_snpi, config$interventions$settings, chimeric_likelihood_data) + proposed_hnpi <- inference::perturb_hnpi_from_file(initial_hnpi, config$interventions$settings, chimeric_likelihood_data) + proposed_spar <- inference::perturb_spar_from_file(initial_spar, config$interventions$settings, chimeric_likelihood_data) + proposed_hpar <- inference::perturb_hpar_from_file(initial_hpar, config$interventions$settings, chimeric_likelihood_data) +>>>>>>> origin/inference + proposed_spar <- initial_spar + proposed_hpar <- inference::perturb_hpar(initial_hpar, config$outcomes$settings[[deathrate]]) + + ## Write files that need to be written for other code to read + # writes to file of the form variable/name/scenario/deathrate/run_id/global/intermediate/slot.block.iter.run_id.variable.ext write.csv(proposed_seeding,this_global_files[['seed_filename']]) arrow::write_parquet(proposed_snpi,this_global_files[['snpi_filename']]) arrow::write_parquet(proposed_hnpi,this_global_files[['hnpi_filename']]) arrow::write_parquet(proposed_spar,this_global_files[['spar_filename']]) arrow::write_parquet(proposed_hpar,this_global_files[['hpar_filename']]) - ## Update the prefix gempyor_inference_runner$update_prefix(new_prefix=global_local_prefix) ## Run the simulator @@ -342,16 +405,15 @@ for(scenario in scenarios) { if(err != 0){ stop("InferenceSimulator failed to run") } - + sim_hosp <- report.generation:::read_file_of_type(gsub(".*[.]","",this_global_files[['hosp_filename']]))(this_global_files[['hosp_filename']]) %>% dplyr::filter(time >= min(obs$date),time <= max(obs$date)) - - - + lhs <- unique(sim_hosp[[obs_nodename]]) rhs <- unique(names(data_stats)) all_locations <- rhs[rhs %in% lhs] - + + ## Compare model output to data and calculate likelihood ---- proposed_likelihood_data <- inference::aggregate_and_calc_loc_likelihoods( all_locations = all_locations, modeled_outcome = sim_hosp, @@ -372,40 +434,70 @@ for(scenario in scenarios) { start_date = gt_start_date, end_date = gt_end_date ) - - + + rm(sim_hosp) - + ## UNCOMMENT TO DEBUG ## print(global_likelihood_data) ## print(chimeric_likelihood_data) ## print(proposed_likelihood_data) - + ## Compute total loglik for each sim proposed_likelihood <- sum(proposed_likelihood_data$ll) - + ## For logging - print(paste("Current likelihood",global_likelihood,"Proposed likelihood",proposed_likelihood)) - - if(inference::iterateAccept(global_likelihood, proposed_likelihood) || ((current_index == 0) && (opt$this_block == 1))){ - print("****ACCEPT****") + print(paste("Current likelihood",formatC(global_likelihood,digits=2,format="f"),"Proposed likelihood",formatC(proposed_likelihood,digits=2,format="f"))) + + ## Global likelihood acceptance or rejection decision ---- + + + proposed_likelihood_data$accept <- ifelse(inference::iterateAccept(global_likelihood, proposed_likelihood) || ((current_index == 0) && (opt$this_block == 1)),1,0) + if(proposed_likelihood_data$accept == 1) { + + print("**** ACCEPT (Recording) ****") if ((opt$this_block == 1) && (current_index == 0)) { print("by default because it's the first iteration of a block 1") } + old_global_files <- inference::create_filename_list(opt$run_id, global_local_prefix, current_index) old_chimeric_files <- inference::create_filename_list(opt$run_id, chimeric_local_prefix, current_index) + + #IMPORTANT: This is the index of the most recent globally accepted parameters current_index <- this_index + + proposed_likelihood_data$accept <- 1 # global acceptance decision (0/1), same recorded for each geoID + + #This carries forward to next iteration as current global likelihood global_likelihood <- proposed_likelihood + #This carries forward to next iteration as current global likelihood data global_likelihood_data <- proposed_likelihood_data - + + warning("Removing unused files") sapply(old_global_files, file.remove) } else { - print("****REJECT****") + print("**** REJECT (Recording) ****") + warning("Removing unused files") sapply(this_global_files, file.remove) } - arrow::write_parquet(proposed_likelihood_data, this_global_files[['llik_filename']]) + avg_global_accept_rate <- ((this_index-1)*old_avg_global_accept_rate + proposed_likelihood_data$accept)/(this_index) # update running average acceptance probability + proposed_likelihood_data$accept_avg <-avg_global_accept_rate + proposed_likelihood_data$accept_prob <- exp(min(c(0, proposed_likelihood - global_likelihood))) #acceptance probability + + + old_avg_global_accept_rate <- avg_global_accept_rate # keep track, since old global likelihood data not kept in memory + + ## Print average global acceptance rate + # print(paste("Average global acceptance rate: ",formatC(100*avg_global_accept_rate,digits=2,format="f"),"%")) + + # prints to file of the form llik/name/scenario/deathrate/run_id/global/intermediate/slot.block.iter.run_id.llik.ext + arrow::write_parquet(proposed_likelihood_data, this_global_files[['llik_filename']]) + + ## Chimeric likelihood acceptance or rejection decisions (one round) ----- + # "Chimeric" means GeoID-specific + seeding_npis_list <- inference::accept_reject_new_seeding_npis( seeding_orig = initial_seeding, seeding_prop = proposed_seeding, @@ -418,45 +510,71 @@ for(scenario in scenarios) { orig_lls = chimeric_likelihood_data, prop_lls = proposed_likelihood_data ) + + old_avg_chimeric_accept_rate <- chimeric_likelihood_data$accept_avg # keep track of running average chimeric acceptance rate, for each geoID, since old chimeric likelihood data not kept in memory + + # Update accepted parameters to start next simulation initial_seeding <- seeding_npis_list$seeding initial_snpi <- seeding_npis_list$snpi initial_hnpi <- seeding_npis_list$hnpi initial_hpar <- seeding_npis_list$hpar chimeric_likelihood_data <- seeding_npis_list$ll - arrow::write_parquet(chimeric_likelihood_data, this_chimeric_files[['llik_filename']]) - + + # Update running average acceptance rate + chimeric_likelihood_data$accept_avg <- ((this_index-1)*old_avg_chimeric_accept_rate + chimeric_likelihood_data$accept)/(this_index) # update running average acceptance probability. CHECK, this depends on values being in same order in both dataframes. Better to bind?? + + arrow::write_parquet(chimeric_likelihood_data, this_chimeric_files[['llik_filename']]) # to file of the form llik/name/scenario/deathrate/run_id/chimeric/intermediate/slot.block.iter.run_id.llik.parquet + + ## Write accepted parameters to file + # writes to file of the form variable/name/scenario/deathrate/run_id/chimeric/intermediate/slot.block.iter.run_id.variable.ext + write.csv(initial_seeding,this_chimeric_files[['seed_filename']]) + arrow::write_parquet(initial_snpi,this_chimeric_files[['snpi_filename']]) + arrow::write_parquet(initial_hnpi,this_chimeric_files[['hnpi_filename']]) + arrow::write_parquet(initial_spar,this_chimeric_files[['spar_filename']]) + arrow::write_parquet(initial_hpar,this_chimeric_files[['hpar_filename']]) + + print(paste("Current index is ",current_index)) - # print(proposed_likelihood_data) - # print(chimeric_likelihood_data) - - ###Memory managment + + ###Memory management rm(proposed_snpi) rm(proposed_hnpi) rm(proposed_hpar) rm(proposed_seeding) + + endTimeCountEach=difftime(Sys.time(), startTimeCountEach, units = "secs") + print(paste("Time to run this MCMC iteration is ",formatC(endTimeCountEach,digits=2,format="f")," seconds")) } - -#####Do MCMC end copy. Fail if unsucessfull - cpy_res <- inference::perform_MCMC_step_copies(current_index, - opt$this_slot, - opt$this_block, - opt$run_id, - global_local_prefix, - gf_prefix, - global_block_prefix) - - if(!prod(unlist(cpy_res))) {stop("File copy failed:", paste(unlist(cpy_res),paste(names(cpy_res),"|")))} - - -#####Write currently accepted files to disk + + endTimeCount=difftime(Sys.time(), startTimeCount, units = "secs") + # print(paste("Time to run all MCMC iterations is ",formatC(endTimeCount,digits=2,format="f")," seconds")) + + #####Do MCMC end copy. Fail if unsucessfull + # moves the most recently globally accepted parameter values from global/intermediate file to global/final + cpy_res_global <- inference::perform_MCMC_step_copies_global(current_index, + opt$this_slot, + opt$this_block, + opt$run_id, + global_local_prefix, + gf_prefix, + global_block_prefix) + if(!prod(unlist(cpy_res_global))) {stop("File copy failed:", paste(unlist(cpy_res_global),paste(names(cpy_res_global),"|")))} + # moves the most recently chimeric accepted parameter values from chimeric/intermediate file to chimeric/final + + cpy_res_chimeric <- inference::perform_MCMC_step_copies_chimeric(this_index, + opt$this_slot, + opt$this_block, + opt$run_id, + chimeric_local_prefix, + cf_prefix, + chimeric_block_prefix) + if(!prod(unlist(cpy_res_chimeric))) {stop("File copy failed:", paste(unlist(cpy_res_chimeric),paste(names(cpy_res_chimeric),"|")))} + #####Write currently accepted files to disk + #files of the form variables/name/scenario/deathrate/run_id/chimeric/intermediate/slot.block.run_id.variable.parquet output_chimeric_files <- inference::create_filename_list(opt$run_id, chimeric_block_prefix, opt$this_block) + #files of the form variables/name/scenario/deathrate/run_id/global/intermediate/slot.block.run_id.variable.parquet output_global_files <- inference::create_filename_list(opt$run_id, global_block_prefix, opt$this_block) - readr::write_csv(initial_seeding,output_chimeric_files[['seed_filename']]) - arrow::write_parquet(initial_snpi,output_chimeric_files[['snpi_filename']]) - arrow::write_parquet(initial_hnpi,output_chimeric_files[['hnpi_filename']]) - arrow::write_parquet(initial_spar,output_chimeric_files[['spar_filename']]) - arrow::write_parquet(initial_hpar,output_chimeric_files[['hpar_filename']]) - arrow::write_parquet(chimeric_likelihood_data,output_chimeric_files[['llik_filename']]) + warning("Chimeric hosp and seir files not yet supported, just using the most recently generated file of each type") current_index_global_files <- inference::create_filename_list(opt$run_id, global_local_prefix, current_index) file.copy(current_index_global_files[['hosp_filename']],output_chimeric_files[['hosp_filename']]) diff --git a/R/scripts/full_filter.R b/R/scripts/full_filter.R index 9739c7fba..dc5b709aa 100644 --- a/R/scripts/full_filter.R +++ b/R/scripts/full_filter.R @@ -29,6 +29,8 @@ if(opt$config == ""){ )) } +print(paste('Running ',opt$j,' jobs in parallel')) + config <- covidcommon::load_config(opt$config) deathrates <- opt$deathrates diff --git a/R/scripts/get_location_pop_and_ifr.R b/R/scripts/get_location_pop_and_ifr.R new file mode 100644 index 000000000..3980ecf74 --- /dev/null +++ b/R/scripts/get_location_pop_and_ifr.R @@ -0,0 +1,61 @@ + + +# Extract the age-stratified Population and IFR estimates for the Country/Location of interest. +# - the base data is generated by the COVID19_IFR repo which pulls data from Worldpop based on tiffs and shapefiles. + + +# Country ISO +iso_of_interest <- "KOR" + + +# SETUP ------------------------------------------------------------------- +library(tidyverse) + + + +# LOAD BASE DATA ---------------------------------------------------------- + +# Save as parquets (removing CSVs for size) +# global_pop_5yr <- read_csv("data/ifr_and_pop/GLOBAL_age_pop_5yr.csv") %>% rename(age = age_group) +# arrow::write_parquet(global_pop_5yr, "data/ifr_and_pop/GLOBAL_age_pop_5yr.parquet") +# global_pop_1yr <- read_csv("data/ifr_and_pop/GLOBAL_age_pop_data.csv") %>% select(-loc) %>% separate(age, into=c("age_l","age_r"), sep="_", remove = FALSE) +# arrow::write_parquet(global_pop_5yr, "data/ifr_and_pop/GLOBAL_age_pop_1yr.parquet") +# global_ifr_adm2 <- read_csv("data/ifr_and_pop/GLOBAL_ifr_ageadj.csv") +# arrow::write_parquet(global_ifr_adm2, "data/ifr_and_pop/GLOBAL_ifr_ageadj_adm2.parquet") +# global_ifr_adm1 <- read_csv("data/ifr_and_pop/GLOBAL_ifr_ageadj_adm1.csv") +# arrow::write_parquet(global_ifr_adm1, "data/ifr_and_pop/GLOBAL_ifr_ageadj_adm1.parquet") +# global_ifr_adm0 <- read_csv("data/ifr_and_pop/GLOBAL_ifr_ageadj_adm0.csv") +# arrow::write_parquet(global_ifr_adm0, "data/ifr_and_pop/GLOBAL_ifr_ageadj_adm0.parquet") + + +global_pop_5yr <- arrow::read_parquet("data/ifr_and_pop/GLOBAL_age_pop_5yr.parquet") +global_pop_1yr <- arrow::read_parquet("data/ifr_and_pop/GLOBAL_age_pop_1yr.parquet") +global_ifr_adm2 <- arrow::read_parquet("data/ifr_and_pop/GLOBAL_ifr_ageadj_adm2.parquet") +global_ifr_adm1 <- arrow::read_parquet("data/ifr_and_pop/GLOBAL_ifr_ageadj_adm1.parquet") +global_ifr_adm0 <- arrow::read_parquet("data/ifr_and_pop/GLOBAL_ifr_ageadj_adm0.parquet") + + + +# SUBSET TO SPECIFIED LOCATION -------------------------------------------- +# -- current setup is just for Country subsets. For state or other, need to modify on your own + +loc_pop_5yr <- global_pop_5yr %>% filter(adm0 == iso_of_interest) +loc_pop_1yr <- global_pop_1yr %>% filter(adm0 == iso_of_interest) +loc_ifr_adm2 <- global_ifr_adm2 %>% filter(adm0 == iso_of_interest) +loc_ifr_adm1 <- global_ifr_adm1 %>% filter(adm0 == iso_of_interest) +loc_ifr_adm0 <- global_ifr_adm0 %>% filter(adm0 == iso_of_interest) + + + +# SAVE LOCATION DATA ------------------------------------------------------ + +arrow::write_parquet(loc_pop_5yr, paste0("data/ifr_and_pop/", iso_of_interest, "_age_pop_5yr.parquet")) +arrow::write_parquet(loc_pop_1yr, paste0("data/ifr_and_pop/", iso_of_interest, "_age_pop_1yr.parquet")) +arrow::write_parquet(loc_ifr_adm2, paste0("data/ifr_and_pop/", iso_of_interest, "_ifr_ageadj_adm2.parquet")) +arrow::write_parquet(loc_ifr_adm1, paste0("data/ifr_and_pop/", iso_of_interest, "_ifr_ageadj_adm1.parquet")) +arrow::write_parquet(loc_ifr_adm0, paste0("data/ifr_and_pop/", iso_of_interest, "_ifr_ageadj_adm0.parquet")) + + + + + diff --git a/data/ifr_and_pop/GLOBAL_age_pop_1yr.parquet b/data/ifr_and_pop/GLOBAL_age_pop_1yr.parquet new file mode 100644 index 000000000..60dc664a2 Binary files /dev/null and b/data/ifr_and_pop/GLOBAL_age_pop_1yr.parquet differ diff --git a/data/ifr_and_pop/GLOBAL_age_pop_5yr.parquet b/data/ifr_and_pop/GLOBAL_age_pop_5yr.parquet new file mode 100644 index 000000000..60dc664a2 Binary files /dev/null and b/data/ifr_and_pop/GLOBAL_age_pop_5yr.parquet differ diff --git a/data/ifr_and_pop/GLOBAL_ifr_ageadj_adm0.parquet b/data/ifr_and_pop/GLOBAL_ifr_ageadj_adm0.parquet new file mode 100644 index 000000000..16b3abe19 Binary files /dev/null and b/data/ifr_and_pop/GLOBAL_ifr_ageadj_adm0.parquet differ diff --git a/data/ifr_and_pop/GLOBAL_ifr_ageadj_adm1.parquet b/data/ifr_and_pop/GLOBAL_ifr_ageadj_adm1.parquet new file mode 100644 index 000000000..ce8ed5c58 Binary files /dev/null and b/data/ifr_and_pop/GLOBAL_ifr_ageadj_adm1.parquet differ diff --git a/data/ifr_and_pop/GLOBAL_ifr_ageadj_adm2.parquet b/data/ifr_and_pop/GLOBAL_ifr_ageadj_adm2.parquet new file mode 100644 index 000000000..249f365ad Binary files /dev/null and b/data/ifr_and_pop/GLOBAL_ifr_ageadj_adm2.parquet differ diff --git a/gempyor_pkg/src/gempyor/outcomes.py b/gempyor_pkg/src/gempyor/outcomes.py index b5d6ed42c..fd489b2b8 100644 --- a/gempyor_pkg/src/gempyor/outcomes.py +++ b/gempyor_pkg/src/gempyor/outcomes.py @@ -149,7 +149,7 @@ def read_parameters_from_config(s: setup.Setup): print( "Intersect with seir simulation: ", len(branching_data.geoid.unique()), - "keeped", + "kept", ) if len(branching_data.geoid.unique()) != len(s.spatset.nodenames): diff --git a/local_install.R b/local_install.R index 08824b6f5..1e13363ce 100644 --- a/local_install.R +++ b/local_install.R @@ -1,10 +1,19 @@ +# Installs the custom-made packages in this repository + library(devtools) install.packages(c("covidcast","data.table","vroom","dplyr"), force=TRUE) +# To run if operating in the container initial.options <- commandArgs(trailingOnly = FALSE) file.arg.name <- "--file=" -script.name <- sub(file.arg.name, "", initial.options[grep(file.arg.name, initial.options)]) - -pkg.dir <- paste0(dirname(script.name), "/R/pkgs/") +script.name <- sub(file.arg.name, "", initial.options[grep(file.arg.name, initial.options)]) # get the name of this file, by looking for the option "--file" in the arguments that were used to start this R instance and getting the term that comes after +pkg.dir <- paste0(dirname(script.name), "/R/pkgs/") # find the directory that this file is within install.packages(list.files(pkg.dir,full.names=TRUE),type='source',repos=NULL) + +# to run within a local instance of R studio + +#install.packages(list.files("./R/pkgs/",full.names=TRUE),type='source',repos=NULL) #install from files. Run from COVIDScenarioPipeline folder. Might need to run twice since packages are interdependent and might not be installed in correct order +# devtools::install_github("HopkinsIDD/globaltoolboxlite") #install the covidimportation package from a separate Github repo +# devtools::install_github("HopkinsIDD/covidImportation") +# diff --git a/notebooks/MCMC_diagnosis.Rmd b/notebooks/MCMC_diagnosis.Rmd new file mode 100644 index 000000000..d316a943b --- /dev/null +++ b/notebooks/MCMC_diagnosis.Rmd @@ -0,0 +1,1254 @@ +--- +title: "R Notebook" +params: + config_file: config_2020_updated_incidC_incidH_shifts.yml + continue_on_error: yes + hosp_eval: no + pi_hi: 0.95 + pi_lo: 0.05 + runs_folder: model_output_2020_updated_incidC_incidH_shifts_n100_j4 # directory with model output (i.e. with subdirs: hosp, snpi, spar) + pdeath_default: "med" + #states: "ALL" + states: ["CA","FL","IL","NY","OH","TX","WA"] +output: + html_document: + df_print: paged + toc: true + toc_depth: 3 +editor_options: + chunk_output_type: inline +#Note: For now it seems like this notebook must be in the main folder of COVID19_USA (although it is supposed to be able to find the correct directory) +--- + +```{r setup, include=F} + +## Block with general knitr options, library load, etc. Nothing specific to the project. +knitr::opts_knit$set(eval.after = 'fig.cap', + root.dir = rprojroot::find_root(rprojroot::has_file(params$config_file))) # the root does not seem to be working and carrying thru to other chunks + +knitr::opts_chunk$set( + echo = FALSE, + fig.align = "center", + message = FALSE, + warning = FALSE, + error = params$continue_on_error, + cache.lazy = FALSE, + bitmapType = "cairo" +) + +options(scipen=999) + +#Preamble +# Make sure to run local_install.R first +library(tidyverse) +library(covidcommon) +library(report.generation) +library(covidImportation) +library(ggtext) +library(cowplot) +library(googlesheets4) +#library(covidHubUtils) +library(gtsummary) +library(zoo) +library(corrplot) +library(reshape2) +``` + +```{r parameter_definitions} + +config_file=params$config_file +continue_on_error= params$continue_on_error +hosp_eval=params$hosp_eval +pi_hi= params$pi_hi +pi_lo= params$pi_hi +runs_folder= params$runs_folder +states= params$states +pdeath_default= params$pdeath_default + +# These filters select which fitted parameters were be plotted. They have to be changed depending on the config and what was run +this_hpar_filter <-c('incidC_probability','incidH_probability','incidD_probability') +this_hpar_fig_labs <- c("Case detection probability","Hospitalization probability","Death probability") +this_snpi_filter <-c('local_variance','Seas_mar','lockdown','open_p1','open_p2') +this_snpi_fig_labs <- c("Local variance in R0","March seasonality","Lockdown","Phase 1 re-opening","Phase 2 re-opening") +this_hnpi_filter <-c('incidCshift1','incidCshift2','incidH_adj') +this_hnpi_fig_labs <- c("State-specific reduction in incidC (phase 1)","State-specific reduction in incidC (phase 2)","State-specific adjustment to hospitalization rate") +``` + +```{r rmd_formatting} +fig_counter <- 1 +tab_counter <- 1 +multi_fig_height <- length(states) +``` + + +```{r load_config} +config <- covidcommon::load_config(config_file) +``` + + +```{r config_vals} +scn_labels <- "inference" +scn_levels <- "inference" +scn_colors = c("#550055") +# scn_labels <- config$report$formatting$scenario_labels +# scn_levels<-config$report$formatting$scenario_levels +sim_start_date <- config$start_date +sim_end_date <- config$end_date +max_date <-Sys.Date() + + +pdeath_label= paste0(100*config$outcomes$settings[[pdeath_default]]$incidD$probability$value$value,"% IFR") + +``` + + + +```{r} +# In case you want to pre-maturely end and knit the document +#knitr::knit_exit() +``` + + + +```{r} + +# Load in basic information on geoIDs of interest + +geodata_all <- load_geodata_file(file.path(config$spatial_setup$base_path, config$spatial_setup$geodata), + geoid_len=5, + names=TRUE) %>% + mutate(name=str_to_title(name)) %>% + rename(pop=starts_with("pop")) + +included_geoids_all <- geodata_all$geoid + +``` + + +```{r choose_states} + +if(states=="ALL"){ + + states <- geodata_all$USPS +} + +multi_fig_height <- length(states) + +geodata_some <-geodata_all %>% dplyr::filter(USPS %in% states) + +included_geoids_some <- geodata_some$geoid +``` + + +```{r load_model_timecourse_summed} +# Gets model outcomes over time summed over all geoIDs of interest +model_output_summed <-load_hosp_geocombined_totals(runs_folder, + scenario_levels = scn_levels, + scenario_labels = scn_labels, + incl_geoids = included_geoids_all, + pdeath_filter=pdeath_default) + +``` + + + +```{r load_model_timecourse} +# Gets model outcomes over time for all geoIDs of interest individually +model_timecourse_all <- load_hosp_county(runs_folder, + scenario_levels = scn_levels, + scenario_labels = scn_labels, + incl_geoids=included_geoids_all, + pdeath_filter = pdeath_default) %>% + left_join(geodata_all) %>% + mutate(name = factor(name, levels=sort(geodata_all$name, decreasing=TRUE))) + +``` + + +```{r load_truth, cache=TRUE, cache.extra=paste(Sys.Date())} + +# Get up to date data on cases, deaths, and hospitalizations - takes a long time to run + +#truth_dat <-get_groundtruth_from_source(source="csse") # for county level +truth_dat_raw <- covidcommon::get_groundtruth_from_source(source = "csse", scale = "US state") + +``` + + +```{r process_truth} + +truth_dat_all<-truth_dat_raw%>% + dplyr::select(Confirmed, Deaths, incidI, incidDeath, USPS=source, date=Update, geoid=FIPS) %>% + left_join(geodata_all) + +#truth_dat_some <-truth_dat_all %>% dplyr::filter(USPS %in% states) + +``` + + + +```{r load_hosp_data, cache=TRUE, cache.extra=paste(Sys.Date())} + +# State level hospitalization data +# PROBLEM: These functions replace missing data with zeros! And not all states report until Aug 1. So, added code to replace all values before Aug 1 with NA +hosp_data_start_date ="2020-08-01" + +hosp_dat_all<-get_hhsCMU_incidH_st_data() %>% + dplyr::select(date=Update, USPS=source, incidH=incidH_confirmed)%>% + #drop_na() + mutate(incidH=ifelse(date>hosp_data_start_date,incidH,NA)) + +hosp_dat_all <- geodata_all %>% + left_join(hosp_dat_all) + +currhosp_dat_all<-get_hhsCMU_hospCurr_st_data() %>% + dplyr::select(date=Update, USPS=source, currhosp=hospCurr_confirmed)%>% + #drop_na() + mutate(currhosp=ifelse(date>hosp_data_start_date,currhosp,NA)) + +currhosp_dat_all <- geodata_all %>% + left_join(currhosp_dat_all) + +# This is specific to California data! +# #only works by county. only has county name, not FIPS - not sure how they treat redundant counties! +# currhosp_dat <- read.csv("https://data.ca.gov/dataset/529ac907-6ba1-4cb7-9aae-8966fc96aeef/resource/42d33765-20fd-44b8-a978-b083b7542225/download/hospitals_by_county.csv") %>% +# dplyr::rename(name = county, +# date = todays_date, +# currhosp=hospitalized_covid_confirmed_patients) %>% +# tibble() + +# currhosp_dat <- currhosp_dat %>% +# left_join(geodata) %>% +# mutate(date = lubridate::ymd(date)) + +# Currently hospitalized +# hosp_dat<-currhosp_dat %>% +# group_by(date) %>% +# summarize(currhosp=sum(currhosp))%>% +# left_join(hosp_dat) +``` + +```{r} + +# combine case + death data with hospitalization data + +# prevalence +combo_obs_dat<-currhosp_dat_all%>% + select(geoid, date, currhosp)%>% + right_join(truth_dat_all) + +# incidence +combo_obs_dat<-hosp_dat_all%>% + select(geoid, date, incidH)%>% + right_join(combo_obs_dat) + +combo_obs_dat<-combo_obs_dat %>% + dplyr::relocate(geoid, USPS, name, pop, date) %>% + arrange(geoid,date) + #dplyr::rename(source=USPS) #just needed for this plotting function for some reason + +``` + +# Epidemic timecourse from simulations vs data + +## National + +Using the final parameter values from each chain + +```{r calib} + +# Creates plots for time course of cases + deaths (+hospitalization) aggregated over geoIDs, model + data, for final parameter values from each slot + +calib_plts <- plot_model_vs_obs(model_output_summed, + jhu_obs_dat = combo_obs_dat %>% dplyr::rename(source=USPS), #needed for plotting, not sure why + scenario_colors = scn_colors, + obs_data_col = "black", + ci.L = 0.025, + ci.U = 0.975, + date_breaks = "1 month", + sim_start_date = sim_start_date, + sim_end_date = max_date, + week = TRUE, + hosp=TRUE, tendency="mean") + +calib_legend<-get_legend(calib_plts[[1]]+theme(legend.position="bottom")) + +#calib_plts[[1]]<-calib_plts[[1]]+theme(legend.position = "none")+scale_x_date(breaks="1 month", date_labels="%b %y") +#calib_plts[[2]]<-calib_plts[[2]]+theme(legend.position = "none")+scale_x_date(breaks="1 month", date_labels="%b %y") +calib_plts[[1]]<-calib_plts[[1]]+theme(legend.position = "none") +calib_plts[[2]]<-calib_plts[[2]]+theme(legend.position = "none") + +if(length(calib_plts)>2){ + calib_plts[[3]]<-calib_plts[[3]]+theme(legend.position = "none") +} + +``` + +```{r single_geoid_plot, fig.height=3, fig.width=10, fig.cap=cap} + +summed_fit<- plot_grid(NULL, NULL, NULL, NULL, NULL, NULL, + NULL, calib_plts[[1]], + NULL, calib_plts[[2]], + NULL, calib_plts[[3]], + NULL, NULL,calib_legend, NULL, NULL, NULL, + ncol=6, nrow=3, + rel_heights=c(0.05, 0.9, 0.05), + rel_widths = c(0.03, 0.47, 0.03, 0.47, 0.03, 0.47), + labels=c("","","", "Model Calibration", "", "","", + "A", "", "B", "","C","")) +summed_fit + +cap<-paste0("**Fig. ", fig_counter, " ** Model calibration to incident cases (A) and incident deaths (B) reported by JHU CSSE for each state at ", pdeath_label, " assumptions, summed over entire country. (C) [Optional] Model comparison to hospitalizations (not fit). Shaded areas show 95% confidence intervals based on ",max(model_output_summed$sim_num),"independent inference runs and black points/lines indicate data reported by JHU CSSE.") + +fig_counter<-fig_counter+1 +``` + +## State-level + +(for subset of states only now) + +```{r county_truth, include=FALSE} +#replace term "_some" in variables with "_all" to plot for all locations instead of subset + +combo_obs_dat_some <-combo_obs_dat %>% dplyr::filter(USPS %in% states) +model_timecourse_some <-model_timecourse_all %>% dplyr::filter(USPS %in% states) + +calib<-plot_truth_by_location(truth_dat=combo_obs_dat_some, + model_dat=model_timecourse_some, + start_date=sim_start_date, + end_date=max_date, + geodat=geodata_some, + filter_by="pdeath", + filter_val=pdeath_default, + hosp=TRUE) + +cases<-calib[[1]]+theme(legend.position="none", + strip.text.y = element_blank(), + strip.background.y = element_blank(), + axis.title.y = element_blank()) + background_grid() + +cases<-ggplot_gtable(ggplot_build(cases)) + +deaths<-calib[[2]]+theme(legend.position="none", + #strip.text.y=element_blank(), + #strip.background.y=element_blank(), + axis.title.y = element_blank()) + background_grid() + +deaths<-ggplot_gtable(ggplot_build(deaths)) + +# Comment out if set hosp=FALSE in calling the plotting function +hosps<-calib[[3]]+theme(legend.position="none", + axis.title.y = element_blank(), + strip.text = element_text(size=10, face="bold")) + background_grid() + +hosps<-ggplot_gtable(ggplot_build(hosps)) + +cases$widths<-deaths$widths +hosps$widths<-deaths$widths + +plot_calib<-gridExtra::grid.arrange(grobs = list(cases, deaths, hosps), ncol=3, nrow=1) +legend<-get_legend(calib[[1]]+theme(legend.position="bottom")) +``` + + +```{r county_plot, fig.height=multi_fig_height, fig.width=10, fig.cap=cap} +#{r county_plot, fig.height=10, fig.width=10, fig.cap=cap} +cowplot::plot_grid(plot_calib, legend, nrow=2, rel_heights = c(1, 0.01)) + +cap<- paste0("**Fig. ", fig_counter, "** Calibration of estimated incident cases and deaths to reported data from JHU CSSE, and validation of estimates for occupied hospital beds when compared to CDPH data. Here, modeled cases are calculated as a percent of modeled infection that is fit to county data. Black points represent actual data, lines represent means and shading represents the 95% prediction interval for each scenario at ",pdeath_label," and 0.5% assumptions. Note that JHU CSSE data were reported as daily cumulative cases and deaths. In this figure, daily cumulative case counts were differenced in order to report the incident cases and deaths. **In comparing the actual and modeled data, we emphasize that limited testing and reporting delays may affect the quality of the reported case data early on in the outbreak.**") + +fig_counter<-fig_counter+1 + +``` + + +# Likelihood trends + +## Likelihood by MCMC iteration: state-level + + +```{r} +llik_final <- load_llik_sims_filtered(outcome_dir=runs_folder, + pdeath_filter=pdeath_default, + incl_geoids=included_geoids_all) %>% + dplyr::select(c(-filename,-file_name,-runID)) +``` + +```{r} +llik_interm <- load_llik_sims_filtered_interm(outcome_dir=runs_folder, + pdeath_filter=pdeath_default, + incl_geoids=included_geoids_all) %>% + dplyr::select(c(-filename,-file_name,-runID, -location)) + +llik_interm <- geodata_all %>% + left_join(llik_interm) + +llik_interm_some <-llik_interm %>% dplyr::filter(USPS %in% states) +``` + + + +```{r mcmc_llik_plot, include=FALSE} + +llik_plot <- plot_llik_by_location(llik_interm_some%>% dplyr::filter(slot_num %in% c(1,2)), #change to show for single slot or remove for all slots + filter_by="pdeath", + filter_val=pdeath_default, + burn_in = 0 +) + +lik_type1 <-llik_plot[[1]]+theme(legend.position="none", + #strip.text.y = element_blank(), + #strip.background.y = element_blank(), + #axis.title.y = element_blank(), + strip.text = element_text(size=10, face="bold")) + background_grid() + +lik_type1<-ggplot_gtable(ggplot_build(lik_type1)) + +# Uncomment if more than one scenario or death rate being included +# lik_type2 <-llik_plot[[2]]+theme(legend.position="none", +# #strip.text.y = element_blank(), +# #strip.background.y = element_blank(), +# #axis.title.y = element_blank(), +# strip.text = element_text(size=10, face="bold")) +# +# lik_type2<-ggplot_gtable(ggplot_build(lik_type2)) + +# repeat for other entries in llik_plot if more scenarios were looked at + +#plot_llik_grid <-gridExtra::grid.arrange(grobs = list(lik_type1,lik_type2), ncol=2, nrow=1) +plot_llik_grid <-gridExtra::grid.arrange(grobs = list(lik_type1), ncol=2, nrow=1) +legend<-get_legend(llik_plot[[1]]+theme(legend.position="top")) +``` + + +```{r llik_plot, fig.height=multi_fig_height, fig.width=10, fig.cap=cap} +cowplot::plot_grid(plot_llik_grid, legend, nrow=2, rel_heights = c(1, 0.01)) + +cap<- paste0("**Fig. ", fig_counter, "** GeoID-specific log-likelihood values by MCMC step. Here 'chimeric' values are the likelihood for accepted parameters in the chimeric likelihood, and 'global' values are the likelihood values accepted for the global likelihood (only plot iterations at which accept=1). These two likelihood values are equivalent only at steps where the chimeric likelihood was accepted for that GeoID.**") + +fig_counter<-fig_counter+1 + +``` + + +## Likelihood by MCMC iteration: national-level + + +```{r mcmc_llik_plot_total, include=FALSE} + +llik_plot <- plot_llik_total(llik_interm,# %>% dplyr::filter(slot_num==1), #change to show for only a single slot + filter_by="pdeath", + filter_val=pdeath_default, + burn_in = 0 +) + +``` + + +```{r llik_plot_total, fig.height=2, fig.width=5, fig.cap=cap} +llik_plot[[1]] + +cap<- paste0("**Fig. ", fig_counter, "** Total log-likelihood values by MCMC step (summed over all GeoIDs). Here 'chimeric' values are the total likelihood for accepted parameters in the chimeric likelihood, and 'global' values are the total likelihood values for the accepted parameters of the global likelihood, which are recorded in the global likelihood files (only plot iterations at which accept=1). The chimeric (accepted) likelihood is always higher since acceptance decisions are made on a geoID-by-geoID level, and only accepted for GeoIDs where the acceptance would improve the geoID-specific likelihood. These two likelihood values would only be equivalent at steps where the chimeric likelihood was accepted for every single GeoID.**") + +fig_counter<-fig_counter+1 + +``` + + +## Proportion of likelihoods accepted or rejected + + +```{r mcmc_accept_rate_plot, include=FALSE} + +accept_plot <- plot_accept_by_location(llik_interm_some,# %>% dplyr::filter(slot_num==1), #change to show for only a single slot + filter_by="pdeath", + filter_val=pdeath_default, + burn_in = 0) + +accept_type1 <-accept_plot[[1]]+theme(legend.position="none", + #strip.text.y = element_blank(), + #strip.background.y = element_blank(), + #axis.title.y = element_blank(), + strip.text = element_text(size=10, face="bold")) + background_grid() + +accept_type1<-ggplot_gtable(ggplot_build(accept_type1)) + +# accept_type2 <-accept_plot[[2]]+theme(legend.position="none", +# #strip.text.y = element_blank(), +# #strip.background.y = element_blank(), +# #axis.title.y = element_blank(), +# strip.text = element_text(size=10, face="bold")) +# +# accept_type2<-ggplot_gtable(ggplot_build(accept_type2)) + +# repeat for other entries in llik_plot if more scenarios were looked at + +#plot_accept_grid <-gridExtra::grid.arrange(grobs = list(accept_type1,accept_type2), ncol=2, nrow=1) +plot_accept_grid <-gridExtra::grid.arrange(grobs = list(accept_type1), ncol=2, nrow=1) +legend<-get_legend(accept_plot[[1]]+theme(legend.position="top")) +``` + + +```{r accept_plot, fig.height=multi_fig_height, fig.width=10, fig.cap=cap} +cowplot::plot_grid(plot_accept_grid, legend, nrow=2, rel_heights = c(1, 0.01)) + +cap<- paste0("**Fig. ", fig_counter, "** Acceptance rate of proposed parameters by MCMC step for each state ('chimeric' values) along with the global acceptance rate. Acceptance rate is averaged over all previous steps. **") + +fig_counter<-fig_counter+1 + +``` + +## Average acceptance as rolling mean over an interval + + +```{r mcmc_accept_rate_plot_roll, include=FALSE} + +this_roll_period = 10 +this_burn_in = 10 + +accept_plot <- plot_accept_by_location_rolling(llik_interm_some,# %>% dplyr::filter(slot_num==1), #change to show for only a single slot + filter_by="pdeath", + filter_val=pdeath_default, + roll_period = this_roll_period, + burn_in = this_burn_in) + +accept_type1 <-accept_plot[[1]]+theme(legend.position="none", + #strip.text.y = element_blank(), + #strip.background.y = element_blank(), + #axis.title.y = element_blank(), + strip.text = element_text(size=10, face="bold")) + background_grid() + +accept_type1<-ggplot_gtable(ggplot_build(accept_type1)) + +# accept_type2 <-accept_plot[[2]]+theme(legend.position="none", +# #strip.text.y = element_blank(), +# #strip.background.y = element_blank(), +# #axis.title.y = element_blank(), +# strip.text = element_text(size=10, face="bold")) +# +# accept_type2<-ggplot_gtable(ggplot_build(accept_type2)) + +# repeat for other entries in llik_plot if more scenarios were looked at + +#plot_accept_grid <-gridExtra::grid.arrange(grobs = list(accept_type1,accept_type2), ncol=2, nrow=1) +plot_accept_grid <-gridExtra::grid.arrange(grobs = list(accept_type1), ncol=2, nrow=1) +legend<-get_legend(accept_plot[[1]]+theme(legend.position="top")) +``` + + +```{r accept_plot_roll, fig.height=multi_fig_height, fig.width=10, fig.cap=cap} +cowplot::plot_grid(plot_accept_grid, legend, nrow=2, rel_heights = c(1, 0.01)) + +cap<- paste0("**Fig. ", fig_counter, "** Acceptance rate of proposed parameters by MCMC step for each state ('chimeric' values) along with the global acceptance rate. Acceptance rate is averaged over the previous",this_roll_period,"steps**") + +fig_counter<-fig_counter+1 + +``` +## Cumulative acceptances + + +```{r mcmc_accept_cumul, include=FALSE} + +accept_plot <- plot_accept_by_location_cumul(llik_interm_some,# %>% dplyr::filter(slot_num==1), #change to show for only a single slot + filter_by="pdeath", + filter_val=pdeath_default, + burn_in = 0) + +accept_type1 <-accept_plot[[1]]+theme(legend.position="none", + #strip.text.y = element_blank(), + #strip.background.y = element_blank(), + #axis.title.y = element_blank(), + strip.text = element_text(size=10, face="bold")) + background_grid() + +accept_type1<-ggplot_gtable(ggplot_build(accept_type1)) + +# accept_type2 <-accept_plot[[2]]+theme(legend.position="none", +# #strip.text.y = element_blank(), +# #strip.background.y = element_blank(), +# #axis.title.y = element_blank(), +# strip.text = element_text(size=10, face="bold")) +# +# accept_type2<-ggplot_gtable(ggplot_build(accept_type2)) + +# repeat for other entries in llik_plot if more scenarios were looked at + +#plot_accept_grid <-gridExtra::grid.arrange(grobs = list(accept_type1,accept_type2), ncol=2, nrow=1) +plot_accept_grid <-gridExtra::grid.arrange(grobs = list(accept_type1), ncol=2, nrow=1) +legend<-get_legend(accept_plot[[1]]+theme(legend.position="top")) +``` + + +```{r accept_plot_cumul, fig.height=multi_fig_height, fig.width=10, fig.cap=cap} +cowplot::plot_grid(plot_accept_grid, legend, nrow=2, rel_heights = c(1, 0.01)) + +cap<- paste0("**Fig. ", fig_counter, "** Cumulative number of acceptances of proposed parameters by MCMC step for each state ('chimeric' values) along with the global acceptance rate**") + +fig_counter<-fig_counter+1 + +``` + +# Parameter inference trends + +## Basic SEIR parameter by MCMC iteration + +Note: For these runs, these parameters are the same for all states and are fixed, not estimated, so nothing to plot + +```{r} +# Load final SEIR parameter values + +spar_final <- load_spar_sims_filtered(outcome_dir=runs_folder, + pre_process=function(x){x}, + pdeath_filter=pdeath_default) +``` + +```{r} + +# Load final values of intervention parameters +spar_interm<- load_spar_sims_filtered_interm(outcome_dir=runs_folder, + pre_process=function(x){x}, + pdeath_filter=pdeath_default) %>% + dplyr::select(c(-location,-runID,-file_name)) + +spar_interm <- spar_interm %>% + left_join(llik_interm%>%dplyr::filter(USPS=='MA')%>%dplyr::select(-accept_avg,-accept_prob,-ll,-colnames(geodata_all))) #only need for 1 geoID, same for all + +``` + +```{r mcmc_spar_plot, include=FALSE} + +spar_plot <- plot_spars(spar_interm,#%>% dplyr::filter(slot_num==1), + filter_by="pdeath", + filter_val=pdeath_default, + burn_in = 0 +) + +spar_type1 <-spar_plot[[1]]+theme(legend.position="none", + #strip.text.y = element_blank(), + #strip.background.y = element_blank(), + #axis.title.y = element_blank(), + strip.text = element_text(size=10, face="bold")) + background_grid() + +spar_type1<-ggplot_gtable(ggplot_build(spar_type1)) + +# accept_type2 <-accept_plot[[2]]+theme(legend.position="none", +# #strip.text.y = element_blank(), +# #strip.background.y = element_blank(), +# #axis.title.y = element_blank(), +# strip.text = element_text(size=10, face="bold")) +# +# accept_type2<-ggplot_gtable(ggplot_build(accept_type2)) + +# repeat for other entries in llik_plot if more scenarios were looked at + +#plot_accept_grid <-gridExtra::grid.arrange(grobs = list(accept_type1,accept_type2), ncol=2, nrow=1) +plot_spar_grid <-gridExtra::grid.arrange(grobs = list(spar_type1), ncol=2, nrow=1) +legend<-get_legend(spar_plot[[1]]+theme(legend.position="top")) +``` + + +```{r spars_plot, fig.height=10, fig.width=10, fig.cap=cap} +cowplot::plot_grid(plot_spar_grid, legend, nrow=2, rel_heights = c(1, 0.01)) + +cap<- paste0("**Fig. ", fig_counter, "** SEIR parameters by MCMC step**") + +fig_counter<-fig_counter+1 + +``` + +## Outcome model parameter values by MCMC iteration + +```{r} +# Load final Outcomes parameter values + +hpar_final <- load_hpar_sims_filtered(outcome_dir=runs_folder, + pre_process=function(x){x}, # {x %>% dplyr::filter(parameter=="R0")}, + pdeath_filter=pdeath_default, + incl_geoids = included_geoids_all) +``` + +```{r} + +# Load final values of intervention parameters +hpar_interm<- load_hpar_sims_filtered_interm(outcome_dir=runs_folder, + pre_process=function(x){x}, + pdeath_filter=pdeath_default, + incl_geoids=included_geoids_all) %>% + dplyr::select(c(-location,-runID,-file_name)) + +hpar_interm <- geodata_all %>% + left_join(hpar_interm) %>% + left_join(llik_interm%>%dplyr::select(-accept_avg,-accept_prob,-ll)) + +hpar_interm_some <- hpar_interm %>% dplyr::filter(USPS %in% states) + +``` + +```{r mcmc_hpar_plot,include=FALSE} + +# These filters are defined at the top of the file now +#this_hpar_filter <-c('incidC_probability','incidH_probability','incidD_probability') +#this_fig_labs <- c("Case detection probability","Hospitalization probability","Death probability") + +hpar_plot <- plot_hpar_by_location(hpar_interm_some,#%>% dplyr::filter(slot_num==1), + pdeath_filter = pdeath_default, + scenario_filter = 'inference', + hpar_filter = this_hpar_filter, + fig_labs=this_hpar_fig_labs +) + +hpar_plot_list <- list() + +# for first column of plots, keep y axis and remove strip label +hpar_plot_list[[1]] <- ggplot_gtable(ggplot_build(hpar_plot[[1]]+theme(legend.position="none", + #strip.text.y = element_blank(), + #strip.background.y = element_blank() + strip.text = element_text(size=10, face="bold") +))) + +# for last column of plots, remove y axis and keep strip label +hpar_plot_list[[length(this_hpar_filter)]] <- ggplot_gtable(ggplot_build(hpar_plot[[length(this_hpar_filter)]]+theme(legend.position="none", + strip.text = element_text(size=10, face="bold"), + axis.title.y = element_blank()))) + +# for all other column of plots, remove y axis and remove strip label + +if (length(this_hpar_filter)>2){ + for (i in 2:(length(this_hpar_filter)-1)){ + print(i) + hpar_plot_list[[i]] <- ggplot_gtable(ggplot_build(hpar_plot[[i]]+theme(legend.position="none", + #strip.background.y = element_blank(), + #strip.text.y = element_blank(), + strip.text = element_text(size=10, face="bold"), + axis.title.y = element_blank()))) + } +} + + +plot_hpar_grid <-gridExtra::grid.arrange(grobs = hpar_plot_list, ncol=length(this_hpar_filter), nrow=1) +legend<-get_legend(hpar_plot[[1]]+theme(legend.position="top")) +``` + + +```{r hpar_plot, fig.height=multi_fig_height, fig.width=10, fig.cap=cap} +cowplot::plot_grid(plot_hpar_grid, legend, nrow=2, rel_heights = c(1, 0.01)) + +cap<- paste0("**Fig. ", fig_counter, "** Inferred outcome parameter values by MCMC step for each GeoID.'Global' values are the parameters accepted in the global likelihood, and 'Chimeric' values are the parameters accepted at the GeoID-level in the chimeric likelihood **") + +fig_counter<-fig_counter+1 + +``` + +## SNPIs by MCMC iteration + + +```{r} + +# Load final values of intervention parameters + +snpi_final<- load_snpi_sims_filtered(outcome_dir=runs_folder, + pre_process=function(x){x}, + pdeath_filter=pdeath_default, + incl_geoids=included_geoids_all) %>% + dplyr::select(sim_num, scenario, pdeath, geoid, npi_name, start_date, end_date, reduction) # need to fix sim_num parsing +``` + +```{r} +# Load final values of intervention parameters +snpi_interm<- load_snpi_sims_filtered_interm(outcome_dir=runs_folder, + pre_process=function(x){x}, + pdeath_filter=pdeath_default, + incl_geoids=included_geoids_all) %>% + dplyr::select(c(-location,-runID, -file_name)) + +snpi_interm <- geodata_all %>% + left_join(snpi_interm)%>% + left_join(llik_interm%>%dplyr::select(-accept_avg,-accept_prob,-ll)) + +snpi_interm_some <-snpi_interm %>% dplyr::filter(USPS %in% states) +``` + + +```{r mcmc_snpi_plot, include=FALSE} + +# These filters are defined at the top of the file now +#this_snpi_filter <-c('local_variance','Seas_mar','lockdown','open_p1','open_p2') +#this_fig_labs <- c("Local variance in R0","March seasonality","Lockdown","Phase 1 re-opening","Phase 2 re-opening") + +snpi_plot <- plot_snpi_by_location(snpi_interm_some%>% dplyr::filter(slot_num %in% c(1,2)), #change to show for single slot or remove for all slots + pdeath_filter = pdeath_default, + scenario_filter = 'inference', + snpi_filter = this_snpi_filter, + fig_labs=this_snpi_fig_labs +) + +snpi_plot_list <- list() + +# for first column of plots, keep y axis and remove strip label +snpi_plot_list[[1]] <- ggplot_gtable(ggplot_build(snpi_plot[[1]]+theme(legend.position="none", + #strip.text.y = element_blank(), + #strip.background.y = element_blank() + strip.text = element_text(size=10, face="bold") +))) + +# for last column of plots, remove y axis and keep strip label +snpi_plot_list[[length(this_snpi_filter)]] <- ggplot_gtable(ggplot_build(snpi_plot[[length(this_snpi_filter)]]+theme(legend.position="none", + strip.text = element_text(size=10, face="bold"), + axis.title.y = element_blank()))) + +# for all other column of plots, remove y axis and remove strip label + +if (length(this_snpi_filter)>2){ + for (i in 2:(length(this_snpi_filter)-1)){ + print(i) + snpi_plot_list[[i]] <- ggplot_gtable(ggplot_build(snpi_plot[[i]]+theme(legend.position="none", + #strip.background.y = element_blank(), + #strip.text.y = element_blank(), + strip.text = element_text(size=10, face="bold"), + axis.title.y = element_blank()))) + } +} + + +plot_snpi_grid <-gridExtra::grid.arrange(grobs = snpi_plot_list, ncol=length(this_snpi_filter), nrow=1) +legend<-get_legend(snpi_plot[[1]]+theme(legend.position="top")) +``` + + +```{r snpi_plot, fig.height=multi_fig_height, fig.width=10, fig.cap=cap} +cowplot::plot_grid(plot_snpi_grid, legend, nrow=2, rel_heights = c(1, 0.01)) + +cap<- paste0("**Fig. ", fig_counter, "** Inferred paramter values by MCMC step for each GeoID.'Global' values are the parameters accepted in the global likelihood, and 'Chimeric' values are the parameters accepted at the GeoID-level in the chimeric likelihood **") + +fig_counter<-fig_counter+1 + +``` + +## HNPIs by MCMC iteration + +This section might be blank if no outcome parameters were fit + + + +```{r} + +# Load final values of intervention parameters +# +hnpi_final<- load_hnpi_sims_filtered(outcome_dir=runs_folder, + pre_process=function(x){x}, + pdeath_filter=pdeath_default, + incl_geoids=included_geoids_all) %>% + dplyr::select(sim_num, scenario, pdeath, geoid, npi_name, start_date, end_date, reduction) # need to fix sim_num parsing +``` + + +```{r} +# Load final values of intervention parameters +hnpi_interm<- load_hnpi_sims_filtered_interm(outcome_dir=runs_folder, + pre_process=function(x){x}, + pdeath_filter=pdeath_default, + incl_geoids=included_geoids_all) %>% + dplyr::select(c(-location,-runID, file_name)) + +hnpi_interm <- geodata_all %>% + left_join(hnpi_interm)%>% + left_join(llik_interm%>%dplyr::select(-accept_avg,-accept_prob,-ll)) + +hnpi_interm_some <-hnpi_interm %>% dplyr::filter(USPS %in% states) +``` + +```{r} +eval_hnpi <-exists("hnpi_interm")==T +``` + + +```{r mcmc_hnpi_plot, include=FALSE, eval=eval_hnpi} + +hnpi_plot <- plot_hnpi_by_location(hnpi_interm_some%>% dplyr::filter(slot_num %in% c(1,2)), #change to show for single slot or remove for all slots + pdeath_filter = pdeath_default, + scenario_filter = 'inference', + hnpi_filter = this_hnpi_filter, + fig_labs=this_hnpi_fig_labs +) + +hnpi_plot_list <- list() + +# for first column of plots, keep y axis and remove strip label +hnpi_plot_list[[1]] <- ggplot_gtable(ggplot_build(hnpi_plot[[1]]+theme(legend.position="none", + #strip.text.y = element_blank(), + #strip.background.y = element_blank() + strip.text = element_text(size=10, face="bold") +))) + +# for last column of plots, remove y axis and keep strip label +hnpi_plot_list[[length(this_hnpi_filter)]] <- ggplot_gtable(ggplot_build(hnpi_plot[[length(this_hnpi_filter)]]+theme(legend.position="none", + strip.text = element_text(size=10, face="bold"), + axis.title.y = element_blank()))) + +# for all other column of plots, remove y axis and remove strip label + +if (length(this_hnpi_filter)>2){ + for (i in 2:(length(this_hnpi_filter)-1)){ + print(i) + hnpi_plot_list[[i]] <- ggplot_gtable(ggplot_build(hnpi_plot[[i]]+theme(legend.position="none", + #strip.background.y = element_blank(), + #strip.text.y = element_blank(), + strip.text = element_text(size=10, face="bold"), + axis.title.y = element_blank()))) + } +} + + +plot_hnpi_grid <-gridExtra::grid.arrange(grobs = hnpi_plot_list, ncol=max(3,length(this_hnpi_filter)), nrow=1) +legend<-get_legend(hnpi_plot[[1]]+theme(legend.position="top")) + +``` + + +```{r hnpi_plot, fig.height=multi_fig_height, fig.width=10, fig.cap=cap, eval=eval_hnpi} + +#if (exists("hnpi_interm")==T) { + +cowplot::plot_grid(plot_hnpi_grid, legend, nrow=2, rel_heights = c(1, 0.01)) + +cap<- paste0("**Fig. ", fig_counter, "** Inferred paramter values by MCMC step for each GeoID.'Global' values are the proposed parameters at each step, and 'Chimeric' values are the parameters accepted at the GeoID-level in the chimeric likelihood **") + +fig_counter<-fig_counter+1 + +#} + + + +``` + + +## Seeding parameters + +```{r} + +seed_all <- load_seed_sims_filtered_interm(outcome_dir=runs_folder, + pdeath_filter=pdeath_default, + incl_geoids=included_geoids_all) + +seed_all <- geodata_all %>% + left_join(seed_all) + +seed_interm <- seed_all%>% dplyr::filter(is_final =="intermediate") + +seed_interm_some <-seed_interm %>% dplyr::filter(USPS %in% states) + +seed_final <- seed_all %>% + dplyr::filter(is_final =="final", lik_type =="global") %>% + dplyr::select(-block_num,-iter_num) %>% + dplyr::rename(sim_num=slot_num) + +rm(seed_all) + +``` + +Don't have any plots developed yet for seeding values, but do have code to read them in + +# Correlations between fit parameter values (identifiability analysis) + +Correlations are for within a particular MCMC chain, treating each iteration as a sample + +## For each state individually + + + +```{r} + +# fix this to use config to figure out what parameters were fit or not (instead of removing NA values) +# Fix this to average over all slots + +#these_states=c('MA','NY','AZ','CA','IL','TX') +these_states = states + +for (this_state in these_states){ + + this_state_name=str_replace_all(geodata_all$name[geodata_all$USPS==this_state]," ","") + print(this_state_name) + + snpi_interm_mat <- snpi_interm %>% dplyr::filter(slot_num==1)%>% + filter(USPS==this_state) %>% + dplyr::filter(lik_type=='chimeric')%>% + dplyr::select(c("npi_name","reduction","iter_num")) %>% + drop_na() %>% + tidyr::spread(npi_name,reduction) %>% + dplyr::select(-starts_with("Dose")) %>% # remove variables that were not fit (otherwise get NAs in correlation matrix due to zero STD) + #select_if(~ !any(is.na(.))) # remove any other rows or columns with NA values + dplyr::select(where(~!sd(.) == 0)) # remove any columns with zero standard deviation (ie were not fit) + + + names(snpi_interm_mat) <- str_remove(names(snpi_interm_mat),fixed(paste0(this_state_name,'_'),ignore_case=TRUE)) # remove states names from intervention names + names(snpi_interm_mat) <- str_remove(names(snpi_interm_mat),fixed(paste0(this_state,'_'),ignore_case=TRUE)) # remove states names from intervention names + + hpar_interm_mat <-hpar_interm %>% dplyr::filter(slot_num==1)%>% + filter(USPS==this_state) %>% + dplyr::filter(lik_type=='chimeric') %>% + unite(outcome_quantity,c("outcome","quantity"), sep="_")%>% #make a new parameter that combines the outcome+quantity variables + dplyr::select(c("outcome_quantity","value","iter_num")) %>% + drop_na(iter_num) %>% # for now, because don't know what it means to have no iteration number + tidyr::spread(outcome_quantity,value) %>% + dplyr::select(where(~!sd(.) == 0)) # remove any columns with zero standard deviation (ie were not fit) + + if (exists("hnpi_interm")==T) { + + hnpi_interm_mat <-hnpi_interm %>% dplyr::filter(slot_num==1)%>% + filter(USPS==this_state) %>% + dplyr::filter(lik_type=='chimeric') %>% + dplyr::select(c("npi_name","reduction","iter_num")) %>% + drop_na(iter_num) %>% # for now, because don't know what it means to have no iteration number + tidyr::spread(npi_name,reduction) + + if(nrow(hnpi_interm_mat)>0){ # if there are interventions for this geoID + hnpi_interm_mat <- hnpi_interm_mat %>% + dplyr::select(where(~!sd(.) == 0)) # remove any columns with zero standard deviation (ie were not fit) + } + + names(hnpi_interm_mat) <- str_remove(names(hnpi_interm_mat),fixed(paste0(this_state_name,'_'),ignore_case=TRUE)) # remove states names from intervention names + names(hnpi_interm_mat) <- str_remove(names(hnpi_interm_mat),fixed(paste0(this_state,'_'),ignore_case=TRUE)) # remove states names from intervention names + + fit_param_mat <- snpi_interm_mat %>% left_join(hpar_interm_mat) %>% left_join(hnpi_interm_mat) + + }else{ + + fit_param_mat <- snpi_interm_mat %>% left_join(hpar_interm_mat) + } + + fit_param_mat$iter_num <- NULL + + fit_param_cor = cor(fit_param_mat) + + corrplot(cor(fit_param_mat), method = "ellipse",order = "hclust", addrect = 3, title=this_state_name) + +} + + +``` + +## Average over all states + +```{r} +# Fix this to average over all slots +all_fit_param_cor<- data.frame(Var1=character(),Var2=character(),value=double(),state=character()) + +these_states <- unique(snpi_interm$USPS) + +for (this_state in these_states){ + + this_state_name=str_replace_all(geodata_all$name[geodata_all$USPS==this_state]," ","") + + snpi_interm_mat <- snpi_interm%>% dplyr::filter(slot_num==1) %>% + filter(USPS==this_state) %>% + dplyr::filter(lik_type=='chimeric')%>% + dplyr::select(c("npi_name","reduction","iter_num")) %>% + drop_na() %>% + tidyr::spread(npi_name,reduction) %>% + dplyr::select(-starts_with("Dose")) %>% # remove variables that were not fit (otherwise get NAs in correlation matrix due to zero STD) + #select_if(~ !any(is.na(.))) # remove any other rows or columns with NA values + dplyr::select(where(~!sd(.) == 0)) # remove any columns with zero standard deviation (ie were not fit) + + names(snpi_interm_mat) <- str_remove(names(snpi_interm_mat),fixed(paste0(this_state_name,'_'),ignore_case=TRUE)) # remove states names from intervention names + names(snpi_interm_mat) <- str_remove(names(snpi_interm_mat),fixed(paste0(this_state,'_'),ignore_case=TRUE)) # remove states names from intervention names + + if (this_state=="VI"){ # US Virgin Islands, name in geodata does not match name in config + names(snpi_interm_mat) <- str_remove(names(snpi_interm_mat),fixed(paste0("VirginIslands",'_'),ignore_case=TRUE)) # remove states names from intervention names + } + + hpar_interm_mat <-hpar_interm %>% dplyr::filter(slot_num==1)%>% + filter(USPS==this_state) %>% + dplyr::filter(lik_type=='chimeric') %>% + unite(outcome_quantity,c("outcome","quantity"), sep="_")%>% #make a new parameter that combines the outcome+quantity variables + dplyr::select(c("outcome_quantity","value","iter_num")) %>% + drop_na(iter_num) %>% # for now, because don't know what it means to have no iteration number + tidyr::spread(outcome_quantity,value) %>% + dplyr::select(where(~!sd(.) == 0)) # remove any columns with zero standard deviation (ie were not fit) + + if (exists("hnpi_interm")==T) { + + hnpi_interm_mat <-hnpi_interm%>% dplyr::filter(slot_num==1) %>% + filter(USPS==this_state) %>% + dplyr::filter(lik_type=='chimeric') %>% + dplyr::select(c("npi_name","reduction","iter_num")) %>% + drop_na(iter_num) %>% # for now, because don't know what it means to have no iteration number + tidyr::spread(npi_name,reduction) + + if(nrow(hnpi_interm_mat)>0){ # if there are interventions for this geoID + hnpi_interm_mat %>% + dplyr::select(where(~!sd(.) == 0)) # remove any columns with zero standard deviation (ie were not fit) + } + + names(hnpi_interm_mat) <- str_remove(names(hnpi_interm_mat),fixed(paste0(this_state_name,'_'),ignore_case=TRUE)) # remove states names from intervention names + names(hnpi_interm_mat) <- str_remove(names(hnpi_interm_mat),fixed(paste0(this_state,'_'),ignore_case=TRUE)) # remove states names from intervention names + + fit_param_mat <- snpi_interm_mat %>% left_join(hpar_interm_mat) %>% left_join(hnpi_interm_mat) + + }else{ + + fit_param_mat <- snpi_interm_mat %>% left_join(hpar_interm_mat) + } + + fit_param_mat$iter_num <- NULL + + fit_param_cor = cor(fit_param_mat) + + all_fit_param_cor <-rbind(all_fit_param_cor,mutate(melt(fit_param_cor),state=this_state_name)) + + #corrplot(cor(snpi_interm_mat), method = "ellipse",order = "hclust", addrect = 3, title=this_state_name) + +} +``` + +```{r} +mean_fit_param_cor <- all_fit_param_cor %>% group_by(Var1,Var2) %>% summarize(mean_corr=mean(value)) +mean_fit_param_cor$Var1 <-str_replace_all(mean_fit_param_cor$Var1, + regex(c("seas_jan"="seas_01jan", + "seas_feb"="seas_02feb", + "seas_mar"="seas_03mar", + "seas_apr"="seas_04apr", + "seas_may"="seas_05may", + "seas_jun"="seas_06jun", + "seas_jul"="seas_07jul", + "seas_aug"="seas_08aug", + "seas_sep"="seas_09sep", + "seas_oct"="seas_10oct", + "seas_nov"="seas_11nov", + "seas_dec"="seas_12dec" + ), ignore_case=TRUE)) +mean_fit_param_cor$Var2 <-str_replace_all(mean_fit_param_cor$Var2, + regex(c("seas_jan"="seas_01jan", + "seas_feb"="seas_02feb", + "seas_mar"="seas_03mar", + "seas_apr"="seas_04apr", + "seas_may"="seas_05may", + "seas_jun"="seas_06jun", + "seas_jul"="seas_07jul", + "seas_aug"="seas_08aug", + "seas_sep"="seas_09sep", + "seas_oct"="seas_10oct", + "seas_nov"="seas_11nov", + "seas_dec"="seas_12dec" + ), ignore_case=TRUE)) + +mean_fit_param_cor_mat <- mean_fit_param_cor %>% + tidyr::spread(Var2,mean_corr) %>% + mutate_all(~replace(.,is.na(.),0)) +mean_fit_param_cor_mat$Var1<- NULL + + +corrplot(cor(mean_fit_param_cor_mat), method = "ellipse",order = "hclust", addrect = 4, title="Mean over all states") +corrplot(cor(mean_fit_param_cor_mat), method = "ellipse",order = "alphabet", title="Mean over all states") +``` + + +# Variance of posterior vs proposal step size + +This is a way of measuring whether adaptive MCMC could help. (Ignore covariance for now). If observed std >> perturb_sd, suggests that proposal too small. If observed_sd << perturb_sd, suggests proposal too big. Done for each parameteres + +```{r} + +# calculate variance for runs for a certain range of iterations (after ~100 runs acceptance ->0) +iter_start <-0 +iter_finish <-max(snpi_interm$iter_num, na.rm=TRUE) + +snpi_interm_var_mat <- snpi_interm %>% dplyr::filter(slot_num==1)%>% drop_na() %>% filter(iter_num > iter_start & iter_num < iter_finish) %>% dplyr::filter(lik_type=='chimeric')%>% group_by(USPS,geoid,scenario,pdeath,lik_type,npi_name) %>% dplyr::summarize(npi_sd_observed=sd(reduction)) %>% ungroup() + +# Get the perturbation values used in the simulation from the config file + +snpi_fit_inds <- config$interventions$settings[unlist(lapply(config$interventions$settings, exists, x="perturbation"))] +snpi_fit_names <- names(snpi_fit_inds)# %>% +#str_remove(npi_trimmer) %>% +#unique() # NOTE: this assumes all interventions have inference, if the projections for some geoids do NOT have inference, then you will need to modify the input to the Rt/effectiveness plots + +#get original perturbation +snpi_fit_perturb_sd <- lapply(snpi_fit_names,function(x){config$interventions$settings[[x]]$perturbation$sd}) + +# make a table of NPI name and perturbation +snpi_fit_df <- data.frame(unlist(snpi_fit_names),unlist(snpi_fit_perturb_sd)) +names(snpi_fit_df) <- c("npi_name","npi_sd_input") + +snpi_interm_var_mat <- snpi_interm_var_mat %>% left_join(snpi_fit_df) %>% drop_na() + +snpi_interm_var_mat$npi_sd_fold_change <-snpi_interm_var_mat$npi_sd_observed/snpi_interm_var_mat$npi_sd_input + +snpi_interm_var_mat +# compare + +``` + +# Final likelihood values + +```{r} + +# get final global values + +llik_final <- geodata_all %>% + left_join(llik_final) %>% + select(c(USPS,geoid,ll,sim_num)) + +for (i in unique(llik_final$sim_num)){ + + llik_final_temp <- llik_final%>% dplyr::filter(sim_num==i) + + sumLL <- sum(llik_final_temp$ll) + #print(sumLL) + + llik_final <- llik_final %>% add_row(USPS="TT",geoid="00000",sim_num=i,ll=sumLL) + +} + +``` + +```{r} +# get final chimeric values + +llik_final_chimeric <- llik_interm %>% + dplyr::filter(lik_type == "chimeric") %>% + dplyr::filter(iter_num == max(llik_interm$iter_num,na.rm=TRUE)) %>% + select(c(USPS,geoid,ll,slot_num)) %>% + dplyr::rename(sim_num=slot_num) + +# get the total + +for (i in unique(llik_final_chimeric$sim_num)){ + + llik_final_temp <- llik_final_chimeric%>% dplyr::filter(sim_num==i) + + sumLL <- sum(llik_final_temp$ll) + #print(sumLL) + + llik_final_chimeric <- llik_final_chimeric %>% add_row(USPS="TT",geoid="00000",sim_num=i,ll=sumLL) + +} + +llik_final_combo <- full_join(llik_final,llik_final_chimeric,by = c("USPS", "geoid","sim_num"),suffix = c(".global", ".chimeric")) %>% + dplyr::relocate(USPS,geoid,sim_num) %>% + dplyr::arrange(geoid,sim_num) +``` + + +```{r} +llik_final_combo +``` + + +#OTHER + +Example code to read in SEIR values directly + +```{r} +seir <- load_seir_sims_filtered(outcome_dir = runs_folder, + pdeath_filter=pdeath_default, + incl_geoids = included_geoids_all, + pre_process = function(x){x %>% filter(comp %in% c("S","R"))}, + post_process = function(x){x %>% pivot_wider(names_from=comp, values_from=value)} #make each variable into column + ) + +``` + diff --git a/notebooks/MCMC_diagnosis.html b/notebooks/MCMC_diagnosis.html new file mode 100644 index 000000000..9ffe3bf0d --- /dev/null +++ b/notebooks/MCMC_diagnosis.html @@ -0,0 +1,420 @@ + + + + + + + + + + + + + +R Notebook + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + + + + + +
+

Epidemic timecourse from simulations vs data

+
+

National

+

Using the final parameter values from each chain

+
+**Fig. 1 ** Model calibration to incident cases (A) and incident deaths (B) reported by JHU CSSE for each state at 0.55% IFR assumptions, summed over entire country. (C) [Optional] Model comparison to hospitalizations (not fit). Shaded areas show 95% confidence intervals based on 4independent inference runs and black points/lines indicate data reported by JHU CSSE. +

+Fig. 1 Model calibration to incident cases (A) and incident deaths (B) reported by JHU CSSE for each state at 0.55% IFR assumptions, summed over entire country. (C) [Optional] Model comparison to hospitalizations (not fit). Shaded areas show 95% confidence intervals based on 4independent inference runs and black points/lines indicate data reported by JHU CSSE. +

+
+
+
+

State-level

+

(for subset of states only now)

+
+**Fig. 2** Calibration of estimated incident cases and deaths to reported data from JHU CSSE, and validation of estimates for occupied hospital beds when compared to CDPH data. Here, modeled cases are calculated as a percent of modeled infection that is fit to county data. Black points represent actual data, lines represent means and shading represents the 95% prediction interval for each scenario at 0.55% IFR and 0.5% assumptions. Note that JHU CSSE data were reported as daily cumulative cases and deaths. In this figure, daily cumulative case counts were differenced in order to report the incident cases and deaths. **In comparing the actual and modeled data, we emphasize that limited testing and reporting delays may affect the quality of the reported case data early on in the outbreak.** +

+Fig. 2 Calibration of estimated incident cases and deaths to reported data from JHU CSSE, and validation of estimates for occupied hospital beds when compared to CDPH data. Here, modeled cases are calculated as a percent of modeled infection that is fit to county data. Black points represent actual data, lines represent means and shading represents the 95% prediction interval for each scenario at 0.55% IFR and 0.5% assumptions. Note that JHU CSSE data were reported as daily cumulative cases and deaths. In this figure, daily cumulative case counts were differenced in order to report the incident cases and deaths. In comparing the actual and modeled data, we emphasize that limited testing and reporting delays may affect the quality of the reported case data early on in the outbreak. +

+
+
+
+ + +
+

Correlations between fit parameter values (identifiability analysis)

+

Correlations are for within a particular MCMC chain, treating each iteration as a sample

+
+

For each state individually

+ +
## [1] "California"
+

+
## [1] "Florida"
+

+
## [1] "Illinois"
+

+
## [1] "NewYork"
+

+
## [1] "Ohio"
+

+
## [1] "Texas"
+

+
## [1] "Washington"
+

+
+
+

Average over all states

+

+
+
+
+

Variance of posterior vs proposal step size

+

This is a way of measuring whether adaptive MCMC could help. (Ignore covariance for now). If observed std >> perturb_sd, suggests that proposal too small. If observed_sd << perturb_sd, suggests proposal too big. Done for each parameteres

+
+ +
+
+
+

Final likelihood values

+
+ +
+

#OTHER

+

Example code to read in SEIR values directly

+
+ + + + +
+ + + + + + + + + + + + + + +