diagrams, links and minot edits

transformationflow · Jan 26, 2024 · 88a0e88 · 88a0e88
1 parent 8a6259b
commit 88a0e88
Show file tree

Hide file tree

Showing 4 changed files with 46 additions and 12 deletions.
diff --git a/docs/reference/decodedata/index.md b/docs/reference/decodedata/index.md
@@ -6,10 +6,10 @@ _Attribute_ | Value
 _**Name**_ | Decode Data
 _**Project ID**_ | `decodedata`
 _**Access**_ | Licensed
-_**Description**_ | Profiles historic event data, then builds and deploys a base query and set of bespoke functions to remodel the data into a simple flat structure.  It adds count metrics for `event_names` and type-specific value columns for `event_params` and `user_properties`.
+_**Description**_ | Remodels GA4 event data into a flat structure, adding count metrics for `event_names` and type-specific value columns for `event_params` and `user_properties`. 
 
 # Deployment
-Functions are currently deployed into the following [BigQuery regions](https://cloud.google.com/bigquery/docs/locations), but can be mirrored to additional regions as required:
+Deployment functions are currently deployed into the following [BigQuery regions](https://cloud.google.com/bigquery/docs/locations), but can be mirrored to additional regions as required:
 
 Region Name | Dataset ID 
 --- | --- 

diff --git a/docs/reference/decodedata/usage/advanced.md b/docs/reference/decodedata/usage/advanced.md
@@ -25,10 +25,10 @@ Advanced filters are implemented via the `event_options JSON` variable, which is
 Note that the placeholders need to be replaced by a valid `JSON` value or array. Placeholder `"<[DATE]>"` would be replaced by e.g. `"2024-01-01"` and placeholder `["<ARRAY<STRING>"]` would be replaced by e.g. `["12345678", "87654321"]`.
 
 ## Data Profile Filters
-Note that in each logical case, events are _not_ filtered out, simply excluded from the profiling and therefore not included in the relevant output STRUCT (`event_count`, `event_param`, `user_property`).  This means that the row count for the result of a query against the `[dataset_id].GA4_EVENTS` date-bounded table function for a specific date range should _precisely_ match the row count in both the source GA4 table shard range `[dataset_id].events_*` _and_ the output table `[dataset_id].EVENTS`.
+Note that in each logical case, events are _not_ filtered out, simply excluded from the profiling and therefore not included in the relevant output `STRUCT` (`event_count`, `event_param`, `user_property`).  This means that the row count for the result of a query against the `[dataset_id].GA4_EVENTS` date-bounded table function for a specific date range should _precisely_ match the row count in both the source GA4 table shard range `[dataset_id].events_*` _and_ the output table `[dataset_id].EVENTS`.
 
 ### Date Range
-Date ranges are used to only use a specific date range to build the custom decoder functions, only including columns in the decoded model which are observed in the defined date range (in addition to the standard set of columns). Neither or both `start_date` and `end_date` need to be passed in order for the deployment to be executed. 
+Date ranges filter the specific date range used to build the custom decoder functions, only including columns in the decoded model which are observed in the defined date range (in addition to the standard set of columns). Neither or both `start_date` and `end_date` need to be passed in order for the deployment to be executed. 
 
 !!! info "advanced event_options: `start_date, end_date`"
     === "JSON"
@@ -39,7 +39,7 @@ Date ranges are used to only use a specific date range to build the custom decod
         }
         ```
 
-    === "Static GoogleSQL"
+    === "GoogleSQL (Static)"
         ```sql
         DECLARE event_options JSON; 
 
@@ -51,7 +51,7 @@ Date ranges are used to only use a specific date range to build the custom decod
         """
         ```
 
-    === "Dynamic GoogleSQL"
+    === "GoogleSQL (Dynamic)"
         ```sql
         DECLARE event_options JSON; 
 

diff --git a/docs/reference/decodedata/usage/automation.md b/docs/reference/decodedata/usage/automation.md
@@ -1,6 +1,3 @@
-# Automation Options
-There are several options to ensure that your output table is kept updated in a timely and efficient manner. 
-
 ## BigQuery Native
 The simplest way to automatically synchronise the output `EVENTS` date-partitioned table is to use the native flow runner (`RUN_FLOW`) function, which is deployed with your core resources by default.
 
@@ -73,7 +70,14 @@ The `start_date` is used to define the observation window in which the new date
     ```
 
 ### Automation Deployment
-Native automation is achieved using BigQuery Scheduled Queries, which needs to be enabled for the project. Incremental refresh should be used to minimise costs and control for unpredictable inbound data timing.
+Native automation is achieved using BigQuery [Scheduled Queries](../../terminology.md), which needs to be enabled for the project. Incremental refresh should be used to minimise costs and control for unpredictable inbound data timing, and it is recommended to schedule the incremental refresh every hour (and not more frequently than every 30 minutes). 
+
+It is also recommended to use the query label `scheduled_query_id` to support integration with job-level cost data, enabling granular cost tracking by GA4 property.
+
+!!! info "`RUN_FLOW`: Incremental, 7 day observation window with `query_label`"
+    ```sql
+    SET @@query_label = "scheduled_query_id:ga4_dataset_id";
 
-It is also recommended to use a query label to support future integration with job-level cost data, enabling granular cost tracking by GA4 property.
+    CALL [dataset_id].RUN_FLOW (CURRENT_DATE - 7, NULL);
+    ```
 
diff --git a/docs/reference/decodedata/usage/resources.md b/docs/reference/decodedata/usage/resources.md
@@ -12,9 +12,39 @@ The following resources are deployed to the destination dataset in all deploymen
 | `EVENTS` | [`DATE-PARTITIONED TABLE`](../../terminology.md) | Partitioned by `event_date` | Output table containing remodelled event data. To connect this table optimally to Looker Studio, use the `event_date` partitioning column as the report date field in Looker Studio.
 | `RUN_FLOW`| [`PROCEDURE`](../../terminology.md)  | `start_date DATE`, `end_date DATE` | Runs the flow to refresh the output `EVENTS` table, with the behaviour controlled by the arguments.
 
+## Architecture
+These resources interoperate in the following architectural configuration:
+
+```mermaid
+flowchart TB
+subgraph GA4 Dataset
+
+subgraph source data
+    analytics.events[analytics_#########.events_*]
+end
+
+subgraph data modelling
+    analytics.RUN_FLOW((RUN_FLOW))
+    subgraph analytics.GA4_EVENTS[GA4_EVENTS]
+        analytics.GA4_event_names[GA4_event_names]
+        analytics.GA4_event_params[GA4_event_params]
+        analytics.GA4_user_properties[GA4_user_properties]
+    end
+end
+
+subgraph output data
+  analytics.EVENTS>EVENTS]
+end
+
+analytics.events --> analytics.GA4_EVENTS
+analytics.GA4_EVENTS --> analytics.RUN_FLOW --> analytics.EVENTS
+
+end
+```
+
 ## Usage
 ### BigQuery
-The `EVENTS` date-partitioned table is the output events table to which you connect downstream tools, logic and processes. Note that using the GoogleSQL [`CURRENT_DATE`](https://cloud.google.com/bigquery/docs/reference/standard-sql/date_functions#current_date) function enables dynamic ranges to be set in a clear and concise manner:
+The `EVENTS` date-partitioned table is the output events table to which you connect downstream tools, logic and processes. Note that using the GoogleSQL [`CURRENT_DATE`](https://cloud.google.com/bigquery/docs/reference/standard-sql/date_functions#current_date) function enables dynamic ranges to be set in a clear and concise manner.
 
 #### Query Data
 ??? info "basic query example: `EVENTS`"