Skip to content

Commit

Permalink
diagrams, links and minot edits
Browse files Browse the repository at this point in the history
  • Loading branch information
jim-barlow committed Jan 26, 2024
1 parent 8a6259b commit 88a0e88
Show file tree
Hide file tree
Showing 4 changed files with 46 additions and 12 deletions.
4 changes: 2 additions & 2 deletions docs/reference/decodedata/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@ _Attribute_ | Value
_**Name**_ | Decode Data
_**Project ID**_ | `decodedata`
_**Access**_ | Licensed
_**Description**_ | Profiles historic event data, then builds and deploys a base query and set of bespoke functions to remodel the data into a simple flat structure. It adds count metrics for `event_names` and type-specific value columns for `event_params` and `user_properties`.
_**Description**_ | Remodels GA4 event data into a flat structure, adding count metrics for `event_names` and type-specific value columns for `event_params` and `user_properties`.

# Deployment
Functions are currently deployed into the following [BigQuery regions](https://cloud.google.com/bigquery/docs/locations), but can be mirrored to additional regions as required:
Deployment functions are currently deployed into the following [BigQuery regions](https://cloud.google.com/bigquery/docs/locations), but can be mirrored to additional regions as required:

Region Name | Dataset ID
--- | ---
Expand Down
8 changes: 4 additions & 4 deletions docs/reference/decodedata/usage/advanced.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,10 @@ Advanced filters are implemented via the `event_options JSON` variable, which is
Note that the placeholders need to be replaced by a valid `JSON` value or array. Placeholder `"<[DATE]>"` would be replaced by e.g. `"2024-01-01"` and placeholder `["<ARRAY<STRING>"]` would be replaced by e.g. `["12345678", "87654321"]`.

## Data Profile Filters
Note that in each logical case, events are _not_ filtered out, simply excluded from the profiling and therefore not included in the relevant output STRUCT (`event_count`, `event_param`, `user_property`). This means that the row count for the result of a query against the `[dataset_id].GA4_EVENTS` date-bounded table function for a specific date range should _precisely_ match the row count in both the source GA4 table shard range `[dataset_id].events_*` _and_ the output table `[dataset_id].EVENTS`.
Note that in each logical case, events are _not_ filtered out, simply excluded from the profiling and therefore not included in the relevant output `STRUCT` (`event_count`, `event_param`, `user_property`). This means that the row count for the result of a query against the `[dataset_id].GA4_EVENTS` date-bounded table function for a specific date range should _precisely_ match the row count in both the source GA4 table shard range `[dataset_id].events_*` _and_ the output table `[dataset_id].EVENTS`.

### Date Range
Date ranges are used to only use a specific date range to build the custom decoder functions, only including columns in the decoded model which are observed in the defined date range (in addition to the standard set of columns). Neither or both `start_date` and `end_date` need to be passed in order for the deployment to be executed.
Date ranges filter the specific date range used to build the custom decoder functions, only including columns in the decoded model which are observed in the defined date range (in addition to the standard set of columns). Neither or both `start_date` and `end_date` need to be passed in order for the deployment to be executed.

!!! info "advanced event_options: `start_date, end_date`"
=== "JSON"
Expand All @@ -39,7 +39,7 @@ Date ranges are used to only use a specific date range to build the custom decod
}
```

=== "Static GoogleSQL"
=== "GoogleSQL (Static)"
```sql
DECLARE event_options JSON;

Expand All @@ -51,7 +51,7 @@ Date ranges are used to only use a specific date range to build the custom decod
"""
```

=== "Dynamic GoogleSQL"
=== "GoogleSQL (Dynamic)"
```sql
DECLARE event_options JSON;

Expand Down
14 changes: 9 additions & 5 deletions docs/reference/decodedata/usage/automation.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,3 @@
# Automation Options
There are several options to ensure that your output table is kept updated in a timely and efficient manner.

## BigQuery Native
The simplest way to automatically synchronise the output `EVENTS` date-partitioned table is to use the native flow runner (`RUN_FLOW`) function, which is deployed with your core resources by default.

Expand Down Expand Up @@ -73,7 +70,14 @@ The `start_date` is used to define the observation window in which the new date
```

### Automation Deployment
Native automation is achieved using BigQuery Scheduled Queries, which needs to be enabled for the project. Incremental refresh should be used to minimise costs and control for unpredictable inbound data timing.
Native automation is achieved using BigQuery [Scheduled Queries](../../terminology.md), which needs to be enabled for the project. Incremental refresh should be used to minimise costs and control for unpredictable inbound data timing, and it is recommended to schedule the incremental refresh every hour (and not more frequently than every 30 minutes).

It is also recommended to use the query label `scheduled_query_id` to support integration with job-level cost data, enabling granular cost tracking by GA4 property.

!!! info "`RUN_FLOW`: Incremental, 7 day observation window with `query_label`"
```sql
SET @@query_label = "scheduled_query_id:ga4_dataset_id";

It is also recommended to use a query label to support future integration with job-level cost data, enabling granular cost tracking by GA4 property.
CALL [dataset_id].RUN_FLOW (CURRENT_DATE - 7, NULL);
```

32 changes: 31 additions & 1 deletion docs/reference/decodedata/usage/resources.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,39 @@ The following resources are deployed to the destination dataset in all deploymen
| `EVENTS` | [`DATE-PARTITIONED TABLE`](../../terminology.md) | Partitioned by `event_date` | Output table containing remodelled event data. To connect this table optimally to Looker Studio, use the `event_date` partitioning column as the report date field in Looker Studio.
| `RUN_FLOW`| [`PROCEDURE`](../../terminology.md) | `start_date DATE`, `end_date DATE` | Runs the flow to refresh the output `EVENTS` table, with the behaviour controlled by the arguments.

## Architecture
These resources interoperate in the following architectural configuration:

```mermaid
flowchart TB
subgraph GA4 Dataset
subgraph source data
analytics.events[analytics_#########.events_*]
end
subgraph data modelling
analytics.RUN_FLOW((RUN_FLOW))
subgraph analytics.GA4_EVENTS[GA4_EVENTS]
analytics.GA4_event_names[GA4_event_names]
analytics.GA4_event_params[GA4_event_params]
analytics.GA4_user_properties[GA4_user_properties]
end
end
subgraph output data
analytics.EVENTS>EVENTS]
end
analytics.events --> analytics.GA4_EVENTS
analytics.GA4_EVENTS --> analytics.RUN_FLOW --> analytics.EVENTS
end
```

## Usage
### BigQuery
The `EVENTS` date-partitioned table is the output events table to which you connect downstream tools, logic and processes. Note that using the GoogleSQL [`CURRENT_DATE`](https://cloud.google.com/bigquery/docs/reference/standard-sql/date_functions#current_date) function enables dynamic ranges to be set in a clear and concise manner:
The `EVENTS` date-partitioned table is the output events table to which you connect downstream tools, logic and processes. Note that using the GoogleSQL [`CURRENT_DATE`](https://cloud.google.com/bigquery/docs/reference/standard-sql/date_functions#current_date) function enables dynamic ranges to be set in a clear and concise manner.

#### Query Data
??? info "basic query example: `EVENTS`"
Expand Down

0 comments on commit 88a0e88

Please sign in to comment.