Skip to content

Commit 9ece92f

Browse files
Bug/string agg limit (#133)
* bug/string-agg-limit * update changelog & regen docs * Update int_jira__pivot_daily_field_history.sql * update incremental compatible and regen docs * regen docs * update yml * add tests * update conversation disablement * regen docs * Update int_jira__issue_comments.sql * update readme * release review updates * update readme * Apply suggestions from code review Co-authored-by: Jamie Rodriguez <[email protected]> --------- Co-authored-by: Jamie Rodriguez <[email protected]>
1 parent 9474332 commit 9ece92f

16 files changed

+163
-25
lines changed

CHANGELOG.md

+15
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,18 @@
1+
# dbt_jira v0.19.0
2+
[PR #133](https://github.com/fivetran/dbt_jira/pull/133) contains the following updates:
3+
4+
## Breaking Changes
5+
- This change is marked as breaking due to its impact on Redshift configurations.
6+
- For Redshift users, comment data aggregated under the `conversations` field in the `jira__issue_enhanced` table is now disabled by default to prevent consistent errors related to Redshift's varchar length limits.
7+
- If you wish to re-enable `conversations` on Redshift, set the `jira_include_conversations` variable to `true` in your `dbt_project.yml`.
8+
9+
## Under the Hood
10+
- Updated the `comment` seed data to ensure conversations are correctly disabled for Redshift by default.
11+
- Renamed the `jira_is_databricks_sql_warehouse` macro to `jira_is_incremental_compatible`, which was updated to return `true` if the Databricks runtime is an all-purpose cluster (previously it checked only for a SQL warehouse runtime) or if the target is any other non-Databricks-supported destination.
12+
- This update addresses Databricks runtimes (e.g., endpoints and external runtimes) that do not support the `insert_overwrite` incremental strategy used in the `jira__daily_issue_field_history` and `int_jira__pivot_daily_field_history` models.
13+
- For Databricks users, the `jira__daily_issue_field_history` and `int_jira__pivot_daily_field_history` models will now apply the incremental strategy only if running on an all-purpose cluster. All other Databricks runtimes will not utilize an incremental strategy.
14+
- Added consistency tests for the `jira__project_enhanced` and `jira__user_enhanced` models.
15+
116
# dbt_jira v0.18.0
217
[PR #131](https://github.com/fivetran/dbt_jira/pull/131) contains the following updates:
318
## Breaking Changes

README.md

+22-6
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ Include the following jira package version in your `packages.yml` file:
6666
```yaml
6767
packages:
6868
- package: fivetran/jira
69-
version: [">=0.18.0", "<0.19.0"]
69+
version: [">=0.19.0", "<0.20.0"]
7070
7171
```
7272
### Step 3: Define database and schema variables
@@ -82,14 +82,30 @@ vars:
8282
Your Jira connector may not sync every table that this package expects. If you do not have the `SPRINT`, `COMPONENT`, or `VERSION` tables synced, add the respective variables to your root `dbt_project.yml` file. Additionally, if you want to remove comment aggregations from your `jira__issue_enhanced` model, add the `jira_include_comments` variable to your root `dbt_project.yml`:
8383
```yml
8484
vars:
85-
jira_using_sprints: false # Disable if you do not have the sprint table or do not want sprint-related metrics reported
86-
jira_using_components: false # Disable if you do not have the component table or do not want component-related metrics reported
87-
jira_using_versions: false # Disable if you do not have the versions table or do not want versions-related metrics reported
88-
jira_using_priorities: false # disable if you are not using priorities in Jira
89-
jira_include_comments: false # This package aggregates issue comments so that you have a single view of all your comments in the jira__issue_enhanced table. This can cause limit errors if you have a large dataset. Disable to remove this functionality.
85+
jira_using_sprints: false # Enabled by default. Disable if you do not have the sprint table or do not want sprint-related metrics reported.
86+
jira_using_components: false # Enabled by default. Disable if you do not have the component table or do not want component-related metrics reported.
87+
jira_using_versions: false # Enabled by default. Disable if you do not have the versions table or do not want versions-related metrics reported.
88+
jira_using_priorities: false # Enabled by default. Disable if you are not using priorities in Jira.
89+
jira_include_comments: false # Enabled by default. Disabling will remove the aggregation of comments via the `count_comments` and `conversations` columns in the `jira__issue_enhanced` table.
9090
```
91+
9192
### (Optional) Step 5: Additional configurations
9293

94+
#### Controlling conversation aggregations in `jira__issue_enhanced`
95+
96+
The `dbt_jira` package offers variables to enable or disable conversation aggregations in the `jira__issue_enhanced` table. These settings allow you to manage the amount of data processed and avoid potential performance or limit issues with large datasets.
97+
98+
- `jira_include_conversations`: Controls only the `conversation` [column](https://github.com/fivetran/dbt_jira/blob/main/models/jira.yml#L125-L127) in the `jira__issue_enhanced` table.
99+
- Default: Disabled for Redshift due to string size constraints; enabled for other supported warehouses.
100+
- Setting this to `false` removes the `conversation` column but retains the `count_comments` field if `jira_include_comments` is still enabled. This is useful if you want a comment count without the full conversation details.
101+
102+
In your `dbt_project.yml` file:
103+
104+
```yml
105+
vars:
106+
jira_include_conversations: false/true # Disabled by default for Redshift; enabled for other supported warehouses.
107+
```
108+
93109
#### Define daily issue field history columns
94110
The `jira__daily_issue_field_history` model generates historical data for the columns specified by the `issue_field_history_columns` variable. By default, the only columns tracked are `status`, `status_id`, and `sprint`, but all fields found in the Jira `FIELD` table's `field_name` column can be included in this model. The most recent value of any tracked column is also captured in `jira__issue_enhanced`.
95111

dbt_project.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
name: 'jira'
2-
version: '0.18.0'
2+
version: '0.19.0'
33
config-version: 2
44
require-dbt-version: [">=1.3.0", "<2.0.0"]
55
vars:

docs/catalog.json

+1-1
Large diffs are not rendered by default.

docs/manifest.json

+1-1
Large diffs are not rendered by default.

integration_tests/dbt_project.yml

+3-4
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
11
name: 'jira_integration_tests'
2-
version: '0.18.0'
2+
version: '0.19.0'
33
config-version: 2
44
profile: 'integration_tests'
55

66
vars:
7+
# Comment out the below when generating docs
8+
issue_field_history_columns: ['summary', 'story points', 'components']
79
jira_source:
810
jira_schema: jira_integrations_tests_41
911
jira_comment_identifier: "comment"
@@ -28,9 +30,6 @@ vars:
2830
jira_user_identifier: "user"
2931
jira_version_identifier: "version"
3032

31-
# Comment out the below when generating docs
32-
issue_field_history_columns: ['summary', 'story points', 'components']
33-
3433
models:
3534
jira:
3635
+schema: "{{ 'jira_integrations_tests_sqlw' if target.name == 'databricks-sql' else 'jira' }}"

integration_tests/seeds/comment.csv

+1
Large diffs are not rendered by default.

integration_tests/tests/consistency/consistency_issue_enhanced.sql

+4-2
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,15 @@
44
enabled=var('fivetran_validation_tests_enabled', false)
55
) }}
66

7+
{# Exclude columns that depend on calculations involving the current time in seconds or aggregate strings in a random order, as they will differ between runs. #}
8+
{% set exclude_columns = ['open_duration_seconds', 'any_assignment_duration_seconds', 'last_assignment_duration_seconds'] %}
79
with prod as (
8-
select *
10+
select {{ dbt_utils.star(from=ref('jira__issue_enhanced'), except=exclude_columns) }}
911
from {{ target.schema }}_jira_prod.jira__issue_enhanced
1012
),
1113

1214
dev as (
13-
select *
15+
select {{ dbt_utils.star(from=ref('jira__issue_enhanced'), except=exclude_columns) }}
1416
from {{ target.schema }}_jira_dev.jira__issue_enhanced
1517
),
1618

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
2+
{{ config(
3+
tags="fivetran_validations",
4+
enabled=var('fivetran_validation_tests_enabled', false)
5+
) }}
6+
7+
{# Exclude columns that depend on calculations involving the current time in seconds or aggregate strings in a random order, as they will differ between runs. #}
8+
{% set exclude_columns = ['avg_age_currently_open_seconds', 'avg_age_currently_open_assigned_seconds', 'median_age_currently_open_seconds', 'median_age_currently_open_assigned_seconds', 'epics', 'components'] %}
9+
10+
with prod as (
11+
select {{ dbt_utils.star(from=ref('jira__project_enhanced'), except=exclude_columns) }}
12+
from {{ target.schema }}_jira_prod.jira__project_enhanced
13+
),
14+
15+
dev as (
16+
select {{ dbt_utils.star(from=ref('jira__project_enhanced'), except=exclude_columns) }}
17+
from {{ target.schema }}_jira_dev.jira__project_enhanced
18+
),
19+
20+
prod_not_in_dev as (
21+
-- rows from prod not found in dev
22+
select * from prod
23+
except distinct
24+
select * from dev
25+
),
26+
27+
dev_not_in_prod as (
28+
-- rows from dev not found in prod
29+
select * from dev
30+
except distinct
31+
select * from prod
32+
),
33+
34+
final as (
35+
select
36+
*,
37+
'from prod' as source
38+
from prod_not_in_dev
39+
40+
union all -- union since we only care if rows are produced
41+
42+
select
43+
*,
44+
'from dev' as source
45+
from dev_not_in_prod
46+
)
47+
48+
select *
49+
from final
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
2+
{{ config(
3+
tags="fivetran_validations",
4+
enabled=var('fivetran_validation_tests_enabled', false)
5+
) }}
6+
7+
{# Exclude columns that depend on calculations involving the current time in seconds or aggregate strings in a random order, as they will differ between runs. #}
8+
{% set exclude_columns = ['avg_age_currently_open_seconds', 'median_age_currently_open_seconds', 'projects'] %}
9+
10+
with prod as (
11+
select {{ dbt_utils.star(from=ref('jira__user_enhanced'), except=exclude_columns) }}
12+
from {{ target.schema }}_jira_prod.jira__user_enhanced
13+
),
14+
15+
dev as (
16+
select {{ dbt_utils.star(from=ref('jira__user_enhanced'), except=exclude_columns) }}
17+
from {{ target.schema }}_jira_dev.jira__user_enhanced
18+
),
19+
20+
prod_not_in_dev as (
21+
-- rows from prod not found in dev
22+
select * from prod
23+
except distinct
24+
select * from dev
25+
),
26+
27+
dev_not_in_prod as (
28+
-- rows from dev not found in prod
29+
select * from dev
30+
except distinct
31+
select * from prod
32+
),
33+
34+
final as (
35+
select
36+
*,
37+
'from prod' as source
38+
from prod_not_in_dev
39+
40+
union all -- union since we only care if rows are produced
41+
42+
select
43+
*,
44+
'from dev' as source
45+
from dev_not_in_prod
46+
)
47+
48+
select *
49+
from final

macros/jira_is_databricks_sql_warehouse.sql macros/jira_is_incremental_compatible.sql

+4-2
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,16 @@
1-
{% macro jira_is_databricks_sql_warehouse() %}
1+
{% macro jira_is_incremental_compatible() %}
22
{% if target.type in ('databricks') %}
33
{% set re = modules.re %}
44
{% set path_match = target.http_path %}
5-
{% set regex_pattern = "sql/.+/warehouses/" %}
5+
{% set regex_pattern = "sql/protocol" %}
66
{% set match_result = re.search(regex_pattern, path_match) %}
77
{% if match_result %}
88
{{ return(True) }}
99
{% else %}
1010
{{ return(False) }}
1111
{% endif %}
12+
{% elif target.type in ('bigquery','snowflake','postgres','redshift','sqlserver') %}
13+
{{ return(True) }}
1214
{% else %}
1315
{{ return(False) }}
1416
{% endif %}

models/intermediate/field_history/int_jira__pivot_daily_field_history.sql

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{{
22
config(
3-
materialized='table' if jira.jira_is_databricks_sql_warehouse() else 'incremental',
3+
materialized='incremental' if jira_is_incremental_compatible() else 'table',
44
partition_by = {'field': 'valid_starting_at_week', 'data_type': 'date'}
55
if target.type not in ['spark','databricks'] else ['valid_starting_at_week'],
66
cluster_by = ['valid_starting_at_week'],
@@ -176,4 +176,4 @@ final as (
176176
)
177177

178178
select *
179-
from final
179+
from final

models/intermediate/int_jira__issue_comments.sql

+8-4
Original file line numberDiff line numberDiff line change
@@ -20,14 +20,18 @@ agg_comments as (
2020

2121
select
2222
comment.issue_id,
23-
{{ fivetran_utils.string_agg( "comment.created_at || ' - ' || jira_user.user_display_name || ': ' || comment.body", "'\\n'" ) }} as conversation,
2423
count(comment.comment_id) as count_comments
2524

26-
from
27-
comment
25+
{%- if var('jira_include_conversations', False if target.type == 'redshift' else True) %}
26+
,{{ fivetran_utils.string_agg(
27+
"comment.created_at || ' - ' || jira_user.user_display_name || ': ' || comment.body",
28+
"'\\n'" ) }} as conversation
29+
{% endif %}
30+
31+
from comment
2832
join jira_user on comment.author_user_id = jira_user.user_id
2933

3034
group by 1
3135
)
3236

33-
select * from agg_comments
37+
select * from agg_comments

models/intermediate/int_jira__issue_join.sql

+1-1
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ join_issue as (
110110
{% endif %}
111111

112112
{% if var('jira_include_comments', True) %}
113-
,issue_comments.conversation
113+
{{ ',issue_comments.conversation' if var('jira_include_conversations', False if target.type == 'redshift' else True) }}
114114
,coalesce(issue_comments.count_comments, 0) as count_comments
115115
{% endif %}
116116

models/jira.yml

+1
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,7 @@ models:
125125
- name: conversation
126126
description: >
127127
Line-separated list of comments made on this issue, including the timestamp and author name of each comment.
128+
(Disabled by default for Redshift.)
128129
- name: count_comments
129130
description: The number of comments made on this issues.
130131
- name: first_assigned_at

models/jira__daily_issue_field_history.sql

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{{
22
config(
3-
materialized='table' if jira.jira_is_databricks_sql_warehouse() else 'incremental',
3+
materialized='incremental' if jira_is_incremental_compatible() else 'table',
44
partition_by = {'field': 'date_week', 'data_type': 'date'}
55
if target.type not in ['spark', 'databricks'] else ['date_week'],
66
cluster_by = ['date_week'],

0 commit comments

Comments
 (0)