Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug/string agg limit #133

Merged
merged 15 commits into from
Nov 12, 2024
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,18 @@
# dbt_jira v0.19.0
[PR #133](https://github.com/fivetran/dbt_jira/pull/133) contains the following updates:

## Breaking Changes
- This change is marked as breaking due to its impact on Redshift configurations.
- For Redshift users, comment data aggregated under the `conversations` field in the `jira__issue_enhanced` table is now disabled by default to prevent consistent errors related to Redshift's varchar length limits.
- If you wish to re-enable `conversations` on Redshift, set the `jira_include_conversations` variable to `true` in your `dbt_project.yml`.

## Under the Hood
- Updated the `comment` seed data to ensure conversations are correctly disabled for Redshift by default.
- Updated the `jira_is_databricks_sql_warehouse` macro to `jira_is_incremental_compatible`. This macro now returns `true` if the Databricks runtime is an all-purpose cluster (previously it checked only for a SQL warehouse runtime) or if the target is any other non-Databricks-supported destination.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify that it's renamed

Suggested change
- Updated the `jira_is_databricks_sql_warehouse` macro to `jira_is_incremental_compatible`. This macro now returns `true` if the Databricks runtime is an all-purpose cluster (previously it checked only for a SQL warehouse runtime) or if the target is any other non-Databricks-supported destination.
- Renamed the `jira_is_databricks_sql_warehouse` macro to `jira_is_incremental_compatible`. Updated this macro to now return `true` if the Databricks runtime is an all-purpose cluster (previously it checked only for a SQL warehouse runtime) or if the target is any other non-Databricks-supported destination.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated!

- This update addresses Databricks runtimes (e.g., endpoints and external runtimes) that do not support the `insert_overwrite` incremental strategy used in the `jira__daily_issue_field_history` and `int_jira__pivot_daily_field_history` models.
- For Databricks users, the `jira__daily_issue_field_history` and `int_jira__pivot_daily_field_history` models will now apply the incremental strategy only if running on an all-purpose cluster. All other Databricks runtimes will not utilize an incremental strategy.
- Added consistency tests for the `jira__project_enhanced` and `jira__user_enhanced` models.

# dbt_jira v0.18.0
[PR #131](https://github.com/fivetran/dbt_jira/pull/131) contains the following updates:
## Breaking Changes
Expand Down
14 changes: 8 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ Include the following jira package version in your `packages.yml` file:
```yaml
packages:
- package: fivetran/jira
version: [">=0.18.0", "<0.19.0"]
version: [">=0.19.0", "<0.20.0"]

```
### Step 3: Define database and schema variables
Expand All @@ -82,11 +82,13 @@ vars:
Your Jira connector may not sync every table that this package expects. If you do not have the `SPRINT`, `COMPONENT`, or `VERSION` tables synced, add the respective variables to your root `dbt_project.yml` file. Additionally, if you want to remove comment aggregations from your `jira__issue_enhanced` model, add the `jira_include_comments` variable to your root `dbt_project.yml`:
```yml
vars:
jira_using_sprints: false # Disable if you do not have the sprint table or do not want sprint-related metrics reported
jira_using_components: false # Disable if you do not have the component table or do not want component-related metrics reported
jira_using_versions: false # Disable if you do not have the versions table or do not want versions-related metrics reported
jira_using_priorities: false # disable if you are not using priorities in Jira
jira_include_comments: false # This package aggregates issue comments so that you have a single view of all your comments in the jira__issue_enhanced table. This can cause limit errors if you have a large dataset. Disable to remove this functionality.
jira_using_sprints: false # Enabled by default. Disable if you do not have the sprint table or do not want sprint-related metrics reported.
jira_using_components: false # Enabled by default. Disable if you do not have the component table or do not want component-related metrics reported.
jira_using_versions: false # Enabled by default. Disable if you do not have the versions table or do not want versions-related metrics reported.
jira_using_priorities: false # Enabled by default. Disable if you are not using priorities in Jira.
jira_include_comments: false # Enabled by default. Disabling will remove the aggregation of comments via the `count_comments` and `conversations` columns in the `jira__issue_enhanced` table.
jira_include_conversations: false # Disabled by default for Redshift, enabled by default for other supported warehouses. Controls only the `conversation` column in the `jira__issue_enhanced` table. Disabling removes `conversation` columns but keeps the `count_comments` field if `jira_include_comments` is still enabled, which can help avoid limit errors with large datasets.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda wonder if this should be in its own section (or subsection of this section) , since there is no conversation source table. It's a little different from the other variables here, so I fear this format might be a lil confusing.

Also, if we do this, we can move the info about the default value out from the yml comment, which you currently need to scroll quite a bit to the right to see the end of https://github.com/fivetran/dbt_jira/tree/bug/string-agg-limit?tab=readme-ov-file#step-4-disable-models-for-non-existent-sources

@fivetran-catfritz

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I moved it!


```
### (Optional) Step 5: Additional configurations

Expand Down
2 changes: 1 addition & 1 deletion dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: 'jira'
version: '0.18.0'
version: '0.19.0'
config-version: 2
require-dbt-version: [">=1.3.0", "<2.0.0"]
vars:
Expand Down
2 changes: 1 addition & 1 deletion docs/catalog.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/manifest.json

Large diffs are not rendered by default.

7 changes: 3 additions & 4 deletions integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
name: 'jira_integration_tests'
version: '0.18.0'
version: '0.19.0'
config-version: 2
profile: 'integration_tests'

vars:
# Comment out the below when generating docs
issue_field_history_columns: ['summary', 'story points', 'components']
jira_source:
jira_schema: jira_integrations_tests_41
jira_comment_identifier: "comment"
Expand All @@ -28,9 +30,6 @@ vars:
jira_user_identifier: "user"
jira_version_identifier: "version"

# Comment out the below when generating docs
issue_field_history_columns: ['summary', 'story points', 'components']

models:
jira:
+schema: "{{ 'jira_integrations_tests_sqlw' if target.name == 'databricks-sql' else 'jira' }}"
Expand Down
1 change: 1 addition & 0 deletions integration_tests/seeds/comment.csv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small note, but I noticed the end model results in no conversation or count_comments because of this join. It seems that since we don't have any author_user_id or user_id combinations within the seed data the inner join always results in no records.

Would you be able to update the seed data to have at least one match so we can verify results in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had updated the seed data to match one, and I'm showing the counts are showing up. Were you using the seed data from my branch?

Screenshot 2024-11-11 at 5 33 42 PM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting 🤔 thanks for showing the results! It must have been a mismatch on my end with using old seed data in place of your updates. This is all good to me!

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,15 @@
enabled=var('fivetran_validation_tests_enabled', false)
) }}

{% set exclude_columns = ['open_duration_seconds', 'any_assignment_duration_seconds', 'last_assignment_duration_seconds'] %}

with prod as (
select *
select {{ dbt_utils.star(from=ref('jira__issue_enhanced'), except=exclude_columns) }}
from {{ target.schema }}_jira_prod.jira__issue_enhanced
),

dev as (
select *
select {{ dbt_utils.star(from=ref('jira__issue_enhanced'), except=exclude_columns) }}
from {{ target.schema }}_jira_dev.jira__issue_enhanced
),

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@

{{ config(
tags="fivetran_validations",
enabled=var('fivetran_validation_tests_enabled', false)
) }}

{% set exclude_columns = ['avg_age_currently_open_seconds', 'avg_age_currently_open_assigned_seconds', 'median_age_currently_open_seconds', 'median_age_currently_open_assigned_seconds', 'epics', 'components'] %}
fivetran-jamie marked this conversation as resolved.
Show resolved Hide resolved

with prod as (
select {{ dbt_utils.star(from=ref('jira__project_enhanced'), except=exclude_columns) }}
from {{ target.schema }}_jira_prod.jira__project_enhanced
),

dev as (
select {{ dbt_utils.star(from=ref('jira__project_enhanced'), except=exclude_columns) }}
from {{ target.schema }}_jira_dev.jira__project_enhanced
),

prod_not_in_dev as (
-- rows from prod not found in dev
select * from prod
except distinct
select * from dev
),

dev_not_in_prod as (
-- rows from dev not found in prod
select * from dev
except distinct
select * from prod
),

final as (
select
*,
'from prod' as source
from prod_not_in_dev

union all -- union since we only care if rows are produced

select
*,
'from dev' as source
from dev_not_in_prod
)

select *
from final
48 changes: 48 additions & 0 deletions integration_tests/tests/consistency/consistency_user_enhanced.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@

{{ config(
tags="fivetran_validations",
enabled=var('fivetran_validation_tests_enabled', false)
) }}

{% set exclude_columns = ['avg_age_currently_open_seconds', 'median_age_currently_open_seconds', 'projects'] %}
fivetran-jamie marked this conversation as resolved.
Show resolved Hide resolved

with prod as (
select {{ dbt_utils.star(from=ref('jira__user_enhanced'), except=exclude_columns) }}
from {{ target.schema }}_jira_prod.jira__user_enhanced
),

dev as (
select {{ dbt_utils.star(from=ref('jira__user_enhanced'), except=exclude_columns) }}
from {{ target.schema }}_jira_dev.jira__user_enhanced
),

prod_not_in_dev as (
-- rows from prod not found in dev
select * from prod
except distinct
select * from dev
),

dev_not_in_prod as (
-- rows from dev not found in prod
select * from dev
except distinct
select * from prod
),

final as (
select
*,
'from prod' as source
from prod_not_in_dev

union all -- union since we only care if rows are produced

select
*,
'from dev' as source
from dev_not_in_prod
)

select *
from final
Original file line number Diff line number Diff line change
@@ -1,14 +1,16 @@
{% macro jira_is_databricks_sql_warehouse() %}
{% macro jira_is_incremental_compatible() %}
{% if target.type in ('databricks') %}
{% set re = modules.re %}
{% set path_match = target.http_path %}
{% set regex_pattern = "sql/.+/warehouses/" %}
{% set regex_pattern = "sql/protocol" %}
{% set match_result = re.search(regex_pattern, path_match) %}
{% if match_result %}
{{ return(True) }}
{% else %}
{{ return(False) }}
{% endif %}
{% elif target.type in ('bigquery','snowflake','postgres','redshift','sqlserver') %}
{{ return(True) }}
{% else %}
{{ return(False) }}
{% endif %}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{{
config(
materialized='table' if jira.jira_is_databricks_sql_warehouse() else 'incremental',
materialized='incremental' if jira_is_incremental_compatible() else 'table',
partition_by = {'field': 'valid_starting_at_week', 'data_type': 'date'}
if target.type not in ['spark','databricks'] else ['valid_starting_at_week'],
cluster_by = ['valid_starting_at_week'],
Expand Down Expand Up @@ -176,4 +176,4 @@ final as (
)

select *
from final
from final
12 changes: 8 additions & 4 deletions models/intermediate/int_jira__issue_comments.sql
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,18 @@ agg_comments as (

select
comment.issue_id,
{{ fivetran_utils.string_agg( "comment.created_at || ' - ' || jira_user.user_display_name || ': ' || comment.body", "'\\n'" ) }} as conversation,
count(comment.comment_id) as count_comments

from
comment
{%- if var('jira_include_conversations', False if target.type == 'redshift' else True) %}
,{{ fivetran_utils.string_agg(
"comment.created_at || ' - ' || jira_user.user_display_name || ': ' || comment.body",
"'\\n'" ) }} as conversation
{% endif %}

from comment
join jira_user on comment.author_user_id = jira_user.user_id

group by 1
)

select * from agg_comments
select * from agg_comments
2 changes: 1 addition & 1 deletion models/intermediate/int_jira__issue_join.sql
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ join_issue as (
{% endif %}

{% if var('jira_include_comments', True) %}
,issue_comments.conversation
{{ ',issue_comments.conversation' if var('jira_include_conversations', False if target.type == 'redshift' else True) }}
,coalesce(issue_comments.count_comments, 0) as count_comments
{% endif %}

Expand Down
1 change: 1 addition & 0 deletions models/jira.yml
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@ models:
- name: conversation
description: >
Line-separated list of comments made on this issue, including the timestamp and author name of each comment.
(Disabled by default for Redshift.)
- name: count_comments
description: The number of comments made on this issues.
- name: first_assigned_at
Expand Down
2 changes: 1 addition & 1 deletion models/jira__daily_issue_field_history.sql
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{{
config(
materialized='table' if jira.jira_is_databricks_sql_warehouse() else 'incremental',
materialized='incremental' if jira_is_incremental_compatible() else 'table',
partition_by = {'field': 'date_week', 'data_type': 'date'}
if target.type not in ['spark', 'databricks'] else ['date_week'],
cluster_by = ['date_week'],
Expand Down