-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug/string agg limit #133
Bug/string agg limit #133
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fivetran-catfritz thanks for working through this PR! Overall this PR looks really good, I do have an open ended question which I would like your thoughts around before moving forward.
Let me know if you would like to discuss in more detail.
@@ -1,4 +1,4 @@ | |||
{{ config(enabled=var('jira_include_comments', True)) }} | |||
{{ config(enabled=var('jira_include_comments', False if target.type == 'redshift' else True)) }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking a closer look at this now, since you were able to identify that the string_agg was likely the only issue I think we may be going overboard with turning off the entirety of this model by default for Redshift users. Especially when there is still some valuable information here (with the count_comments
field).
Additionally, if we disable these models by default then we don't end up using the stg_jira__comment
staging model at all in the downstream transformations.
Can we instead adjust this model in particular to not disable it entirely, but just provide the conditional on the string_agg to include or not include based on the jira_using_comments
variable. This way we only ignore this problematic field for Redshift while still keeping the other aggregate comment field which could be useful in downstream models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would then need to make slight adjustments in the int_jira__issue_join
model to account for this dynamic functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fivetran-joemarkiewicz Thanks for the explanation! I updated accordingly, and it's ready for re-review!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making this updates @fivetran-catfritz! I have two small comments, but not necessary for blocking approval. Once those are updated this will be good for release review!
README.md
Outdated
jira_using_components: false # Enabled by default. Disable if you do not have the component table or do not want component-related metrics reported. | ||
jira_using_versions: false # Enabled by default. Disable if you do not have the versions table or do not want versions-related metrics reported. | ||
jira_using_priorities: false # Enabled by default. Disable if you are not using priorities in Jira. | ||
jira_include_comments: false # Disabled by default for Redshift, enabled by default for other supported warehouses. This package aggregates issue comments so that you have a single view of all your comments in the jira__issue_enhanced table. This can cause limit errors if you have a large dataset. Disable to remove this functionality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also document the new variable for conversations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh yes updated!
integration_tests/seeds/comment.csv
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small note, but I noticed the end model results in no conversation or count_comments because of this join. It seems that since we don't have any author_user_id
or user_id
combinations within the seed data the inner join always results in no records.
Would you be able to update the seed data to have at least one match so we can verify results in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very interesting 🤔 thanks for showing the results! It must have been a mismatch on my end with using old seed data in place of your updates. This is all good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fivetran-joemarkiewicz thank you! I made the updates, and I think the seed data is ready to go.
README.md
Outdated
jira_using_components: false # Enabled by default. Disable if you do not have the component table or do not want component-related metrics reported. | ||
jira_using_versions: false # Enabled by default. Disable if you do not have the versions table or do not want versions-related metrics reported. | ||
jira_using_priorities: false # Enabled by default. Disable if you are not using priorities in Jira. | ||
jira_include_comments: false # Disabled by default for Redshift, enabled by default for other supported warehouses. This package aggregates issue comments so that you have a single view of all your comments in the jira__issue_enhanced table. This can cause limit errors if you have a large dataset. Disable to remove this functionality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh yes updated!
integration_tests/seeds/comment.csv
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, left some questions and doc suggestions
README.md
Outdated
jira_using_versions: false # Enabled by default. Disable if you do not have the versions table or do not want versions-related metrics reported. | ||
jira_using_priorities: false # Enabled by default. Disable if you are not using priorities in Jira. | ||
jira_include_comments: false # Enabled by default. Disabling will remove the aggregation of comments via the `count_comments` and `conversations` columns in the `jira__issue_enhanced` table. | ||
jira_include_conversations: false # Disabled by default for Redshift, enabled by default for other supported warehouses. Controls only the `conversation` column in the `jira__issue_enhanced` table. Disabling removes `conversation` columns but keeps the `count_comments` field if `jira_include_comments` is still enabled, which can help avoid limit errors with large datasets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kinda wonder if this should be in its own section (or subsection of this section) , since there is no conversation
source table. It's a little different from the other variables here, so I fear this format might be a lil confusing.
Also, if we do this, we can move the info about the default value out from the yml comment, which you currently need to scroll quite a bit to the right to see the end of https://github.com/fivetran/dbt_jira/tree/bug/string-agg-limit?tab=readme-ov-file#step-4-disable-models-for-non-existent-sources
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I moved it!
CHANGELOG.md
Outdated
|
||
## Under the Hood | ||
- Updated the `comment` seed data to ensure conversations are correctly disabled for Redshift by default. | ||
- Updated the `jira_is_databricks_sql_warehouse` macro to `jira_is_incremental_compatible`. This macro now returns `true` if the Databricks runtime is an all-purpose cluster (previously it checked only for a SQL warehouse runtime) or if the target is any other non-Databricks-supported destination. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify that it's renamed
- Updated the `jira_is_databricks_sql_warehouse` macro to `jira_is_incremental_compatible`. This macro now returns `true` if the Databricks runtime is an all-purpose cluster (previously it checked only for a SQL warehouse runtime) or if the target is any other non-Databricks-supported destination. | |
- Renamed the `jira_is_databricks_sql_warehouse` macro to `jira_is_incremental_compatible`. Updated this macro to now return `true` if the Databricks runtime is an all-purpose cluster (previously it checked only for a SQL warehouse runtime) or if the target is any other non-Databricks-supported destination. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fivetran-jamie Thanks for reviewing! I have made the updates.
README.md
Outdated
jira_using_versions: false # Enabled by default. Disable if you do not have the versions table or do not want versions-related metrics reported. | ||
jira_using_priorities: false # Enabled by default. Disable if you are not using priorities in Jira. | ||
jira_include_comments: false # Enabled by default. Disabling will remove the aggregation of comments via the `count_comments` and `conversations` columns in the `jira__issue_enhanced` table. | ||
jira_include_conversations: false # Disabled by default for Redshift, enabled by default for other supported warehouses. Controls only the `conversation` column in the `jira__issue_enhanced` table. Disabling removes `conversation` columns but keeps the `count_comments` field if `jira_include_comments` is still enabled, which can help avoid limit errors with large datasets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I moved it!
CHANGELOG.md
Outdated
|
||
## Under the Hood | ||
- Updated the `comment` seed data to ensure conversations are correctly disabled for Redshift by default. | ||
- Updated the `jira_is_databricks_sql_warehouse` macro to `jira_is_incremental_compatible`. This macro now returns `true` if the Databricks runtime is an all-purpose cluster (previously it checked only for a SQL warehouse runtime) or if the target is any other non-Databricks-supported destination. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great! just a few suggestions in the new readme section
Co-authored-by: Jamie Rodriguez <[email protected]>
PR Overview
This PR will address the following Issue/Feature:
This PR will result in the following new package version:
Please provide the finalized CHANGELOG entry which details the relevant changes included in this PR:
PR Checklist
Basic Validation
Please acknowledge that you have successfully performed the following commands locally:
Before marking this PR as "ready for review" the following have been applied:
Detailed Validation
Please share any and all of your validation steps:
Updated the
comment
seed data to have a 65534 character longbody
field (varchar max is 65535) (row 0 'AAA...' below).jira_include_comments
not defaults to false for redshift and does not fail.confirmed still enabled in another warehouse, like BQ for exampple
consistency tests pass
If you had to summarize this PR in an emoji, which would it be?
💃