Add macro to generate row counts for all tables in a schema#923
Add macro to generate row counts for all tables in a schema#923BindusekharGorintla wants to merge 3 commits intoelementary-data:masterfrom
Conversation
This macro generates a row count summary for all tables in a given schema. It dynamically queries the information_schema.tables to list all tables, then builds a UNION ALL query that returns the row counts for each table in that schema Benefits Automates row count checks across all tables in a schema. Useful for data quality monitoring and schema validation. Provides a quick snapshot of table sizes during dbt runs. Logs the number of tables processed for transparency.
|
👋 @BindusekharGorintla |
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 WalkthroughWalkthroughAdds a new dbt Jinja macro that queries the information_schema for a given schema and emits a UNION ALL SQL query which returns run timestamp, catalog, schema, table names, and row counts for every table in the computed target schema. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@macros/edr/metadata_collection/get_all_table_counts_from_schema.sql`:
- Around line 13-26: The macro currently emits nothing when
results['table_name'] is empty which yields invalid SQL; update the template to
check if results['table_name'] is defined and has length before the for-loop
and, if empty, emit a safe placeholder SELECT (for example: select
'{{run_started_at}}' as date_time, null as catalog_name, null as schema_name,
'NO_TABLES_FOUND' as table_name, 0 as count) so callers always receive valid
SQL; keep the existing for-loop that iterates over results['table_name'] and its
use of results['table_catalog'][loop.index-1],
results['table_schema'][loop.index-1], and results['table_name'][loop.index-1]
when the list is non-empty.
- Line 1: The macro get_all_table_counts_from_schema currently defaults
catalog_name to target.catalog which doesn't exist on some adapters (e.g.,
Postgres/Redshift) and will raise at runtime; change the parameter default to
use adapter-agnostic target.database or make catalog_name optional and fallback
to target.database when target.catalog is undefined inside the macro
(referencing get_all_table_counts_from_schema, the schema_name and catalog_name
parameters) so the macro works across adapters or explicitly guard/branch on
adapter type before using target.catalog.
- Around line 5-9: The SQL directly interpolates schema_to_use into the
catalog_sql string, which risks SQL injection and syntax errors; update the
catalog_sql construction to quote/escape the schema identifier instead of raw
interpolation (use the adapter's quoting/identifier functions or dbt-utils
quoting helper and apply lower() if needed) so that schema_to_use is safely
quoted when used with catalog_name and information_schema.tables; adjust the
template where catalog_sql is defined to call the quoting helper for
schema_to_use and ensure the rest of the query (the FROM {{ catalog_name
}}.information_schema.tables and where clause) uses the quoted value.
- Around line 19-20: Replace the direct interpolations of catalog/schema/table
with adapter-safe quoted identifiers and switch to loop.index0; specifically,
update the SELECT and FROM to use expressions like {{
adapter.quote(results['table_catalog'][loop.index0]) }}, {{
adapter.quote(results['table_schema'][loop.index0]) }}, and {{
adapter.quote(results['table_name'][loop.index0]) }} (keep the visible column
aliases like date_time and relation unchanged) so identifiers are properly
quoted and protected from spaces/reserved words.
🧹 Nitpick comments (2)
macros/edr/metadata_collection/get_all_table_counts_from_schema.sql (2)
19-19:run_started_attimestamp format may need casting.The
run_started_atvariable is a Python datetime object. When interpolated as a string, the format may not be compatible with all database timestamp types. Consider explicit casting or formatting.♻️ Proposed improvement for timestamp handling
- select '{{run_started_at}}' as date_time, ... + select cast('{{run_started_at.strftime("%Y-%m-%d %H:%M:%S")}}' as timestamp) as date_time, ...Or use the adapter's timestamp literal format for better cross-database compatibility.
3-3: Hardcoded schema naming convention may not fit all use cases.The schema construction
target.schema ~ '_' ~ schema_nameassumes a specific naming convention. Consider making the full schema name passable directly, or document this convention clearly.💡 Suggested enhancement for flexibility
-{%- macro get_all_table_counts_from_schema(schema_name, catalog_name = target.catalog) -%} - -{%- set schema_to_use = target.schema ~ '_' ~ schema_name -%} +{%- macro get_all_table_counts_from_schema(schema_name, catalog_name = target.catalog | default(target.database, true), use_target_prefix = true) -%} + +{%- set schema_to_use = (target.schema ~ '_' ~ schema_name) if use_target_prefix else schema_name -%}
macros/edr/metadata_collection/get_all_table_counts_from_schema.sql
Outdated
Show resolved
Hide resolved
macros/edr/metadata_collection/get_all_table_counts_from_schema.sql
Outdated
Show resolved
Hide resolved
Added database Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Table and schema names Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
This macro generates a row count summary for all tables in a given schema. It dynamically queries the information_schema.tables to list all tables, then builds a UNION ALL query that returns the row counts for each table in that schema
Benefits
Automates row count checks across all tables in a schema. Useful for data quality monitoring and schema validation. Provides a quick snapshot of table sizes during dbt runs. Logs the number of tables processed for transparency.
Summary by CodeRabbit