Skip to content

Comments

Add macro to generate row counts for all tables in a schema#923

Closed
BindusekharGorintla wants to merge 3 commits intoelementary-data:masterfrom
BindusekharGorintla:patch-1
Closed

Add macro to generate row counts for all tables in a schema#923
BindusekharGorintla wants to merge 3 commits intoelementary-data:masterfrom
BindusekharGorintla:patch-1

Conversation

@BindusekharGorintla
Copy link

@BindusekharGorintla BindusekharGorintla commented Feb 3, 2026

This macro generates a row count summary for all tables in a given schema. It dynamically queries the information_schema.tables to list all tables, then builds a UNION ALL query that returns the row counts for each table in that schema

Benefits
Automates row count checks across all tables in a schema. Useful for data quality monitoring and schema validation. Provides a quick snapshot of table sizes during dbt runs. Logs the number of tables processed for transparency.

Summary by CodeRabbit

  • New Features
    • Added schema-wide row-count collection for metadata tracking and monitoring; automatically enumerates all tables in a schema and produces per-table row counts.
    • Each record includes query timestamp, catalog, schema and table identifiers to aid auditing, trend analysis, and alerting for data volume changes.

This macro generates a row count summary for all tables in a given schema. It dynamically queries the information_schema.tables to list all tables, then builds a UNION ALL query that returns the row counts for each table in that schema

Benefits
Automates row count checks across all tables in a schema.
Useful for data quality monitoring and schema validation.
Provides a quick snapshot of table sizes during dbt runs.
Logs the number of tables processed for transparency.
@github-actions
Copy link
Contributor

github-actions bot commented Feb 3, 2026

👋 @BindusekharGorintla
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

@coderabbitai
Copy link

coderabbitai bot commented Feb 3, 2026

Warning

Rate limit exceeded

@BindusekharGorintla has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 11 minutes and 26 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

Adds a new dbt Jinja macro that queries the information_schema for a given schema and emits a UNION ALL SQL query which returns run timestamp, catalog, schema, table names, and row counts for every table in the computed target schema.

Changes

Cohort / File(s) Summary
New Metadata Collection Macro
macros/edr/metadata_collection/get_all_table_counts_from_schema.sql
Adds get_all_table_counts_from_schema(schema_name, catalog_name = target.catalog | default(target.database, true)). Computes target schema from target.schema + _ + schema_name, queries information_schema.tables for tables, logs total table count, and generates a UNION ALL of per-table SELECTs returning run_started_at, table_catalog, table_schema, table_name, and COUNT(*) for each table.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐇 I hop through schemas, one by one,
I ask the catalog what tables live there,
I stitch their counts into a single run,
A tiny carrot of data I share,
Row by row — tally, union, and flair. 🍃

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately and concisely summarizes the main change: adding a macro to generate row counts for tables in a schema, which matches the file addition and objectives.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@macros/edr/metadata_collection/get_all_table_counts_from_schema.sql`:
- Around line 13-26: The macro currently emits nothing when
results['table_name'] is empty which yields invalid SQL; update the template to
check if results['table_name'] is defined and has length before the for-loop
and, if empty, emit a safe placeholder SELECT (for example: select
'{{run_started_at}}' as date_time, null as catalog_name, null as schema_name,
'NO_TABLES_FOUND' as table_name, 0 as count) so callers always receive valid
SQL; keep the existing for-loop that iterates over results['table_name'] and its
use of results['table_catalog'][loop.index-1],
results['table_schema'][loop.index-1], and results['table_name'][loop.index-1]
when the list is non-empty.
- Line 1: The macro get_all_table_counts_from_schema currently defaults
catalog_name to target.catalog which doesn't exist on some adapters (e.g.,
Postgres/Redshift) and will raise at runtime; change the parameter default to
use adapter-agnostic target.database or make catalog_name optional and fallback
to target.database when target.catalog is undefined inside the macro
(referencing get_all_table_counts_from_schema, the schema_name and catalog_name
parameters) so the macro works across adapters or explicitly guard/branch on
adapter type before using target.catalog.
- Around line 5-9: The SQL directly interpolates schema_to_use into the
catalog_sql string, which risks SQL injection and syntax errors; update the
catalog_sql construction to quote/escape the schema identifier instead of raw
interpolation (use the adapter's quoting/identifier functions or dbt-utils
quoting helper and apply lower() if needed) so that schema_to_use is safely
quoted when used with catalog_name and information_schema.tables; adjust the
template where catalog_sql is defined to call the quoting helper for
schema_to_use and ensure the rest of the query (the FROM {{ catalog_name
}}.information_schema.tables and where clause) uses the quoted value.
- Around line 19-20: Replace the direct interpolations of catalog/schema/table
with adapter-safe quoted identifiers and switch to loop.index0; specifically,
update the SELECT and FROM to use expressions like {{
adapter.quote(results['table_catalog'][loop.index0]) }}, {{
adapter.quote(results['table_schema'][loop.index0]) }}, and {{
adapter.quote(results['table_name'][loop.index0]) }} (keep the visible column
aliases like date_time and relation unchanged) so identifiers are properly
quoted and protected from spaces/reserved words.
🧹 Nitpick comments (2)
macros/edr/metadata_collection/get_all_table_counts_from_schema.sql (2)

19-19: run_started_at timestamp format may need casting.

The run_started_at variable is a Python datetime object. When interpolated as a string, the format may not be compatible with all database timestamp types. Consider explicit casting or formatting.

♻️ Proposed improvement for timestamp handling
-    select '{{run_started_at}}' as date_time, ...
+    select cast('{{run_started_at.strftime("%Y-%m-%d %H:%M:%S")}}' as timestamp) as date_time, ...

Or use the adapter's timestamp literal format for better cross-database compatibility.


3-3: Hardcoded schema naming convention may not fit all use cases.

The schema construction target.schema ~ '_' ~ schema_name assumes a specific naming convention. Consider making the full schema name passable directly, or document this convention clearly.

💡 Suggested enhancement for flexibility
-{%- macro get_all_table_counts_from_schema(schema_name, catalog_name = target.catalog) -%}
-
-{%- set schema_to_use = target.schema ~ '_' ~ schema_name -%}
+{%- macro get_all_table_counts_from_schema(schema_name, catalog_name = target.catalog | default(target.database, true), use_target_prefix = true) -%}
+
+{%- set schema_to_use = (target.schema ~ '_' ~ schema_name) if use_target_prefix else schema_name -%}

Added database

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Table and schema names

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@BindusekharGorintla BindusekharGorintla closed this by deleting the head repository Feb 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant