Scrape mz_introspection data as CSVs per cluster replica #31951

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

SangJunBak merged 8 commits into jun/#8908/dump-catalog-2 from jun/#8908/dump-catalog-3

Mar 25, 2025

Contributor

SangJunBak commented Mar 19, 2025 •

edited

Loading

Based on PR #31859

Chain of upstream PRs as of 2025-03-20

PR Add system catalog dumper for self-managed debug tool #31859:
main ← jun/#8908/dump-catalog-2
- PR Scrape mz_introspection data as CSVs per cluster replica #31951 (THIS ONE):
  jun/#8908/dump-catalog-2 ← jun/#8908/dump-catalog-3

See commit messages for details

This PR adds a known-desirable feature.

https://github.com/MaterializeInc/database-issues/issues/8908

Motivation

Tips for reviewer

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

SangJunBak force-pushed the jun/#8908/dump-catalog-3 branch from e82b8ed to 6313548 Compare

March 20, 2025 03:55

SangJunBak requested a review from ParkMyCar

March 20, 2025 04:43

SangJunBak marked this pull request as ready for review

March 20, 2025 04:44

SangJunBak force-pushed the jun/#8908/dump-catalog-2 branch from d01849a to fdd5ec2 Compare

March 20, 2025 04:50

SangJunBak added 7 commits

March 20, 2025 00:50


          Refactor SystemCatalogDumper for improved PostgreSQL interaction

058ee78

- Convert queries to transactions to account for different cluster session variables per introspection query
- Introduce separate timeouts for connection and query execution.
- Extend relation dumping logic for introspection relatoins


          Make dump_all_relations synchronous

d6b35e4

- Create a future instead of spawning a task
- Separate closures and functions into separate functions
- Make the docs link conditional based on the cluster replica
- Cancel queries when not responsive via cancel_token
- Set the max retry duration to 20s until dump_relation gets cancelled
- Sets the max number of errors for cluster replicas to 3 until dump_all_relations skips the cluster replica entirely


          Annotate postgresql query errors

317bcda


          Refactor SystemCatalogDumper constants to static variables

2fd9ab3


          Update get calls to handle errors gracefully

7519edc


          Improve documentation

410ba0f


          Remove target for stdout/stderr output

e783750

SangJunBak force-pushed the jun/#8908/dump-catalog-3 branch from 82c6483 to e783750 Compare

March 20, 2025 04:54

SangJunBak changed the title ~~Extract introspection queries~~ Scrape mz_introspection data as CSVs per cluster replica

This was referenced Mar 20, 2025

Add system catalog dumper for self-managed debug tool #31859

Merged

Scrape retained catalog data as CSVs #31976

Merged

ParkMyCar approved these changes

View reviewed changes

Member

ParkMyCar left a comment

Nice!

src/self-managed-debug/src/system_catalog_dumper.rs Outdated

@@ @@ -615,57 +615,231 @@ const RELATIONS: &[Relation] = &[ @@
                   },
               ];
-              const TIMEOUT: Duration = Duration::from_secs(60);
+              static PG_CONNECTION_TIMEOUT: Duration = Duration::from_secs(60);
+              /// Timeout for a query. We use 6 minutes since it's a good

Member

ParkMyCar Mar 21, 2025

The comment says 6 minutes but the timeout is 20 seconds?

src/self-managed-debug/src/system_catalog_dumper.rs Show resolved Hide resolved

src/self-managed-debug/src/system_catalog_dumper.rs

+                  anyhow::Error,
+              > {
+                  let mut pg_config = PgConfig::from_str(connection_string)?;
+                  pg_config.connect_timeout(PG_CONNECTION_TIMEOUT);

Member

ParkMyCar Mar 21, 2025

It might be nice to have the timeout be configurable via a CLI arg? Doesn't need to be fixed in this PR though!

src/self-managed-debug/src/system_catalog_dumper.rs

+                  let copy_stream = std::pin::pin!(copy_stream);
+                  let mut reader = StreamReader::new(copy_stream);
+                  tokio::io::copy(&mut reader, &mut file).await?;

Member

ParkMyCar Mar 21, 2025

Here we should call file.sync_all().await? so we make sure the data is fsync-ed to disk

src/self-managed-debug/src/system_catalog_dumper.rs Outdated

+                  }
+                  // We query the column names to write the header row of the CSV file.
+                  // TODO (SangJunBak): Use `WITH (HEADER TRUE)` once database-issues#2846 is implemented.

Member

ParkMyCar Mar 21, 2025

I think this comment is stale? Or it should maybe be moved to copy_relation_to_csv(...)?

Contributor Author

SangJunBak Mar 25, 2025

Ah good callout on how it should be closer to copy_relation_to_csv.

src/self-managed-debug/src/system_catalog_dumper.rs

-                                  let pg_client = Arc::clone(&pg_client);
-                                  let relation_name = relation_name.clone();
-                                  let relation_category = relation_category.clone();
+                      let mut pg_client_lock = pg_client.lock().await;

Member

ParkMyCar Mar 21, 2025

Can you document somewhere in this crate why we don't want to run these queries in parallel, and we'd prefer serial execution?


          Enhance documentation and ensure file flushing in SystemCatalogDumper

75ba1f7

SangJunBak merged commit 7680279 into jun/#8908/dump-catalog-2

2 of 4 checks passed

SangJunBak deleted the jun/#8908/dump-catalog-3 branch

March 25, 2025 04:35

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet