Skip to content

Scrape mz_introspection data as CSVs per cluster replica #31951

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 25, 2025

Conversation

SangJunBak
Copy link
Contributor

@SangJunBak SangJunBak commented Mar 19, 2025

Based on PR #31859

Chain of upstream PRs as of 2025-03-20

See commit messages for details

  • This PR adds a known-desirable feature.

https://github.com/MaterializeInc/database-issues/issues/8908

Motivation

Tips for reviewer

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@SangJunBak SangJunBak force-pushed the jun/#8908/dump-catalog-3 branch from e82b8ed to 6313548 Compare March 20, 2025 03:55
@SangJunBak SangJunBak requested a review from ParkMyCar March 20, 2025 04:43
@SangJunBak SangJunBak marked this pull request as ready for review March 20, 2025 04:44
@SangJunBak SangJunBak force-pushed the jun/#8908/dump-catalog-2 branch from d01849a to fdd5ec2 Compare March 20, 2025 04:50
- Convert queries to transactions to account for different cluster session variables per introspection query
- Introduce separate timeouts for connection and query execution.
- Extend relation dumping logic for introspection relatoins
- Create a future instead of spawning a task
- Separate closures and functions into separate functions
- Make the docs link conditional based on the cluster replica
- Cancel queries when not responsive via cancel_token
- Set the max retry duration to 20s until dump_relation gets cancelled
- Sets the max number of errors for cluster replicas to 3 until dump_all_relations skips the cluster replica entirely
@SangJunBak SangJunBak force-pushed the jun/#8908/dump-catalog-3 branch from 82c6483 to e783750 Compare March 20, 2025 04:54
@SangJunBak SangJunBak changed the title Extract introspection queries Scrape mz_introspection data as CSVs per cluster replica Mar 20, 2025
Copy link
Member

@ParkMyCar ParkMyCar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@@ -615,57 +615,231 @@ const RELATIONS: &[Relation] = &[
},
];

const TIMEOUT: Duration = Duration::from_secs(60);
static PG_CONNECTION_TIMEOUT: Duration = Duration::from_secs(60);
/// Timeout for a query. We use 6 minutes since it's a good
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says 6 minutes but the timeout is 20 seconds?

anyhow::Error,
> {
let mut pg_config = PgConfig::from_str(connection_string)?;
pg_config.connect_timeout(PG_CONNECTION_TIMEOUT);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to have the timeout be configurable via a CLI arg? Doesn't need to be fixed in this PR though!

let copy_stream = std::pin::pin!(copy_stream);
let mut reader = StreamReader::new(copy_stream);
tokio::io::copy(&mut reader, &mut file).await?;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we should call file.sync_all().await? so we make sure the data is fsync-ed to disk

}

// We query the column names to write the header row of the CSV file.
// TODO (SangJunBak): Use `WITH (HEADER TRUE)` once database-issues#2846 is implemented.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is stale? Or it should maybe be moved to copy_relation_to_csv(...)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good callout on how it should be closer to copy_relation_to_csv.

let pg_client = Arc::clone(&pg_client);
let relation_name = relation_name.clone();
let relation_category = relation_category.clone();
let mut pg_client_lock = pg_client.lock().await;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document somewhere in this crate why we don't want to run these queries in parallel, and we'd prefer serial execution?

@SangJunBak SangJunBak merged commit 7680279 into jun/#8908/dump-catalog-2 Mar 25, 2025
2 of 4 checks passed
@SangJunBak SangJunBak deleted the jun/#8908/dump-catalog-3 branch March 25, 2025 04:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants