-
Notifications
You must be signed in to change notification settings - Fork 470
Scrape mz_introspection data as CSVs per cluster replica #31951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scrape mz_introspection data as CSVs per cluster replica #31951
Conversation
e82b8ed
to
6313548
Compare
d01849a
to
fdd5ec2
Compare
- Convert queries to transactions to account for different cluster session variables per introspection query - Introduce separate timeouts for connection and query execution. - Extend relation dumping logic for introspection relatoins
- Create a future instead of spawning a task - Separate closures and functions into separate functions - Make the docs link conditional based on the cluster replica - Cancel queries when not responsive via cancel_token - Set the max retry duration to 20s until dump_relation gets cancelled - Sets the max number of errors for cluster replicas to 3 until dump_all_relations skips the cluster replica entirely
82c6483
to
e783750
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
@@ -615,57 +615,231 @@ const RELATIONS: &[Relation] = &[ | |||
}, | |||
]; | |||
|
|||
const TIMEOUT: Duration = Duration::from_secs(60); | |||
static PG_CONNECTION_TIMEOUT: Duration = Duration::from_secs(60); | |||
/// Timeout for a query. We use 6 minutes since it's a good |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment says 6 minutes but the timeout is 20 seconds?
anyhow::Error, | ||
> { | ||
let mut pg_config = PgConfig::from_str(connection_string)?; | ||
pg_config.connect_timeout(PG_CONNECTION_TIMEOUT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice to have the timeout be configurable via a CLI arg? Doesn't need to be fixed in this PR though!
let copy_stream = std::pin::pin!(copy_stream); | ||
let mut reader = StreamReader::new(copy_stream); | ||
tokio::io::copy(&mut reader, &mut file).await?; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we should call file.sync_all().await?
so we make sure the data is fsync-ed to disk
} | ||
|
||
// We query the column names to write the header row of the CSV file. | ||
// TODO (SangJunBak): Use `WITH (HEADER TRUE)` once database-issues#2846 is implemented. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment is stale? Or it should maybe be moved to copy_relation_to_csv(...)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good callout on how it should be closer to copy_relation_to_csv.
let pg_client = Arc::clone(&pg_client); | ||
let relation_name = relation_name.clone(); | ||
let relation_category = relation_category.clone(); | ||
let mut pg_client_lock = pg_client.lock().await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you document somewhere in this crate why we don't want to run these queries in parallel, and we'd prefer serial execution?
Based on PR #31859
Chain of upstream PRs as of 2025-03-20
PR Add system catalog dumper for self-managed debug tool #31859:
main
←jun/#8908/dump-catalog-2
jun/#8908/dump-catalog-2
←jun/#8908/dump-catalog-3
See commit messages for details
https://github.com/MaterializeInc/database-issues/issues/8908
Motivation
Tips for reviewer
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.