Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC-4238, DOC-4510 Sideloader GA prep, Relaunch function, begin migration docs cleanup #194

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

aimurphy
Copy link
Contributor

@aimurphy aimurphy commented Mar 31, 2025

Direct changes for Sideloader GA

Prepare for Sideloader GA (DOC-4238):

  • Bring Astra DB Sideloader docs from astra-vector-docs repo.
  • Separate the content into multiple pages. The first page is "About Sideloader" and you can see the other pages below that on the nav. Unless otherwise noted, the Astra DB sideloader content is the same, aside from the multi-page layout.
  • Add a small troubleshooting page, including the relaunch functionality (DOC-4510) and the current manual restart option.
  • Refresh the prerequisites and add some new ones. The PCU prerequisite and pricing information are under "Target Astra DB database requirements".
  • Refresh the ZDM Phase 2 page, and add sideloader as one of the options for Phase 2 of ZDM. I will probably edit this again in the future.
  • Add Astra DB Sideloader to the migration tools list on the migration docs index page. This is specifically for Astra DB and different from the DSE Sideloader. Note that this page will get more updates in a future PR.

General migration doc cleanup

Begin general cleanup of the migration docs:

  • Clean up redundant CDM content - The same partials were used on cdm-overview.adoc + cdm-steps.adoc and cassandra-data-migrator.adoc. There was no difference between the instructions aside from being on two pages vs one page. Now, the content is all on cassandra-data-migrator.adoc, and cdm-overview.adoc just includes the entire body of cassandra-data-migrator.adoc.
  • Remove CDM parameters content that was supposed to be removed in DOC-4573
  • Remove redundant partials (small amounts of content, content that was used only once, content that was commented out)
  • Remove unnecessary usage of ifdef and {imageprefix}
  • Replace outdated release notes page with links to GitHub repos for ZDM proxy, ZDM automation, and CDM.
  • Add links to migration content in other docsets (until we can do a more formal alignment).

The following pages will receive further edits in future PRs. They don't need a detailed review at this time:

⚠️ Dependent PRs

This PR is dependent on:

@aimurphy aimurphy self-assigned this Mar 31, 2025
@plpesvc-ds

This comment was marked as resolved.

@plpesvc-ds
Copy link

plpesvc-ds commented Mar 31, 2025

Build successful! ✅
Pull requests with matching branch name doc-4238 included in the build:

Deploying draft.
Deploy successful! View draft

@plpesvc-ds
Copy link

plpesvc-ds commented Mar 31, 2025

Build successful! ✅
Pull requests with matching branch name doc-4238 included in the build:

Deploying draft.
Deploy successful! View draft

@plpesvc-ds
Copy link

plpesvc-ds commented Mar 31, 2025

Build successful! ✅
Pull requests with matching branch name doc-4238 included in the build:

Deploying draft.
Deploy successful! View draft

@plpesvc-ds
Copy link

plpesvc-ds commented Apr 1, 2025

Build successful! ✅
Pull requests with matching branch name doc-4238 included in the build:

Deploying draft.
Deploy successful! View draft

@plpesvc-ds
Copy link

plpesvc-ds commented Apr 1, 2025

Build successful! ✅
Pull requests with matching branch name doc-4238 included in the build:

Deploying draft.
Deploy successful! View draft

@plpesvc-ds
Copy link

plpesvc-ds commented Apr 1, 2025

Build successful! ✅
Pull requests with matching branch name doc-4238 included in the build:

Deploying draft.
Deploy successful! View draft

@plpesvc-ds
Copy link

plpesvc-ds commented Apr 3, 2025

Build successful! ✅
Pull requests with matching branch name doc-4238 included in the build:

Deploying draft.
Deploy successful! View draft

Copy link
Collaborator

@alicel alicel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

submitting my comments so far

@plpesvc-ds
Copy link

plpesvc-ds commented Apr 7, 2025

Build successful! ✅
Pull requests with matching branch name doc-4238 included in the build:

Deploying draft.
Deploy successful! View draft

@plpesvc-ds
Copy link

plpesvc-ds commented Apr 7, 2025

Build successful! ✅
Pull requests with matching branch name doc-4238 included in the build:

Deploying draft.
Deploy successful! View draft

@plpesvc-ds
Copy link

plpesvc-ds commented Apr 7, 2025

Build successful! ✅
Pull requests with matching branch name doc-4238 included in the build:

Deploying draft.
Deploy successful! View draft

@plpesvc-ds

This comment was marked as resolved.

@plpesvc-ds
Copy link

plpesvc-ds commented Apr 7, 2025

Build successful! ✅
Pull requests with matching branch name doc-4238 included in the build:

Deploying draft.
Deploy successful! View draft

--
Because {sstable-sideloader} operations are typically short-term, resource-intensive events, you can create a flexible capacity PCU group exclusively to support your target database during the migration.

//After PCU PR merge: xref:astra-db-serverless:administration:create-pcu.adoc#flexible-capacity[create a flexible capacity PCU group]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit link before merge (1)

--
If you plan to keep your target database in a PCU group after the migration, you can create a committed capacity PCU group for your target database.

//After PCU PR merge: xref:astra-db-serverless:administration:create-pcu.adoc#committed-capacity[create a committed capacity PCU group]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit link before merge (2)

@plpesvc-ds
Copy link

plpesvc-ds commented Apr 7, 2025

Build successful! ✅
Pull requests with matching branch name doc-4238 included in the build:

Deploying draft.
Deploy successful! View draft

1 similar comment
@plpesvc-ds
Copy link

plpesvc-ds commented Apr 7, 2025

Build successful! ✅
Pull requests with matching branch name doc-4238 included in the build:

Deploying draft.
Deploy successful! View draft

@plpesvc-ds
Copy link

plpesvc-ds commented Apr 7, 2025

Build successful! ✅
Pull requests with matching branch name doc-4238 included in the build:

Deploying draft.
Deploy successful! View draft

Comment on lines +156 to +171
== Command line knowledge requirements

Due to the nature of the {sstable-sideloader} process and the tools involved, you need to be familiar with using the command line, including the following:

* Installing and using CLI tools
* Issuing curl commands
* Basic scripting
* Modifying example command to your environment
* Security best practices

[IMPORTANT]
====
The {sstable-sideloader} process uses authentication credentials to write to the migration directory and your database.

Make sure you understand how to securely store and use sensitive credentials when working on the command line.
====
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eric suggested moving this to the top of the page

Copy link
Contributor

@eric-schneider eric-schneider left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the general migration doc cleanup. Will follow up with Sideloader + approval later tonight or tomorrow morning.


include::partial$cdm-prerequisites.adoc[]
It facilitates data transfer by creating multiple jobs that access the {cass-short} cluster concurrently, making it an ideal choice for migrating large datasets.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally favor repeating the noun instead of using a pronoun when starting new paragraphs. It might be better for SEO too, since I would expect that the para would be more likely to be chosen as a snippet in Google results. But who knows, really.

Suggested change
It facilitates data transfer by creating multiple jobs that access the {cass-short} cluster concurrently, making it an ideal choice for migrating large datasets.
{cass-migrator-short} facilitates data transfer by creating multiple jobs that access the {cass-short} cluster concurrently, making it an ideal choice for migrating large datasets.


[[cdm-build-jar-local]]
== Build {cass-migrator} JAR for local development (optional)
This container's `assets` directory includes all required migration tools: `cassandra-data-migrator`, `dsbulk`, and `cqlsh`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This container's `assets` directory includes all required migration tools: `cassandra-data-migrator`, `dsbulk`, and `cqlsh`.
The container's `assets` directory includes all required migration tools: `cassandra-data-migrator`, `dsbulk`, and `cqlsh`.

Comment on lines +42 to +48
[source,bash,subs="+quotes"]
----
# Replace **PATCH** with your Spark patch version
wget https://archive.apache.org/dist/spark/spark-3.5.**PATCH**/spark-3.5.**PATCH**-bin-hadoop3-scala2.13.tgz

include::partial$use-cdm-migrator.adoc[]
tar -xvzf spark-3.5.**PATCH**-bin-hadoop3-scala2.13.tgz
----
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, this should probably be broken into two separate code blocks for clarity and easy copy-ability. I initially didn't see the first command because my eyes tricked me into thinking it was part of the comment.

== Migrate or validate specific partition ranges
. Add the `cassandra-data-migrator` dependency to `pom.xml`:
+
[source,xml]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[source,xml]
[source,xml,subs="+quotes"]

For example, you can do the following:

* Check for large field guardrail violations before migrating.
* Use the `partition.min` and `partition.max` parameters to migrate or validate specific token ranges
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Use the `partition.min` and `partition.max` parameters to migrate or validate specific token ranges
* Use the `partition.min` and `partition.max` parameters to migrate or validate specific token ranges.


* A log message with `level=debug` or `level=info` is very likely not an error, but something expected and normal.
There are cases where protocol errors are fatal, and they will kill an active connection that was being used to serve requests.
However, it is also possible to get normal protocol log message that contain wording that sounds like an error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
However, it is also possible to get normal protocol log message that contain wording that sounds like an error.
However, it is also possible to get normal protocol log messages that contain wording that sounds like an error.

----

You can also provide a `-version` command line parameter to the {product-proxy} and it will only print the version.
Example:
This message is immediately before the long `Parsed configuration` string.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This message is immediately before the long `Parsed configuration` string.
This message is logged immediately before the long `Parsed configuration` string.

If you encounter a problem during your migration, please contact us.
In the {product-proxy} GitHub repo, submit a https://github.com/datastax/zdm-proxy/issues[GitHub Issue].
Only to the extent that the issue's description does not contain **your proprietary or private** information, please include the following:
If possible, include the following information in the issue description, and remove all proprietary and private information before submitting the issue:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems a bit redundant with the notice right above it.

Suggested change
If possible, include the following information in the issue description, and remove all proprietary and private information before submitting the issue:
If possible, include the following information in the issue description:


=== Reporting a performance issue
* <<proxy-logs,{product-proxy} logs>>, ideally at `debug` level, if you can easily reproduce the issue and tolerate restarting the proxy instances to apply the log level configuration change.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I'm making a mess of the log leveling. I'm assuming you're using CAPS to refer to the log level configuration setting, and smalls to refer to individual messages/entries.

Suggested change
* <<proxy-logs,{product-proxy} logs>>, ideally at `debug` level, if you can easily reproduce the issue and tolerate restarting the proxy instances to apply the log level configuration change.
* <<proxy-logs,{product-proxy} logs>>, ideally at `DEBUG` level, if you can easily reproduce the issue and tolerate restarting the proxy instances to apply the log level configuration change.


* The driver language and version that the client application is using.

For performance-related issues, provide following additional information:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For performance-related issues, provide following additional information:
For performance-related issues, provide the following additional information:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants