Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pytorch example with alluxiofs #49

Merged
merged 6 commits into from
Jun 24, 2024
Merged

Conversation

SibylYang
Copy link
Collaborator

An example that uses Alluxiofs to speed up pytorch distributed NLP training with large CSV files.
Formatted the file using black and reorder_python_imports.

@SibylYang
Copy link
Collaborator Author

@lucyge2022 @LuQQiu For your review. Thanks!

fsspec.register_implementation("alluxiofs", AlluxioFileSystem, clobber=True)
alluxio_fs = fsspec.filesystem(
"alluxiofs", etcd_hosts="localhost", etcd_port=2379, target_protocol="s3"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so in here, we are already in a child process right?
we can reuse the alluxio_fs instance created per child process when we spawn main() correct?

…rocess. Add dependencies in the top comment lines.
@SibylYang
Copy link
Collaborator Author

@lucyge2022 Moved alluxio_fs initialization and file preprocessing to the parent process. Add dependencies in the top comment lines.

# with open(self.output_filename, 'a') as file:
# file.write(
# f'access to global index {index}, which is line {target_line_index} in file {target_file_name}: {target_line.iloc[0]}\n')
# file.write(f'__getitem__ total access: {self.total_access}\n')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove if not needed

Copy link
Collaborator

@lucyge2022 lucyge2022 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM besides comments

@lucyge2022
Copy link
Collaborator

lucyge2022 commented Jun 22, 2024

@SibylYang also there's some test case failure, the on in CI/lint seems to be bcos of style, can u use pre-commit tool (apt-get install pre-commit) to screen thru the scripts, it will correct the code style for you.
for the CI/Tests , can u make this change to file tests/conftest.py

@@ -93,7 +93,7 @@ def launch_alluxio_dockers(with_etcd=False):
     run_cmd_etcd = (
         "docker run --platform linux/amd64 -d --rm --net=alluxio_network -p 4001:4001 -p 2380:2380 -p 2379:2379 "
         f"-v {TEST_DIR}:/etc/ssl/certs "
-        "--name etcd quay.io/coreos/etcd:latest "
+        "--name etcd quay.io/coreos/etcd:v3.5.13 "

@SibylYang
Copy link
Collaborator Author

@lucyge2022 code updated to address your comments

@lucyge2022 lucyge2022 merged commit bd39f9e into fsspec:main Jun 24, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants