-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pytorch example with alluxiofs #49
Conversation
@lucyge2022 @LuQQiu For your review. Thanks! |
fsspec.register_implementation("alluxiofs", AlluxioFileSystem, clobber=True) | ||
alluxio_fs = fsspec.filesystem( | ||
"alluxiofs", etcd_hosts="localhost", etcd_port=2379, target_protocol="s3" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so in here, we are already in a child process right?
we can reuse the alluxio_fs instance created per child process when we spawn main() correct?
…rocess. Add dependencies in the top comment lines.
@lucyge2022 Moved alluxio_fs initialization and file preprocessing to the parent process. Add dependencies in the top comment lines. |
# with open(self.output_filename, 'a') as file: | ||
# file.write( | ||
# f'access to global index {index}, which is line {target_line_index} in file {target_file_name}: {target_line.iloc[0]}\n') | ||
# file.write(f'__getitem__ total access: {self.total_access}\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove if not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM besides comments
@SibylYang also there's some test case failure, the on in CI/lint seems to be bcos of style, can u use pre-commit tool (apt-get install pre-commit) to screen thru the scripts, it will correct the code style for you.
|
@lucyge2022 code updated to address your comments |
An example that uses Alluxiofs to speed up pytorch distributed NLP training with large CSV files.
Formatted the file using black and reorder_python_imports.