Enable conf to read from cloud path, needed for spark cluster mode, removed logging levels #927
+68
−9
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces support for reading configuration files from various cloud storage providers, including Google Cloud Storage (GCS), Amazon S3, and Azure Blob Storage, in addition to local file systems. The implementation uses Hadoop's FileSystem API to handle paths dynamically based on their prefixes, allowing seamless integration of cloud-based file management into existing workflows.
Why / Goal
The goal of this change is to enhance the flexibility of our configuration management by enabling the application to read configuration files directly from cloud storage. This improvement supports various use cases, such as:
Test Plan
Added Unit Tests: Implemented tests to verify cloud path resolution and file accessibility for GCS, S3, and Azure.
Covered by existing CI: Updated CI configurations to include cloud-related tests.
Integration tested: Validated functionality through end-to-end tests with real cloud path
Reviewers
@nikhil-zlai @caiocamatta-stripe @haotizhong