Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable conf to read from cloud path, needed for spark cluster mode, removed logging levels #927

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

krisnaru
Copy link
Contributor

Summary

This PR introduces support for reading configuration files from various cloud storage providers, including Google Cloud Storage (GCS), Amazon S3, and Azure Blob Storage, in addition to local file systems. The implementation uses Hadoop's FileSystem API to handle paths dynamically based on their prefixes, allowing seamless integration of cloud-based file management into existing workflows.

Why / Goal

The goal of this change is to enhance the flexibility of our configuration management by enabling the application to read configuration files directly from cloud storage. This improvement supports various use cases, such as:

Simplifying configuration management for distributed applications.
Enhancing portability and scalability by allowing configurations to be stored in a centralized cloud location.
Unlocking opportunities for easier deployment and integration with cloud-native architectures.

Test Plan

Added Unit Tests: Implemented tests to verify cloud path resolution and file accessibility for GCS, S3, and Azure.
Covered by existing CI: Updated CI configurations to include cloud-related tests.
Integration tested: Validated functionality through end-to-end tests with real cloud path

Reviewers

@nikhil-zlai @caiocamatta-stripe @haotizhong

@krisnaru krisnaru changed the title support conf cloud paths Support cluster mode for spark and to enable conf to read from cloud path Feb 20, 2025
@krisnaru krisnaru changed the title Support cluster mode for spark and to enable conf to read from cloud path Enable conf to read from cloud path, needed for spark cluster mode, removed logging levels Feb 20, 2025
@hzding621
Copy link
Collaborator

hzding621 commented Feb 20, 2025

cc @pallavia7 to tal

@@ -107,9 +107,6 @@ object SparkSessionBuilder {
baseBuilder
}
val spark = builder.getOrCreate()
// disable log spam
spark.sparkContext.setLogLevel("ERROR")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you undo this? Otherwise, the logs will be too spammy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants