Enable conf to read from cloud path, needed for spark cluster mode, removed logging levels #927

krisnaru · 2025-02-20T05:35:06Z

Summary

This PR introduces support for reading configuration files from various cloud storage providers, including Google Cloud Storage (GCS), Amazon S3, and Azure Blob Storage, in addition to local file systems. The implementation uses Hadoop's FileSystem API to handle paths dynamically based on their prefixes, allowing seamless integration of cloud-based file management into existing workflows.

Why / Goal

The goal of this change is to enhance the flexibility of our configuration management by enabling the application to read configuration files directly from cloud storage. This improvement supports various use cases, such as:

Simplifying configuration management for distributed applications.
Enhancing portability and scalability by allowing configurations to be stored in a centralized cloud location.
Unlocking opportunities for easier deployment and integration with cloud-native architectures.

Test Plan

Added Unit Tests: Implemented tests to verify cloud path resolution and file accessibility for GCS, S3, and Azure.
Covered by existing CI: Updated CI configurations to include cloud-related tests.
Integration tested: Validated functionality through end-to-end tests with real cloud path

Reviewers

@nikhil-zlai @caiocamatta-stripe @haotizhong

spark/src/main/scala/ai/chronon/spark/Driver.scala

hzding621 · 2025-02-20T21:00:14Z

cc @pallavia7 to tal

pengyu-hou · 2025-02-21T19:23:04Z

spark/src/main/scala/ai/chronon/spark/SparkSessionBuilder.scala

@@ -107,9 +107,6 @@ object SparkSessionBuilder {
      baseBuilder
    }
    val spark = builder.getOrCreate()
-    // disable log spam
-    spark.sparkContext.setLogLevel("ERROR")


could you undo this? Otherwise, the logs will be too spammy

support conf cloud paths

e5acc20

krisnaru changed the title ~~support conf cloud paths~~ Support cluster mode for spark and to enable conf to read from cloud path Feb 20, 2025

Merge branch 'main' into conf-cloud-path

76b3654

nikhil-zlai approved these changes Feb 20, 2025

View reviewed changes

spark/src/main/scala/ai/chronon/spark/Driver.scala Outdated Show resolved Hide resolved

knarukulla added 2 commits February 20, 2025 09:16

feedback

8f0b9ca

disable logging

f15ce80

krisnaru changed the title ~~Support cluster mode for spark and to enable conf to read from cloud path~~ Enable conf to read from cloud path, needed for spark cluster mode, removed logging levels Feb 20, 2025

scala fmt

9f8d542

pengyu-hou reviewed Feb 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable conf to read from cloud path, needed for spark cluster mode, removed logging levels #927

Enable conf to read from cloud path, needed for spark cluster mode, removed logging levels #927

krisnaru commented Feb 20, 2025

hzding621 commented Feb 20, 2025 •

edited

Loading

pengyu-hou Feb 21, 2025

Enable conf to read from cloud path, needed for spark cluster mode, removed logging levels #927

Are you sure you want to change the base?

Enable conf to read from cloud path, needed for spark cluster mode, removed logging levels #927

Conversation

krisnaru commented Feb 20, 2025

Summary

Why / Goal

Test Plan

Reviewers

hzding621 commented Feb 20, 2025 • edited Loading

pengyu-hou Feb 21, 2025

Choose a reason for hiding this comment

hzding621 commented Feb 20, 2025 •

edited

Loading