Skip to content

Conversation

cw-alexcroteau
Copy link

What changes were proposed in this pull request?

  • Added a new configuration property, spark.skipUnixNativeRm, to explicitly skip trying the Unix rm binary
  • Grouped boolean conditions for skipping the native rm binary into a helper, shouldTryUnixNativeRm

Why are the changes needed?

In some Unix setups such as when using minimal containers, the rm binary is not available. Receiving an error message every time a call to deleteRecursively() is made is confusing and noisy, even with the presence of the warning message. Making calls to an nonexistent binary also has a slight performance impact.

As an application maintainer, I would like to be able to avoid this situation when I know my setup will never have the rm Unix binary.

Does this PR introduce any user-facing change?

  • When a user sets the spark.skipUnixNativeRm property, deleteRecursivelyUsingUnixNative in deleteRecursively is never called.

How was this patch tested?

No unit tests exist for this specific class, and change is minor enough that I do not consider it worth the risk to add unit tests to the file, especially given my limited knowledge of Spark and the intended behaviour of that file, including potentially unexpected side-effects.

Was this patch authored or co-authored using generative AI tooling?

No

@cw-alexcroteau
Copy link
Author

Note: I am pending access to JIRA, will create a ticket and modify PR title as soon as it is granted.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making a PR, @cw-alexcroteau .

If that's the limitation of underlying OSes, why don't we automatically detect and disable the native Unix RM logic? A static variable initialization via Java class static block might be better than this new Java system property. Could you try that way?

@vrozov
Copy link
Member

vrozov commented Oct 1, 2025

@cw-alexcroteau which minimal container do you use? Is it custom build or a published docker image (for example alpine)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants