Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVRO-3985: [JAVA] adding default serializable pacakge to trusted package list #3333

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

abhilakshyadobhal
Copy link

@abhilakshyadobhal abhilakshyadobhal commented Mar 8, 2025

What is the purpose of the change

With this change, the default packages will always be going to be part of trusted packages.

What is the fix?

  • Static Block Enhancement:
    The static block has been updated to always include both user-defined packages (if any) and the default packages (java.lang,java.math,java.io,java.net,org.apache.avro.reflect), while removing duplicate entries.

  • Constructor Updates:
    Constructors in SpecificDatumReader now call initializeTrustedPackages(), ensuring that every instance is correctly populated with the trusted packages.

Verifying this change

This change is already covered by existing tests, such as (please describe tests).

Documentation

  • Feature Impact:
    This commit is a bug fix for the package initialization in SpecificDatumReader and does not introduce any new features.

  • Code Documentation:
    Code comments, especially in the initializeTrustedPackages() method, have been updated to clearly explain the logic behind the inclusion of both user-defined and default packages.

JIRA Ticket

@github-actions github-actions bot added the Java Pull Requests for Java binding label Mar 8, 2025
@martin-g martin-g requested a review from jbonofre March 8, 2025 13:56
…icDatumReader

This commit addresses an issue in Avro's SpecificDatumReader where the trustedPackages list was not properly
initialized in every constructor. Although the issue is tracked under HADOOP-19315 in the Hadoop JIRA board,
the underlying problem exists in Avro.

Changes include:
- Modifying the static initialization block to combine both user-defined packages from the
  org.apache.avro.SERIALIZABLE_PACKAGES system property and the default packages
  ("java.lang,java.math,java.io,java.net,org.apache.avro.reflect"), ensuring that duplicates are removed.
- Updating the initializeTrustedPackages() method to call clear() before adding packages, preventing duplicate entries.
- Ensuring that initializeTrustedPackages() is invoked in every constructor of SpecificDatumReader so that each instance
  always has the correct trustedPackages list.

This improvement ensures that custom packages specified via the system property are correctly recognized as trusted,
preventing security errors during deserialization.
@KalleOlaviNiemitalo
Copy link
Contributor

KalleOlaviNiemitalo commented Mar 8, 2025

Previously, if the org.apache.avro.SERIALIZABLE_PACKAGES system property was set as "*", then only that was added to SERIALIZABLE_PACKAGES and to trustedPackages, and trustAllPackages() saw that trustedPackages was a one-element list containing only "*" so it returned true.

Now with this PR, trustedPackages will contain the default trusted packages as well, so it won't be a one-element list, and trustAllPackages() will return false.

That's why I think this should not be merged as is.

Is it possible to add a test case for that regression? I'm not sure how to isolate system property changes from other tests running in parallel.

@KalleOlaviNiemitalo
Copy link
Contributor

HADOOP-19315 "Bump avro from 1.9.2 to 1.11.4" was already marked as fixed. If an Avro change is still needed then perhaps that should be a separate AVRO issue in Jira.

* and any duplicate entries are avoided.</p>
*
* <p>Before adding the packages, the list is cleared to prevent duplicate entries if this method is invoked
* multiple times, ensuring that the list remains consistent and up-to-date across all instances.</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "up-to-date across all instances" mean here?

  • AFAICS, SERIALIZABLE_PACKAGES is not modified after it has been initialised, so there is no question of the trustedPackages list being "up-to-date" with SERIALIZABLE_PACKAGES.
  • As the method only modified the trustedPackages list in one instance, the "across all instances" wording seems misleading.

Out of curiosity, were parts of the PR description and the javadoc comment created using generative AI? The Apache Software Foundation publishes some guidelines for that at ASF GENERATIVE TOOLING GUIDANCE.

@KalleOlaviNiemitalo
Copy link
Contributor

#3330 fixes AVRO-3985 so that trustedPackages is initialized no matter which constructor of SpecificDatumReader is used. So if that pull request is merged first, then the only user-visible change remaining in this #3333 will be that the default serializable packages are always included. At that point, this PR should be retitled.

Copy link
Contributor

@KalleOlaviNiemitalo KalleOlaviNiemitalo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait for #3330 to be merged first, which has an issue in the AVRO project.

Delete the initializeTrustedPackages() calls from those constructors that call this(…).

Retitle this pull request when it no longer adds initializeTrustedPackages() calls in constructors due to the changes above.

Fix the org.apache.avro.SERIALIZABLE_PACKAGES = * case so that trustAllPackages() returns true again. Add a test for that if possible.

}

private final List<String> trustedPackages = new ArrayList<>();

public SpecificDatumReader() {
this(null, null, SpecificData.get());
initializeTrustedPackages();
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling initializeTrustedPackages() here is not necessary because this constructor calls another constructor of the same class this(null, null, SpecificData.get()) and that constructor will call initializeTrustedPackages(). It should be called only after the super constructor.

This PR makes similar unnecessary changes to other delegating constructors too.

Copy link
Author

@abhilakshyadobhal abhilakshyadobhal Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have updated the PR, and will rebase once the other change #3330 gets merged.

@abhilakshyadobhal abhilakshyadobhal changed the title HADOOP-19315: [Java] Ensure trustedPackages is initialized in all constructors of SpecificDatumReader AVRO-3985: [JAVA] adding default serializable pacakge to trusted package list Mar 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Java Pull Requests for Java binding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants