-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVRO-3985: [JAVA] adding default serializable pacakge to trusted package list #3333
base: main
Are you sure you want to change the base?
Conversation
…icDatumReader This commit addresses an issue in Avro's SpecificDatumReader where the trustedPackages list was not properly initialized in every constructor. Although the issue is tracked under HADOOP-19315 in the Hadoop JIRA board, the underlying problem exists in Avro. Changes include: - Modifying the static initialization block to combine both user-defined packages from the org.apache.avro.SERIALIZABLE_PACKAGES system property and the default packages ("java.lang,java.math,java.io,java.net,org.apache.avro.reflect"), ensuring that duplicates are removed. - Updating the initializeTrustedPackages() method to call clear() before adding packages, preventing duplicate entries. - Ensuring that initializeTrustedPackages() is invoked in every constructor of SpecificDatumReader so that each instance always has the correct trustedPackages list. This improvement ensures that custom packages specified via the system property are correctly recognized as trusted, preventing security errors during deserialization.
0da2588
to
f20ed6f
Compare
Previously, if the org.apache.avro.SERIALIZABLE_PACKAGES system property was set as "*", then only that was added to SERIALIZABLE_PACKAGES and to trustedPackages, and trustAllPackages() saw that trustedPackages was a one-element list containing only "*" so it returned true. Now with this PR, trustedPackages will contain the default trusted packages as well, so it won't be a one-element list, and trustAllPackages() will return false. That's why I think this should not be merged as is. Is it possible to add a test case for that regression? I'm not sure how to isolate system property changes from other tests running in parallel. |
HADOOP-19315 "Bump avro from 1.9.2 to 1.11.4" was already marked as fixed. If an Avro change is still needed then perhaps that should be a separate AVRO issue in Jira. |
* and any duplicate entries are avoided.</p> | ||
* | ||
* <p>Before adding the packages, the list is cleared to prevent duplicate entries if this method is invoked | ||
* multiple times, ensuring that the list remains consistent and up-to-date across all instances.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "up-to-date across all instances" mean here?
- AFAICS, SERIALIZABLE_PACKAGES is not modified after it has been initialised, so there is no question of the trustedPackages list being "up-to-date" with SERIALIZABLE_PACKAGES.
- As the method only modified the trustedPackages list in one instance, the "across all instances" wording seems misleading.
Out of curiosity, were parts of the PR description and the javadoc comment created using generative AI? The Apache Software Foundation publishes some guidelines for that at ASF GENERATIVE TOOLING GUIDANCE.
#3330 fixes AVRO-3985 so that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait for #3330 to be merged first, which has an issue in the AVRO project.
Delete the initializeTrustedPackages() calls from those constructors that call this(…).
Retitle this pull request when it no longer adds initializeTrustedPackages() calls in constructors due to the changes above.
Fix the org.apache.avro.SERIALIZABLE_PACKAGES = *
case so that trustAllPackages()
returns true
again. Add a test for that if possible.
} | ||
|
||
private final List<String> trustedPackages = new ArrayList<>(); | ||
|
||
public SpecificDatumReader() { | ||
this(null, null, SpecificData.get()); | ||
initializeTrustedPackages(); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling initializeTrustedPackages()
here is not necessary because this constructor calls another constructor of the same class this(null, null, SpecificData.get())
and that constructor will call initializeTrustedPackages()
. It should be called only after the super
constructor.
This PR makes similar unnecessary changes to other delegating constructors too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have updated the PR, and will rebase once the other change #3330 gets merged.
What is the purpose of the change
With this change, the default packages will always be going to be part of trusted packages.
What is the fix?
Static Block Enhancement:
The static block has been updated to always include both user-defined packages (if any) and the default packages (
java.lang,java.math,java.io,java.net,org.apache.avro.reflect
), while removing duplicate entries.Constructor Updates:
Constructors in
SpecificDatumReader
now callinitializeTrustedPackages()
, ensuring that every instance is correctly populated with the trusted packages.Verifying this change
This change is already covered by existing tests, such as (please describe tests).
Documentation
Feature Impact:
This commit is a bug fix for the package initialization in
SpecificDatumReader
and does not introduce any new features.Code Documentation:
Code comments, especially in the
initializeTrustedPackages()
method, have been updated to clearly explain the logic behind the inclusion of both user-defined and default packages.JIRA Ticket