-
Notifications
You must be signed in to change notification settings - Fork 1.6k
fix: Expose hash to FFI udf/udaf/udwf to fix their Eq #17350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I'm wondering if we can save ourselves some effort here and do this: pub struct FFI_ScalarUDF {
...
/// Internal hash function result
pub hash_value: u64,
...
}
impl From<Arc<ScalarUDF>> for FFI_ScalarUDF {
fn from(udf: Arc<ScalarUDF>) -> Self {
let name = udf.name().into();
let aliases = udf.aliases().iter().map(|a| a.to_owned().into()).collect();
let volatility = udf.signature().volatility.into();
let short_circuits = udf.short_circuits();
let mut state = DefaultHasher::new();
udf.hash(&mut state);
let hash_value = state.finish();
let private_data = Box::new(ScalarUDFPrivateData { udf });
Self {
name,
aliases,
volatility,
short_circuits,
invoke_with_args: invoke_with_args_fn_wrapper,
return_type: return_type_fn_wrapper,
return_field_from_args: return_field_from_args_fn_wrapper,
coerce_types: coerce_types_fn_wrapper,
hash_value,
clone: clone_fn_wrapper,
release: release_fn_wrapper,
private_data: Box::into_raw(private_data) as *mut c_void,
}
}
}
impl PartialEq for ForeignScalarUDF {
fn eq(&self, other: &Self) -> bool {
let Self {
name,
aliases,
udf,
signature,
} = self;
name == &other.name
&& aliases == &other.aliases
&& signature == &other.signature
&& udf.hash_value == other.udf.hash_value
}
}
impl Hash for ForeignScalarUDF {
fn hash<H: Hasher>(&self, state: &mut H) {
let Self {
name,
aliases,
udf,
signature,
} = self;
name.hash(state);
aliases.hash(state);
// This appears to be a hash of the hash value, but if you review how
// u64 is hashed, it is just pushing the byte values into state.
udf.hash_value.hash(state);
signature.hash(state);
}
} And further, I wonder if we even need It seems like we have an opportunity here to have a simpler path, but I'm not 100% confident I haven't overlooked some need to call |
Thanks for your feedback! Will take a look and think more carefully about the cases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent! Thank you for diving in!
It may sounds like being dependent on hash not colliding. |
@findepi Sorry for the confusion in wording and I just fixed the feature description. For Eq, we do compare signatures and aliases in addition to hash values. |
If two different functions return same hash (e.g. |
I must be missing something - how is that any different than any of the UDFs that aren't going through ffi? This is calling the same function during initialization. |
For normal functions, the Eq is not based on the Hash and is not susceptible to hash collisions. |
You can think of this the following way. If I replace hash function with a function that always returns the same thing (e.g. However, looking at |
Ok, I think @findepi makes a fair point. We can probably just revert the last commits and I'll re-review. Sorry for the extra work. |
Sorry I might be missing some context here. I understand that hash values can only be used to avoid more expensive comparisons if they are unequal. But if hash values are equal, it doesn't guarantee anything about equality, so how should we check equality of ForeignUDFs? I'm not sure if having |
@timsaucer what if we simply don't do #17087 ? |
Yeah, I guess it's better to have Eq to return false negative (return false for equal udfs) than false positive (return true for unequal udfs), although it would be good if we can return the result more accurately. |
Agreed!
Agreed too. |
Which issue does this PR close?
Rationale for this change
As described in the issue, the original version of Eq in Foreign UDF/UDAF/UDWF does comparison based on pointer hash, which fails to recognize the same UDFs if their pointers differ. This feature fixes this by exposing
hash
method in the FFI interface so that Eq will compare the actual hash values of udfs (as well as their signatures and aliases).What changes are included in this PR?
For FFI UDF, UDAF and UDWF, I have made the following changes:
hash
function is exposed in FFI moduleForeign_
module, thePartialEq
trait will do comparison on the results of the hash functionsForeign_
module, theHash
trait will now use the result of hash functionAre these changes tested?
Yes. The added unit tests for UDF, UDAF and UDWF failed before this feature, but passed now.
Are there any user-facing changes?
No.