-
-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support WHERE x IN (list) clauses with multiple items in list #352
base: main
Are you sure you want to change the base?
Conversation
src/shillelagh/backends/apsw/vt.py
Outdated
if isinstance(constraint, set): | ||
# A WHERE x IN (list) clause sends in constraintargs as a set when N>1: | ||
# {item1,...,itemN}. We serialize this set to a string '["item1",...,"itemN"]'. | ||
constraint = json.dumps(list(constraint)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, but why do you have to serialize this at all here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because these sets belong to an "Equals" clause and the current logic translates that for a string to '==string' and to leverage that principle the set needs to be converted to somethign like '==["string1", "string2"]'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code should look something like this:
if isinstance(constraint, set):
constraint = [type_map[column_type.type]().parse(c) for c in constraint]
value = [column_type.format(c) for c in constraint]
# operator = IN (not sure if eq is passed in here or if IN is set)
else:
constraint = type_map[column_type.type]().parse(constraint)
value = column_type.format(constraint)
all_bounds[column_name].add((operator, value))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok yes, you will need to set the operator explicitly here to Operator.IN
and create that operator as otherwise it will be EQ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - let me try that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all_bounds[column_name].add((operator, value))
which value
is a list so (operator, value)
is unhashable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So change this line:
value = [column_type.format(c) for c in constraint]
To
value = tuple(column_type.format(c) for c in constraint)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or use frozenset: https://docs.python.org/3/library/stdtypes.html#frozenset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that should work, then we got adding new type of operator left, waiting for the update and merge ^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went with tuple() and added the Operator.IN
this also needs some tests |
src/shillelagh/backends/apsw/vt.py
Outdated
if isinstance(constraint, set): | ||
# A WHERE x IN (list) clause sends in constraintargs as a set when N>1: | ||
# {item1,...,itemN}. We serialize this set to a string '["item1",...,"itemN"]'. | ||
constraint = json.dumps(list(constraint)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok yes, you will need to set the operator explicitly here to Operator.IN
and create that operator as otherwise it will be EQ
I think the failing integration tests look unrelated to your work? And they were probably fixed here -> 48bb76a You should just run |
Yes, thanks, I just figured that out too that the pre-commit is changing files. I think there still is one issue related to my PR though that I'm trying to track down. I seem to be changing the signature of requested_columns of VTTable. |
For pre-commit, just make sure it's installed on your local machine https://pre-commit.com/#install |
You should just be able to merge the latest main branch, or rebase onto main to incorporate the changes related to the API change. 14ed1f6 |
This particular error in my new test, I cannot figure out: FAILED tests/backends/apsw/vt_test.py::test_virtual_best_index_object_with_in_statement - AttributeError: 'VTTable' object has no attribute 'requested_columns' |
The test runs successfully for my on Linux Python 3.9 .... 🤷 Something funny with the Github actions set up maybe ... I'm not a GH actions expert. 😉 |
|
||
all_bounds[column_name].add((operator, value)) | ||
if isinstance(constraint, set) and operator == Operator.EQ: | ||
# See also https://rogerbinns.github.io/apsw/vtable.html#apsw.VTCursor.Filter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit but I would write this as:
if isinstance(constraint, set) and operator is Operator.EQ:
# See also https://rogerbinns.github.io/apsw/vtable.html#apsw.VTCursor.Filter
constraint = [type_map[column_type.type]().parse(c) for c in constraint]
value = tuple(column_type.format(c) for c in constraint)
operator = Operator.IN
else:
constraint = type_map[column_type.type]().parse(constraint)
value = column_type.format(constraint)
all_bounds[column_name].add((operator, value))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That causes warnings because in the 'if' branch value is a tuple and in the 'else' branch it is a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can solve that by forward declaring the type
value: Tuple | string # update type as needed
if isinstance(constraint, set) and operator is Operator.EQ:
# See also https://rogerbinns.github.io/apsw/vtable.html#apsw.VTCursor.Filter
constraint = [type_map[column_type.type]().parse(c) for c in constraint]
value = tuple(column_type.format(c) for c in constraint)
operator = Operator.IN
else:
constraint = type_map[column_type.type]().parse(constraint)
value = column_type.format(constraint)
all_bounds[column_name].add((operator, value))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is Any OK? I'd rather not get into a type cast war trying to list all possibilities
value: Any
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any
is fine
src/shillelagh/filters.py
Outdated
return Impossible() | ||
|
||
# we only accept tuples | ||
value = values.pop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why pop here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copied the concept from all the other filters. I'll remove the check for 'tuple' below and then normalize it with all the others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@betodealmeida can comment on the pop and the rest of this code.
src/shillelagh/filters.py
Outdated
|
||
# we only accept tuples | ||
value = values.pop() | ||
if not isinstance(value, tuple): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably be looser here with the types accepted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These come from https://rogerbinns.github.io/apsw/vtable.html#apsw.VTCursor.Filte. Only set is mentioned there I believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but that is tuple. Where does this get called from any ways?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tuple? That's what we chose in
value = tuple(column_type.format(c) for c in constraint)
But I'm OK removing this constraint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not totally clear where this code gets called. I remember it being cases where the sql engine doesn't handle filtering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this slid under my radar. Taking a look this weekend.
# if true, a set from an IN statement will be assigned the new IN operator and | ||
# passed in as a tuple to bounds in ``get_rows`` and ``get_data``. This requires | ||
# supports_requested_columns to be set to true as well. | ||
supports_in_statements = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we need explicit support at the adapter level. I think we might be able to just add the IN
adapter, and if a given column doesn't support it we could replace it with a bunch of Equal
operators.
Summary
Up to now, the constraintargs for a WHERE IN clause with more than one element in a list comes in as None in the VTCursor.Filter() method. With the use of the VTTable.BestIndexObject() and using apsw 3.41.0+, it is possible to get this list passed in as a set to the VTCursor.Filter() method. This PR serializes that set to a string.
Testing instructions