-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Add JSON serializer for ASTs and store them upon node creation #699
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for thriving-cassata-78ae72 canceled.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me, thanks @shangyian! So much good stuff in such few lines. And I'm understanding it right that this PR creates and stores the query ast (via the node validation) but doesn't actually utilize it yet in the SQL generation? It makes sense to break that out into a separate PR.
@@ -415,7 +416,7 @@ def validate_node_data( # pylint: disable=too-many-locals | |||
dependencies_map, | |||
) | |||
validated_node.required_dimensions = matched_bound_columns | |||
|
|||
validated_node.query_ast = json.loads(json.dumps(query_ast, cls=ASTEncoder)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this just handles serializing and storing the ast. As I mentioned below, I might try some basic deserialization to make sure it works for query building, but I'll put the actual implementation in a separate PR :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shangyian a few questions taking a quick peek. These are all pretty much the same question from different directions I think since they are all some information I think are used after compilation but may be ignored during this serialization
- how are
parent
andparent_key
handled when deserializing - does this account for potential circular references like
Column
<->Table
- for
Table
in particular, some of the ignored attributes are only set during compilation and I think are potentially used in some build stuff, are these somehow backfilled during deserialization?
@@ -102,6 +102,10 @@ class Node(ABC): | |||
|
|||
_is_compiled: bool = False | |||
|
|||
@property | |||
def json_ignore_keys(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this pattern 🙂
@shangyian this is awesome. From what you said it sounds like there still may need to be adjustments done to this code once we start deserializing and using this code? |
Reading on the bigger screen now... I see this is just meant to be serialization. When I was imagining this, if I had to handle potential circular stuff like in my question and your writeup, I figured maybe a flat structure like |
So right now it's handling the circular stuff by storing a
Yeah, so it sounds like I might need to take a stab at deserialization and make sure that all works with this setup. If not, a flat structure like you described will probably help! I think the case where having Table fully populated with columns will be used is when we're trying to build a query that needs one or more columns from that table to be grouped or filtered on as dimensions. |
@agorajek It's quite possible, so I'll try setting up some basic deserialization before merging just to make sure that this setup is actually enough. |
…r references and thus can serialize more of the AST
3cea1d7
to
b7eff1c
Compare
Summary
This PR adds a custom JSON encoder for query ASTs:
ASTEncoder
. This encoder uses our own circular check so that we can short-circuit the processing of circular dependencies but not raise an error. We may want to determine what's causing these circular dependencies (it looks related toFunctionTableExpression
), but that's a separate issue.This also adds a
query_ast
column toNodeRevision
so that every time we create a node, we can store the parsed query AST alongside it. The logic for actually using this cached AST can be done separatelyTest Plan
make check
passesmake test
shows 100% unit test coverageDeployment Plan