-
-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #4816: Apply @ temp operators from left to right #4826
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with this. makes more sense. wait for another approval before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but wait for @ret2libc first, please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey!
I think this is a "big" change as in it completely changes the shell behaviour as it has been so far. That said, I'm fine with it as long as it is well thought and works well!
- test/db: are those the only places where the order really matters? I remember we have a lot of tests where we chain several
@
together. - parsing commands should avoid global state? i think we don't have any for now
- when i think about the shell grammar i always think about it recursively. this change kinda breaks it and makes things (in my view) harder to follow. It might just be I'm used to the right-to-left way though.
More on the last point. The thing is that now the command is structured as <cmd> <@tmp1> <@tmp2> ....
while before it was <cmd> <@tmp>
recursively. the reason the recursion works well in my mind is that each tmp_stmt is structured in this way: 1) setup 2) run original <cmd>
3) teardown. For example, @ addr
does 1) save original addresses and change address to new one 2) run command 3) restore saved addresses. This works well with the previous command structure, but less well imho with the new one (which is the reason why you have some hacks around, I guess, like global child_idx, grandparent, etc.).
What if whenever you see a tmp_stmt
you "rewrite" it in a recursive way like before?
essentially, the user can write px @!5 @x:1234
but then internally rizin parse it as (tmp_blksz_stmt (tmp_hex_stmt (simple_stmt)))
so that we can reuse the parsing functions as before? What do you think? The rewriting can probably happen on the fly in the DEFINE_HANDLE_TS_FCN_AND_SYMBOL(tmp_stmt) {
.
librz/core/cmd/cmd.c
Outdated
static uint32_t tmp_child_idx = 0; | ||
|
||
static TSNode tmp_get_next_node(TSNode cur) { | ||
TSNode next = ts_node_next_named_sibling(cur); | ||
tmp_child_idx++; | ||
if (ts_node_is_null(next)) { | ||
next = ts_node_named_child(ts_node_parent(cur), 0); | ||
tmp_child_idx = 0; | ||
} | ||
return next; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this tbh, it makes parallel parsing of commands not safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we ever, ever decide to parse commands in parallel, I think the usual tricks can be applied e.g. make tmp_child_idx
a thread-local variable, pass tmp_child_idx
around as an argument etc.
@@ -3102,7 +3102,8 @@ static bool substitute_args(struct tsr2cmd_state *state, TSNode args, TSNode *ne | |||
// If do_unwrap is true, then quote unwrapping is always done, else cd is | |||
// checked. An arg of raw type (this can be determined if cd is available) | |||
// prevents unescaping and quote unwrapping regardless. | |||
static RzCmdParsedArgs *ts_node_handle_arg_prargs(struct tsr2cmd_state *state, TSNode command, TSNode arg, uint32_t child_idx, bool do_unwrap, const RzCmdDesc *cd) { | |||
static RzCmdParsedArgs *ts_node_handle_arg_prargs(struct tsr2cmd_state *state, TSNode command, TSNode arg, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the grandparent
used for? why is it necessary?
also, why a change in the tmp_*_stmt
requires changes here? I think this might be a sign that something is not as it should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the
grandparent
used for? why is it necessary?
It's necessary because the following line
Line 3115 in 14eecce
arg = ts_node_named_child(new_command, child_idx); |
assumes that
TSNode arg
is a child of TSNode command
. I need it to go 1 layer deeper, because arg
is now a child of a ts_*_stmt
which is a child of ts_stmt
which contains the whole list of ts_*_stmt
+ a _simple_stmt
at the beginning.
also, why a change in the
tmp_*_stmt
requires changes here? I think this might be a sign that something is not as it should be.
My opinion on this is that substitution (like for backticks) should be allowed on any node regardless on whether the node is a command, but then I see
Line 3044 in 14eecce
return ts_parser_parse_string(state->parser, NULL, state->input, strlen(state->input)); |
and I decided it would be better to just let sleeping dragons lie
Yes but for most of them, the
Parsing commands currently do use the stack to hold previous nodes that have been visited, so in a sense global state is used. Ofc if parallel parsing is desired, a thread will get their own stack automatically.
I do see that the original code is basically doing
I honestly did think hard about the approach you proposed (one reason being that changing 16 different |
I personally find that "need it to go 1 layer deeper, because arg is now a child of a ts_stmt which is a child of ts_stmt which contains the whole list of ts_stmt + a _simple_stmt at the beginning" & co is very hard to follow and introduces some global state (not just global vars, but some global "concept" as well). WRT editing the input, yeah it might not be trivial indeed. Going back to the main change: is it really safe and consistent doing it? Like, what about What is the result of something like:
? (unrelated, but there's probably a bug where |
Yes it appears to be a new concept. I plan to rename Perhaps this is easier to follow. For
Substitution needs to be done on the highlighted
Oh that's a relief.
Since I'm not moving nodes, it's safe and consistent as long as tree-sitter is safe and consistent -- "safe and consistent" here meaning that the semantics is precisely defined, and the parser is deterministic and won't go into an infinite loop.
Its tree is
which is
which is actually the same as before. Only if you have something like:
(the
which is
where previously it was
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd say... can we just find a way to avoid tmp_child_idx
?
yes eedef9b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great! thanks a lot ;)
Your checklist for this pull request
Detailed description
This pr fixes #4816 by making the application of the
@
temp operators (listed at@?
) go from left to right instead of right to left. This was more work than I expected.The main issue encountered was that the
tmp_*_stmt
nodes in the grammar cannot be statements anymore (i.e. they cannot have a_simple_stmt
at the front). This disqualifies them from being commands wrt to the substitution functionsts_node_handle_arg()
andts_node_handle_arg_prargs()
. I think I've managed to shoehorn in a fix without too much of a fuss.The
tmp_*_stmt
nodes in the grammar should be renamed to e.g.tmp_*_op
, but this has been deferred due to the mass renaming involved and because the renaming affects autocomplete.Test plan
All builds are green.
Closing issues
Closes #4816.