Skip to content

Conversation

@iguinn
Copy link
Contributor

@iguinn iguinn commented Jun 1, 2025

This is an attempt to address #135 using the asteval package, which is safer than the built in eval. This disables certain features of the language using essentially a white list (see here).

The way this is not written is for Table.eval to take an extra argument library (similar to view_as). The table will then be view_ased appropriately for this library, and the fields will be fed into the interpreter. In addition, modules needed for using that library are fed in (e.g., ak if library is "ak").

I also added an argument as_lgdo to control the return type, so you can return as the library you provide or as an LGDO object (defaults to True since this is the current behavior).

I left in the parameters argument to feed in external values, but removed the modules argument; the user actually can still feed modules as parameters, but I'm not so sure we want that (at some point Table.eval isn't the right tool for the job...).

The libraries included here are (note numpy is automatically included):

  • lgdo: includes the Table as LGDO objects and includes lgdo.types module
  • pd: calls pandas.DataFrame.eval
  • numexpr: similar to before; uses flatten to use __ for nested objects, and then includes the ndas for each field (excluding VectorOfVectors fields)
  • ak: includes fields as ak objects, and includes the ak module
  • np: will throw an Error because view_as('np') is not implemented

Implementing each of these was extremely simple with asteval, and extending this system further would be fairly trivial!

Right now, this PR is not ready to merge; I have submitted it for the sake of discussion. In particular, this adds a new dependency in asteval, and it works a little differently from the old implementation. It should be possible to make the default behavior for no provided library act like before (basically ak if needed and numexpr otherwise). In addition, I have to still make sure the tests are appropriate.

@tdixon97
Copy link
Contributor

tdixon97 commented Jun 3, 2025

I am not sure about this, it will break a lot of our code in reboost to take away the modules option...
I think a much better solution is to build the function expressions themself than to just use another package for eval that will surely have limitations.

@tdixon97
Copy link
Contributor

tdixon97 commented Jun 3, 2025

I think it will also break the pygama evt tier where we eval pygama functions

@iguinn
Copy link
Contributor Author

iguinn commented Jun 5, 2025

Interesting...Good to know. We can consider adding modules back in (you can still add modules through the parameters arg, but that's not obvious).

I actually agree that for complex expressions, writing a function is the way to go. The reason I'm interested in this function is that I want to be able to use it with query in #154 (and eventually see if we can build a light-weight event loader based on these tools). It should be noted that query, map, and hist in that PR are all designed to work with user-defined function as well.

This function is important for pygama/hit tier processing, though, so it's important that we understand what functionality might break. At this time, I see this PR more as a proposal for an approach, and if we agree it's worth pursuing this approach we can try to fix it up to minimize breakage and write better tests and such.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants