-
Notifications
You must be signed in to change notification settings - Fork 3k
Regular expression export/import #9976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@josevalim What do you think about this? |
CT Test Results 4 files 228 suites 1h 54m 13s ⏱️ Results for commit efd5ef0. ♻️ This comment has been updated with latest results. To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass. See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally. Artifacts// Erlang/OTP Github Action Bot |
I believe this is fantastic and simplifies many of the issues we had to tackle in Elixir. Thank you. It would be fantastic if this could be used from Erlang too. Perhaps a pass in the compiler will rewrite re:compile into re:import? Also, do you see this making to 28.1 or would it be 29 only? |
The plan is to get this export/import functionality into 28.1. And then potentially do the loader optimization later maybe already in 28.2. |
@sverker making it part of 28.1 would help Elixir codebases migrate to latest OTP, so thank you. I have one additional question: do you think it is reasonable for |
I have one additional thought: what if the export is part of the existing tagged tuple? For example, you can add a new field to |
The "import" step was literally free but unsafe. It is now safe but not totally free. It has to
I did some measurements, and the import seems to be a least a factor 10 cheaper than compiling the corresponding expression. Compiling a large 20 kb regex took ~500μs while importing it took ~40μs. Our idea was to keep the import as a separate step for performance reasons. At least to begin with. After all, the only reason to precompile regex is performance. If you don't care much about that, just send the regex across node instances uncompiled. For example, if someone has existing generated code looking like this
then the loader trick would probably not trigger as the regex argument to If we keep the import separate, then the code generation could be changed simply by adding the
The loader can detect the calls to We can always add automatic import to |
Got it, thank you. I think I misunderstood it initially but it is now clear to me: I need to call |
Yes. Except, instead of a new |
Given |
0597625
to
c77a347
Compare
7cecc13
to
314caf6
Compare
Split off build_compile_error() from build_compile_result().
314caf6
to
efd5ef0
Compare
Leverages newly added :re.import/1. erlang/otp#9976
Leverages newly added :re.import/1. erlang/otp#9976
Problem
Before OTP 28.0 it was possible to abuse the compiled format of regular expressions as returned by
re:compile
as if it was a serialized format to be imported into other Erlang node instances. This abuse happened to work as long as the underlying hardware architecture and PCRE version was not too incompatible. But it was unsafe as any unpleasant behavior could be the result of passing an incompatible compiled regular expression tore:run
.In OTP 28.0 the compiled format has changed to not expose the internals of PCRE but instead return a safe (magic) reference to the internal regex structures. A compiled regex is now safe but can only be used in the node instance that compiled it.
Solution
This PR introduces a supported safe way to export compiled regular expressions. The exported format is self-contained and can be stored off-node or sent to another nodes. If the importing node is compatible (architecture and PCRE version), then the compiled regex can be used directly with minimal overhead. If not compatible, then the regular expression will be recompiled from the original string and options which are included as a fallback in the exported format.
Usage
then in a potentially other node do
Exported format
The exported format is opaque but look currently like this:
{re_exported_pattern, HeaderBin, OrigBin, OrigOpts, EncodedBin}
EncodedBin
- binary containing the compiled regex as encoded bypcre2_serialize_encode()
HeaderBin
- binary with some meta information including a CRC checksum overEncodedBin
OrigBin
- original regular expression as a binary stringOrigOpts
- options passed tore:compile/2
.Future optimization
For users that earlier generated Erlang code with compiled regular expressions as literals would now instead compile with option
export
and generatere:import(Literal)
instead of just the literal. If done like that, the beam loader could be optimized to detect such calls tore:import
with literals as arguments, evaluate the calls in load-time and replace them with just the returned compiled regular expression as a literal term.