Releases: zeek/spicy
v1.14.0
New Functionality
-
GH-2028: New interprocedural optimizations.
We added infrastructure for performing interprocedural optimizations, and as a first user added a pass which removes unused function parameters in GH-2030. While this works on any code it is mainly intended to simply generated parser code for better runtime performance.
-
GH-1697: Remove some dead statements based on control and data flow.
We now collect control and data flow information. We use this to detect and remove "dead statements", i.e., statements which are not seen by any other needed computations. Currently we handle two classes of dead statements:
- assignments which are override before being used
- unreachable code, e.g., due to preceding
return
,break
orthrow
The implementation for this is still not able to cover all possible Spicy language constructs, so it is behind a feature flag and not enabled by default. To enable it one needs to set the environment variable
HILTI_OPTIMIZER_ENABLE_CFG=1
when compiling Spicy code with e.g.,spicyc
.We encourage users to test this compilation mode and if possible use the compiled parsers in production. If parsers compiled this way show the intended runtime behavior in tests they should also be fine to use in production.
Changed Functionality
-
GH-2050: Prefer stdout over stderr for
--help
messages.Spicy tools now emit
--help
output to stdout instead ofstderr
. -
GH-2068: Allow disabling building of tests.
We added a new CMake option
SPICY_ENABLE_TESTS
which if toggled on forces building of test and benchmark binaries; it isON
by default. This flag can be used by projects building Spicy to disable building of tests if they are not interested in them. We also provide a configure flag--disable-tests
which has has the effect of turning it off. -
GH-1663: Speed up checking of iterator compatibility.
We were previously using a control block which held a
weak_ptr
to the protected data. This was pretty inefficient for a number of reasons:- access to the controlled data always required a
weak_ptr::lock
which created a temporaryshared_ptr
copy and immediately destroyed it after access - to check whether the control block was expired we used
lock
instead ofexpired
which introduced the same overhead - to check compatibility of iterators we compared
shared_ptrs
to the control data which again required full locks instead of usingowner_before
This manifested in e.g., loops often being less performant than possible. We now changed how we hold data to make iterating collections cheaper.
- access to the controlled data always required a
-
GH-2086: Fix scope resolution of local variables.
If usage of a local comes before its declaration, we now no longer resolve that usage to this local. It'll either be resolved to an upper layer ID (if there is one of the same name), or rejected if it's otherwise unknown.
-
GH-2066: When C++ compilation fails, ask user for help.
We do expect C++ code generated by Spicy to be valid, so C++ compiler errors in generated code are likely bugs. We now record the output of the C++ compiler in a dedicated file
hilti-jit-error.log
and ask users to file a ticket in case C++ compilation failed. -
GH-1660: When printing anonymous bitfields inside a struct, lift up the fields.
This now prints, e.g.,
[$fin=1, $rsv=0, $opcode=2, $remaining=255]
instead of[$<anon>=(1, 0, 2, 255)]
.In addition, we also prettify non-anonymous bitfields. They now print as, e.g.,
[$y=(a: 4, b: 8)]
instead of[$y=(4, 8)]
. -
GH-1085: Allow registering a module twice.
So far, if one compiled the same HILTI module twice, each into its own HLTO, then when loading the two HLTOs, the runtime system would skip the second instance. However, that's not really what we want: a module could intentionally be part of multiple HLTOs, in which case each should get its own copy of that module state (i.e., its globals).
This change allows the same module to be registered multiple times, with the HLTO linker scope distinguishing between the instances at runtime, as usual. To make that work, we move computation of the scope from compile time to runtime, using the library's absolute path as the scope.
-
GH-1905: Fix operator precedence in Spicy grammar.
We fixed the precedence of a number of operators to be closer to what users would expect from other language like C++ or Python.
- we reduced the precedence of the
in
operator - pre- and postfix operators
++
and--
now have same precedence and are right associative - unary negate was change to match the precedence of other unary operators.
- we reduced the precedence of the
-
Switch compilation to C++20.
Like Zeek Spicy now requires a C++ compiler. As part of this change we cleaned up the implementation to take advantage of C++ functionality in a number of places. We also moved from the external libraries
linb::any
tostd::any
, andghc::filesystem
tostd::filesystem
. -
Update supported platforms.
We dropped support for the following platforms:
- debian-11
- fedora-40
We added support for
- debian-13
- fedora-42
-
GH-1660: Render all bitfield instances with included field names.
-
GH-2099: Fully implement iterator interface for
set::Iterator
. -
GH-2052: Move calling convention from function to function type.
Bug fixes
- GH-2057: Fix
bytes
iterator dereference operation. - GH-2065: Error for redefined locals from statement inits.
- GH-2061: Fix cyclic usage of units types inside other types.
- GH-2074: Fix fiber abortion.
- GH-2063: Fix C++ compilation issue with weak->strong refs.
- GH-2064: Ensure generated typeinfos are declared before used.
- GH-2044: Catch if methods are implemented multiple times.
- GH-2078: Fix C++ output for constants of constant type.
- GH-1988: Enforce that block-local declarations must be variables.
- GH-1996: Catch exceptions in
processInput
gracefully. - GH-2091: Fix strong->value reference coercion in calls.
- GH-2100: Add missing deref operations for struct try-member/has-member operators.
- GH-2119: Fix missing
inline
functions in enum prototypes. - GH-2142, GH-2134: Complete information exposed for reflection in typeinfo.
- GH-2135: Add
&cxx-any-as-ptr
attribute.
Documentation
- GH-1905: Document operator precedence.
v1.11.6
Bug fixes
-
GH-2074: Fix fiber abortion.
When aborting a fiber, we need to activate it once more, to then leave it for good by raising an
AbortException. Problem was that that exception ended up being caught by user code because it was derived from
std::exception`. This change removes the base class so that the exception is guaranteed to go back to the managing fiber code, where we just ignore it. -
GH-2073: Prevent throwing naked exception when yielding from aborted fiber.
v1.13.2
Bug fixes
-
GH-2119: Fix missing inline functions in enum prototypes.
Our prototype generation could miss function bodies for
inline
functions. -
GH-2074: Fix fiber abortion.
When aborting a fiber, we need to activate it once more, to then leave it for good by raising an
AbortException. Problem was that that exception ended up being caught by user code because it was derived from
std::exception`. This change removes the base class so that the exception is guaranteed to go back to the managing fiber code, where we just ignore it.
v1.13.1
v1.11.5
v1.13.0
New Functionality
-
GH-1788: We now support decoding and encoding to UTF16, in particular
the newUTF16LE
andUTF16BE
charsets for little and big endian
encoding, respectively. -
GH-1961: We now support creating type values in Spicy code. The
primary use case for this is to pass type information to host
applications, and debugging.A type value is typically created from either
typeinfo(TYPE)
or
typinfo(value)
, or coercion from an existing ID of a custom type
likeglobal T: type = MyStruct;
. The resulting value can be printed,
or stored in a variable of typetype
, e.g.,
global bool_t: type = typeinfo(bool);
. -
GH-1971: Extend unit
switch
based on look-ahead to support blocks of
items.In 1.12.0 we added support grouping related unit fields in blocks;
there the primary use case wereif
blocks to group fields with
identical dependencies. We now also support such blocks inside unit
switch
constructs with lookahead so one can write the following
code:# Parses either `a` followed by another `a`, or `b`. type X = unit { switch { -> { : b"a"; : b"a"; } -> : b"b"; }; };
-
GH-1538: Implement compound statements (
{...}
). This allows
introducing local scopes, e.g., to group related code. -
GH-1946:
string
'sencode
method gained an optionalerrors
argument to influence error handling. The parameter defaults to
DecodeErrorStrategy::REPLACE
reproducing the previous implicit
behavior. -
GH-2010:
bytes
andstring
gainedends_with
methods -
GH-1965: Add support for case-insensitive matching to regular
expressions.By adding an
i
flag to a regular expression pattern, it will now be
matched case-insensitively (e.g./foobar/i
). -
GH-1962: Add
spicy-dump
option to enable profiling.
Changed Functionality
-
GH-1981, GH-1982, GH-1991: We now catch more user errors in defining
function overloads. Previously these would likely (hopefully) have
failed in C++ compilation down the line, but are now cleanly rejected. -
GH-1977: We now reject function overloads which only differ in their
return type. -
GH-1991: We now reject function prototypes without
&cxxname
.Since in Spicy global declarations can be in any order there is no
need to introduce a function with a prototype if it is declared later.
The only valid use case for function prototypes was if the function
was implemented in C++ and bound to the Spicy name with&cxxname
. -
We have cleaned up our implementation for runtime type information,
primarily intended for custom host applications.type_info::Value
instances obtained through runtime type
introspection can now be rendered to a user-facing representation
with a newto_string
method.- The runtime representation was changed to correctly encode that
tuple elements can remain unset. A Spicy-side tuple
tuple<T1, T2, T3>
now gets turned into
std::tuple<std::optional<T1>, std::optional<T2>, std::optional<T3>>
which captures the full semantics. - We added type information for types previously not exposed, namely
Null
,Nothing
andList
. We also fixed the exposed type
information forresult<void>
.
-
GH-2011: We have optimized allocations for unit fields extracting
vectors which should speed up extracting especially small and
medium-size vectors. -
GH-2035: We have dropped support for Ubuntu 20.04 (Focal Fossa) since
it has reached end of standard support upstream. -
GH-2026: Speed up matching of character classes in regexps
Bug fixes
- GH-1580: Catch when functions aren't called.
- GH-1961: Fix generated C++ prototype header.
- GH-1966: Reject anonymous units in variables and fields.
- GH-1967: Fix inactive stack size check during module initialization.
- GH-1968: Fix coercion of function call arguments.
- GH-1976: Fix unit
&max-size
not returning to proper loc. - GH-2007: Fix using
&try
with&max-size
, and potentially other
cases. - GH-2016: Fix
&size
expressions evaluating multiple times. - GH-2038: Prevent escape of non-HILTI exception in lower-level driver
functions. - GH-2047: Make sure
bytes::to[U]Int
returns runtime integers. - GH-2049: Add
#include <cstdint>
for fixed-width integers
Documentation
- GH-1155: Document iteration over maps/set/vectors.
- GH-1963: Document
assert-exception
. - GH-1964: Document use of
$$
inside&{while,until,until-including}
. - GH-1973: Remove documentation of unsupported
&nosub
. - GH-1974: Add documentation on how to interpret stack traces involving
fibers. - GH-1975: Fix possibly-incorrect custom host compile command
- GH-2039: Touchup docs style section.
- GH-1970, GH-2003: Fix minor typos in documentation.
v1.11.4
Bug fixes
- GH-2047: Make sure
bytes::to[U]Int
returns runtime integers. - GH-2049: Fix building with GCC15.
- GH-1999, GH-2004: Adjust build setup for cmake-4.
- GH-2038: Prevent escape of non-HILTI exception in lower-level driver functions.
- GH-1918: Fix potential segfault with stream iterators.
- GH-1871: Fix
&max-size
on unit containing aswitch
.
v1.12.0
New Functionality
-
We now support
if
around a block of unit items:type X = unit { x: uint8; if ( self.x == 1 ) { a1: bytes &size=2; a2: bytes &size=2; }; };
One can also add an
else
-block:type X = unit { x: uint8; if ( self.x == 1 ) { a1: bytes &size=2; a2: bytes &size=2; } else { b1: bytes &size=2; b2: bytes &size=2; }; };
-
We now support attaching an
%error
handler to an individual field:type Test = unit { a: b"A"; b: b"B" %error { print "field B %error", self; } c: b"C"; };
With input
AxC
, that handler will trigger, whereas withABx
it won't. If the unit had a unit-wide%error
handler as well, that one would trigger in both cases (i.e., forb
, in addition to its field local handler).The handler can also be provided separately from the field:
on b %error { ... }
In that separate version, one can receive the error message as well by declaring a corresponding string parameter:
on b(msg: string) %error { ... }
This works externally, from outside the unit, as well:
on Test::b(msg: string) %error { ... }
-
GH-1856: We added support for specifying a dedicated error message for
requires
failures.This now allows creating custom error messages when a
&require
condition fails. Example:type Foo = unit { x: uint8 &requires=($$ == 1 : error"Deep trouble!'"); # or, shorter: y: uint8 &requires=($$ == 1 : "Deep trouble!'"); };
This is powered by a new condition test expression
COND : ERROR
. -
We reworked C++ code generation so now many parsers should compile faster. This is accomplished by both improved dependency tracking when emitting C++ code for a module as well as by a couple of new peephole optimization passes which additionally reduced the emitted code.
Changed Functionality
- Add
CMAKE_CXX_FLAGS
toHILTI_CONFIG_RUNTIME_LD_FLAGS
. - Speed up compilation of many parsers by streamlining generated C++ code.
- Add
starts_with
split
,split1
,lower
andupper
methods tostring
. - GH-1874: Add new library function
spicy::bytes_to_mac
. - Optimize
spicy::bytes_to_hexstring
andspicy::bytes_to_mac
. - Improve validation of attributes so incompatible or invalid attributes should be rejected more reliably.
- Optimize parsing for
bytes
of fixed size as well as literals. - Add a couple of peephole optimizations to reduce emitted C++ code.
- GH-1790: Provide proper error message when trying access an unknown unit field.
- GH-1792: Prioritize error message reporting unknown field.
- GH-1803: Fix namespacing of
hilti
IDs in Spicy-side diagnostic output. - GH-1895: Do no longer escape backslashes when printing strings or bytes.
- GH-1857: Support
&requires
for individual vector items. - GH-1859: Improve error message when a unit parameter is used as a field.
- GH-1898: Disallow attributes on "type aliases".
- GH-1938: Deprecate
&count
attribute.
Bug fixes
- GH-1815: Disallow expanding limited
View
's again withlimit
. - Fix
to_uint(ByteOrder)
for empty byte ranges. - Fix undefined shifts of 32bit integer in
toInt()
. - GH-1817: Prevent null ptr dereference when looking on nodes without
Scope
. - Fix use of move'd from variable.
- GH-1823: Don't qualify magic linker symbols with C++ namespace.
- Fix diagnostics seen when compiling with GCC.
- GH-1852: Fix
skip
with units. - GH-1832: Fail for vectors with bytes but no stop.
- GH-1860: Fix parsing for vectors of literals.
- GH-1847: Fix resynchronization issue with trimmed input.
- GH-1844: Fix nested look-ahead parsing.
- GH-1842: Fix when input redirection becomes visible.
- GH-1846: Fix bug with captures groups.
- GH-1875: Fix potential nullptr dereference when comparing streams.
- GH-1867: Fix infinite loops with recursive types.
- GH-1868: Associate source code locations with current fiber instead of current thread.
- GH-1871: Fix
&max-size
on unit containing aswitch
. - GH-1791: Fix usage of
&convert
with unit's requiring parameters. - GH-1858: Fix the literals parsers not following coercions.
- GH-1893: Encompass child node's location in parent.
- GH-1919: Validate that sets are sortable.
- GH-1918: Fix potential segfault with stream iterators.
- GH-1856: Disallow dereferencing a
result<void>
value. - Fix issue with type inference for
result
constructor.
Documentation
v1.11.3
Bug fixes
-
GH-1846: Fix bug with captures groups.
When extracting the data matching capture groups we'd take it from the beginning of the stream, not the beginning of the current view, even though the latter is what we are matching against.
-
Add missing trim after matching a regular expression.
-
GH-1875: Fix potential nullptr dereference when comparing streams.
Because we are operating on unsafe iterators, need to catch when one goes out of bounds.
-
GH-1842: Fix when input redirection becomes visible.
With
&parse-at/from
we were updating the internal state on our current position immediately, meaning they were visible already when evaluating other attributes on the same field afterwards, which is unexpected. -
GH-1844: Fix nested look-ahead parsing.
When parsing nested vectors all using look-ahead, we need to return control back to upper level when an inner look-ahead isn't found.
This may change the error message for "normal" look-ahead parsing (see test baseline), but the new one seems fine and potentially even better.
v1.11.2
Bug fixes
-
GH-1860: Fix parsing for vectors of literals.
This was broken in two ways:
- with the
(LITERAL)[]
syntax, the parser would not recognize literals using type constructors - with the syntax
LITERAL[]
, we'd try to store the parsed value into a vector
- with the
-
GH-1847: Fix resynchronization issue with trimmed input.
When input had been trimmed,
View::advanceToNextData
could end up returning a view starting ahead of the valid area. -
GH-1852: Fix
skip
with units.For unit parsing with
skip
, we would create a temporary instance but wouldn't properly initialize it, meaning for example that parameters weren't available. We now generally fully initialize any destination, even if temporary.