This document provides guidelines and instructions for developers working on the Hammer parsing library.
Tested on Ubuntu 22.04 and 24.04 (VM and WSL).
Install formatting tools:
sudo apt install -y clang-formatUse clang-format to keep C code consistent.
Examples:
- Format all C sources and headers:
clang-format -i **/*.c **/*.h- Format a single file:
clang-format -i path/to/file.cThe repository centralizes the semantic version in the VERSION file. Update that file to bump the project version. All downstream artifacts read from it:
VERSION: semantic version (for example:1.0.0)- GitHub tags:
v{VERSION}(for example:v1.0.0) - Debian packages:
{VERSION}-1 - Shared library:
libhammer.so.{VERSION} pkg-configentries read the same version
Install coverage tools:
sudo apt install lcov xdg-utilsGenerate coverage and open the HTML report:
scons -c --variant=debug && scons --coverage --variant=debug test && mkdir -p coverage && lcov --directory build/debug/src --zerocounters && lcov --ignore-errors gcov --capture --initial --directory build/debug/src --output-file coverage/base.info && scons --coverage --variant=debug test && lcov --ignore-errors gcov --capture --directory build/debug/src --output-file coverage/test.info && lcov --add-tracefile coverage/base.info --add-tracefile coverage/test.info --output-file coverage/coverage.info && genhtml coverage/coverage.info --output-directory coverage/html && xdg-open coverage/html/index.htmlNotes:
- On WSL, replace
xdg-openwithwslview(from thewslupackage). - Coverage and object files (
.gcov,.gcno,.gcda,.o) are generated underbuild/debug/orbuild/opt/. - To generate
.gcovfiles manually:scons --coverage --variant=debug gcov.
Install the required tools:
sudo apt install swig default-jdk libgtest-dev
pip install setuptoolsBuild and test all language bindings:
scons bindings=all testTo target a specific binding, pass it individually and use its alias (testpython, testjava, or testcpp):
scons bindings=python testpython
scons bindings=java testjava
scons bindings=cpp testcppIf JAVA_HOME is not set, the build locates javac via PATH. To use a specific JDK:
JAVA_HOME=/usr/lib/jvm/java-21-openjdk-amd64 scons bindings=javaInstall ruff:
sudo apt install pipx
pipx install ruffLint all Python and SCons files with:
ruff check $(find . -path ./build -prune -o \( -name "*.py" -o -name "SConstruct" -o -name "SConscript" \) -print)Notes:
ruffconfiguration lives inruff.toml.
Install Doxygen:
sudo apt install doxygenBuild the documentation:
doxygen DoxyfileOutput will be in docs/html/. Open the main page:
xdg-open docs/html/index.htmlFor WSL, use wslview instead of xdg-open.
- Make
h_actionfunctions be called only after parse is complete. - Allow alternative input streams (eg, zlib, base64)
- Bonus points if layered...
- Add consistency check to the bitreader
- We should support the use of parse-table-based parse methods; add a parse_compile method that must be called before the newly-created parser is used.
- Implement datastructure linearization func
- Implement free func for parsers
As a matter of convenience, there are several identifiers that internal anaphoric macros use. Chances are that if you use these names for other things, you're gonna have a bad time.
In particular, these names, and the macros that use them, are:
state: Used bya_newand company. Should be anHParseState*.mm__: Used byh_newandh_free. Should be anHAllocator*.stk__: Used in desugaring. Should be anHCFStack*.
Many functions come in several variants, to handle receiving optional parameters or parameters in multiple different forms. For example, often, you have a global memory manager that is used for an entire program. In this case, you can leave off the memory manager arguments off, letting them be implicit instead. Further, it is often convenient to pass an array or va_list to a function instead of listing the arguments inline (e.g., for wrapping a function, generating the arguments programatically, or writing bindings for another language.)
Because we have found that most variants fall into a fairly small set of forms, and to minimize the amount of API calls that users need to remember, there is a consistent naming scheme for these function variants: the function name is followed by two underscores and a set of single-character "flags" indicating what optional features that particular variant has (in alphabetical order, of course):
__a: takes variadic arguments as avoid*[](not implemented yet, but will be soon.)__m: takes a memory manager as the first argument, to override the system memory manager.__v: Takes the variadic argument list as ava_list.
If the __m function variants are used or system_allocator is overridden, there come some difficult questions to answer, particularly regarding the behavior when multiple memory managers are combined. As a general rule of thumb (exceptions will be explicitly documented), assume that
If you have a function f, which is passed a memory manager m and returns a value r, any function that uses r as a parameter must also be told to use m as a memory manager.
In other words, don't let the (memory manager) streams cross.
Regarding parse_result_t:
- If a parse fails, the parse_result_t will be NULL.
- If a parse is successful but there's nothing there (i.e., if end_p succeeds), then there's a parse_result_t but its ast is NULL.
Regarding input location:
- If parse is successful, input is left at beginning of next thing to be read.
- If parse fails, location is UNPREDICTABLE.
If CONSISTENCY_CHECK is defined, enable a bunch of additional internal consistency checks.
Regarding butnot and difference:
There's a "do what I say, not what I do" variation in how we implemented these (versus how jsparse did it). His butnot succeeds if p1 and p2 both match and p1's result is longer than p2's, though the comments say it should succeed if p2's result is longer than p1's. Also, his difference succeeds if p1 and p2 both match, full stop, returning the result of p2 if p2's result is shorter than p1's or the result of p1 otherwise, though the comments say it should succeed if p2's result is shorter than p1's. Whatever; we're doing what the comments say.