-
Notifications
You must be signed in to change notification settings - Fork 837
[bazel] WIP Initial bazelification #4544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
maru-ava
wants to merge
1,055
commits into
master
Choose a base branch
from
maru/bazel
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
+80,191
−0
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Adds a concurrency group to the attach static libs workflow https://docs.github.com/en/enterprise-cloud@latest/actions/reference/workflow-syntax-for-github-actions#example-using-concurrency-and-the-default-behavior . The attach static libs GitHub action pushes/pulls to a branch derived from the PR branch title to https://github.com/ava-labs/firewood-go-ethhash throughout the workflow. If multiple commits on the same branch concurrently attempt to push/pull the same branch they may interfere leading to unexpected results (pulled different code than it pushed) or git errors. This change ensures that the workflow started by commit 1 will be cancelled before starting the workflow for commit 2. This is intended to fix an issue observed in CI on ava-labs/firewood#1024 CI running for commit 0 https://github.com/ava-labs/firewood/actions/runs/16055519899/job/45308974261#step:3:8 succeeded, but the timestamps show it ran concurrently with the next commit, which failed to push https://github.com/ava-labs/firewood/actions/runs/16055531848/job/45308958694#step:9:66. <img width="1349" alt="Screenshot 2025-07-07 at 11 34 22" src="https://github.com/user-attachments/assets/1780572f-1b15-49c0-bbf3-b80d47b9a415" /> <img width="1346" alt="Screenshot 2025-07-07 at 11 37 49" src="https://github.com/user-attachments/assets/a36b4ccb-d27a-4920-92f6-fc2b9b7605ea" /> Subsequent workflow runs failed to check out the branch: https://github.com/ava-labs/firewood/actions/runs/16055543248/job/45309129819 <img width="1343" alt="Screenshot 2025-07-07 at 11 39 28" src="https://github.com/user-attachments/assets/e959eaf2-a198-4267-9df4-fb0f4e141c5b" /> Conflicting attempts to push to the same branch seems to be the most likely cause given the conflicting timestamps and attempts to force push/pull to the same branch name when multiple commits can run the same workflow in parallel.
Gauges weren't being reported via the homegrown HTTP endpoint we used for ffi, now they are. This is particularly important since we now have a few useful gauges in firewood. The surrounding HTTP structure might be going away, but the underlying `stats` call is likely to stay, so this enhancement is probably still useful. Also took the opportunity to reduce memory allocations here, and write directly to strings instead of converting back and forth between string/bytes/string with all the utf8 conversions. Histograms are still not supported, but we don't have any, and aren't likely to add any anytime soon.
I noticed in this action output an error from the incorrect firewood version: https://github.com/ava-labs/firewood/actions/runs/16127166800/job/45506937221#step:7:13
Upcoming changes make supporting this harder, and it's not necessary, as the size of a LinearAddress and the size of a pointer are the same, so it's more efficient to return the actual value rather than a pointer to the value.
Since github actions is his favorite programming language, let's keep him as a reviewer of those tickets. Everything else should be reviewed by the firewood team now.
This cleans up a lot of the remaining `#[allow]` attributes in the code base. Only one remains, `firewood-macros/tests/compile_pass/with_attributes.rs` because I did not want to alter the tests of the macro.
Introduces a new type, `MaybePersistedNode`, which is a node which is either a SharedNode or a LinearAddress. This new type will allow writes to occur while readers are still reading this revision using an `ArcSwap`. Note that this does not actually put any additional nodes in memory, it just sets things up so they could be in memory. Additional changes needed for this feature to be complete are: - [ ] BranchNode children can now be Node, LinearAddress or a MaybePersistedNode - [ ] When converting a `MutableProposal` to an `ImmutableProposal`, don't actually allocate space, just create `SharedNode`s from the `Node`s. - [ ] Remove `NodeStoreHeader` from `NodeStore`. The header should move into the `RevisionManager`. - [ ] Allocate new nodes from the recently-freed nodes from an expired revision, then from the freelist. This can be done by maintaining a local freelist from the nodes deleted in the expiring revision and falling back to the actual freelist. Deleted nodes that are not reused must still be pushed onto the freelist.
Nodes on the delete list might be unpersisted. This can't actually happen yet, but support for it is there. This is half of making branch children unpersisted. Reaping unpersisted nodes is easy, we just drop them. If they are persisted, they need to move to the freelist, as they did before.
Branch children might also not be persisted. This also splits up hash_helper which got too long for clippy. - [x] Roots may not be persisted (Part 1: #1041) - [x] BranchNode children can now be Node, LinearAddress or a MaybePersistedNode (Part 2: #1045 and this PR #1047) - [ ] When converting a `MutableProposal` to an `ImmutableProposal`, don't actually allocate space, just create `SharedNode`s from the `Node`s. - [ ] Remove `NodeStoreHeader` from `NodeStore`. The header should move into the `RevisionManager`. - [ ] Allocate new nodes from the recently-freed nodes from an expired revision, then from the freelist. This can be done by maintaining a local freelist from the nodes deleted in the expiring revision and falling back to the actual freelist. Deleted nodes that are not reused must still be pushed onto the freelist.
This PR enables Firewood to export metrics so that clients such as Coreth can aggregate those metrics into its own. ## How Splits the Firewood API into the following: ```go // Start recorder func StartMetrics() // Start recorder and exporter func StartMetricsWithExporter() // Gather latest metrics func GatherMetrics() (string, error) ``` `GatherMetrics()` is used by to create a custom Prometheus gatherer that can used by Coreth's Multigatherer. ## Testing Extended the existing metrics test to account for `Gatherer`.
Any error from Firewood is obviously a Firewood error, the user should be able to figure this out.
A nodestore and its related types evolved over time becoming more
complex. This change splits up the functionality into different areas to
make it easier to find specific parts of the nodestore logic.
Doc comments about the layout are extensive and are not repeated here.
Other than moving things around, some additional changes are:
- `UpdateError` was dead code. It was used in early development of
merkle's insert but is no longer used.
- Accessors for various members of NodeStoreHeader instead of
referencing them directly. This makes auditing their use a little
easier, and improves scoping and readability.
This is the first part of this diff: #1033
dump has to work with unpersisted nodes, as there are a bunch of tests that assume an immutable proposal can be dumped. This was useful anyway while debugging some tests in the next commit, and small enough to commit separately.
While looking into Taskfiles for `ffi`, I noticed that we're still using `golangci-lint v1` while AvalancheGo is already using `golangci-lint v2`. This PR upgrades the version of `golangci-lint` used in `ffi` to `v2`. To maintain synchrony between the Firewood and AvalancheGo linter rules, an additional CI job (`expected-golangci-yaml-diff`) ensures that the difference between the two `.golangci.yaml` files is equal to the contents of `golangci_yaml_expected_changes.txt`. ## What - Copied over current AvalancheGo `golangci.yml` file and removing AvalancheGo-specific rules. - Bumped version of `golangci-lint` action used in CI. - Added `expected-golangci-yaml-diff` CI job - Linted files ## How This Was Tested Ran `golangci-lint` locally + CI.
The way we used serde when serializing free areas but not other nodes was inconsistent and caused an issue were we were expecting data to be encoded one way but in actuality it was encoded another way. This commit removes the dependency on serde for serialization in the storage crate, specifically in the `FreeArea` serialization. This ensures that all serialization is done manually, which is consistent and avoids the issues we were facing with serde's encoding. Fixes: #831 #1057
Add functions to split leaked ranges into areas that can be added to free lists. This concludes this diff: #1033.
This PR adds checks for stored area misalignment. We only focus on areas that are not leaked (areas tracked in the trie or in the free list). As long as the areas not leaked are aligned, leaked areas will be aligned. When a misaligned area is identified, we return not only the address of this area but also who points to that area directly. If this area is allocated for a trie node, the root or the parent trie node will be returned. If this area is free, the free list head or the parent of this free area is returned. We return the parent because, if we were to fix the misalignment issue (by migrating to another area or remove it from the free list), the parent pointer needs to be updated accordingly. As a side effect, `FreeListIterator` will return the parent of the free list item. This design can help checker fix various issues it sees when traversing the free list but will add some overhead if it is used for other purposes.
It's cumbersome (and more prone to TOCTOU issues) for users to validate whether a file exists, and if it does, set a flag. This should be handled internally. The `truncate` config now more closely resembles it's actual meaning. If the file already exists, it will be wiped and re-created. There are unit tests in Rust and in Go for this. Close #1039
An UnpersistedIterator iterates through a NodeStore looking for unpersisted nodes. Unpersisted nodes are only of the type MaybeUnpersisted that are not yet persisted. Note that it's impossible to have a persisted parent for a MaybeUnpersistedNode, so it's relatively easy to find the unpersisted ones. There are tests that verify that the nodes are returned in the order that they will need persisting (that is, depth first). --------- Signed-off-by: Ron Kuris <[email protected]> Co-authored-by: Suyan Qu <[email protected]>
Branch nodes were already handling the overflow correctly, but leaf nodes were not. A failing test case was added to ensure this fix works. Previously: ```text ---- node::test::test_serialize_deserialize::leaf_node_obnoxiously_long_partial_path stdout ---- thread 'node::test::test_serialize_deserialize::leaf_node_obnoxiously_long_partial_path' panicked at storage/src/node/mod.rs:553:9: assertion `left == right` failed left: Leaf([Leaf 7 4 6 8 6 9 7 3 2 0 6 9 7 3 2 0 6 1 2 0 7 2 6 5 6 1 6 c 6 c 7 9 2 0 6 c 6 f 6 e 6 7 2 0 7 0 6 1 7 2 7 4 6 9 6 1 6 c 2 0 7 0 6 1 7 4 6 8 2 c 2 0 6 c 6 9 6 b 6 5 2 0 7 3 6 f 2 0 6 c 6 f 6 e 6 7 2 0 6 9 7 4 2 7 7 3 2 0 6 d 6 f 7 2 6 5 2 0 7 4 6 8 6 1 6 e 2 0 3 6 3 3 2 0 6 e 6 9 6 2 6 2 6 c 6 5 7 3 2 0 6 c 6 f 6 e 6 7 2 0 7 7 6 8 6 9 6 3 6 8 2 0 7 4 7 2 6 9 6 7 6 7 6 5 7 2 7 3 2 0 2 3 3 1 3 0 3 5 3 6 2 e 04050607]) right: Leaf([Leaf [invalid ca] 1 7 4 6 8 6 9 7 3 2 0 6 9 7 3 2 0 6 1 2 0 7 2 6 5 6 1 6 c 6 c 7 9 2 0 6 c 6 f 6 e 6 7 2 0 7 0 6 1 7 2 7 4 6 9 6 1 6 c 2 0 7 0 6 1 7 4 6 8 2 c 2 0 0c0609060b06]) note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace ``` fixes #1056 --------- Signed-off-by: Joachim Brandon LeBlanc <[email protected]>
This test inserts a bunch of keys and then verifies that the resulting hashes are the same if we delete them in the reverse order. However, the test is generating a duplicate key, which will not reproduce the original hash set. Changed the test to remove any duplicate keys found. It's okay if the data is the same. Now passes with the errant key from #1075 Also fixed the merkle dump generator, which I used to debug this. A bracket was missing so graphviz was failing to parse the file. Also shortened the hashes as they made the nodes look much bigger, and added a new line. See attached merkle dump for the new format.
While traversing the trie, we can optionally check if all hashes are valid.
I noticed I was not auto-added to reviews for the ffi and .github directories, and I discovered that this is because the wildcard rule was being ignored if a more specific rule matched. I have updated the `CODEOWNERS` file to ensure that Ron and I are added to all reviews.
The checker decides whether to enable ethhash at build time through feature flags. Exposing this feature flag when building `fwdctl`.
- [x] Successfully rebased on main (commit 03f12df) - [x] Created METRICS.md with comprehensive documentation of all 29 metrics - [x] Added note about one metrics instance per process - [x] Updated README.md to reference METRICS.md - [x] Added METRICS.md to license header exclusion list in .github/check-license-headers.yaml - [x] Clean single commit with only documentation changes - [x] No unrelated PR changes visible in diff <issue_title>Add comprehensive metrics documentation to README</issue_title> > ### Problem > There is no comprehensive document listing or describing the available metrics in Firewood. Users and developers cannot easily understand what metrics are available for monitoring or how to interpret them. > > ### Proposed Solution > Add a dedicated "Metrics" section to the top-level README.md that documents all available metrics. > > ### Content to Include > - List of all available metrics with their names > - Description of what each metric measures > - Usage examples for enabling and gathering metrics > - Information about metric labels and values > - Examples of how to interpret metrics for monitoring and debugging > > ### Current State > - Metrics are mentioned briefly in `ffi/README.md` > - The `firewood-macros/README.md` shows how metrics are instrumented > - No comprehensive listing exists of what metrics are actually available > > ### References > - See `storage/src/macros.rs` for metric macro implementations > - See `ffi/README.md` for existing metrics documentation > - See `firewood-macros/README.md` for metrics macro usage - Fixes ava-labs/firewood#1398 <!-- START COPILOT CODING AGENT SUFFIX --> <details> <summary>Original prompt</summary> > > ---- > > *This section details on the original issue you should resolve* > > <issue_title>Add comprehensive metrics documentation to README</issue_title> > <issue_description>### Problem > There is no comprehensive document listing or describing the available metrics in Firewood. Users and developers cannot easily understand what metrics are available for monitoring or how to interpret them. > > ### Proposed Solution > Add a dedicated "Metrics" section to the top-level README.md that documents all available metrics. > > ### Content to Include > - List of all available metrics with their names > - Description of what each metric measures > - Usage examples for enabling and gathering metrics > - Information about metric labels and values > - Examples of how to interpret metrics for monitoring and debugging > > ### Current State > - Metrics are mentioned briefly in `ffi/README.md` > - The `firewood-macros/README.md` shows how metrics are instrumented > - No comprehensive listing exists of what metrics are actually available > > ### References > - See `storage/src/macros.rs` for metric macro implementations > - See `ffi/README.md` for existing metrics documentation > - See `firewood-macros/README.md` for metrics macro usage</issue_description> > > ## Comments on the Issue (you are @copilot in this section) > > <comments> > </comments> > </details> - Fixes ava-labs/firewood#1398 <!-- START COPILOT CODING AGENT TIPS --> --- 💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more [Copilot coding agent tips](https://gh.io/copilot-coding-agent-tips) in the docs. --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: rkuris <[email protected]> Co-authored-by: demosdemon <[email protected]> Co-authored-by: Brandon LeBlanc <[email protected]>
I noticed there was one remaining component using arc-swap. This change replaces it with a standard Mutex. And, removes the dependency on arc-swap.
As discussed offline and in #1460, the `ffi-nix` job takes around 15 minutes to complete and runs on every PR. This PR changes it so that the `ffi-nix` job runs only on commits to `main`.
This PR adds `FjallStore`, an implementation of `RootStore` that allows for persistent storage of revisions (in contrast to `MockStore`, which keeps everything in-memory). The main changes of this PR are as follows: - Adds `FjallStore` implementation - Extends the `DbConfig` to take in a path to where `FjallStore` will be stored - Extends the FFI database config to enable archival support - Adds `FjallStore` tests in both Rust and Golang
The previous work to defer allocating nodes until commit time had the side effect of allowing nodes to exceed the maximum allowed size at hash time. This defers the error until commit time, which is not ideal. This change updates the test to confirm that the allocator still errors when trying to allocate a node that is too large. However, we would prefer to do this earlier but without serializing the node multiple times. As part of this change, I created a helper type that the test can use to verify the exact error was returned instead of comparing against the string representation of the error. As a part of that, I discovered that there was some vestiges of old code where the allocator would potentially return an area larger than necessary when allocating from the free list. However, that functionality was removed in #1217; thus, I have simplified the remaining code to reflect that which also simplified my `giant_node` test changes. Similarly, the removed wasted metric was no longer relevant because we no longer select free nodes larger than needed. I added `DisplayTruncatedHex` to the `Debug` representations of the leaf and branch nodes to truncate the node values when printing debug info. This was necessary before fixing the `giant_node` test because the debug output included the full contents of the node, which would flood the terminal when the test failed and cause it to hang. Closes: #1054
Upgrading dependencies included `[email protected]` which pins its dependency on `generic-array` to `0.14.7`, removing the deprecation warnings we were seeing that were added to `generic-array` in version `0.14.9` (where `0.14.8` was yanked).
This doesn't totally complete #1113 since I didn't touch the Proof or Iterator files. However, this does clarify a lot of the use cases, and what is valid for concurrent operations. The best way to review this is open the godocs and reading through it, commenting on what doesn't seem clear. To reproduce the godocs, run: ``` cd ./ffi godoc -http=:<PORT_NUM> ``` Please leave nit-picky comments! It would be great if any reader of the docs would know how to use the code. Closes #1348, #1413
This PR closes #1030 by migrating from the standard library's `Mutex` and `RwLock` implementations to `parking_lot` equivalents throughout the codebase.
With `FjallStore` now merged in, I believe there's not a strong reason to keep around `MockStore`. This PR removes `MockStore` from Firewood. This PR also includes cleaning up the `FjallStore` tests by adding a helper for constructing instances of `Db` with `FjallStore`: 9b0d009.
IMO, the `reopen()` method of `TestDb` is a misnomer - while this method does not truncate the existing Firewood file, it uses the default configuration for the new `TestDb` instance, ignoring the prior instance's configuration. As a consumer, I would expect `reopen()` to maintain the existing configuration. This PR does the following: - Unifies `reopen()` and `reopen_with_config()` into a single method - Stores the `dbconfig` in `testDb` to allow the database configuration to persist across reopens - Adds documentation to both `reopen()` and `replace()` - Removes `into_root_store()` test helpers since `RootStore` reopens are now handled via the database configuration This PR is a precursor to #1481.
…anager` (#1482) In preparation of moving in-memory revisions to their own data structure (#1470), this PR moves creation of `RootStore` into the `RevisionManager` - this move will allow us to inject the in-memory revisions data structure into `RootStore` during creation. This PR is a precursor to #1480.
While taking a stab at #1483, I noticed after removing `NoOpStore` in favor of `Option<T>`, there's actually no need for the `Box` pointer since, if we have `Some`, it'll always be an instance of `FjallStore`. This PR does the following: - Removes the usage of `Box` pointers for `RootStore` - Removes the `RootStore` trait and renames `FjallStore => RootStore` - Updates code to use new `RootStore` naming and logic
93843f3 to
72eae69
Compare
Base automatically changed from
alarso16/migrate-coreth-6-lint
to
maru/migrate-coreth-subtree-merge
November 25, 2025 19:15
#1488 found a panic during serialization of a node, indicating that a child node was either not hashed or not allocated and therefore could not serialize the parent node. After a walk through of the code, I noticed that after #1389, we began serializing nodes in batches in order to prevent corrupting the free-list after a failed write and introduced a case where this could occur. Previously, we would serialize a node, write it to storage, and then mark it as allocated in one step. Now we serialize and allocate the node and add the node to a batch before moving onto the next node. Like before, we only mark the node as allocated after writing it to disk. This means that if the batch contained child and parent, the parent node fails to serialize because it cannot reference the address. This change refactors the allocation logic to mark nodes as allocated during batching. This ensures that if its parent is also in the batch, it can successfully serialize. This linear address itself is not read from by viewers of the `Allocated` state because the node is not considered committed until the entire write has completed (i.e., after the root and header are written). This means this change does not introduce any early reads of uncommitted data. In the course of writing tests, I discovered that the arena allocator returns the number of bytes by the arena, not the number of bytes occupied by items within the arena. This meant that the batching logic should have always generated a batch of 1 element. To fix this, I track the allocated length of each area. This successfully triggered the bug I discovered; however, this also means the bug might not be the source of the original panic.
The upstream `.golangci.yaml` was updated to relocate the linters block. This commit updates our `.golangci.yaml` to do the same and updates the patch file to reflect this change. Upstream: <#4535>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why this should be merged
How this works
How this was tested
Need to be documented in RELEASES.md?