Skip to content

add support for incremental snapshots#103

Open
Crypt-iQ wants to merge 6 commits intodergoegge:masterfrom
Crypt-iQ:01082026/tmp_snapshot_copy
Open

add support for incremental snapshots#103
Crypt-iQ wants to merge 6 commits intodergoegge:masterfrom
Crypt-iQ:01082026/tmp_snapshot_copy

Conversation

@Crypt-iQ
Copy link
Contributor

@Crypt-iQ Crypt-iQ commented Jan 9, 2026

Posting mainly to get high-level feedback on the design before I continue any further.

Basically, an IncrementalSnapshotStage is created that wraps the mutation stage and will insert an Operation::IncrementalSnapshot into the program (not persisted to the corpus). Mutators are then aware not to mutate before the snapshot point. It then runs for 50 iterations with the incremental snapshot.

It is pretty slow which is something I want to look into. I can clean things up a lot (i.e. not making it default to incremental snapshots). I think some more intelligent snapshot placement could be used as well (not placing the snapshot operation so early in the program, only placing it after certain messages / operations, etc.). Another thing that is wonky is that sometimes the input gets evicted to disk since we're using a CachedOnDiskCorpus, which means occasionally we need to reload the input.

@Crypt-iQ Crypt-iQ marked this pull request as draft January 9, 2026 23:16
@Crypt-iQ Crypt-iQ force-pushed the 01082026/tmp_snapshot_copy branch from a5227b4 to 73d4e3d Compare January 9, 2026 23:18
@Crypt-iQ Crypt-iQ force-pushed the 01082026/tmp_snapshot_copy branch from 73d4e3d to 17fb0e0 Compare January 14, 2026 20:54
@tokatoka
Copy link
Contributor

This is more for a discussion.

If I understand correctly, what we do now is like this:
After we take a snapshot, we hold IncrementalSnapshotMetadata and there we manage

  • the corpus ID relevant to the snapshot
  • the prefix that is frozen in the snapshot
  • the number of times we used this snapshot
    And the next time we enter the IncrementalSnapshotStage we restore the corpus ID again, and return to return to fuzz the corpus entry using snapshot that we saved earlier in the IncrementalSnapshotMetadata.

I wonder what would be the reason that we don't do like this in the IncrementalSnapshotStage

  1. Always take a tmp snapshot at the start of the stage.
  2. then make the inner_stage use them later on.
  3. At the end of the stage discard this snapshot.

This way it would look more simple to manage.

Is it because we don't want to discard the tmp snapshot too early?

@Crypt-iQ
Copy link
Contributor Author

Crypt-iQ commented Jan 16, 2026

This is more for a discussion.

If I understand correctly, what we do now is like this: After we take a snapshot, we hold IncrementalSnapshotMetadata and there we manage

* the corpus ID relevant to the snapshot

* the prefix that is frozen in the snapshot

* the number of times we used this snapshot
  And the next time we enter the IncrementalSnapshotStage we restore the corpus ID again, and return to return to fuzz the corpus entry using snapshot that we saved earlier in the  `IncrementalSnapshotMetadata`.

Yup, that's right.

I wonder what would be the reason that we don't do like this in the IncrementalSnapshotStage

1. Always take a tmp snapshot at the start of the stage.

2. then make the inner_stage use them later on.

3. At the end of the stage discard this snapshot.

This way it would look more simple to manage.

Is it because we don't want to discard the tmp snapshot too early?

I think this works if IncrementalSnapshotStage calls inner_stage.perform N times instead of 1 like it does now. I think this would also allow using the probe metadata instead of ignoring it when a tmp snapshot exists and lets me get rid of the logic that overrides whatever the scheduler chose.

I also think IncrementalSnapshotMetadata should be kept so that the mutators in inner_stage know frozen_prefix_len, though it's possible to instead pass a variable instead of using metadata.

@Crypt-iQ Crypt-iQ force-pushed the 01082026/tmp_snapshot_copy branch 2 times, most recently from 2f3862a to 0444d65 Compare January 29, 2026 15:55
@Crypt-iQ Crypt-iQ changed the title add support for tmp snapshots add support for incremental snapshots Jan 29, 2026
@Crypt-iQ Crypt-iQ force-pushed the 01082026/tmp_snapshot_copy branch 2 times, most recently from a29f111 to 2548f1c Compare January 29, 2026 16:36
@Crypt-iQ Crypt-iQ marked this pull request as ready for review January 29, 2026 17:07
@Crypt-iQ
Copy link
Contributor Author

Crypt-iQ commented Jan 29, 2026

Implemented @tokatoka suggestion so that IncrementalSnapshotStage runs the snapshot iterations inside of it and then discards at the end. This also lets us get rid of IncrementalSnapshotMetadata which is nice. Something to note is that the probing and stability stages will work on the initial input used in IncrementalSnapshotStage and not any of the mutated versions of this input. I wasn't really sure how to address that.

There is a memory leak which I'm investigating. I'm also going to come up with a benchmarking plan because in theory this should give better and quicker coverage than running without incremental snapshots. One TODO is incremental snapshots will be behind a flag so that it's not always enabled. Lastly, there is a commit here Crypt-iQ@1567d36 that can be used to increase the instruction limit when testing.

@Crypt-iQ Crypt-iQ force-pushed the 01082026/tmp_snapshot_copy branch from 2548f1c to 2ecef2d Compare February 11, 2026 13:55
…d_next

Also change the scenarios to accept runners
…incremental snapshots

The IncrementalSnapshotStage is configured to run with an incremental
snapshot for a configurable number of iterations. The mutators have
awareness of the snapshot position.
This should prevent some cache thrashing when inputs are evicted to disk.
@Crypt-iQ Crypt-iQ force-pushed the 01082026/tmp_snapshot_copy branch from 2ecef2d to 10cad9c Compare February 12, 2026 21:59
@Crypt-iQ
Copy link
Contributor Author

The results from benchmarking incremental snapshots to master were underwhelming. I ran the two branches on a clean machine each with 6 cores for 2 hours with the instruction count increased 10x to 40960. I then parsed the stdout of each to create the below graphs. I also compared the coverage with a script similar to deterministic-fuzz-coverage to see exactly where the branches differed in coverage.

fuzzamoto-comparison-r8

The results in the run that produced the graph were pretty consistent across several different benchmarks where I varied:

  • where I was placing the snapshot
  • amount of time (this run was the longest at 2 hours, most were ~1 hour)
  • if I was placing the snapshot at all (depending on how many instructions were present)
  • changing the number of iterations the snapshot is used for

In the vast majority of them, incremental snapshots had less coverage (typically in txgraph.cpp or in script/interpreter.cpp), a smaller corpus size, and were always faster. The stability measurement usually varied across the benchmarks so I can't make any meaningful conclusion about that. Also, I think something happened in this particular run with the stability for the master branch, so I've decided to ignore it since most of the time the master branch had ~same stability.

Going into this, I expected incremental snapshots to have less coverage overall, but "deeper" coverage in some areas. There are some "quirky" edges that are hit (e.g. being unable to fetch a value from the index in index/blockfilterindex.cpp, having a too-large v3 child in policy/truc_policy.cpp), but I can't really say that it's hitting "deeper" branches. I think running the benchmarks for much longer (~1 week) would give me a better picture, but I use the machine for other things sometimes which I found pretty quickly ruins the benchmark. The other thing that I think would really help is having intelligent snapshot placement because right now the placement doesn't bias towards anything, it just takes the snapshot wherever (regardless of whether the input is even interesting!). If this instead biased towards taking the snapshot after long sequences of blocks, long sequences of txns, or because of custom feedback (like sometimes assertions or maybe even using things like reacting to the bitcoind logs), I think that could make this branch more effective.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants