Skip to content

Conversation

atuchin-m
Copy link
Collaborator

@atuchin-m atuchin-m commented Aug 25, 2025

The PR introduces new structures to store cosmetic filters in flatbuffer.

  • the algorithms to sort and apply rules shouldn't be touched, only storage level is changed.
  • CosmeticFilterCache is now a view for a flatbuffer data.
  • Old storage layer (via serde) is removed, the version is now stored in the flatbuffer,
  • Another container 'FlatMap' is introduced + tests
  • Most host-specific rules are stored in a single FlatMap (domain_hash => HostnameSpecificRules). Although, the most common rule kinds are stored as a dedicated multi-maps to save memory
  • a code to build flatbuffer structure is moved to dedicated files.

Perf impact:

  • improve memory usage by 43%
  • improves loading .dat file 3x times
  • a non-important rule-match-first-request/brave-list regression, the absolute number ~1ms is still totally fine for the first request.
  • a potentially bad regression with cosmetic-class-id-match/brave-list.A new code uses O(log N) binany search instead of O(1) HashSomething lookup. For large number of selectors (~1000) that can make sense. We're going to improve this in a dedicated PR.

@atuchin-m atuchin-m self-assigned this Aug 25, 2025
@atuchin-m atuchin-m requested a review from antonok-edm as a code owner August 25, 2025 13:36
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust Benchmark

Benchmark suite Current: 4ce4fb0 Previous: eb5c30c Ratio
rule-match-browserlike/brave-list 2245830384 ns/iter (± 17985345) 2278858883 ns/iter (± 38485237) 0.99
rule-match-first-request/brave-list 1135373 ns/iter (± 12516) 1019219 ns/iter (± 12255) 1.11
blocker_new/brave-list 147912749 ns/iter (± 466527) 162380552 ns/iter (± 5943396) 0.91
blocker_new/brave-list-deserialize 19912584 ns/iter (± 78113) 65009694 ns/iter (± 2409298) 0.31
memory-usage/brave-list-initial 10072443 ns/iter (± 3) 17549140 ns/iter (± 3) 0.57
memory-usage/brave-list-initial/max 64817658 ns/iter (± 3) 64817658 ns/iter (± 3) 1
memory-usage/brave-list-initial/alloc-count 1534721 ns/iter (± 3) 1515544 ns/iter (± 3) 1.01
memory-usage/brave-list-1000-requests 2516487 ns/iter (± 3) 2505576 ns/iter (± 3) 1.00
memory-usage/brave-list-1000-requests/alloc-count 66641 ns/iter (± 3) 66123 ns/iter (± 3) 1.01
url_cosmetic_resources/brave-list 199167 ns/iter (± 1533) 213731 ns/iter (± 2730) 0.93
cosmetic-class-id-match/brave-list 15628013 ns/iter (± 4624104) 4346725 ns/iter (± 1205995) 3.60

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Rust Benchmark'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.10.

Benchmark suite Current: 4ce4fb0 Previous: eb5c30c Ratio
rule-match-first-request/brave-list 1135373 ns/iter (± 12516) 1019219 ns/iter (± 12255) 1.11
cosmetic-class-id-match/brave-list 15628013 ns/iter (± 4624104) 4346725 ns/iter (± 1205995) 3.60

This comment was automatically generated by workflow using github-action-benchmark.

@atuchin-m atuchin-m requested a review from boocmp August 26, 2025 20:07
@@ -191,34 +166,40 @@ impl CosmeticFilterCache {
) -> Vec<String> {
let mut selectors = vec![];

let cs = self.filter_data_context.memory.root().cosmetic_filters();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cf ?, or even cosmetic_filters

Copy link
Collaborator

@antonok-edm antonok-edm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(partial review, will look over the rest soon)

let filter_data_context = FilterDataContext::new(memory);
Self::from_context(filter_data_context)
let mut filter_set = FilterSet::new(true);
filter_set.network_filters = network_filters;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit odd to have Blocker::new() build a FilterSet, then manually move network filters into the set, then build an Engine, and finally make the blocker from the engine. Understood this is test only, but perhaps we should consider a different constructor method for the tests.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that test ideally these test contractors should be removed in favor of building the engine directly (or another high level test method).
Although, preserving them allow to reduce the diff in the PR (we don't need to rewrite a lot of tests). So I suggest to change it in another PR.

P.S. We can avoid using Engine completely because it's a flatbuffer root table. And we need to get serialized flatbuffer data.

Comment on lines -16 to -19
/// Newer formats start with this magic byte sequence.
/// Calculated as the leading 4 bytes of `echo -n 'brave/adblock-rust' | sha512sum`.
const ADBLOCK_RUST_DAT_MAGIC: [u8; 4] = [0xd1, 0xd9, 0x3a, 0xaf];
const ADBLOCK_RUST_DAT_VERSION: u8 = 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we keep the byte sequence and version at the start of the serialized file? Brave iOS is currently using it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use that magic somewhere outside adblock-rust to identify the file?

Now the while file content is a flatbuffer and we verify the format before file using, so it looks like we don't need a extra magic.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the iOS client is updated to use adblock-rust v0.11.x, it will read the older file and try to interpret it as a flatbuffer rather than looking at the first bytes and knowing to exit immediately because of a version mismatch.

If we're lucky that file will have a deserialization error, but in the worst case it may parse and be full of junk data. Keeping the separate format header will guarantee future clients are always able to early-exit even if we ever switch to something other than flatbuffers later on.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verifying a flat buffer is quick and the migration happens once a few months. I don't think it makes sense to save some CPU cycles here.

As for correctness: theoretically it's possible to have a broken file that will be interpreted as valid flatbuffer. The same thing for the old .dat: a partially written file will pass magic header verification.

I would add some kind of CRC sum instead (to be sure that the file is consistent).
If you want to preserve magic number: let's keep it, but as the part of a flatbuffer.

@@ -51,16 +57,18 @@ pub struct Engine {
filter_data_context: FilterDataContextRef,
}

const ADBLOCK_FLATBUFFER_VERSION: u32 = 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be incremented

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it a new format, we can start with 1 here.

@@ -29,17 +29,91 @@ table NetworkFilterList {
filter_map_values: [NetworkFilter] (required);
}

// A table to store the most host-specific cosmetic rules.
// Although, the most common kind of rule (see hostname_inject_script_*
// and hostname_hide_*) are stored separately to save memory.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting the /// doc comments are applied inconsistently

#[cfg(test)]
use crate::filters::cosmetic::CosmeticFilter;
use crate::filters::cosmetic::{CosmeticFilterAction, CosmeticFilterOperator};
use crate::filters::fb_network::FilterDataContextRef;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably worth renaming or moving out of fb_network now that it's used by cosmetic filters

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, it makes sense to move FilterDataContext*

use crate::cosmetic_filter_utils::SpecificFilterType;
use crate::cosmetic_filter_utils::{encode_script_with_permission, key_from_selector};
use crate::filters::cosmetic::{CosmeticFilter, CosmeticFilterMask, CosmeticFilterOperator};
use crate::filters::fb_network::flat::fb;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another fb_network used by cosmetic filters

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants