Use the one flatbuffer to store all lists#489
Conversation
| pub(crate) fn filter_list(&self) -> fb::NetworkFilterList<'_> { | ||
| unsafe { fb::root_as_network_filter_list_unchecked(self.data()) } | ||
| pub(crate) fn root(&self) -> fb::Engine<'_> { | ||
| unsafe { fb::root_as_engine_unchecked(self.data()) } |
There was a problem hiding this comment.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
| flatbuffers::Vector<'a, flatbuffers::ForwardsUOffset<NetworkFilterList>>, | ||
| >>(Engine::VT_LISTS, None) | ||
| .unwrap() | ||
| } |
There was a problem hiding this comment.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
| None, | ||
| ) | ||
| .unwrap() | ||
| } |
There was a problem hiding this comment.
reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage
Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage
Cc @thypon @kdenhartog
There was a problem hiding this comment.
Rust Benchmark
Details
| Benchmark suite | Current: 0bb327f | Previous: 4738d3f | Ratio |
|---|---|---|---|
rule-match-browserlike/brave-list |
2268568240 ns/iter (± 20502033) |
2252126465 ns/iter (± 13369021) |
1.01 |
rule-match-first-request/brave-list |
1006882 ns/iter (± 6061) |
1000284 ns/iter (± 8364) |
1.01 |
blocker_new/brave-list |
150003954 ns/iter (± 1107279) |
150674514 ns/iter (± 1975624) |
1.00 |
blocker_new/brave-list-deserialize |
63987710 ns/iter (± 1362897) |
62360204 ns/iter (± 1865060) |
1.03 |
memory-usage/brave-list-initial |
16282069 ns/iter (± 3) |
16225933 ns/iter (± 3) |
1.00 |
memory-usage/brave-list-initial/max |
64817658 ns/iter (± 3) |
64817658 ns/iter (± 3) |
1 |
memory-usage/brave-list-initial/alloc-count |
1514486 ns/iter (± 3) |
1514650 ns/iter (± 3) |
1.00 |
memory-usage/brave-list-1000-requests |
2516487 ns/iter (± 3) |
2505592 ns/iter (± 3) |
1.00 |
memory-usage/brave-list-1000-requests/alloc-count |
66572 ns/iter (± 3) |
66070 ns/iter (± 3) |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
f8e5204 to
27de614
Compare
| // TODO: do we need another feature for this? | ||
| #[cfg(feature = "unsync-regex-caching")] | ||
| pub(crate) type SharedStateRef = std::rc::Rc<SharedState>; | ||
| #[cfg(not(feature = "unsync-regex-caching"))] | ||
| pub(crate) type SharedStateRef = std::rc::Arc<SharedState>; |
There was a problem hiding this comment.
Same feature for both is fine, but it would be nice to rename it since it'd no longer be strictly about regex caching
There was a problem hiding this comment.
On that note though, I don't think we actually need any refcounted pointers for this? We could pass the &SharedState in as an argument to whatever functions require it, or give Blocker a <'a> lifetime so it can own a shared_state: &'a SharedState field
There was a problem hiding this comment.
That sounds logical, but the issue is that Engine (the primary owner of shared_state) also own a Blocker instance (that also stores shared_state).
A class member cannot use the same lifetime as another class member without stuff like https://docs.rs/ouroboros/latest/ouroboros/attr.self_referencing.html
Without using Rc Blocker will became Blocker<'a> with a limited lifetime and can't be stored as a member of Engine, which results in major changes in the codebase.
There was a problem hiding this comment.
the feature renamed to single-thread
There was a problem hiding this comment.
My thinking is it could be organized like this minimal example:
type FilterDataContextRef = [u8; 32];
struct Engine {
filter_data_ref: FilterDataContextRef,
}
impl Engine {
fn new() -> Self {
let filter_data_ref = [0u8; 32];
Self {
filter_data_ref,
}
}
fn blocker<'a>(&'a self) -> Blocker<'a> {
Blocker {
filter_data_ref: &self.filter_data_ref,
}
}
fn check_network_request(&self) -> bool {
self.blocker().check()
}
}
struct Blocker<'a> {
filter_data_ref: &'a FilterDataContextRef,
}
impl<'a> Blocker<'a> {
fn check(&self) -> bool {
true
}
}Key point is Engine owns all the data and has the blocker() method to produce a scoped convenience struct that can be used for the current network blocking methods without moving too much code around.
The only other fields on Blocker are:
regex_manager, which also makes sense to keep on the top-levelEngineso that it may be used for cosmetic filtering as needed (some of uBO's newer syntax features could benefit from it)tags_enabled, which I don't have strong preferences about since I'd like to get rid of it anyways
| builder.add_filter(filter, list_id as u32); | ||
| } | ||
|
|
||
| builder.finish(if optimize { |
There was a problem hiding this comment.
optimize && id != FilterId::RemoveParam as u32 ?
There was a problem hiding this comment.
it was my initial attempt, but we can't use lambas with a capture as fn (like fn(u32) -> bool))
There was a problem hiding this comment.
see my other comment, but for future reference: https://doc.rust-lang.org/std/keyword.move.html
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Rust Benchmark'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.10.
| Benchmark suite | Current: 0bb327f | Previous: 4738d3f | Ratio |
|---|---|---|---|
blocker_new/brave-list-deserialize |
69734003 ns/iter (± 1109450) |
62360204 ns/iter (± 1865060) |
1.12 |
This comment was automatically generated by workflow using github-action-benchmark.
| @@ -0,0 +1,337 @@ | |||
| //! Builder for creating flatbuffer with serialized engine. | |||
There was a problem hiding this comment.
May be rename it to fb_builder.rs ?
There was a problem hiding this comment.
I'm asking because we have fb_network.rs and flat_filter_map.rs and I suppose that fb for flatbuffers and flat for flat containers
There was a problem hiding this comment.
yes, sounds good to me
antonok-edm
left a comment
There was a problem hiding this comment.
one more general thing - could we do these in a 0.11.x branch rather than in master? That way it's a little bit easier to maintain the history of breaking changes vs patch releases
| builder.finish(if optimize { | ||
| // Don't optimize removeparam, since it can fuse filters without respecting distinct | ||
| |id: u32| id != FilterId::RemoveParam as u32 | ||
| } else { | ||
| |_| false | ||
| }) |
There was a problem hiding this comment.
what's the reason to pass this as a lambda rather than moving the check inside finish()?
There was a problem hiding this comment.
Heh, the initial idea was to make 2 layers: for the storage and for the logic how and what.
But in fact that parts of code is strongly connected to each other.
Changed to just bool.
| builder.add_filter(filter, list_id as u32); | ||
| } | ||
|
|
||
| builder.finish(if optimize { |
There was a problem hiding this comment.
see my other comment, but for future reference: https://doc.rust-lang.org/std/keyword.move.html
| // Reconstruct the unique_domains_hashes_map from the flatbuffer data | ||
| let root = memory.root(); | ||
| let mut unique_domains_hashes_map: HashMap<crate::utils::Hash, u32> = HashMap::new(); | ||
| for (index, hash) in root.unique_domains_hashes().iter().enumerate() { | ||
| unique_domains_hashes_map.insert(hash, index as u32); | ||
| } | ||
| FilterDataContextRef::new(Self { | ||
| memory, | ||
| unique_domains_hashes_map, | ||
| }) |
There was a problem hiding this comment.
not a concern for this particular PR, but does it make sense to hold data that needs to be "reconstructed" in a separate buffer? Then we don't necessarily need to hold both copies in memory.
There was a problem hiding this comment.
The problem is that stored unique_domains_hashes is index => ShortHash (basically), but we need ShortHash => index.
Maybe it makes sense to store additional mapping (despite it will eat some small amount memory).
Well, I believe we need to dedup and index not only domains, but the other strings, so let's postpone this for some time.
|
|
||
| Self { | ||
| blocker: Blocker::new(network_filters, &blocker_options), | ||
| blocker: Blocker::from_context(filter_data_context.clone()), |
There was a problem hiding this comment.
Convention is to explicitly use Rc::clone(x) or Arc::clone(x) rather than x.clone() to make it clear that it's just a refcount increase.
https://doc.rust-lang.org/book/ch15-04-rc.html#using-rct-to-share-data
In these instances it depends on the feature config, so I'd suggest FilterDataContextRef::clone(x) instead
| @@ -0,0 +1,337 @@ | |||
| //! Builder for creating flatbuffer with serialized engine. | |||
There was a problem hiding this comment.
yes, sounds good to me
We can in theory. But the workflow isn't clear to me. |
|
[puLL-Merge] - brave/adblock-rust@489 DescriptionThis PR refactors the internal storage system for network filters in the adblock-rust library. The main changes include:
Possible Issues
ChangesChanges
sequenceDiagram
participant Client
participant Engine
participant FilterDataContext
participant FlatBufferBuilder
participant Blocker
Client->>Engine: from_filter_set(rules, optimize)
Engine->>FlatBufferBuilder: make_flatbuffer(network_filters, optimize)
FlatBufferBuilder->>FlatBufferBuilder: categorize filters into lists
FlatBufferBuilder->>FlatBufferBuilder: optimize if requested
FlatBufferBuilder->>FilterDataContext: new(memory)
FilterDataContext-->>Engine: FilterDataContextRef
Engine->>Blocker: from_context(filter_data_context)
Blocker-->>Engine: Blocker instance
Engine-->>Client: Engine instance
Client->>Engine: check_network_request(request)
Engine->>Blocker: check(request, resources)
Blocker->>Blocker: get_list(NetworkFilterListId)
Blocker->>FilterDataContext: access filter data
FilterDataContext-->>Blocker: NetworkFilterList
Blocker-->>Engine: BlockerResult
Engine-->>Client: result
|
The PR moves from per-NetworkList flatbuffers to a one (per-Engine).
It doesn't affect the performance metrics, but open possibilities to put cosmetic filters to the same flatbuffer.
It also simplifies the serialization and deserialization code.
Notes:
insert_dupwas dropped, currently it didn't have any effect with flatbuffers (we compared the flatbuffers offsets, not the unique ids). Maybe it makes sense to restore it the future versions.NetworkFilterLists::newis left as-is for benches and tests to avoid excessive diff.