Trie implementation

While looking at the `cov` command recently, I realized that it performs extremely poorly with a large number of addresses (5000+) provided in the IP list argument. For example, passing one's `rawaddrman` addresses (40k+ addrs) would take upwards of 10 minutes to get a coverage check. Not good.

The reason is simple, and is fundamental to kartograf: looking up an IP address in a list of network prefixes _n_ is of complexity O(n). Thus for looking up _m_ number of IPs, the worst case would be _m_ * _n_ runtime. 

Turns out the canonical way of looking up IP addresses in networks is via a _trie_. It's a binary tree, where each node represent a bit in the IP address. So you can "walk" down the bits in an IP address until you find the longest prefix matching, and see if it has an associated ASN. In short, for any IP lookup within a list of IP networks, we want to use a trie because the lookup is constant time. Building the trie remains O(m * n): you pay the upfront cost of building it, to search it for "free".

Benchmarking `cov`, I got the following results on my machine (32GB mem),
using some local data I had, with a fixed map file of 692_583 lines, and the corresponding with IP list length:

| ip_list_len | current_cov | cov_with_trie |
| :--------------| :-----------------| :----------------    |  
| 437            |   6s                |   8.5s               |
| 5407          |   1m14s        |   8.9s               |
| 65049        |   10m +        |   9.8s                |


I have three PRs in the works related to this:
- add trie implementation #115 
- use trie in `cov` command https://github.com/asmap/kartograf/pull/116
- use trie in `merge` command https://github.com/asmap/kartograf/pull/123

There will be some implementation details to sort out as the behavior may change a bit, so I wanted to open this issue for discussion and visibility. Overall we will get better performance and nicer code.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trie implementation #117

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Trie implementation #117

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions