-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace Marshal by Granular_marshal in ocaml-index #1889
Conversation
bc7eb27
to
e6b4bd3
Compare
4c6e689
to
2f18e9b
Compare
2f18e9b
to
a1495ad
Compare
a1495ad
to
250b46e
Compare
Optimize use of reachable_words Fix fd leak: always close granlar_marshal in_channel Compressed index_format by de-duplicating filenames/longidents Fix fd leak: keep latest in_channel open for granular_marshal
in the union find structures
250b46e
to
ca0edb2
Compare
On my own testing the indexing time is a slight (5%) increase of the indexing time and no increase in the file size. Fetches are now close to instantaneous and Merlin's memory usage vastly reduced since the index are not fully loaded into memory anymore. Indexing one large library in Dune:
Impact on the larger Dune parallel build appear to be much lower: 1.6% increase. @liam923 I am going to merge these improvements that make sense for most users, but we should also evaluate the impact of this PR on "larger" codebases. |
Initial PR presentation by @Lucccyo:
The current implementation of ocaml-index uses Marshal to store on the disk the data.
Searching for occurrences on massive projects is time-consuming because the search loads all the data structures from the disk to perform the search.
This Pull Request aims to replace Marshal with a granular version to make the ocaml-index more efficient in reading.
It comes with two granular implementations of the data structures set and map, based on the Stdlib implementation.
During a search operation, the program lazily loads only the required part of the ocaml-index.
It works because the heavy nodes of the granular_map and granular_set have link indirections,
introducing serialization boundaries, which allows Marshal to delay the deserialization of their children.