Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Marshal by Granular_marshal in ocaml-index #1889

Merged
merged 8 commits into from
Feb 5, 2025

Conversation

voodoos
Copy link
Collaborator

@voodoos voodoos commented Jan 17, 2025

Initial PR presentation by @Lucccyo:

The current implementation of ocaml-index uses Marshal to store on the disk the data.
Searching for occurrences on massive projects is time-consuming because the search loads all the data structures from the disk to perform the search.

This Pull Request aims to replace Marshal with a granular version to make the ocaml-index more efficient in reading.
It comes with two granular implementations of the data structures set and map, based on the Stdlib implementation.
During a search operation, the program lazily loads only the required part of the ocaml-index.
It works because the heavy nodes of the granular_map and granular_set have link indirections,
introducing serialization boundaries, which allows Marshal to delay the deserialization of their children.

voodoos added a commit to voodoos/merlin that referenced this pull request Jan 21, 2025
voodoos added a commit to voodoos/merlin that referenced this pull request Jan 22, 2025
voodoos added a commit to voodoos/merlin that referenced this pull request Jan 22, 2025
voodoos added a commit to voodoos/merlin that referenced this pull request Jan 22, 2025
voodoos added a commit to voodoos/merlin that referenced this pull request Jan 22, 2025
@voodoos
Copy link
Collaborator Author

voodoos commented Feb 5, 2025

On my own testing the indexing time is a slight (5%) increase of the indexing time and no increase in the file size. Fetches are now close to instantaneous and Merlin's memory usage vastly reduced since the index are not fully loaded into memory anymore.

Indexing one large library in Dune:

Benchmark 1: Main
  Time (mean ± σ):      1.116 s ±  0.011 s    [User: 0.941 s, System: 0.159 s]
  Range (min … max):    1.100 s …  1.132 s    10 runs

Benchmark 1: With the new granular-marshall:
  Time (mean ± σ):      1.177 s ±  0.011 s    [User: 1.004 s, System: 0.158 s]
  Range (min … max):    1.162 s …  1.195 s    10 runs

Impact on the larger Dune parallel build appear to be much lower: 1.6% increase.

@liam923 I am going to merge these improvements that make sense for most users, but we should also evaluate the impact of this PR on "larger" codebases.

@voodoos voodoos merged commit 102eee4 into ocaml:main Feb 5, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants