Switch to Whoosh implementation to improve speed#762
Switch to Whoosh implementation to improve speed#762
Conversation
|
@eepMoody what's the status on this PR? If I understand correctly, a fair amount of the search is already in place in staging... Where does this fit in? |
|
Ah sorry, I hit a bit of a wall on getting the actual switch to Elasticsearch (which is a major performance boost) sorted out. What's in production adds the new search methods, but the performance isn't really improved. Might be a little bit worse, but hard to tell until it hits real resources. The main bit that's outstanding is the deployment setup, since ES needs to run in its own container/server. I haven't quite gotten that into a state where it's functional in production. If you want to take a peek, maybe there's something obvious that I've overlooked or another approach that might make more sense? |
Use Whoosh for text/fuzzy search and spaCy word embeddings for semantic similarity. Removes the need for a separate ES service.
f3358bd to
598c9f8
Compare
Replaces the existing hand-tuned search with the Whoosh framework.
This has major performance benefits, but incurs a much higher indexing time. To deal with this, I've adopted a strategy that caches indexes in the repo, unpacks them for quick development purposes, then runs a background indexing process inside the server after deployment. In practice, this means search results may start slightly out-of-date (if a major content release has not been cachced) but will be consistent within approximately 5 minutes of deploy.
Major changes:
quickindexwhich will index and create zipped files for checking inNOTE being kept in draft until #844 is merged, as this is based on that branch