Draft
Conversation
remove debug cli line that I put in because I was too lazy to setup the IDE to do it properly 😁
restores extract_streaming_native Hopefully this was the only one i accdientally axed
pylint will deduct points in it's report for useless returns MyPy expects explicit `return None` disabling `useless-retursn` in pylint will avoid this conflict
Faster unpackers didn't respect workers / batch sizes I forgot that workers/batch_size varied inside of strategies; which led to my profiling attempts providing useless data. All strategies should now respect num_workers/batch_size Optimized now beats Native by 6/10ths of a second (UF is slower on average by 2 seconds); should experiment with batch_size arguments I'd like to do some more TLC to get rid of the extraction strategy pattern. Or at least; merge the 3 single pass extractors and the 2 double pass extractors If I abstract a bit further; I could probably extend it to support checksum validation; that may end up being the *true* extraction strategy pattern; some class that passes off functions to the parallel 'passes' Also need to add timings to the master class; my profiling hack isn't great, and we already have an extraction stats result
unpacking speed seems to be HARD bottlenecked by IO and my parser for pyfilesystem. Using a native implementation to extrract files seems to be on par (maybe better) than parallel processing. This mostly says that my pyfilesystem setup is terrible; which uh... makes me wonder if SgaV2 v1.0 is actually faster than SgaV2 v2.0 I'd REALLY like to find a profiling framework that I'd like to use Gonna run an AB test between Serial and Optimized (optimized is generally the best now)
more micro optimizations Currently saving 1.5 seconds over the base PR Doing some really bad extrapolation, that might bring us to ~120% faster for large files At least the code has less duplicated blocks Think its time to throw in the towel and just finish what I have here, then finish migrating to SGA-V2
Kind of ugly hack NativeV2Parse accepts a boolean/(int)->boolean that determins whether to include the drive in response
migrate v2 only code to an entrypoint plugin then push new update?
Also seem to have sped up native should optimize the data/metadata pairing
gonna axe delta strategy/progressive strategy and just ship what's left Streaming/Optimized now seem to hit the same performance (more testing needed)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rather than abstracting to pyfilesystem ALL the time, we only need to use it for packing (for now) and for doing complex operations on SGA archives (like modifying select files in place)
Currently butchers CannibalToast's PR for several reasons;