Skip to content

Native CLI#26

Draft
ModernMAK wants to merge 29 commits intomainfrom
native-cli
Draft

Native CLI#26
ModernMAK wants to merge 29 commits intomainfrom
native-cli

Conversation

@ModernMAK
Copy link
Member

Rather than abstracting to pyfilesystem ALL the time, we only need to use it for packing (for now) and for doing complex operations on SGA archives (like modifying select files in place)

Currently butchers CannibalToast's PR for several reasons;

  • To completely avoid using pyfilesystem, see pyfilesystem (fs) is dead Issue-Tracker#51
  • Fix some minor bugs
    • name-block would only work if name count was longer than the size of the buffer
    • some misnamed properties (sga called ultra_fast instead of ultra_fast_native or something)
    • handle os.O_BINARY not existing on unix systems (nobody will use this on linux, but couldn't hurt)
  • CI fixes (mostly mypy/pylint)

Trimming code paths that don't execute
Reducing all dependencies on pyfilesystem
more hacking away at chafe

Starting to get to the point where I can 'abstract' away calls and reimplement a plugin architecture for native calls

But BEFORE that

a quick pass at anything else that core exposes that can be native for v2; just to get a basis
remove debug cli line that I put in because I was too lazy to setup the IDE to do it properly 😁
restores extract_streaming_native

Hopefully this was the only one i accdientally axed
reducing a lot of redundant code

Will it affect timings?
Probably; so long as ultrafast still hits ~10 seconds on definitive-edition files, I'm not concerned
pylint will deduct points in it's report for useless returns

MyPy expects explicit `return None`

disabling `useless-retursn` in pylint will avoid this conflict
more cleanup
Faster unpackers didn't respect workers / batch sizes

I forgot that workers/batch_size varied inside of strategies; which led to my profiling attempts providing useless data.

All strategies should now respect num_workers/batch_size

Optimized now beats Native by 6/10ths of a second (UF is slower on average by 2 seconds); should experiment with batch_size arguments

I'd like to do some more TLC to get rid of the extraction strategy pattern. Or at least; merge the 3 single pass extractors and the 2 double pass extractors

If I abstract a bit further; I could probably extend it to support checksum validation; that may end up being the *true* extraction strategy pattern; some class that passes off functions to the parallel 'passes'

Also need to add timings to the master class; my profiling hack isn't great, and we already have an extraction stats result
unpacking speed seems to be HARD bottlenecked by IO and my parser for pyfilesystem. Using a native implementation to extrract files seems to be on par (maybe better) than parallel processing.

This mostly says that my pyfilesystem setup is terrible; which uh... makes me wonder if SgaV2 v1.0 is actually faster than SgaV2 v2.0

I'd REALLY like to find a profiling framework that I'd like to use

Gonna run an AB test between Serial and Optimized (optimized is generally the best now)
AB test is in; for a single worker; they're essential the same

The big slowdown is actually creating directories at the start
mistake on my part mean that we always used only 1 worker;
parallel IS faster than serial by about ~3 seconds

And the directory cache fix shaves another ~1 seconds
more micro optimizations

Currently saving 1.5 seconds over the base PR

Doing some really bad extrapolation, that might bring us to ~120% faster for large files

At least the code has less duplicated blocks

Think its time to throw in the towel and just finish what I have here, then finish migrating to SGA-V2
Kind of ugly hack
NativeV2Parse accepts a boolean/(int)->boolean that determins whether to include the drive in response
migrate v2 only code to an entrypoint plugin then push new update?
Also seem to have sped up native

should optimize the data/metadata pairing
gonna axe delta strategy/progressive strategy and just ship what's left
Streaming/Optimized now seem to hit the same performance (more testing needed)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant