Skip to content

REVAMP core to FIX #20 reuse of instanciated plugin digesters#21

Open
ankostis wants to merge 10 commits intoMiserlou:masterfrom
ankostis:digfact
Open

REVAMP core to FIX #20 reuse of instanciated plugin digesters#21
ankostis wants to merge 10 commits intoMiserlou:masterfrom
ankostis:digfact

Conversation

@ankostis
Copy link
Collaborator

@ankostis ankostis commented Dec 16, 2016

HASHING REVAMP

Had to revamp to solve the bug in #20 about reusing instanciated plugin-digesters.

In the previous version, each "digester" was a "convoluted' 2-tuple
(digester-instance, final-hash-func).
To solve the reuse bug without refactoring it would require to
re-initialize the plugins for each input.

In this PR registered digesters are actually factory_functions(fsize: int)
that will create a digester class with just 2 methods:

  • update(bytes)
  • hexdigest() -> str # lower

NOTE: the factory-function takes fsize as its argument, - this is
useful for git-digesters, to avoid always slurping files with known size;
that is also handy for for URL-resources; but as bargain,
all other digesters must use a "special" factory just to ignore the fsize arg.

MODULES REVAMP

As explained in #20, enabling plugins in Travis across PY-versions revealed
structural module issues.
In general it is easier if module-names do not shadow their package-name,
and using relative imports is helpfull. So I had to move:

  • omniparse.omniparse.py --> omniparse.init.py
  • and move project coords --> omniparse._version.py.

Other changes

(most changes are in commit e15bef5, impossible to separate, sorry)

  • FIX plugins - multiple problems were preventing them from running -
    added travis TCs to detect them.
  • Add -x family option to exclude families (that was easy) :-).
  • Implement inclusion/exclusion logic within a class that avoids
    needless instantiation of excluded digesters.
  • Do not git-slurp if URL provide Content-Length.
  • Avoid some top-level imports, to speed up cmd-line launch for
    help-msg.
  • FIX: was comparing hash-matches case-sensitively.
  • FIX: Stop printing this ugly message about plugins-failed-to-load, even if not installed!.
  • Centrally ensure all hashes are str/lower.
  • TravisCI on DEV for PY35+. PY36+; many things change in the future.
  • Add TCs to prove the errors found.

TODO:

  • We can move main-code into own __main__.py file, and do the trick I described in Rework cmd-line interface with Inclusions/eXclusions #10 with 2 cmds (oh, & omnihash).
  • Restructure a bit functions, grouping i.e. digester-setup functions together.
  • Unfortunately, PY36+ fails (allowed failure in Travis), so FIX this.
  • Retrofit CRCs as plugin.

Currently plugins install digester-instances when initialized.
The SAME-DIGESTERSs are re-used multiple times, 
corrupting hashes for all but the 1st input!

- Installed TCs to detect this bug.
MAJOR REVAMP revamp to solve the bug discovered in a21bf38
about reusing plugin-digesters.

In the previous version, each "digester" was a 2-tuple
`(digester-instance, final-hash-func)`.
To solve the reuse bug without refactoring it would require to
re-initialize the plugins for each input.  

In this revision, each registered *digester* is actually a
`factory_function(fsize: int)` that will create a *digester* class with
2 methods:
- update(bytes)
- hexdigest() -> str # lower

NOTE that the factory-function takes `fsize` as its argument, - this is
necessary for git-digesters not to always slurp bytes, particularly for
URL-resources; so all other digesters must use a "special" factory that
ignores the `fsize` arg.


Other changes, impossible to separate in commits:

- FIX plugins - multiple problems were preventing them from running -
added travis TCs to detect them.
- Add `-x family` option to exclude families.
- The inclusion/exclusion logic is implemented within a class.
- Do not git-slurp if URL provide Content-Length.
- Avoid needless instantiation of excluded digesters.
- Avoid some top-level imports, to speed up cmd-line launch for
help-msg.
Trying to work across PY-versions is easier if not shadowing
package-name with module, and using relative imports instead.

So move code:
  omniparse.omniparse.py --> omniparse.__init__.py
- and move project coords --> omniparse._version.py.
@ankostis
Copy link
Collaborator Author

After shattered, this utility might become a bit popular.

Would you check this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant