Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: add a new cache which works on the CFG level and directly loads an object file #923

Open
wants to merge 3,173 commits into
base: master
Choose a base branch
from

Conversation

undingen
Copy link
Contributor

Our previous cache worked on the LLVM IR level - it hashed the IR and if it was exactly the same we loaded the object file.
This cache hashes the CFG and if we find a object file with the same hash we don't even build the LLVM IR. Because a lot of stuff can change which is not visible in the CFG it's not safe to reuse a cache file from one pyston build to another one (e.g. a function in our runtime get's renamed) therefore the cache also contains the time-stamp of the pyston executable.

Info:

  • we embed a lot of pointers to our CFG nodes inside our LLVM IR
    • our CFG nodes don't have a deterministic memory layout. E.g. if we would have bytecode we could just embed the offset into the bytecode array for statements and for constants just embed the offset into the constant array (like cpython has)
    • therefore there is a new AssignConstantVisitor which makes the actual pointer value of things we embed known and gives constants a deterministic ordering.
    • there are two types: pointer references which need to be changed to the actual value (e.g. current_stmt) and "materializeable" stuff which are AST nodes which we create python instances on runtime e.g. AST_Str nodes,...
  • we generate external syms with mat_ (=materialize) and ptr_ (=pointer reference) prefixes which we then replace with the actual value during linking
  • some special symbol names e.g. cParentModule, cCF,...
  • a third prefixed str_ which is currently used as a hack for the random small strings we have in a lot of places which don't directly are represented in the cfg. E.g. print statement calls write. This get currently just embedded as the actual string value prefixed by str_ (therefore does not support characters which are not allowed in a sym name...)
  • some general changed which should be doing to decrease the number of pointers we embed: e.g. lamdas should directly get remapped to function defs during CFG construction, we should create the CFFunction directly during CFG construction (aka don't call wrapFunction during execution)...
  • all functions we call must be called by there actual name not memory location and all llvm JITed functions contain the hash inside there name
  • patchpoints are a problem because the stackmap info is not enough to reconstruct the PP info... Therefore we embed a new pp arg at the end of the every PP which is just a string which contains the PP info.
    • e.g. num slots, size, names and types of all frame vars (we also use the same mechanism for OSR)

Issues:

  • currently we don't embed the type recorders
  • disabled type speculation
  • hashing should get improved - the CFG printed as string doesn't contain all infos. e.g. osr thresholds,...
  • had to disable inlining because of some inlining error (should be easy to fix)
  • removed a pyston specific pass which replaced sym names with actual pointer values
  • had to disable tryCallattrConstant because it will embed references to default args of the runtime functions and they are currently not tracked be the AssignConstantVisitor would probably also lead to issues were we use static functions inside the runtime which we probably can't resolve by name.
  • only small perf change even though loading the object files is very fast. We may want to load objects in another thread or early for functions were we already have a cache file. We may also want to directly load the most optimized version instead of the one with the requested optimization level.
           django_template3.py             2.7s (4)             2.7s (4)  -1.7%
                 pyxl_bench.py             2.2s (4)             2.0s (4)  -6.0%
     sqlalchemy_imperative2.py             2.6s (4)             2.5s (4)  -2.0%
                       geomean                 2.5s                 2.4s  -3.3%

kmod and others added 30 commits August 11, 2015 01:03
rewrite float comparison base on CPython implementation
NumPy is huge, bigger than our previous (arbitrary) number by an order
of magnitude.
C extensions (NumPy) might inherit classes in C code and expect to find
tp_number. This is just copied from CPython's PyType_Ready.

This requires assigning some of the runtime functions to thesq_ and mp_ slots
otherwise there are infinite loops from Pyston attributes.
Those should never exist because all Python objects should be created
through the CPython API except for type objects. Unfortunately, some
places like NumPy do that so we need a mean of patching it for now.
…bjects.

Instead of GCKind::HIDDEN_CLASS, use GCKind::RUNTIME and require that
the runtime objects have a gc_visit functions by inheriting from
GCAllocatedRuntime.
Refactors on types of GC objects
Previously, we would just call these "conservative python" objects,
and scan their memory conservatively.  The problem with this is that
the builtin type might have defined a custom GC handler that needs to
be called in addition to the conservative scanning (ie it stores GC
pointers out-of-band that are not discoverable via other gc pointers).

We had dealt with these kinds of issues before which is why I added
the "conservative python kind", but I think the real solution here is
to say that to the GC, these objects are just python objects, and
then let the type machinery decide how to scan the objects, including
how to handle the inheritance rules.  We were already putting
"conservativeGCHandler" as the gc_handler on these conservative types,
so let's use it.
Previously it would have to call out to checkAndThrowCAPIException(),
which is quite a bit slower than what it now can do, which is directly
checking the return value.
Improve dictionary performance
First job: check for cases where we call isSubclass() when it would
be faster to call Py*_Check

Probably overkill for this.  Pretty cool though that it found a
case that would have been impossible to spot textually, where there
was an implicit this-> member access.
ie everything that the linter was warning about
rudi-c and others added 27 commits September 4, 2015 15:10
This will reallocate all objects in the small heap and update
all references that were pointing to this object.

This is not functional yet, there are still references that we are not
tracking at other points in the program, so it's still gated behind the
MOVING_GC flag.
Last few fixes to make sqlalchemy_declarative work
Optionally move objects around in memory to prepare Pyston for a moving collector.
ie concerning things like:
  a, b = 1, 2

The irgen phase already knows how to do unpacking in a type-analyzed
way (a and b will be of type int), but type speculation needed to
have that added.
Used a hardcoded CXX exception style in the non-rewriteable case.
Type system fix: need to add unpacking to the type system
Implment some PyNumber_XXX function, to enable "test_operator"
We already supported changing the values, but not the number
of them.  The main trickiness here is
- We had been assuming that the number of defaults was immutable,
  so I had to find the places that we used it and add invalidation.
- We assumed that all functions based on the same source function would
  have the same number of defaults.

For the first one, I found all the places that looked at the defaults array,
which should hopefully be all the places that need invalidation.

One tricky part is that we will embed the num_defaults data into code produced
by the LLVM tier, and we currently don't have any mechanism for invalidating
those functions.  This commit side-steps around that since the only functions that
we can inline are the builtins, and those you aren't allowed to change the defaults
anyway.  So I added a "can_change_defaults" flag.

For the second part, I moved "num_defaults" from the CLFunction (our "code" object)
to the BoxedFunction (our "function" object), and then changed the users to pull
it from there.
Allow changing the number of default arguments
Add support for symbolic patchpoint targets to SelectionDAG and the
X86 backend.
@kmod kmod added the wip label Oct 15, 2015
@kmod kmod force-pushed the master branch 2 times, most recently from 352fd89 to 6488a3e Compare October 28, 2020 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants