WIP: add a new cache which works on the CFG level and directly loads an object file #923

undingen · 2015-09-16T21:41:29Z

Our previous cache worked on the LLVM IR level - it hashed the IR and if it was exactly the same we loaded the object file.
This cache hashes the CFG and if we find a object file with the same hash we don't even build the LLVM IR. Because a lot of stuff can change which is not visible in the CFG it's not safe to reuse a cache file from one pyston build to another one (e.g. a function in our runtime get's renamed) therefore the cache also contains the time-stamp of the pyston executable.

Info:

we embed a lot of pointers to our CFG nodes inside our LLVM IR
- our CFG nodes don't have a deterministic memory layout. E.g. if we would have bytecode we could just embed the offset into the bytecode array for statements and for constants just embed the offset into the constant array (like cpython has)
- therefore there is a new AssignConstantVisitor which makes the actual pointer value of things we embed known and gives constants a deterministic ordering.
- there are two types: pointer references which need to be changed to the actual value (e.g. current_stmt) and "materializeable" stuff which are AST nodes which we create python instances on runtime e.g. AST_Str nodes,...
we generate external syms with mat_ (=materialize) and ptr_ (=pointer reference) prefixes which we then replace with the actual value during linking
some special symbol names e.g. cParentModule, cCF,...
a third prefixed str_ which is currently used as a hack for the random small strings we have in a lot of places which don't directly are represented in the cfg. E.g. print statement calls write. This get currently just embedded as the actual string value prefixed by str_ (therefore does not support characters which are not allowed in a sym name...)
some general changed which should be doing to decrease the number of pointers we embed: e.g. lamdas should directly get remapped to function defs during CFG construction, we should create the CFFunction directly during CFG construction (aka don't call wrapFunction during execution)...
all functions we call must be called by there actual name not memory location and all llvm JITed functions contain the hash inside there name
patchpoints are a problem because the stackmap info is not enough to reconstruct the PP info... Therefore we embed a new pp arg at the end of the every PP which is just a string which contains the PP info.
- e.g. num slots, size, names and types of all frame vars (we also use the same mechanism for OSR)

Issues:

currently we don't embed the type recorders
disabled type speculation
hashing should get improved - the CFG printed as string doesn't contain all infos. e.g. osr thresholds,...
had to disable inlining because of some inlining error (should be easy to fix)
removed a pyston specific pass which replaced sym names with actual pointer values
had to disable tryCallattrConstant because it will embed references to default args of the runtime functions and they are currently not tracked be the AssignConstantVisitor would probably also lead to issues were we use static functions inside the runtime which we probably can't resolve by name.
only small perf change even though loading the object files is very fast. We may want to load objects in another thread or early for functions were we already have a cache file. We may also want to directly load the most optimized version instead of the one with the requested optimization level.

           django_template3.py             2.7s (4)             2.7s (4)  -1.7%
                 pyxl_bench.py             2.2s (4)             2.0s (4)  -6.0%
     sqlalchemy_imperative2.py             2.6s (4)             2.5s (4)  -2.0%
                       geomean                 2.5s                 2.4s  -3.3%

rewrite float comparison base on CPython implementation

NumPy is huge, bigger than our previous (arbitrary) number by an order of magnitude.

C extensions (NumPy) might inherit classes in C code and expect to find tp_number. This is just copied from CPython's PyType_Ready. This requires assigning some of the runtime functions to thesq_ and mp_ slots otherwise there are infinite loops from Pyston attributes.

Those should never exist because all Python objects should be created through the CPython API except for type objects. Unfortunately, some places like NumPy do that so we need a mean of patching it for now.

…bjects. Instead of GCKind::HIDDEN_CLASS, use GCKind::RUNTIME and require that the runtime objects have a gc_visit functions by inheriting from GCAllocatedRuntime.

Refactors on types of GC objects

Previously, we would just call these "conservative python" objects, and scan their memory conservatively. The problem with this is that the builtin type might have defined a custom GC handler that needs to be called in addition to the conservative scanning (ie it stores GC pointers out-of-band that are not discoverable via other gc pointers). We had dealt with these kinds of issues before which is why I added the "conservative python kind", but I think the real solution here is to say that to the GC, these objects are just python objects, and then let the type machinery decide how to scan the objects, including how to handle the inheritance rules. We were already putting "conservativeGCHandler" as the gc_handler on these conservative types, so let's use it.

Previously it would have to call out to checkAndThrowCAPIException(), which is quite a bit slower than what it now can do, which is directly checking the return value.

Get Pyston building on Fedora

Improve dictionary performance

Some work on the NumPy test

More small optimizations

First job: check for cases where we call isSubclass() when it would be faster to call Py*_Check Probably overkill for this. Pretty cool though that it found a case that would have been impossible to spot textually, where there was an implicit this-> member access.

ie everything that the linter was warning about

This will reallocate all objects in the small heap and update all references that were pointing to this object. This is not functional yet, there are still references that we are not tracking at other points in the program, so it's still gated behind the MOVING_GC flag.

Last few fixes to make sqlalchemy_declarative work

Optionally move objects around in memory to prepare Pyston for a moving collector.

rewrite oldstyle class getattro

ie concerning things like: a, b = 1, 2 The irgen phase already knows how to do unpacking in a type-analyzed way (a and b will be of type int), but type speculation needed to have that added.

Used a hardcoded CXX exception style in the non-rewriteable case.

Type system fix: need to add unpacking to the type system

…CPython with some Pyston changes

PrintVisitor: use raw_ostream

Implment some PyNumber_XXX function, to enable "test_operator"

We already supported changing the values, but not the number of them. The main trickiness here is - We had been assuming that the number of defaults was immutable, so I had to find the places that we used it and add invalidation. - We assumed that all functions based on the same source function would have the same number of defaults. For the first one, I found all the places that looked at the defaults array, which should hopefully be all the places that need invalidation. One tricky part is that we will embed the num_defaults data into code produced by the LLVM tier, and we currently don't have any mechanism for invalidating those functions. This commit side-steps around that since the only functions that we can inline are the builtins, and those you aren't allowed to change the defaults anyway. So I added a "can_change_defaults" flag. For the second part, I moved "num_defaults" from the CLFunction (our "code" object) to the BoxedFunction (our "function" object), and then changed the users to pull it from there.

Allow changing the number of default arguments

Fix set comparisons

Add support for symbolic patchpoint targets to SelectionDAG and the X86 backend.

kmod and others added 30 commits August 11, 2015 01:03

Merge pull request pyston#821 from Daetalus/float_comparision

ae12787

rewrite float comparison base on CPython implementation

Add missing #define for NumPy's npy_common.h

6c4a0a1

Increase maximum size of BSS section.

b42cb29

NumPy is huge, bigger than our previous (arbitrary) number by an order of magnitude.

Simple implementation of abs(complex) PyComplex_AsCComplex.

b1b8d67

Stub implementation for Ellipsis.

da72a44

Add function to register nonheap root objects.

477c209

Those should never exist because all Python objects should be created through the CPython API except for type objects. Unfortunately, some places like NumPy do that so we need a mean of patching it for now.

Extension modules might want to have their own tp_free.

30c004f

Update testsuite submodule to include NumPy test.

d837de3

Make GC handling of HiddenClass more general, support other runtime o…

5de3104

…bjects. Instead of GCKind::HIDDEN_CLASS, use GCKind::RUNTIME and require that the runtime objects have a gc_visit functions by inheriting from GCAllocatedRuntime.

Move GC-related declarations to gc folder and add comments.

4c99ad4

Get Pyston building on Fedora

a0ce81f

Merge pull request pyston#820 from rudi-c/gc_types

45b15b3

Refactors on types of GC objects

Misc fixes / helpers

c30e503

Dict change: scan manually instead of conservatively

7c6b521

Switch BoxedDict to llvm::DenseMap

779cb5b

Have the rewriter check for CAPI excs directly

893dbbb

Previously it would have to call out to checkAndThrowCAPIException(), which is quite a bit slower than what it now can do, which is directly checking the return value.

Support checking for return codes other than NULL

8813c42

Merge pull request pyston#823 from kmod/fedora_fixes

cd5a4d0

Get Pyston building on Fedora

Merge pull request pyston#822 from kmod/perf4

76c4219

Improve dictionary performance

Microoptimizations

be7aae7

Have rearrangeArguments return in place

61a68d3

Optimize some type-checking

ee5b6d4

Merge pull request pyston#783 from rudi-c/numpy_fix

109df64

Some work on the NumPy test

Merge pull request pyston#824 from kmod/perf4

47d0270

More small optimizations

Remove the broken LLVM rules in the Makefile

f0efd51

Change isSubclass to PyFoo_Check

4edd24e

ie everything that the linter was warning about

Introduction of slice ast type & updated libpypa

7e14a2c

rudi-c and others added 27 commits September 4, 2015 15:10

ifdef out some moving GC code for now.

8a510e3

Merge pull request pyston#895 from kmod/sqlalchemy

db991b3

Last few fixes to make sqlalchemy_declarative work

Merge pull request pyston#889 from rudi-c/movingmerge

7b84d99

Optionally move objects around in memory to prepare Pyston for a moving collector.

rewrite oldstyle class getattro

4975866

Merge pull request pyston#899 from undingen/perf_oldstyle2

9a3a43c

rewrite oldstyle class getattro

Type system fix: need to add unpacking to the type system

414d207

ie concerning things like: a, b = 1, 2 The irgen phase already knows how to do unpacking in a type-analyzed way (a and b will be of type int), but type speculation needed to have that added.

Callattr fix

80f4bc3

Used a hardcoded CXX exception style in the non-rewriteable case.

Merge pull request pyston#901 from kmod/speculation_fix

1782dd5

Type system fix: need to add unpacking to the type system

enable test_operator

b8e204b

get slice copy before calculate the posistion

1eb9a9a

Implement some PyNumber_XXX function, most of the code are copy from …

c66b6f1

…CPython with some Pyston changes

add PySequence_SetItem and GetItem

5bacdd4

PrintVisitor: use raw_ostream

fe6885a

Merge pull request pyston#904 from undingen/printvisitor

f96c49a

PrintVisitor: use raw_ostream

Merge pull request pyston#900 from Daetalus/test_operator

2294c2a

Implment some PyNumber_XXX function, to enable "test_operator"

I think this test was in the wrong file

cd43a0f

Merge pull request pyston#905 from kmod/change_numdefaults

54a9559

Allow changing the number of default arguments

Fix set comparisons

b91071c

Merge pull request pyston#908 from undingen/fix_set_cmp

9df41bb

Fix set comparisons

rebase llvm to r235483 Apr 22 2015

5ce93e8

Add support for symbolic patchpoint targets to SelectionDAG and the X86 backend.

Remove workaround for symbolic patchpoint targets

8fde2cc

fix boxing passes after switching to symbols

924813e

WIP: CFG level object cache

38f8ff9

call wrapFunction during CFG processing

f9f727b

HACK: disable OSR inside the llvm tier to investigate perf difference

9337270

kmod added the wip label Oct 15, 2015

kmod force-pushed the master branch 2 times, most recently from 352fd89 to 6488a3e Compare October 28, 2020 21:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP: add a new cache which works on the CFG level and directly loads an object file #923

WIP: add a new cache which works on the CFG level and directly loads an object file #923

Uh oh!

undingen commented Sep 16, 2015

Uh oh!

Uh oh!

WIP: add a new cache which works on the CFG level and directly loads an object file #923

Are you sure you want to change the base?

WIP: add a new cache which works on the CFG level and directly loads an object file #923

Uh oh!

Conversation

undingen commented Sep 16, 2015

Uh oh!

Uh oh!