- 
                Notifications
    You must be signed in to change notification settings 
- Fork 30
Float4 arrayView support #646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: code-reflection
Are you sure you want to change the base?
Conversation
| 👋 Welcome back rbrchen! A progress list of the required criteria for merging this PR into  | 
| ❗ This change is not yet ready to be integrated. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great work. I have a few comments/questions.
All test passing also for my setup (OpenCL backend).
| .space() | ||
| .oparen(); | ||
| // if the value to be stored is an operation, recurse on the operation | ||
| if (hatVectorStoreView.operands().get(1) instanceof Op.Result r && r.op() instanceof HATVectorBinaryOp) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be automatically visited when HATVectorBinaryOp is found? Do you have the use case in which this is needed?  The reason I am asking is that this also works with the prev. proposal for single Float4 loads./stores . So I wonder which pattern triggers the new check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need to propagate this change also for the CUDA codegen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point; currently, a line like vC[index * 4] = Float4.add(vA[index * 4], vB[index * 4]) won't be handled properly (the add operation won't be recursed over, and the resulting code will look like vstore4(vA, 0, &c->array[index*4]) instead of storing the add operation's result).
(Edit: I originally thought that if we extracted the array accesses vA[index * 4] and vB[index * 4] into Float4 variables first before operating on them, the original code would work, but it doesn't seem to be the case)
There might also be something I'm missing that causes a case like vC[index * 4] = Float4.add(vA[index * 4], vB[index * 4]) to fail with the original code; I'll give it another pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, got it. Thanks for the clarification. So it looks to me that possibly there are some missing HATVectorVarLoadOps in this case.
For example:
Float4.add(vA[index * 4], vB[index * 4]), would have two HATVectorVarLoadOp loading a float4. But I get now, with the array views, we will see actually two plain arrays being loaded. So if that's the case, it makes sense to invoke recurse.
I am not suggesting to use HATVectorVarLoadOp, but it is good we have the cases identified.
| // } | ||
| switch (op) { | ||
| case JavaOp.InvokeOp iop -> { | ||
| if (isVectorOperation(iop)) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This HAT Phase is invoked after all vector operations have been inserted in the new code-model. So I wonder if this is actually triggered. My take is that, at this point in the compilation pipeline, all VectoLoads, Stores and VSelect are already in place. Is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct; this is leftover from when I was testing earlier haha. Good catch!
| } | ||
| } | ||
| // handles only 1D and 2D arrays | ||
| case JavaOp.ArrayAccessOp.ArrayLoadOp alop -> { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My take is that this phase is hitting this case for all the actual array view. If the prev. block still insert the dialect ops for vectors, we could substitute the prev.. pipeline of hat phases with this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh interesting, are you proposing combining some of the hat phases into one larger overall phase using this case logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the whole logic works, then yes, we can condense it in one phase. I personally prefer different compiler phases, so if something goes wrong, or we find a new use-case that does not match the patterns for one of the phases, we can easily detect phases involved. But not strong opinion.
| vop.resultType(), | ||
| 4, | ||
| HATVectorViewOp.VectorType.FLOAT4, | ||
| false, //TODO: fix | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This flag was to indentify if the store is set to a private or shared array. In this case, we don't have a pointer.
Using this search will probably work:
    private boolean findIsSharedOrPrivateSpace(CoreOp.VarAccessOp.VarLoadOp varLoadOp) {
        return findIsSharedOrPrivateSpace(varLoadOp.operands().get(0));
    }
    private boolean findIsSharedOrPrivateSpace(Value v) {
        if (v instanceof Op.Result r && r.op() instanceof CoreOp.VarAccessOp.VarLoadOp varLoadOp) {
            return findIsSharedOrPrivateSpace(varLoadOp);
        } else  if (v instanceof CoreOp.Result r && (r.op() instanceof HATLocalVarOp || r.op() instanceof HATPrivateVarOp)) {
            return true;
        }else{
            return false;
        }
    }I think in the future we can add a new Op for expressing global array accesses too (from parameters), so we can simplify how we handle different levels of memory within the code gen.
Preliminary
Float4support for arrayViews. Currently only supported forF32ArrayPaddedbuffers, which can be accessed like aFloat4[]as shown below:At the moment, to use an element in the array, the
Float4must be loaded into a separate variable first, i.e.A value can be stored into the array through the following syntax:
Progress
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/babylon.git pull/646/head:pull/646$ git checkout pull/646Update a local copy of the PR:
$ git checkout pull/646$ git pull https://git.openjdk.org/babylon.git pull/646/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 646View PR using the GUI difftool:
$ git pr show -t 646Using diff file
Download this PR as a diff file:
https://git.openjdk.org/babylon/pull/646.diff