New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

riscv64/x390: add *_overflow #9214

Draft

ghostway0 wants to merge 6 commits into bytecodealliance:main from ghostway0:ghostway/9186

Contributor

ghostway0 commented Sep 8, 2024

currently a draft and only riscv64

ghostway0 added 4 commits

September 7, 2024 11:13

wip

43d66cd

wip

969f859

wip

f285275

wip

ee538a0

ghostway0 force-pushed the ghostway/9186 branch from f504e73 to b4c2cc0 Compare

September 9, 2024 17:23

wip

242d45d

ghostway0 force-pushed the ghostway/9186 branch from b4c2cc0 to 242d45d Compare

September 9, 2024 18:34

github-actions bot added the cranelift label

Contributor Author

ghostway0 commented Sep 11, 2024

@bjorn3 can you take a look? also, I haven't found umul equivalent (or mul with zext*) in s390x. do you know what are their names?

wip

0f18f9c

Member

uweigand commented Sep 19, 2024

Hi @ghostway0, a few comments on the s390x part:

All the new instruction rules you added seem to provide only a single return, the overflow bit. However, my understanding is that smul_overflow and all the other overflow instructions are defined to have two returns, the low-part result and the overflow bit. I think you'll need to use some form of with_flags to construct the pair of results (like x86 and aarch64 already do).
You're simply re-using the same instructions used for the "normal" operation (add/sub/mul) also for the overflow operation. That is correct for 32-bit and 64-bit operations, but not for 8-bit and 16-bit operations. The reason is that the 390x ISA does not actually have any 8-bit or 16-bit arithmetic instructions, so we simply use the 32-bit version also for 8-bit and 16-bit operations. That provides the correct (low-part) result, but any overflow indication would be incorrect.
There is no unsigned-multiply instruction with overflow indication on our platform. What other compilers do is to use the 32x32->64 or 64x64->128 bit wide multiply instruction, and check whether the high-part of the output is zero.

afonso360 reviewed

View reviewed changes

Contributor

afonso360 left a comment •

edited

Loading

👋 Hey,

I don't know if this is ready for review yet, but It's a great start!

A few comments for the RISC-V part. I didn't check the lowerings in a lot of detail, mostly just spotting a few things that could be shorter.

Thanks for working on this!

Edit: I also ran the fuzzer (with these changes) and it pointed out this testcase.

Fuzzer testcase

Testcase:

test interpret
test run
target riscv64gc 
target x86_64

function %a(i8) -> i8 {
block0(v0: i8):
    v1, v2 = smul_overflow v0, v0
    return v2
}
; run: %a(-15) == 1

Result:

 ERROR cranelift_filetests::concurrent > FAIL: run
FAIL ./test.clif: run

Caused by:
    Failed test: run: %a(-15) == 1, actual: 0
1 tests
Error: 1 failure

cranelift/codegen/src/isa/riscv64/lower.isle

+                (let ((hi XReg (rv_mulhu x y))
+                      (res XReg (rv_mul x y))
+                      (one XReg (imm $I8 1))
+                      (of XReg (gen_select_xreg (cmp_eqz hi) (zero_reg) one)))

Contributor

afonso360 Sep 21, 2024 •

edited

Loading

It might be better to use rv_snez here instead of a select between one and zero.

cranelift/codegen/src/isa/riscv64/lower.isle

+                      (res XReg (rv_mul tmp_x tmp_y))
+                      (hi XReg (rv_srli res (imm12_const (ty_bits ty))))
+                      (one XReg (imm $I8 1))
+                      (of XReg (gen_select_xreg (cmp_eqz hi) (zero_reg) one)))

Contributor

afonso360 Sep 21, 2024 •

edited

Loading

Ditto here

cranelift/codegen/src/isa/riscv64/lower.isle

+              (rule 1 (lower (has_type $I64 (uadd_overflow x y)))
+                (let ((sum XReg (rv_add x y))
+                      (one XReg (imm $I8 1))
+                      (of XReg (gen_select_xreg (int_compare (IntCC.UnsignedLessThan) sum x) one (zero_reg))))

Contributor

afonso360 Sep 21, 2024

It might be better to use rv_sltu here instead of a select between one and zero. The RISC-V comparision functions already return a zero or one, and they are a lot shorter than our current implementation of select_xreg

cranelift/codegen/src/isa/riscv64/lower.isle

+                      (high_tmp XReg (rv_add (value_regs_get x 1) (value_regs_get y 1)))
+                      ;; add carry.
+                      (high XReg (rv_add high_tmp carry))
+                      (of XReg (gen_select_xreg (int_compare (IntCC.UnsignedLessThan) high carry) one (zero_reg))))

Contributor

afonso360 Sep 21, 2024

Ditto here for sltu.

cranelift/codegen/src/isa/riscv64/lower.isle

+                      (res XReg (rv_mul tmp_x tmp_y))
+                      (hi XReg (rv_srai res (imm12_const (ty_bits ty))))
+                      (one XReg (imm $I8 1))
+                      (of XReg (gen_select_xreg (cmp_eqz hi) (zero_reg) one)))

Contributor

afonso360 Sep 21, 2024

This could also be a snez

cranelift/codegen/src/isa/riscv64/lower.isle

+                    ;; madd    dst_hi, x_hi, y_lo, dst_hi
+                    ;; madd    dst_lo, x_lo, y_lo, zero
+                    (dst_hi1 XReg (rv_mulhu x_lo y_lo))
+                    (one XReg (imm $I32 1))

Contributor

afonso360 Sep 21, 2024

This one doesn't seem to be used anywhere, similarly in the rules below there are a few one unused instructions.

cranelift/codegen/src/isa/riscv64/lower.isle

+                    (one XReg (imm $I32 1))
+                    (dst_hi2 ValueRegs (smadd_overflow64 x_lo y_hi (value_regs_get dst_hi1 0)))
+                    (dst_hi ValueRegs (smadd_overflow64 x_hi y_lo (value_regs_get dst_hi2 0)))
+                    (dst_lo XReg (madd x_lo y_lo (zero_reg)))

Contributor

afonso360 Sep 21, 2024

Instead of doing madd here, we can just multiply x_lo and y_lo and save one instruction.

cranelift/codegen/src/isa/riscv64/lower.isle

+                    (one XReg (imm $I32 1))
+                    (dst_hi2 ValueRegs (umadd_overflow64 x_lo y_hi (value_regs_get dst_hi1 0)))
+                    (dst_hi ValueRegs (umadd_overflow64 x_hi y_lo (value_regs_get dst_hi2 0)))
+                    (dst_lo XReg (madd x_lo y_lo (zero_reg)))

Contributor

afonso360 Sep 21, 2024

Here we could use mul instead of madd and save one instruction.

cranelift/codegen/src/isa/riscv64/inst.isle

+                (let ((one XReg (imm $I8 1))
+                      (hi XReg (rv_mulhu x y))
+                      (m XReg (rv_mul x y))
+                      (of_mul XReg (gen_select_xreg (cmp_eqz hi) (zero_reg) one))

Contributor

afonso360 Sep 21, 2024

We could replace this with a snez instruction

cranelift/codegen/src/isa/riscv64/inst.isle

+                      (m XReg (rv_mul x y))
+                      (of_mul XReg (gen_select_xreg (cmp_eqz hi) (zero_reg) one))
+                      (sum XReg (rv_add m z))
+                      (of_add XReg (gen_select_xreg (int_compare (IntCC.UnsignedLessThan) sum m) one (zero_reg)))

Contributor

afonso360 Sep 21, 2024

This could also be replaced with a sltu instruction which is a shorter sequence than a full select.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels