Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

riscv64/x390: add *_overflow #9214

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

ghostway0
Copy link
Contributor

currently a draft and only riscv64

@github-actions github-actions bot added the cranelift Issues related to the Cranelift code generator label Sep 10, 2024
@ghostway0
Copy link
Contributor Author

@bjorn3 can you take a look? also, I haven't found umul equivalent (or mul with zext*) in s390x. do you know what are their names?

@uweigand
Copy link
Member

Hi @ghostway0, a few comments on the s390x part:

  • All the new instruction rules you added seem to provide only a single return, the overflow bit. However, my understanding is that smul_overflow and all the other overflow instructions are defined to have two returns, the low-part result and the overflow bit. I think you'll need to use some form of with_flags to construct the pair of results (like x86 and aarch64 already do).
  • You're simply re-using the same instructions used for the "normal" operation (add/sub/mul) also for the overflow operation. That is correct for 32-bit and 64-bit operations, but not for 8-bit and 16-bit operations. The reason is that the 390x ISA does not actually have any 8-bit or 16-bit arithmetic instructions, so we simply use the 32-bit version also for 8-bit and 16-bit operations. That provides the correct (low-part) result, but any overflow indication would be incorrect.
  • There is no unsigned-multiply instruction with overflow indication on our platform. What other compilers do is to use the 32x32->64 or 64x64->128 bit wide multiply instruction, and check whether the high-part of the output is zero.

Copy link
Contributor

@afonso360 afonso360 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 Hey,

I don't know if this is ready for review yet, but It's a great start!

A few comments for the RISC-V part. I didn't check the lowerings in a lot of detail, mostly just spotting a few things that could be shorter.

Thanks for working on this!

Edit: I also ran the fuzzer (with these changes) and it pointed out this testcase.

Fuzzer testcase

Testcase:

test interpret
test run
target riscv64gc 
target x86_64

function %a(i8) -> i8 {
block0(v0: i8):
    v1, v2 = smul_overflow v0, v0
    return v2
}
; run: %a(-15) == 1

Result:

 ERROR cranelift_filetests::concurrent > FAIL: run
FAIL ./test.clif: run

Caused by:
    Failed test: run: %a(-15) == 1, actual: 0
1 tests
Error: 1 failure

(let ((hi XReg (rv_mulhu x y))
(res XReg (rv_mul x y))
(one XReg (imm $I8 1))
(of XReg (gen_select_xreg (cmp_eqz hi) (zero_reg) one)))
Copy link
Contributor

@afonso360 afonso360 Sep 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to use rv_snez here instead of a select between one and zero.

(res XReg (rv_mul tmp_x tmp_y))
(hi XReg (rv_srli res (imm12_const (ty_bits ty))))
(one XReg (imm $I8 1))
(of XReg (gen_select_xreg (cmp_eqz hi) (zero_reg) one)))
Copy link
Contributor

@afonso360 afonso360 Sep 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here

(rule 1 (lower (has_type $I64 (uadd_overflow x y)))
(let ((sum XReg (rv_add x y))
(one XReg (imm $I8 1))
(of XReg (gen_select_xreg (int_compare (IntCC.UnsignedLessThan) sum x) one (zero_reg))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to use rv_sltu here instead of a select between one and zero. The RISC-V comparision functions already return a zero or one, and they are a lot shorter than our current implementation of select_xreg

(high_tmp XReg (rv_add (value_regs_get x 1) (value_regs_get y 1)))
;; add carry.
(high XReg (rv_add high_tmp carry))
(of XReg (gen_select_xreg (int_compare (IntCC.UnsignedLessThan) high carry) one (zero_reg))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto here for sltu.

(res XReg (rv_mul tmp_x tmp_y))
(hi XReg (rv_srai res (imm12_const (ty_bits ty))))
(one XReg (imm $I8 1))
(of XReg (gen_select_xreg (cmp_eqz hi) (zero_reg) one)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be a snez

;; madd dst_hi, x_hi, y_lo, dst_hi
;; madd dst_lo, x_lo, y_lo, zero
(dst_hi1 XReg (rv_mulhu x_lo y_lo))
(one XReg (imm $I32 1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one doesn't seem to be used anywhere, similarly in the rules below there are a few one unused instructions.

(one XReg (imm $I32 1))
(dst_hi2 ValueRegs (smadd_overflow64 x_lo y_hi (value_regs_get dst_hi1 0)))
(dst_hi ValueRegs (smadd_overflow64 x_hi y_lo (value_regs_get dst_hi2 0)))
(dst_lo XReg (madd x_lo y_lo (zero_reg)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing madd here, we can just multiply x_lo and y_lo and save one instruction.

(one XReg (imm $I32 1))
(dst_hi2 ValueRegs (umadd_overflow64 x_lo y_hi (value_regs_get dst_hi1 0)))
(dst_hi ValueRegs (umadd_overflow64 x_hi y_lo (value_regs_get dst_hi2 0)))
(dst_lo XReg (madd x_lo y_lo (zero_reg)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we could use mul instead of madd and save one instruction.

(let ((one XReg (imm $I8 1))
(hi XReg (rv_mulhu x y))
(m XReg (rv_mul x y))
(of_mul XReg (gen_select_xreg (cmp_eqz hi) (zero_reg) one))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could replace this with a snez instruction

(m XReg (rv_mul x y))
(of_mul XReg (gen_select_xreg (cmp_eqz hi) (zero_reg) one))
(sum XReg (rv_add m z))
(of_add XReg (gen_select_xreg (int_compare (IntCC.UnsignedLessThan) sum m) one (zero_reg)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be replaced with a sltu instruction which is a shorter sequence than a full select.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants