-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Legalize: implement soft-float legalizations #25924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Simplifies the logic, clarifies the comment, and fixes a minor bug, which is that we exported the Windows ABI name *instead* of the standard compiler-rt name, but it's meant to be exported *in addition* to the standard name (this is LLVM's behavior and it is more useful).
| soft_f16, | ||
| /// Like `soft_f16`, but for 32-bit floating-point types. | ||
| soft_f32, | ||
| /// Like `soft_f16`, but for 64-bit floating-point types. | ||
| soft_f64, | ||
| /// Like `soft_f16`, but for 80-bit floating-point types. | ||
| soft_f80, | ||
| /// Like `soft_f16`, but for 128-bit floating-point types. | ||
| soft_f128, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we take advantage of some of these in the LLVM backend for the situations where we currently do manual soft float lowering? IIRC this mainly affects f16 and f128 on some targets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No; it's strictly better in terms of both compiler performance and runtime performance to do the work in backends rather than Legalize, so ripping out that support would be counterproductive. Legalize is a tool for incomplete backends (currently technically all of them), not something they should be designed to rely on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both compiler performance and runtime performance
Wait, why would runtime performance be worse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing the translation in Legalize means doing it at the AIR level rather than the MIR level, so it's typical for it to ultimately result in more bloated MIR. For instance, when calling int<->float conversion functions, we might need to zero/sign-extend the integer to an ABI size before we do it; or when operating on c_longdouble we may need to bitcast the result back from f128; or when calling extended int routines (for >128 bits) we might need to make an alloc so we can pass a pointer to an integer. When combined, this can cause one conversion, say u150 -> c_longdouble (with 80-bit long double), to turn into AIR like this:
%1 = block(c_longdouble, {
%2 = intcast(u256, [original operand])
%3 = alloc(*u256)
%4 = store(%3, %2)
%5 = legalize_compiler_rt_call(__floatuneixf, [%3, <usize, 150>])
%6 = bitcast(c_longdouble, %5)
%7 = br(%1, %6)
})
That's a lot of operations, and a typical non-optimizing backend might lower them much less efficiently than it could lower the operation as a whole. For instance, the operand is quite likely to already be spilled to the stack and to be represented in memory as a zero-extended u256, so the intcast/alloc/store should all be nops, but a backend is pretty much guaranteed to at the very least reserve more stack space and shuffle memory around when lowering those AIR instructions. This applies less to the LLVM backend than to machine code backends, because we aren't dealing with register allocation / spills manually in the LLVM backend, but it's still definitely going to happen.
|
Oh, for posterity, here's the patch I used to enable these legalizations in the x86_64 backend so as to test this PR: commit e5d30c005ceaa792775a0e353a3a8398aed5c155
Author: Matthew Lugg <[email protected]>
Date: Fri Nov 14 10:54:12 2025 +0000
DO NOT MERGE; make x86_64 use soft float
diff --git a/src/codegen/x86_64/CodeGen.zig b/src/codegen/x86_64/CodeGen.zig
index b43b359de1..a391527037 100644
--- a/src/codegen/x86_64/CodeGen.zig
+++ b/src/codegen/x86_64/CodeGen.zig
@@ -73,6 +73,12 @@ pub fn legalizeFeatures(_: *const std.Target) *const Air.Legalize.Features {
.expand_packed_store,
.expand_packed_struct_field_val,
.expand_packed_aggregate_init,
+
+ .soft_f16,
+ .soft_f32,
+ .soft_f64,
+ .soft_f80,
+ .soft_f128,
});
}
@@ -173690,8 +173696,40 @@ fn genBody(cg: *CodeGen, body: []const Air.Inst.Index) InnerError!void {
for (ops) |op| try op.die(cg);
},
- // No soft-float `Legalize` features are enabled, so this instruction never appears.
- .legalize_compiler_rt_call => unreachable,
+ .legalize_compiler_rt_call => {
+ const inst_data = air_datas[@intFromEnum(inst)].legalize_compiler_rt_call;
+ const extra = cg.air.extraData(Air.Call, inst_data.payload);
+ const args: []const Air.Inst.Ref = @ptrCast(cg.air.extra.items[extra.end..][0..extra.data.args_len]);
+
+ var sfba_state = std.heap.stackFallback(512, cg.gpa);
+ const sfba = sfba_state.get();
+
+ const arg_tys = try sfba.alloc(Type, args.len);
+ defer sfba.free(arg_tys);
+ const arg_tys_ip = try sfba.alloc(InternPool.Index, args.len);
+ defer sfba.free(arg_tys_ip);
+ const arg_vals = try sfba.alloc(MCValue, args.len);
+ defer sfba.free(arg_vals);
+
+ for (arg_tys, arg_tys_ip, arg_vals, args) |*ty, *ty_ip, *mcv, arg| {
+ ty.* = cg.typeOf(arg);
+ ty_ip.* = ty.*.toIntern();
+ mcv.* = .{ .air_ref = arg };
+ }
+
+ assert(inst_data.func.@"callconv"(zcu.getTarget()).eql(cg.target.cCallingConvention().?));
+ const ret = try cg.genCall(.{ .extern_func = .{
+ .return_type = inst_data.func.returnType().toIntern(),
+ .param_types = arg_tys_ip,
+ .sym = inst_data.func.name(cg.target),
+ } }, arg_tys, arg_vals, .{ .safety = true });
+
+ var bt = cg.liveness.iterateBigTomb(inst);
+ for (args) |arg| try cg.feed(&bt, arg);
+
+ const result = if (cg.liveness.isUnused(inst)) .unreach else ret;
+ cg.finishAirResult(inst, result);
+ },
.work_item_id, .work_group_size, .work_group_id => unreachable,
} |
A new `Legalize.Feature` tag is introduced for each float bit width (16/32/64/80/128). When e.g. `soft_f16` is enabled, all arithmetic and comparison operations on `f16` are converted to calls to the appropriate compiler_rt function using the new AIR tag `.legalize_compiler_rt_call`. This includes casts where the source *or* target type is `f16`, or integer<=>float conversions to or from `f16`. Occasionally, operations are legalized to blocks because there is extra code required; for instance, legalizing `@floatFromInt` where the integer type is larger than 64 bits requires calling an arbitrary-width integer conversion function which accepts a pointer to the integer, so we need to use `alloc` to create such a pointer, and store the integer there (after possibly zero-extending or sign-extending it). No backend currently uses these new legalizations (and as such, no backend currently needs to implement `.legalize_compiler_rt_call`). However, for testing purposes, I tried modifying the self-hosted x86_64 backend to enable all of the soft-float features (and implement the AIR instruction). This modified backend was able to pass all of the behavior tests (except for one `@mod` test where the LLVM backend has a bug resulting in incorrect compiler-rt behavior!), including the tests specific to the self-hosted x86_64 backend. `f16` and `f80` legalizations are likely of particular interest to backend developers, because most architectures do not have instructions to operate on these types. However, enabling *all* of these legalization passes can be useful when developing a new backend to hit the ground running and pass a good amount of tests more easily.
5e9f06e to
421bc43
Compare
See commit messages. (@alexrp, having implemented the requested legalization I now expect you to PR 7 backends /j)