Replace FMA's LZC with CVW's LZA #149

emustafa96 · 2025-04-29T11:56:43Z

Replace leading zero counter with leading zero anticipator in FMA sum path

Summary

This PR optimizes the floating-point multiply-add (FMA) unit by replacing the sequential leading zero counter (LZC) in the sum path with a parallel leading zero anticipator (LZA). This change removes normalization from the critical path, significantly improving FMA performance.

Problem

The previous implementation computed the sum first, then counted leading zeros for normalization:

Multiply → Align → Add/Subtract → Count Leading Zeros → Normalize → Result
                                      ↑
                               Critical path bottleneck

This sequential approach added unnecessary latency to the FMA operation, as normalization had to wait for the complete sum calculation.

Solution

Added Schmookler's leading zero anticipation algorithm IEEEX, implemented in the Walley Core that predicts the normalization shift count in parallel with the sum computation:

Multiply → Align → Add/Subtract ──────────→ Normalize → Result
           ↓                               ↗
           └── Leading Zero Anticipator ──┘
           (in parallel)

Technical Details

The LZA implementation:

Uses carry-lookahead logic (P/G/K signals) to predict leading zero patterns
Handles both addition and subtraction operations via the sub control signal
Added logic to detect and handle miss-predictions by one
Feeds the predicted shift count directly to the normalization stage

Testing

Verified with Synopsys VC formal 's sequential equivalence check
Proven to be equal

  Summary Proofs:
   ----------------------------------------------------------------------------------------------------------------------
    VpId |           Name |      Type |         Parent |     #A |     #C |     #S |     #F |     #I |    Status |     %
   ----------------------------------------------------------------------------------------------------------------------
       0 |         seqdef |      root |            nil |     13 |      3 |     13 |      0 |      0 |   success |   100
       0 |      seqdef-rw |        or |         seqdef |      - |      - |      - |      - |      - |         - |     -
       0 |          rw1_1 |       int |      seqdef-rw |      5 |      0 |      5 |      0 |      0 |   success |   100
       0 |       rw1_1-ur |        or |          rw1_1 |      - |      - |      - |      - |      - |         - |     -
       0 |           ur_1 |      leaf |       rw1_1-ur |      4 |      0 |      4 |      0 |      0 |   success |   100
       0 |      rw1_1-dcp | decompose |          rw1_1 |      - |      - |      - |      - |      - |         - |     -
       0 |         idcp_1 |      leaf |      rw1_1-dcp |      4 |      0 |      4 |      0 |      0 |   success |   100
   ----------------------------------------------------------------------------------------------------------------------

emustafa96 · 2025-06-25T12:21:07Z

The following script can be used to verify that the proposed changes are sequentially equivalent to the current implementation with Synopsys VC formal 's sequential equivalence check (vcf -file script_below.tcl):

set_fml_appmode SEQ

set SCRIPT_DIR [file normalize [file join [file dirname [info script]] ]]

set flist_golden [list \
 "common_cells/src/cf_math_pkg.sv" \
 "common_cells/src/lzc.sv" \
 "cvfpu/src/fpnew_pkg.sv" \
 "cvfpu/src/fpnew_classifier.sv" \
 "cvfpu/src/fpnew_rounding.sv" \
 "cvfpu/src/fpnew_fma_multi.sv" \
]
set flist_impl [list \
 "common_cells/src/cf_math_pkg.sv" \
 "common_cells/src/lzc.sv" \
 "cvfpu/src/fpnew_pkg.sv" \
 "cvfpu/src/fpnew_classifier.sv" \
 "cvfpu/src/fpnew_rounding.sv" \
 "cvfpu/src/fpnew_fma_multi_new.sv" \
 "cvfpu/vendor/cvw/fma/fmalza.sv" \
]

analyze -format sverilog -library spec -vcs $flist_golden +incdir+common_cells/include
analyze -format sverilog -library impl -vcs $flist_impl +incdir+common_cells/include


elaborate_seq -spectop fpnew_fma_multi -impltop fpnew_fma_multi

map_by_name -clock spec.clk_i

create_clock -period 100 spec.clk_i
create_reset spec.rst_ni -sense low

fvassume -expr {spec.src_fmt_i == 0}
fvassume -expr {spec.src2_fmt_i == 0}
fvassume -expr {spec.dst_fmt_i == 0}

sim_run -stable
sim_set_state -uninitialized -apply 0

check_fv -block

report_proofs

Make sure to have the correct paths to cvfpuand common_cells relative to where vcf is called.

cvfpu/src/fpnew_fma_multi_new.sv contains the changes of this patch, while cvfpu/src/fpnew_fma_multi.sv and all other source files are the current version from develop. Different source and destination formats can be tried manually (unfortunately, runtime explodes when attempting to constrain these more loosely via, e.g., spec.src_fmt_i inside {0,1,2,3,4}).

rgiunti · 2025-10-28T16:19:47Z

Hi @emustafa96. I tested the PR making use of the UVM testbench https://github.com/openhwgroup/cvfpu-uvm.git. In my test I set the FPU instance implementation in order to have merged slice for FMA unit so that the ADD MUL operations can stress your modifications. As a regression test I ran 10000 random transactions with random operation, operands, FP format and FP rounding mode repeated for 10 different seeds then the results have been compared with those given by the MPFR golden model. I can see that everything is fine so if you agree with my test and results I think that the PR can be merged.

emustafa96 · 2025-10-28T16:24:27Z

Hi @rgiunti, Thank you for the efforts! Concluding from the formal equivalence check and your testing, I also think we can merge.

michael-platzer added 3 commits April 29, 2025 13:55

⚡️ Replace FMA's LZC with CVW's LZA

ebe1ad7

Add CVW's LZA to Bender manifest

b56c6f2

Cleanup FMA's LZA correction and normalization logic

69fcc9f

emustafa96 requested a review from lucabertaccini as a code owner April 29, 2025 11:56

zarubaf requested review from davideschiavone and stmach June 25, 2025 10:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replace FMA's LZC with CVW's LZA #149

Replace FMA's LZC with CVW's LZA #149

Uh oh!

emustafa96 commented Apr 29, 2025 •

edited

Loading

Uh oh!

emustafa96 commented Jun 25, 2025

Uh oh!

rgiunti commented Oct 28, 2025

Uh oh!

emustafa96 commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Replace FMA's LZC with CVW's LZA #149

Are you sure you want to change the base?

Replace FMA's LZC with CVW's LZA #149

Uh oh!

Conversation

emustafa96 commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Replace leading zero counter with leading zero anticipator in FMA sum path

Summary

Problem

Solution

Technical Details

Testing

Uh oh!

emustafa96 commented Jun 25, 2025

Uh oh!

rgiunti commented Oct 28, 2025

Uh oh!

emustafa96 commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

emustafa96 commented Apr 29, 2025 •

edited

Loading