Incorrect UBIT handling for VSIB instructions #338

flobernd · 2024-10-31T08:53:47Z

Hi there,

while running some tests in Zydis, we found a few instructions that do decode in Zydis, but not in XED. After looking at it in detail, I think this might be a bug in XED.

For example:

62 EA F9 01 93 04 11 (XED: vgatherqpd xmm16, k1, qword ptr [r17+xmm18*1])
62 62 FD 06 A0 A4 A4 3E 25 62 62 (XED: vpscatterdq qword ptr ss:[rsp+xmm20*4+0x6262253E] {k6}, xmm28)

All of these instructions have EVEX.U set to 0 which is not allowed for "legacy EVEX" instructions, but XED still decodes them.

It even seems that you even added UBIT=1 to the patterns of these instructions, but it seems to be ignored when decoding:

VPSCATTERDQ
PATTERN:    EVV 0xA0 V66 V0F38 MOD[mm] MOD!=3 UBIT=1 REG[rrr] RM[0b100] RM=4 BCRC=0   VL128  W1 UISA_VMODRM_XMM() eanot16  NOVSR  ZEROING=0  ESIZE_64_BITS() NELEM_GSCAT()

We have not checked further to see if only VSIB instructions are affected or if this behavior applies to other "legacy evex" instructions as well.

cc @mappzor

The text was updated successfully, but these errors were encountered:

marjevan · 2024-10-31T16:06:10Z

Hi,

According to Intel's APX public specification, instructions with ModRM.MOD != 3 reinterpret the U bit as X4, which is ignored if not used by the instruction.

From the APX spec:

As shown below, the X4 (or U) bit is not used in the VSIB memory address calculation:

In XED, the ILD scanner applies the same reinterpretation when APX is enabled, consistent with the spec:

xed/src/dec/xed-ild.c

Lines 1392 to 1398 in b86dd50

    
           if (!mod3 && apx_supported(d)) { 
        
               // [APX] Reinterpret the Ubit with memory index fourth EGPR bit. 
        
               // Pacify the iPattern's UBIT condition and set the X4 inverted value. 
        
               xed_uint_t ubit = xed3_operand_get_ubit(d); 
        
               xed3_operand_set_ubit(d, 1);  // Needed for non-APX EVEX instructions 
        
               xed3_operand_set_rexx4(d, ~ubit & 1); 
        
           }

flobernd · 2024-10-31T21:09:44Z

Hi @marjevan ,

thanks a lot for clarifying!

Our UBIT check was too strict for "legacy" EVEX instructions and did not consider the modrm.mod != 3 case.

flobernd · 2024-11-01T21:00:19Z

Hi @marjevan ,

short follow up question, if you don't mind. Is there a specific reason for using a UBIT=1 filter for APX AMX instructions?

For example:

{
ICLASS:      TILELOADD
CPL:         3
CATEGORY:    AMX_TILE
EXTENSION:   APXEVEX
ISA_SET:     APX_F_AMX
EXCEPTIONS:  AMX-E3-EVEX
REAL_OPCODE: Y
ATTRIBUTES:  DISP8_NO_SCALE NOTSX SPECIAL_AGEN_REQUIRED 
PATTERN:     EVV 0x4B VF2 V0F38 MOD[mm] MOD!=3 REG[rrr] RM[0b100] MODRM() BCRC=0 UBIT=1 W0 VL128 mode64 NOEVSR ZEROING=0 MASK=0
OPERANDS:    REG0=TMM_R3():w:tv:u32 MEM0:r:ptr:u32
IFORM:       TILELOADD_TMMu32_MEMu32_APX
}

This is currently unclear for me as these instructions are only available if APX is enabled anyways, which means that the relaxed UBIT handling always applies (all APX_F_AMX instructions as well have MOD!=3) and both UBIT=0 and UBIT=1 will be allowed in the end. Is there anything I'm missing here?

These are as well the only APX definitions that do not use the EVAPX() or EVAPX_SCC() nonterminal. I know that MASK=0, etc. is semantically equal, but for consistency it would be better to use equivalent NF=0, etc. filters.

The SDM as well defines them using the EEVEX fields, e.g.:

EVEX.128.F2.0F38.W0 4B !(11):rrr:100 A V/N.E. APX_F and AMX-TILE
TILELOADD {NF=0} tmm1, sibmem

marjevan · 2024-11-03T07:31:01Z

Hi,
You’re correct; the UBIT constraint isn’t necessary here, as it's effectively "ignored" by the ILD fixup for MOD != 3 instructions on APX systems.

In theory, some new EVEX instructions might be valid on non-APX systems and would require a UBIT constraint, but this doesn't apply to the APX-promoted instructions (as above). We may consider removing the UBIT constraint leftovers for APX-promoted instructions in the future as a refinement - though this wouldn’t impact decoding functionality and would primarily serve as an internal enhancement.

As for the NF pattern constraint, adding it would require an additional restriction on the lower 2 bits of the EVEX.mask field. We aim to keep the instruction pattern as compact as possible to decrease the number of keys calculated for the instruction hash tables, as this can influence performance.

flobernd · 2024-11-04T09:40:33Z

Hi @marjevan, thanks again for the fast reply!

We may consider removing the UBIT constraint leftovers for APX-promoted instructions in the future as a refinement - though this wouldn’t impact decoding functionality and would primarily serve as an internal enhancement.

👍 Yes, this was the intention of my question here.

As for the NF pattern constraint, adding it would require an additional restriction on the lower 2 bits of the EVEX.mask field. We aim to keep the instruction pattern as compact as possible to decrease the number of keys calculated for the instruction hash tables, as this can influence performance.

Understood the performance concern, but why would this restriction be required for just the APX_F_AMX instructions and not the "regular" APX ones?

In all other places you handle this by adding the EVAPX() nonterminal to the pattern, which clears EVEX.b and EVEX.aaa. IMO the pattern for e.g. TILELOADD should change from:

EVV 0x4B VF2 V0F38 MOD[mm] MOD!=3 REG[rrr] RM[0b100] MODRM() BCRC=0 UBIT=1 W0 VL128 mode64 NOEVSR ZEROING=0 MASK=0

to:

EVV 0x4B VF2 V0F38 MOD[mm] MOD!=3 REG[rrr] RM[0b100] MODRM() ND=0 NF=0 {UBIT=1} W0 VL128 mode64 NOEVSR ZEROING=0 EVAPX()

This uses the same amount of restrictions (BCRC=0 and MASK=0 is replaced by ND=0 and NF=0) with the only addition of the extra EVAPX() use (and if you remove UBIT=1 you would even end up with a smaller pattern).

Or is there a special detail to the APX_F_AMX that I'm missing right now?

The release updates XED according to Intel's latest ISA publications, as detailed in ISE054, ISE055 and AVX10.2-rev2.0. This version includes support for: - Intel Diamond Rapids (DMR) chip - Diamond Rapids AMX instructions - MOVRS and AVX10-MOVRS instructions - SM4 EVEX instructions - MSR-IMM instructions (including APX-promoted variants) - Encoding updates for various AVX10.2 instructions - Other updates across XED chip definitions General - Shared Library Build for Python: Introduces a unique XED shared library build, exposing all XED APIs via a shared library object. This enables the library to be loaded in Python environments, allowing interaction with XED using Python APIs. See the [examples in pyext/README.md](pyext/examples/README.md) for more details. (Closes intelxed#302) - Disassembler Enhancements: Adds support for emitting CS/DS ignored branch hint prefixes, configurable through the `xed_format_options_t` structure. - Updates minimum Python requirement from 3.8 to 3.9. - Improves Internal ISA definition APX files (See intelxed#338) Fixes - Resolves C11 build warnings with GCC (Fixes intelxed#332) - Improves length and error reporting for illegal instructions caused by a zeroed EVEX map (Resolves intelxed#334) Co-authored-by: Arjevani, Maor <[email protected]>

marjevan self-assigned this Oct 31, 2024

marjevan added the requesting feedback label Oct 31, 2024

flobernd closed this as completed Oct 31, 2024

marjevan removed the requesting feedback label Nov 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect UBIT handling for VSIB instructions #338

Incorrect UBIT handling for VSIB instructions #338

flobernd commented Oct 31, 2024

marjevan commented Oct 31, 2024

flobernd commented Oct 31, 2024

flobernd commented Nov 1, 2024

marjevan commented Nov 3, 2024

flobernd commented Nov 4, 2024

Incorrect UBIT handling for VSIB instructions #338

Incorrect UBIT handling for VSIB instructions #338

Comments

flobernd commented Oct 31, 2024

marjevan commented Oct 31, 2024

flobernd commented Oct 31, 2024

flobernd commented Nov 1, 2024

marjevan commented Nov 3, 2024

flobernd commented Nov 4, 2024