Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect UBIT handling for VSIB instructions #338

Closed
flobernd opened this issue Oct 31, 2024 · 5 comments
Closed

Incorrect UBIT handling for VSIB instructions #338

flobernd opened this issue Oct 31, 2024 · 5 comments
Assignees

Comments

@flobernd
Copy link

Hi there,

while running some tests in Zydis, we found a few instructions that do decode in Zydis, but not in XED. After looking at it in detail, I think this might be a bug in XED.

For example:

  • 62 EA F9 01 93 04 11 (XED: vgatherqpd xmm16, k1, qword ptr [r17+xmm18*1])
  • 62 62 FD 06 A0 A4 A4 3E 25 62 62 (XED: vpscatterdq qword ptr ss:[rsp+xmm20*4+0x6262253E] {k6}, xmm28)

All of these instructions have EVEX.U set to 0 which is not allowed for "legacy EVEX" instructions, but XED still decodes them.

It even seems that you even added UBIT=1 to the patterns of these instructions, but it seems to be ignored when decoding:

VPSCATTERDQ
PATTERN:    EVV 0xA0 V66 V0F38 MOD[mm] MOD!=3 UBIT=1 REG[rrr] RM[0b100] RM=4 BCRC=0   VL128  W1 UISA_VMODRM_XMM() eanot16  NOVSR  ZEROING=0  ESIZE_64_BITS() NELEM_GSCAT()

We have not checked further to see if only VSIB instructions are affected or if this behavior applies to other "legacy evex" instructions as well.

cc @mappzor

@marjevan
Copy link
Member

Hi,

According to Intel's APX public specification, instructions with ModRM.MOD != 3 reinterpret the U bit as X4, which is ignored if not used by the instruction.

From the APX spec:
image

As shown below, the X4 (or U) bit is not used in the VSIB memory address calculation:
image

In XED, the ILD scanner applies the same reinterpretation when APX is enabled, consistent with the spec:

xed/src/dec/xed-ild.c

Lines 1392 to 1398 in b86dd50

if (!mod3 && apx_supported(d)) {
// [APX] Reinterpret the Ubit with memory index fourth EGPR bit.
// Pacify the iPattern's UBIT condition and set the X4 inverted value.
xed_uint_t ubit = xed3_operand_get_ubit(d);
xed3_operand_set_ubit(d, 1); // Needed for non-APX EVEX instructions
xed3_operand_set_rexx4(d, ~ubit & 1);
}

@flobernd
Copy link
Author

Hi @marjevan ,

thanks a lot for clarifying!

Our UBIT check was too strict for "legacy" EVEX instructions and did not consider the modrm.mod != 3 case.

@flobernd
Copy link
Author

flobernd commented Nov 1, 2024

Hi @marjevan ,

short follow up question, if you don't mind. Is there a specific reason for using a UBIT=1 filter for APX AMX instructions?

For example:

{
ICLASS:      TILELOADD
CPL:         3
CATEGORY:    AMX_TILE
EXTENSION:   APXEVEX
ISA_SET:     APX_F_AMX
EXCEPTIONS:  AMX-E3-EVEX
REAL_OPCODE: Y
ATTRIBUTES:  DISP8_NO_SCALE NOTSX SPECIAL_AGEN_REQUIRED 
PATTERN:     EVV 0x4B VF2 V0F38 MOD[mm] MOD!=3 REG[rrr] RM[0b100] MODRM() BCRC=0 UBIT=1 W0 VL128 mode64 NOEVSR ZEROING=0 MASK=0
OPERANDS:    REG0=TMM_R3():w:tv:u32 MEM0:r:ptr:u32
IFORM:       TILELOADD_TMMu32_MEMu32_APX
}

This is currently unclear for me as these instructions are only available if APX is enabled anyways, which means that the relaxed UBIT handling always applies (all APX_F_AMX instructions as well have MOD!=3) and both UBIT=0 and UBIT=1 will be allowed in the end. Is there anything I'm missing here?

These are as well the only APX definitions that do not use the EVAPX() or EVAPX_SCC() nonterminal. I know that MASK=0, etc. is semantically equal, but for consistency it would be better to use equivalent NF=0, etc. filters.

The SDM as well defines them using the EEVEX fields, e.g.:

EVEX.128.F2.0F38.W0 4B !(11):rrr:100 A V/N.E. APX_F and AMX-TILE
TILELOADD {NF=0} tmm1, sibmem

@marjevan
Copy link
Member

marjevan commented Nov 3, 2024

Hi,
You’re correct; the UBIT constraint isn’t necessary here, as it's effectively "ignored" by the ILD fixup for MOD != 3 instructions on APX systems.

In theory, some new EVEX instructions might be valid on non-APX systems and would require a UBIT constraint, but this doesn't apply to the APX-promoted instructions (as above). We may consider removing the UBIT constraint leftovers for APX-promoted instructions in the future as a refinement - though this wouldn’t impact decoding functionality and would primarily serve as an internal enhancement.

As for the NF pattern constraint, adding it would require an additional restriction on the lower 2 bits of the EVEX.mask field. We aim to keep the instruction pattern as compact as possible to decrease the number of keys calculated for the instruction hash tables, as this can influence performance.

@flobernd
Copy link
Author

flobernd commented Nov 4, 2024

Hi @marjevan, thanks again for the fast reply!

We may consider removing the UBIT constraint leftovers for APX-promoted instructions in the future as a refinement - though this wouldn’t impact decoding functionality and would primarily serve as an internal enhancement.

👍 Yes, this was the intention of my question here.

As for the NF pattern constraint, adding it would require an additional restriction on the lower 2 bits of the EVEX.mask field. We aim to keep the instruction pattern as compact as possible to decrease the number of keys calculated for the instruction hash tables, as this can influence performance.

Understood the performance concern, but why would this restriction be required for just the APX_F_AMX instructions and not the "regular" APX ones?

In all other places you handle this by adding the EVAPX() nonterminal to the pattern, which clears EVEX.b and EVEX.aaa. IMO the pattern for e.g. TILELOADD should change from:

EVV 0x4B VF2 V0F38 MOD[mm] MOD!=3 REG[rrr] RM[0b100] MODRM() BCRC=0 UBIT=1 W0 VL128 mode64 NOEVSR ZEROING=0 MASK=0

to:

EVV 0x4B VF2 V0F38 MOD[mm] MOD!=3 REG[rrr] RM[0b100] MODRM() ND=0 NF=0 {UBIT=1} W0 VL128 mode64 NOEVSR ZEROING=0 EVAPX()

This uses the same amount of restrictions (BCRC=0 and MASK=0 is replaced by ND=0 and NF=0) with the only addition of the extra EVAPX() use (and if you remove UBIT=1 you would even end up with a smaller pattern).

Or is there a special detail to the APX_F_AMX that I'm missing right now?

Scr3amer pushed a commit to Scr3amer/xed that referenced this issue Feb 13, 2025
The release updates XED according to Intel's latest ISA publications, as detailed in
ISE054, ISE055 and AVX10.2-rev2.0.

This version includes support for:
  - Intel Diamond Rapids (DMR) chip
  - Diamond Rapids AMX instructions
  - MOVRS and AVX10-MOVRS instructions
  - SM4 EVEX instructions
  - MSR-IMM instructions (including APX-promoted variants)
  - Encoding updates for various AVX10.2 instructions
  - Other updates across XED chip definitions


General
  - Shared Library Build for Python: Introduces a unique XED shared library build, exposing all XED APIs via a shared library object. This enables the library to be loaded in Python environments, allowing interaction with XED using Python APIs.
  See the [examples in pyext/README.md](pyext/examples/README.md) for more details. (Closes intelxed#302)
  - Disassembler Enhancements: Adds support for emitting CS/DS ignored branch hint prefixes, configurable through the `xed_format_options_t` structure.
  - Updates minimum Python requirement from 3.8 to 3.9.
  - Improves Internal ISA definition APX files (See intelxed#338)

Fixes
  - Resolves C11 build warnings with GCC (Fixes intelxed#332)
  - Improves length and error reporting for illegal instructions caused by a zeroed EVEX map (Resolves intelxed#334)

Co-authored-by: Arjevani, Maor <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants