Skip to content

Conversation

@jwnrt
Copy link

@jwnrt jwnrt commented Oct 28, 2025

This changes the SPI device's discarding behaviour in two ways:

  1. The byte count is correctly adjusted for discarded bytes so we can receive a new packet when it's finished.
  2. We discard subsequent packets in a transaction if one of them fails. This ensures we treat the transaction as one and don't stop discarding at an arbitrary packet boundary.

This is motivated by seeing that QEMU never stops discarding after it receives an unrecognised flash command as part of a multi-packet transaction (e.g. a "read" transaction which sends a "write read command" packet with EOT=0 followed by a "read bytes" packet with EOT=1).

@rivos-eblot
Copy link

rivos-eblot commented Oct 28, 2025

I do understand the rationale.
However would there be a way to implement this feature using a BUS_STATE state rather than another boolean?

@jwnrt
Copy link
Author

jwnrt commented Oct 28, 2025

I tried that implementation, but the state ended up identical to IDLE expect with the transition to DISCARD in handle_header.

I don't mind switching to a state if you'd prefer

@rivos-eblot
Copy link

rivos-eblot commented Oct 28, 2025

I don't mind switching to a state if you'd prefer

If it is not too much work and it does not clutter the code I would say otherwise it is preferable as it is always easier to debug with a single state that having to track a combination of multiple variables. If this ends up being harder to read, please leave it as is.

@jwnrt jwnrt force-pushed the jw/spi-device-discard branch from 9109cba to 3354aca Compare October 28, 2025 16:21
@jwnrt
Copy link
Author

jwnrt commented Oct 28, 2025

I've made the change and it passes my test, though I may not have got the ideal implementation

edit: oops, let me fix formatting

@jwnrt jwnrt force-pushed the jw/spi-device-discard branch from 3354aca to c1970f6 Compare October 28, 2025 16:23
Copy link

@rivos-eblot rivos-eblot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you also want to add

 * Copyright (c) 2025 lowRISC contributors.

to the file header.

Copy link

@AlexJones0 AlexJones0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me after @rivos-eblot's comments are addressed.

It might also be nice to quickly document this behaviour in the "SPI device CharDev protocol" section of docs/opentitan/spi_device.md, given that the protocol is described in detail (including handling error states) there?

@jwnrt jwnrt force-pushed the jw/spi-device-discard branch from c1970f6 to ea8242e Compare October 29, 2025 09:29
@jwnrt jwnrt requested a review from rivos-eblot October 29, 2025 09:29
@jwnrt jwnrt force-pushed the jw/spi-device-discard branch from ea8242e to d5144c4 Compare October 29, 2025 10:15
@jwnrt
Copy link
Author

jwnrt commented Oct 29, 2025

Rebased on TPM changes

@jwnrt jwnrt requested a review from AlexJones0 October 29, 2025 10:17
Copy link

@rivos-eblot rivos-eblot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I totally lost track of what we want to achieve and how we handle it.
DISCARD_PACKET may not be not helping, as we actively store the incoming packet.

I really need to spend some time to understand this new state machine to be sure it does not introduce unexpected behavior.

I think this part is the most confusing:

    switch (bus->state) {
    case SPI_BUS_IDLE:
    case SPI_BUS_FLASH:
    case SPI_BUS_DISCARD_PACKET:
        BUS_CHANGE_STATE(s, IDLE);
        break;
    case SPI_BUS_DISCARD:
    case SPI_BUS_ERROR:
        BUS_CHANGE_STATE(s, DISCARD_PACKET);
        break;
    default:
        g_assert_not_reached();
        break;
    }
  • I do not think we should resume to IDLE till the full transaction is over. Maybe it is here, not sure (at least, we need some comments)
  • Resuming from ERROR to DISCARD_PACKET is for sure, wrong. For example ERROR is set when the incoming header is invalid, which means there is no way we can recover from it, as we no longer understand the content of the incoming data.

May be flipping to a new state was not a good idea, I'm not sure, but something that is missing is that once we enter the state where we want to discard anything till the last byte of the last packet of a transaction is received (and the same amount of 0xFF bytes is sent back), we should not return to some regular state. Maybe we need one or more extra states to manage:

  • ignoring bytes till the end of the current packet (DISCARD_PACKET)
  • entering a special state to handle the header of the next packet if the transaction to discard is not over (IDLE_IGNORE?), only entering IDLE once the last byte of the packet of the ignored transaction has been received
  • entering the above state when a new packet is received (DISCARD_PACKET)

DISCARD might also need to be renamed or removed: entering this state follows a flash error, so I guess that we want to enter DISCARD_PACKET in this case. It does not seem DISCARD is needed anymore.

@jwnrt
Copy link
Author

jwnrt commented Oct 29, 2025

Let me check my understanding:

  1. When errors occur on malformed packets, we want to remain latched in the error state until reset.
  2. When "soft" errors occur (e.g. unrecognised commands) we want to discard all packets until the end of transaction.

So perhaps we need two state machines, one for the transaction and one for the current packet.

The transaction might have states:

  1. IDLE: waiting for a new transaction.
  2. DISCARD: discard all bytes until end of transaction.
  3. ERROR: latch until reset, stop parsing packets.
  4. FLASH: processing a flash transaction.
  5. TPM: processing a TPM transaction.

The packet might have states:

  1. IDLE: waiting for a new packet.
  2. DISCARD: discard remaining bytes in packet.
  3. ACCEPT/PROCESS: process flash or TPM data.

@rivos-eblot
Copy link

Let me check my understanding:

  1. When errors occur on malformed packets, we want to remain latched in the error state until reset.
  2. When "soft" errors occur (e.g. unrecognised commands) we want to discard all packets until the end of transaction.

Yes, I think so.

So perhaps we need two state machines, one for the transaction and one for the current packet.

If you think it is easier, why not.

Questions:

  1. Why would there be a differenciation at transaction level between flash and TPM modes?
  2. Why would be the use case for DISCARD at packet level? It seems that if some unexpected data is received, it should propagated to the end of the transaction (?)

@jwnrt
Copy link
Author

jwnrt commented Oct 29, 2025

Why would there be a differenciation at transaction level between flash and TPM modes?

@engdoreis can help me out here, but I think the TPM and flash have separate buffers and configuration registers that influence how incoming data it processed. Apparently both TPM and flash can be enabled simultaneously, but the chip select determines which mode to use for a transaction. We don't want to accept transactions which interleave flash and TPM packets, so we need to remember which mode the transaction is in.

Why would be the use case for DISCARD at packet level? It seems that if some unexpected data is received, it should propagated to the end of the transaction (?)

My intention was to handle a case like this:

  1. First packet arrives with an unknown command. EOT=0.
  2. This is a soft error, so we enter DISCARD at both the transaction and packet level.
  3. At the end of the packet, we return to IDLE at the packet level but retain DISCARD at the transaction level.
  4. The next packet arrives, but we transition directly to DISCARD at the packet level instead of ACCEPT.

I think I'm not understanding what you mean by propagating the data to the end of the transaction. My plan was for DISCARD at the transaction level to remember that we need to discard all subsequent packets, and DISCARD at the packet level to remember that we need to discard all remaining bytes in the current packet.

@engdoreis
Copy link

Apparently both TPM and flash can be enabled simultaneously, but the chip select determines which mode to use for a transaction

Yes, that's correct, it's how the HW is implemented, it's possible to enable flash mode and TPM mode at the same time, and the different CS will define how the incoming package will be processed.

@rivos-eblot
Copy link

@engdoreis can help me out here, but I think the TPM and flash have separate buffers and configuration registers that influence how incoming data it processed. Apparently both TPM and flash can be enabled simultaneously, but the chip select determines which mode to use for a transaction. We don't want to accept transactions which interleave flash and TPM packets, so we need to remember which mode the transaction is in.

If I got it right, as the SPI device CharDev protocol does not support this feature yet, and it has been chosen to not support it for now, I think this use case can be dismissed for now.

  1. First packet arrives with an unknown command. EOT=0.
  2. This is a soft error, so we enter DISCARD at both the transaction and packet level.
  3. At the end of the packet, we return to IDLE at the packet level but retain DISCARD at the transaction level.
  4. The next packet arrives, but we transition directly to DISCARD at the packet level instead of ACCEPT.

I'm really not sure this can be handled with 2 SMs as it seems both are inter-dependent, and/or I think the naming is confusing.

The transport level, should decode the header, enter fatal error if some byte is deemed invalid, forward the payload, and inform the next layer whether it is the last packet of a transaction. So it needs an ERROR state whenever it is no longer able to parse the incoming stream, and leave this ERROR state only on SPI Device reset. I do not think it needs a "DISCARD" state.

The next layer, and here starts my confusion about the naming, should enter the FLASH or TPM mode, enter a DISCARD mode if the current command no longer accept so extra bytes for the current -packet-. I'm not sure whether it needs both ERROR and DISCARD mode in this case.

Please ignore my last response regarding the data, it is meaningless.

@ziuziakowska
Copy link

We would never want to discard only some bytes in the middle of a transaction that has been marked as erroneous, and discarding should persist until after a packet with EOT=1 (which represents the de-assertion of chip select), so I understand the rationale of this change, but I think an additional DISCARD_PACKET state makes things a lot more confusing and risks making the DISCARD state redundant.

I think the onus should fall on the IDLE state here in ot_spi_device_chr_handle_header - it is solely responsible for parsing the header and dispatching to the different states such as Flash or TPM. If the previous transaction packet has been sent with EOT=0 and the bus ended in the DISCARD state, that should set a flag that makes this function dispatch into DISCARD until after it has received a packet with EOT=1.

@jwnrt jwnrt force-pushed the jw/spi-device-discard branch 2 times, most recently from 9109cba to b46c1d6 Compare October 30, 2025 10:37
@jwnrt
Copy link
Author

jwnrt commented Oct 30, 2025

I have reverted back to the boolean flag separate from the state machine, but maybe the SPI device does need overhauling separately

jwnrt added 2 commits October 30, 2025 15:59
We must adjust the byte count by the number discarded to that we can
return to `IDLE` when the packet ends.

Signed-off-by: James Wainwright <[email protected]>
If one packet in a transaction triggers an error and needs discarding,
we must continue discarding the remaining packets until CS is released.

Signed-off-by: James Wainwright <[email protected]>
@jwnrt jwnrt force-pushed the jw/spi-device-discard branch from b46c1d6 to 8a9f6c3 Compare October 30, 2025 15:59
@jwnrt jwnrt requested a review from rivos-eblot October 30, 2025 16:00
Copy link

@rivos-eblot rivos-eblot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (not tested)

@jwnrt jwnrt merged commit bd6d66f into lowRISC:ot-9.2.0 Oct 30, 2025
12 of 13 checks passed
@jwnrt jwnrt deleted the jw/spi-device-discard branch October 30, 2025 23:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants