arch/arm/src/stm32/stm32_spi: chunk DMA exchanges to honor 16-bit NDTR#18878
Open
cjj66619 wants to merge 1 commit into
Open
arch/arm/src/stm32/stm32_spi: chunk DMA exchanges to honor 16-bit NDTR#18878cjj66619 wants to merge 1 commit into
cjj66619 wants to merge 1 commit into
Conversation
xiaoxiang781216
approved these changes
May 14, 2026
Contributor
|
Hi! Could you remove the leftover template text from your PR description and include a log from your test? Also, the commit message can just be a small summary, with most of the detail in the PR description. Just a nitpick though! |
Author
OK,Thank you for the suggestion! |
The STM32 DMA NDTR/CNDTR transfer-count register is 16 bits wide on
every STM32 series the in-tree driver supports (IPv1 CNDTR, IPv2
SxNDTR). spi_exchange()'s DMA path forwarded the caller's full
nwords to stm32_dmasetup(), so a single SPI_EXCHANGE() of >= 65536
words silently programmed NDTR to (nwords & 0xffff). When the
truncated count was zero - the typical case for an exact 64 KiB
transfer (flash erase block, FAT cluster, common DMA staging
buffer) - the stream completed instantly with no transfer-complete
interrupt and the caller deadlocked in spi_dmarxwait().
Walk the request in chunks of at most 65535 words inside the
existing DMA branch, reusing the same spi_dma{rx,tx}{setup,start,
wait}() sequence per chunk. Single-descriptor transfers (every
in-tree caller today) are byte-for-byte identical. CONFIG_SPI_TRIGGER
is honored for the first chunk only; subsequent chunks must run
unconditionally because re-arming between chunks of one logical
exchange was never part of the SPI_TRIGGER contract.
Drive-by: rescale priv->buflen-clamped nwords so the DMA
descriptor matches the actually-copied byte count, promote the
spiinfo() format specifier from %d to %zu, and fix two adjacent
comment typos.
See the PR description for reproduction, NSH log and benchmark
numbers (5.12 MB/s, 97.5% of SCK/8 @ 42 MHz on STM32F407 + W25Q128).
Signed-off-by: Jinji Cui <113000688+cjj66619@users.noreply.github.com>
78a11a2 to
7fbc303
Compare
acassis
approved these changes
May 14, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
STM32 DMA
NDTR/CNDTRis 16 bits wide on every supported series(IPv1
CNDTRon F0/F1/F3/G4/L0/L1/L4, IPv2SxNDTRon F2/F4/F7/H7).spi_exchange()'s DMA path forwarded the fullnwordstostm32_dmasetup(), so a singleSPI_EXCHANGE()of >= 65536 wordsprogrammed
NDTRto(nwords & 0xffff). For an exact 64 KiB transfer(flash erase block, FAT cluster, common DMA staging buffer) the count
becomes zero, the stream completes instantly with no TC interrupt,
and the caller deadlocks in
spi_dmarxwait().Repro on any STM32 board with
CONFIG_STM32_SPI{n}_DMA=yand a SPIflash behind
drivers/mtd/w25.c:This patch walks the request in chunks of at most 65535 words inside the
existing DMA branch of
spi_exchange(), reusing the existingspi_dma{rx,tx}{setup,start,wait}()sequence per chunk. Single-descriptortransfers (every in-tree caller today) are byte-for-byte identical;
previously-deadlocking transfers now succeed. Fixes
drivers/mtd/w25.c,drivers/mmcsd/mmcsd_spi.cand any user ofSPI_EXCHANGE()/SPI_RECVBLOCK()/SPI_SNDBLOCK()with >= 64 KiB buffers.Testing
STM32F407VGT6 + W25Q128 over SPI1 (SCK 42 MHz, MODE3), NuttX 12.x +
LittleFS. Bench tool:
apps/examples/w25bench(in-tree NSH builtin),each iteration loops internally for a >= 256 KiB timed window.
Without this patch:
--size=64kand--size=1mblock indefinitely inspi_dmarxwait(); 60 s watchdog fires.With this patch:
All three sizes saturate at 5120 KB/s ≈ 97.5% of
SCK/8; the literalpollis a hard-codedw25benchlabel, the actual path isCONFIG_STM32_SPI1_DMA=y+ this patch (label fix in a follow-up).git apply --checkonapache/nuttx@e05292f9da— cleantools/checkpatch.sh -f+tools/nxstyleon the touched file — cleanmake stm32f4discovery:nshdefault and withSTM32_SPI1_DMA=yforcedon — builds, no new warnings.
No H7 / F0 / F1 hardware on hand. The fix is IP-agnostic (both
CNDTRand
SxNDTRare 16-bit per the RMs); a smoke build on those would beappreciated.
Signed-off-by: Jinji Cui 113000688+cjj66619@users.noreply.github.com