Skip to content

Added stealthchop lag compensation code#7258

Draft
dmbutyugin wants to merge 9 commits into
Klipper3d:masterfrom
dmbutyugin:stealthchop-comp
Draft

Added stealthchop lag compensation code#7258
dmbutyugin wants to merge 9 commits into
Klipper3d:masterfrom
dmbutyugin:stealthchop-comp

Conversation

@dmbutyugin

Copy link
Copy Markdown
Collaborator

It is often reported that enabling StealthChop mode in TMC drivers results in positional errors of the stepper motors, which can lead to geometrical distortions of the printed models, especially on more complicated kinematics such as CoreXY. This is a proof-of-concept PR that adds runtime compensation for stepper position lag in StealthChop mode. I opened this PR as a draft to facilitate further testing following the discussions in this discourse threads (and original proposal from here). @nefelim4ag FYI

Signed-off-by: Dmitry Butyugin <dmbutyugin@google.com>
Signed-off-by: Dmitry Butyugin <dmbutyugin@google.com>
Signed-off-by: Dmitry Butyugin <dmbutyugin@google.com>
This integration eliminates one subtraction of very close values,
improving numerical stability of integration, and it is also faster,
requiring 16 multiplications and 9 additions per move rather than
20 multiplications and 11 additions in the previous implementation.

Signed-off-by: Dmitry Butyugin <dmbutyugin@google.com>
Signed-off-by: Dmitry Butyugin <dmbutyugin@google.com>
Signed-off-by: Dmitry Butyugin <dmbutyugin@google.com>
Signed-off-by: Dmitry Butyugin <dmbutyugin@google.com>
Signed-off-by: Dmitry Butyugin <dmbutyugin@google.com>
Signed-off-by: Dmitry Butyugin <dmbutyugin@google.com>
@github-actions

Copy link
Copy Markdown

Thank you for your contribution to Klipper. Unfortunately, a reviewer has not assigned themselves to this GitHub Pull Request. All Pull Requests are reviewed before merging, and a reviewer will need to volunteer. Further information is available at: https://www.klipper3d.org/CONTRIBUTING.html

There are some steps that you can take now:

  1. Perform a self-review of your Pull Request by following the steps at: https://www.klipper3d.org/CONTRIBUTING.html#what-to-expect-in-a-review
    If you have completed a self-review, be sure to state the results of that self-review explicitly in the Pull Request comments. A reviewer is more likely to participate if the bulk of a review has already been completed.
  2. Consider opening a topic on the Klipper Discourse server to discuss this work. The Discourse server is a good place to discuss development ideas and to engage users interested in testing. Reviewers are more likely to prioritize Pull Requests with an active community of users.
  3. Consider helping out reviewers by reviewing other Klipper Pull Requests. Taking the time to perform a careful and detailed review of others work is appreciated. Regular contributors are more likely to prioritize the contributions of other regular contributors.

Unfortunately, if a reviewer does not assign themselves to this GitHub Pull Request then it will be automatically closed. If this happens, then it is a good idea to move further discussion to the Klipper Discourse server. Reviewers can reach out on that forum to let you know if they are interested and when they are available.

Best regards,
~ Your friendly GitIssueBot

PS: I'm just an automated script, not a human being.

@MRX8024

MRX8024 commented May 13, 2026

Copy link
Copy Markdown
Contributor

Hi, I ran some tests with stealthchop and I have a couple of questions.

What exactly do L and R represent in cmd_ESTIMATE_STEPPER_SHAPER_PARAM()? If L is the stepper motor inductance, why is its minvalue=0.1, the units is millihenry?

GCode:

SET_VELOCITY_LIMIT ACCEL=10000
G4 P1000
G28 X
G0 X100 F10000
G4 P1000
G0 X110
G0 X90
G0 X110
G0 X90
G0 X110
G0 X90

I’m using QSH4218-47-28-040 stepper motors with a phase resistance of 0.5 ohm and a phase inductance of 0.6 mH.

I also have encoders installed that are named the same as the steppers themselves, so on the "deviation_stepper_*" graphs, these are encoder readings.

["kin(stepper_y)"],
["deviation(angle(stepper_y),kin(stepper_y))"],

spreadcycle
spreadcycle

stealthchop comp disabled
stealthchop-comp-disabled

stealthchop-comp=1.0e-06
velocity_smooth_time=0.001
stealthchop-comp-1 0e-06-time-0 001

stealthchop-comp=1.0e-06
velocity_smooth_time=0.1
stealthchop-comp-1 0e-06-time-0 1

stealthchop-comp=1.0e-03
velocity_smooth_time=0.001
stealthchop-comp-1 0e-03-time-0 001

stealthchop-comp=1.0e-03
velocity_smooth_time=0.1
stealthchop-comp-1 0e-03-time-0 1

stealthchop-comp=1.0e-02
velocity_smooth_time=0.1
stealthchop-comp-1 0e-02-time-0 1

Unfortunately, during testing with the following configuration:

[stepper_shaper stepper_y]
force_stealthchop_comp: False
force_spreadcycle_comp: False
velocity_smooth_time: 0.005
stealthchop_comp: 1.0e-01
spreadcycle_comp: 0.00

the BTT TMC5160 Plus driver failed, with its mcu/mosfets shorting/burning out. I would assume this is related to abnormally abrupt motion with effectively unbounded speed. It might make sense to add protection limiting the maximum stepper speed.

["kin(stepper_y)"],
["stepq(stepper_y)"],
["derivative(stepq(stepper_y))"],
stealthchop-comp-1 0e-01-time-0 005-v

@dmbutyugin

Copy link
Copy Markdown
Collaborator Author

@MRX8024 thanks for testing!

What exactly do L and R represent in cmd_ESTIMATE_STEPPER_SHAPER_PARAM()? If L is the stepper motor inductance, why is its minvalue=0.1, the units is millihenry?

R is resistance in Ohm, L is inductance in mH (millihenri).

I’m using QSH4218-47-28-040 stepper motors with a phase resistance of 0.5 ohm and a phase inductance of 0.6 mH.

So, based on these values, a good-guess value for a compensation parameter is 0.001 * 0.6 / 0.5 = 1.2e-3. I guess this also roughly matches your tests, with 1e-3 showing good (about 2x) reduction in the positional lag, and 1e-2 showing already a large overshoot. So, I suppose an optimal value would be somewhere in the range of 1.2e-3 - 3e-3.

Unfortunately, during testing with the following configuration:

[stepper_shaper stepper_y]
force_stealthchop_comp: False
force_spreadcycle_comp: False
velocity_smooth_time: 0.005
stealthchop_comp: 1.0e-01
spreadcycle_comp: 0.00

the BTT TMC5160 Plus driver failed, with its mcu/mosfets shorting/burning out. I would assume this is related to abnormally abrupt motion with effectively unbounded speed. It might make sense to add protection limiting the maximum stepper speed.

I'm sorry to hear that. To be honest, I am not familiar with any software or hardware defect of TMC5160 that could lead to such outcome, though that possibility cannot be ruled out completely, of course. But generally, there are few 'prevention' mechanisms that, even if accidental, should prevent a TMC driver from damage:

  • Too large velocity would result in a very tight step timings, either causing stepcompress errors or 'Timer too close' errors,
  • TMC drivers have minimum step duration, and if the steps are shorter than that threshold, they simply fail to detect the steps. For example, STM32H723 with all optimizations enabled can generate steps too fast and exceed the step rate detectable by TMC, so 'step on both edges' optimization is disabled for it. So, in the end, the maximum step rate that the MCU sends to a TMC driver is limited, also limiting the maximum velocity (for example, the aforementioned MCU can send, based on benchmarks, around 7.4M steps per second to a single driver, at 256 microsteps and 0.2mm full step, this amounts to ~5.8 m/s linear velocity),
  • Then even if the actual motion speed is still high (e.g. one sets low microstepping settings), the actual current that the driver can send to the motor (and through its own circuitry) is still limited by the PSU voltage, and stepper resistance and inductance - so even if velocity is 'infinite', the current cannot be infinite. It should also not operate in 'short-the-phase-and-forget' mode, since at 7.4M steps/second each phase alternates the current at ~7.2 kHz (with 256 microstepping, and even higher frequency if microstepping is lower).

So, alas, I'm not sure how this PR, even if misconfigured, could have fried the TMC5160. I'm only aware of a TMC2208 bug that on quick direction change (which could have happened with PA) in stealthchop mode would trigger over-current protection error (but AFAIR it was a 'false triggering' bug). So again, I'm not sure if TMC5160 could have some hardware bug that would make it short the phases at very high steps in stealthchop mode, I'm just not aware of anything like that. But if some issue like that exist, I wonder if extremely low resistance and inductance of the motor contributed to this unfortunate outcome (and also whether you are using higher-voltage PSU)?

Separately, it would indeed be a good idea to make safety limits more tight, it's just for now they were set how they are to allow more permissive testing as for now we do not exactly know what the reasonable limits for each parameter are (and FWIW, one could trigger somewhat similar behavior with PA by setting high PA constant and small pressure_advance_smooth_time).

@KevinOConnor

Copy link
Copy Markdown
Collaborator

But generally, there are few 'prevention' mechanisms that, even if accidental, should prevent a TMC driver from damage:

Too large velocity would result in a very tight step timings, either causing stepcompress errors or 'Timer too close' errors,

For what it is worth, my reading "between the lines" of the tmc specifications is that stealthchop mode doesn't actually measure current (it's not a "current chopper") - it just makes guesses about how long to apply power. (See: https://klipper.discourse.group/t/spreadcycle-and-stealthchop-an-advanced-guide/23938/1 )

So, as I understand it, it is not necessary for the mcu to send lots of steps to the tmc driver to get it to fully power a coil for an extended period of time. It is only necessary for the driver to make poor guesses. So, as a random example, if the driver guesses it needs 10x more power to move 1mm/s faster and the mcu jumps velocity from 20mm/s to 100mm/s, then I'd expect the driver could enable power to a coil for 800x longer. Thus, in this random example, it doesn't need lots of mcu generated steps for things to go awry.

I know the tmc2209s (and similar) have an over-current check even when in stealthchop mode. I'm not sure about the tmc5160. Since the tmc5160 has external mosfets, I'd guess a close read of the spec would be needed to see what kind of over-current checks there are.

If it was me, I'd probably avoid any kind of abrupt pulsing movements (high bursts of acceleration) while in stealthchop mode.

I am not an expert on TMC drivers - this is just my general understanding - so take what I say here "with a grain of salt".

Cheers,
-Kevin

@nefelim4ag

nefelim4ag commented May 14, 2026

Copy link
Copy Markdown
Collaborator

AFAIU.
Stealth is the voltage source. It measures the current only during the ON phase.
It controls the current by the width of the PWM, basically.
Unlike Spread, normally there is only On/Off cycle for each coil, and that is it.

"positive" direction image
"negative" direction image

So, normally, it is simpler than the SpreadCycle.
The main problem that can happen is that a large external MOSFET can take some time switching on and off. So, there can be transient states, which should be covered by the BBMCLKS and DRV_STRENGTH.
If that happens, all MOSFETs are half open at the same time, which will short the whole thing:

Transient short state image
And I guess, it can only happen during the polarity switching for the stealth (transition from one full step to the next full step).

And it should normally be covered by the S2G/S2VS.
So, I would expect the reason for the above event is the driver misconfiguration, where it just happens that Stealth triggered this state.

So, I guess, the question is, why has protection not been triggered?

@nefelim4ag

nefelim4ag commented May 16, 2026

Copy link
Copy Markdown
Collaborator

For future testing, we basically want to ensure that documented stealth lag is visible on the real print in the first place.

To do so, the suggested procedure is:

  • On CoreXY limit velocity to <140mm/s (to not trigger resonance during diagonal movements)
  • On Cartesian < 200 mm/s for the same reason
  • Interpolation should be disabled; it does not make sense to use one with stealth (and it can add additional lag)
  • It is suggested to have 10k acceleration and to have good enough shapers (otherwise it can be hard to distinguish anything).

Suggested models:
StealthTest.zip
One model already has small cracks and requires high-resolution slicing to be printed
Another one (with photos below) needs to be printed with reduced flow, like 95%.

So one can print the model samples in the Spread, then in the stealth (without compensation)

Photos

Spread
image

Stealth
image
image

It is advisable to have more than one sample at each mode.

Then, one can enable StealthChop lag compensation, based on the motor datasheet, for example.
My motors are LDO 2504 and have 1.2 Ohm phase resistance and 1.5mH phase inductance.
So, I can calculate the compensation constant like: 0.001 * 1.5 / 1.2 = 0.00125

And define it in the configuration:

[tmc5160 stepper_x]
stealthchop_threshold: 99999
...
[stepper_shaper stepper_x]
force_stealthchop_comp: True
stealthchop_comp: 0.00125

And try to reprint the test samples.

Alas, for example, I cannot reproduce distinguishable results on my machine in the first place.

Photos 20250517_144552 20250517_144540 20250517_144535 20250517_144527 20250517_144521 20250517_144514

Btw,
If it is used on Z

Stepper 'stepper stepper_z' position skew after probe: pos -4099 now -4105
Stepper 'stepper stepper_z1' position skew after probe: pos -3824 now -3821
Stepper 'stepper stepper_z2' position skew after probe: pos -3824 now -3821

Right now, I'm not sure why it happens with the probe, my pure guess, because we compare the trapq position with the stepcompress position, where step compress can be shifted based on the compensated lag.

And for whatever reason: def verify_no_probe_skew(self, haltpos): does not utilize the haltpos internally

Regards,
-Timofey

@nefelim4ag

nefelim4ag commented May 30, 2026

Copy link
Copy Markdown
Collaborator
Stepper 'stepper stepper_z' position skew after probe: pos -4099 now -4105

Hmmm, I think that trigger happens at the position which is obviously behind the commanded one.
Because this is how we compensate, we output steps in a way that the toolhead will be at the requested (trapq?) position, where the target stepper position is in the future.

It looks pretty similar to how FOC works. FOC adjusts the voltage phase, so the current phase would be 90 degrees ahead of the rotor. We adjust the step position/voltage phase, so the toolhead position would be where the current actually is.

Upon halt, the toolhead would reach the commanded position, and lag would be subtracted.
So, the above difference is the difference between the position at which the trigger happened and the stepper position it was commanded for.

I guess that stepcompress_find_past_position() should be somehow aware of that difference.
Otherwise, get_past_mcu_position() is invalid as long as there is any compensation enabled.
Because get_past_mcu_position() no longer represents where the toolhead and stepper actually were at the time.
But only the commanded position.

Regards,
-Timofey

@KevinOConnor

Copy link
Copy Markdown
Collaborator

Stepper 'stepper stepper_z' position skew after probe: pos -4099 now -4105

FYI, this warning appears if all the steppers on a multi-stepper axis (eg, stepper_z, stepper_z1, stepper_z2) do not travel the same distance during a homing/probing attempt.

So, for example, if all the stepper_z motors start at mcu position 101 and stepper_z/stepper_z1 halt at position 4099 while stepper_z2 halts at mcu position 4105 then a warning is produced.

The code is not expecting the steppers controlling a single axis to move different amounts during homing/probing. If they do move different amounts, there is no correction for it, and the motors will be skewed for the remainder of the print (until the next z_tilt_adjust, quad_gantry_level, or the next home/probe that skews them further).

Ideally, the code could detect a skew during homing/probing and restore the proper axis alignment. (Restoring alignment would allow support multi-mcu multi-stepper axes.) However, this has not been implemented, and due to the complexity of the current homing/probing code it's likely not easy to implement.

Cheers,
-Kevin

@nefelim4ag

Copy link
Copy Markdown
Collaborator

do not travel the same distance during a homing/probing attempt.

Omg, you are right.
I cannot make Z_TILT converge.

Sorry for the above nonsense, I simply enabled compensation only for the stepper_z

[stepper_shaper stepper stepper_z]
force_stealthchop_comp: True
stealthchop_comp: 0.00125
[stepper_shaper stepper stepper_z] # should be z1
force_stealthchop_comp: True
stealthchop_comp: 0.00125
[stepper_shaper stepper stepper_z] # should be z2
force_stealthchop_comp: True
stealthchop_comp: 0.00125

After the configuration fix, I no longer see those messages.

Thanks,
-Timofey

@dewi-ny-je

Copy link
Copy Markdown

Some comments on the current code by a LLM review, even if it's still a draft PR:

Things that are correct / well done

  • Math refactor is sound. I verified move_integrate_weighted is algebraically identical to the old extruder_integrate/extruder_integrate_time pair, and the new "average × interval" form genuinely is more numerically stable — it avoids the catastrophic cancellation of subtracting two large /t⁴ terms at DUMMY_T=500.
  • The step-window accounting is right. The convention is gen_steps_pre_active = forward look-ahead, gen_steps_post_active = backward look-behind. The StealthChop time-lead is a forward shift (max = stealthchop_lag_const), so adding it to pre_active (and hst to both sides) is correct. The add/subtract bookkeeping in set_lag_correction and the re-add in shaper_note_generation_time are consistent.
  • Sign handling. calc_stealthchop_time_lag uses a Taylor fallback for ω→0 (avoids div-by-zero) and is direction-symmetric. The xyz/IDEX velocity paths normalize to a signed unit vector + magnitude so the projection recovers the correct signed per-motor velocity.
  • Zero-cost when disabled (mostly): early-out when both lag constants are 0.
  • TMC mode validation is subtle but correct. stealthchop_threshold: 0TPWMTHRS=0xfffffget_velocity_threshold() returns 0 → treated as SpreadCycle, which matches Klipper's documented semantics (0 disables StealthChop). I traced this carefully because it looks wrong at first glance.

Findings

1. (High) SET_STEPPER_SHAPER / ESTIMATE_... apply stale values

In stepper_shaper.py, _set_lag_correction(self, stealthchop_comp, spreadcycle_comp, error) validates the passed parameters, but the actual call uses the instance attributes:

if not self.stepper.set_lag_correction(
        self.calc_rad_per_mm(), self.stealthchop_comp,   # <-- self.*, not the params
        self.spreadcycle_comp, self.velocity_smooth_time):

Both cmd_SET_STEPPER_SHAPER and cmd_ESTIMATE_STEPPER_SHAPER_PARAM set self.stealthchop_comp = ... after calling _set_lag_correction. Result: the first SET_STEPPER_SHAPER X STEALTHCHOP_COMP=0.005 actually pushes the old value (0) to the C side; the new value only takes effect on the next invocation. The feature's primary command is effectively always one step behind. (_connect/_update happen to pass self.*, so they're unaffected — which is why it isn't immediately obvious.) Fix: use the function parameters in the set_lag_correction call.

2. (High) SpreadCycle asin produces NaN on reverse moves

calc_spreadcycle_position_lag clamps only the upper bound:

double phase_lag = sc->spreadcycle_lag_const * velocity;
if (phase_lag > 1.) phase_lag = 1.;
return asin(phase_lag) / sc->rad_per_mm;

velocity is signed; for a negative-direction move with phase_lag < -1, asin() returns NaN, which corrupts step generation. With spreadcycle_comp=0.1 this triggers at motor velocity < −10 mm/s — easily hit in normal printing. Needs a symmetric clamp (if (phase_lag < -1.) phase_lag = -1.;).

3. (Medium) tmc2660 will AttributeError at connect

_lookup_tmc iterates endstop_phase.TRINAMIC_DRIVERS (which includes tmc2660) and calls self.tmc_module.get_stealthchop_threshold(). The method was added to tmc2130/2208/2209/2240/5160 but not tmc2660. A [stepper_shaper] on a TMC2660-driven stepper crashes. Either add the method to tmc2660 (returning 0, i.e. always SpreadCycle) or guard with getattr.

4. (Low) Unsupported kinematics error message is opaque

Velocity callbacks exist only for cartesian, corexy, generic(cartesian), and IDEX. On delta/corexz/deltesian/polar/rotary_delta/winch, calc_smoothed_velocity_cb is NULL, so set_lag_correction correctly fails — but the user sees "Invalid stepper shaper configuration with stealthchop_comp=…", not "unsupported kinematics." Worth a clearer message.

5. (Low) Test doesn't validate behavior

stepper_shaper.test only checks that commands don't crash. Because of finding #1 it would pass even with the stale-value bug, and because of #5 it wouldn't catch the NaN in #2 (the test never does a reverse SpreadCycle move fast enough). Consider asserting on generated step output.

Minor nits

  • Naming inconsistency: MCU_stepper.set_lag_correction parameter linear_lag_const vs the C/Python name spreadcycle_lag_const.
  • find_move_at_time (stepcorr.c) and get_velocity_across_moves (kin_shaper.c) use likely() on loop conditions that are almost always false; calc_smoothed_velocity uses unlikely() for the same pattern. (Matches the pre-existing get_axis_position_across_moves, so cosmetic.)
  • itersolve now routes every guess through stepcorr_calc_position (extra branch/indirection in the hottest loop) even for printers not using the feature — negligible, but it's a global hot path.

Verdict

Solid, mathematically careful work consistent with the existing codebase, and appropriately marked Draft. The StealthChop path (the PR's focus) is in good shape; the SpreadCycle path is less finished (no ESTIMATE support + the NaN bug). Before merge I'd want #1 and #2 fixed (both are functional correctness bugs), #3 addressed for robustness, and behavioral test coverage added.

@fehr3dp

fehr3dp commented Jun 19, 2026

Copy link
Copy Markdown

at request over discord, I have tested my platform consisting of LDO-42STH48-2804AH + BTT TMC5160T on my AB motors.

When enabled my config was

[tmc5160 stepper_(x/y)]
cs_pin: PC13
spi_software_sclk_pin: PG8
spi_software_mosi_pin: PG6
spi_software_miso_pin: PG7
interpolate: false
run_current: 2.5
sense_resistor: 0.075
stealthchop_threshold: 99999
diag1_pin: ^!PF4
driver_SGT: 2

[stepper_shaper stepper_(x/y)]
force_stealthchop_comp: True
force_spreadcycle_comp: False
velocity_smooth_time: 0.005
stealthchop_comp: 0.000857
spreadcycle_comp: 0.00
Spread PXL_20260619_131913035 PXL_20260619_131557192 PXL_20260619_131159092
Stealth PXL_20260619_133558903 PXL_20260619_133236985 PXL_20260619_132918680
With Comp PXL_20260619_154459412 PXL_20260619_154205349 PXL_20260619_153858033

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants