-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
asm::delay() only blocks half the time intended on an STM32H7 #430
Comments
Oh I forgot to mention that I tried this with 0.7.4 and the current master. In my example repo I patch in the current master. |
Hm, that's annoying, we deliberately double the number of cycles I vaguely recall checking this on an stm32h7 and getting the right result, I'll see if I can dig it out and investigate a bit further. I guess we could read CPUID at runtime to try and work out what CPU we're on, but even then things like whether execution is from flash or RAM or XIP (Q)SPI flash, whether i-cache is on, whether a flash accelerator is configured, etc etc etc will all heavily influence the number of cycles. The current calculation is a two-instruction loop and it runs half the specified number of cycles of iterations, i.e. it will retire as many instructions as requested delay. It might be that the best fix we can do is make the documentation even more vague on precisely how many cycles of delay you'll get... |
Sorry.. ;) But since this actually broke my code which had to wait for an external device reset I think it's an important bug. |
I suspect the only practical solution might indeed be to not touch the scaling factor anymore for behavioral stability/compatibility and make the docs say that this is an uncalibrated delay with both an an offset and a scaling factor that depend on cortex-m architecture and e.g. enabled cpu/code location/cache features. |
I double checked this in #325 (comment) and indeed we currently only delay half the required time on CM7; the fix in #312 doesn't fully account for the core being able to issue both instructions in the loop at the same time. I guess ideally we'd experiment a bit more about when a CM7 core will or won't dual-issue the loop, but if it seems like it does always manage it, I'd be inclined to try doing feature detection and delaying for about the right number of cycles for the current core, rather than being wrong on everything (as currently) or 3x worse on one core than another (if we fixed it for cm7). |
I'd definitely prefer guaranteeing an error on the long delay side rather than trying "about right". For the frequent applications of e.g. guaranteeing datasheet timing a relative error of 3 is acceptable (if annoying) while 0.9 would be unacceptable. |
The
asm::delay()
function delays a program only by half the time/CPU cycles intended on an STM32H7.I have prepared a somewhat minimal example here. My CPU clock is at 400 MHz and I specify a delay of 4_000_000_000 cycles which should lead to a 10 second delay. The actual delay is 5 seconds for me.
I have also confirmed the same behavior in other programs on an STM32H7 with much shorter timescales and looking at IOs on a scope. The delay is always exactly half the time expected.
One guess is that this is due to the dual issue pipeline of the H7 eating up two instructions per cycle.
The text was updated successfully, but these errors were encountered: