-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
weird storage.erase_filesystem() problems on metro rp2350 #10104
Comments
discord stream of consciousness from me: https://discord.com/channels/327254708534116352/327298996332658690/1344430703114321930 |
There is a possible race condition. The global variable |
The problem is reproducible on an Adafruit Feather RP2350 with an HSTX to DVI adapter attached. Reproducing the problem also requires initializing |
Running with stock 9.2.4 problem does not occur. A top of |
I'm beginning to think that the problem is interaction between DMA read access activity to the HSTX peripheral and the XIP section while programming a flash block. It's possibly a starvation issue for XIP during flash programming. Going through the Pico SDK and RP2350 bootrom code with a fine-toothed comb I'm not finding any nits that would cause breakage from the brief delay introduced by running the IRQ service routine for the frambuffer. Admittedly there remains much hand waving in this explanation. I'm devising a stand-alone test to see if I can eliminate as many variables as possible. In the meantime, I recommend backing out #10049 and explicitly turning off HSTX DMA during flash write operations. |
We could put in a call to release displays if it only affects |
@jepler AFAIKT it affects all flash writes. I'm giving myself a crash course on DMA/HSTX/TDMS operation. It's impressively complicated. There is code in |
There's another spot in
We'll want to factor all flash writes into a single function. |
while it's true that the audio dma disable maybe should be done in nvm's write_page as well, this was added to fix bad audio output while interrupts were delayed, not to fix incorrect flash writes or bad xip after flash writes. For more background: |
Tangentially related, the code in |
I would like to do a 9.2.5 release pretty soon, but consider this a blocker. What is the best short-term way forward? Is it
Or are we on the way to a fix? Is there anything to be done with DMA priorities that could improve glitching? |
I've not nailed the root cause. Earlier I wrote that it appeared not to be IRQ related, but on closer examination this may not be correct. The flash write code in the SDK appears to assume that interrupts are completely disabled and the "victim" core is entirely quiescent. There's a very funky window where the bootrom is re-entered to re-initialize XIP that could be hazardous to interrupt. I'm adding a third DMA channel to re-trigger the command DMA channel in framebuffer to eliminate the interrupt, but since I'm also climbing a steep learning curve it's slow going. If someone with more knowledge of RP2350 DMA wants to jump in, please do. |
I think the simplest thing is to disable the HSTX display before doing erase_filesystem. It resets anyway so just turn it off early. Let's deal with other flash writes during HSTX separately. I haven't actually see it myself so I'm wary it is a big issue. |
Note to self: Constructing a new framebuffer via |
This fix? tannewt@b4675f7#diff-06a92b3a928c9d1ed0731ea128ee37130e527aed7cf4ec3c100b0049de8b49b1R272-R274 I don't see how it leaks DMA channels. |
@tannewt Turns out it wasn't actually leaking DMA channels, it simply wasn't resetting the DMA channel numbers in the framebuffer object so it was attempting to un-claim them twice. It looked like a leak on first glance. Since zero is a valid DMA channel, I changed the channels in framebuffer to |
That's my working branch and has all of my changes. |
Found the root cause: In |
Appreciate the sleuthing @eightycc ! |
CircuitPython version and board name
Code/REPL
Behavior
Usually freezes with the white LED on.
with pico-probe a variety of weird crashes and double faults are observed. For instance on one occasion it crashed within a function in flash that appeared to have its content overwritten; but on restart, the content was restored (so problem with XIP cache?)
I believe that by trying various revisions I excluded my recent changes to auto-initialize HSTX & to interrupt handling. However, I encourage anyone picking up this issue to double check. Especially the interrupt handling change, which was designed to prevent HSTX display glitches during flash writes, could be related to this....
Description
No response
Additional information
No response
The text was updated successfully, but these errors were encountered: