Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve RP2 Network Performance Issue #10027

Merged
merged 4 commits into from
Feb 13, 2025
Merged

Conversation

eightycc
Copy link
Collaborator

@eightycc eightycc commented Feb 4, 2025

This update resolves issue #9837. It makes the following changes:

  1. Updates the lwIP stack to its current top of main. Update is necessary to support fully dynamic memory allocation. This involves re-basing a local change. For now it's on a branch in my private repo: https://github.com/eightycc/lwip/tree/circuitpython9.
  2. Updates lwIP periodic timer configuration from 500/250 ms (slow/fast) to 50/25 ms. This improves network performance where the large timer intervals were inducing retries due to timeouts.
  3. Configures lwIP to use dynamically allocated memory throughout. Uses tlsf as its memory manager.
  4. Re-orders RP2 port initialization so that the tlsf heap is initialized early so that lwIP initialization can use it.

…rval.

Update lwIP to current top of tree with local changes.
@eightycc
Copy link
Collaborator Author

eightycc commented Feb 4, 2025

@bablokb Forgot the pre-commit hook in my repo, but the build artifacts should be fine. Please give it a try.

@tannewt
Copy link
Member

tannewt commented Feb 5, 2025

Ok, I've merged it into adafruit/lwip. Looks like they are about to release 2.2.1 too.

@anecdata
Copy link
Member

anecdata commented Feb 5, 2025

Using PR artifacts, simple non-TLS HTTP response on Pico W HTTP is now consistently under 50ms (ranged from 30-250ms with 9.2.4), on Pimoroni Pico Plus 2 W it's now 15-25ms (also ranged from 30-250ms with 9.2.4).

HTTPS on Pimoroni Pico Plus 2 W is now 1-2sec. Slight improvement from 1.5-2.25sec.

HTTPS on Pico W now times out rather than getting MemoryError:.

Measurements are approximate: they are from the server-reported times with httpserver debug turned on.

@eightycc
Copy link
Collaborator Author

eightycc commented Feb 5, 2025

@anecdata Thank you for testing this out. Much appreciated. Where this update should also help is with stability and throughput. Switching to all dynamic allocations for lwIP reduces the static RAM size from 116K to 72K, good for the memory-constrained RP2040. Of course lwIP will consume memory for control blocks and buffers, but how much memory that costs depends on the app, not on a guess of the size of a given static allocation pool. We were blowing through some of the static allocations in some cases causing lwIP to slow way down.

@bablokb
Copy link

bablokb commented Feb 6, 2025

Tested this with the original setup. Page load with 9 requests is now down to about 1.2s (ranging from 1.1-1.8, repeated loads are mostly below 1.2s). So this is a huge improvement.

I also repeated the iperf3-test. Throughput is up from 389 Kbits/sec (see #9837) to 3.95 Mbits/sec now.

Do you see any chance that I could backport this to 8.0.5? I know this is a very old version, but I need it due to memory constrains (independent of network use). 8.0.5 has 152K to start with, while 9.2.4 has 128K.

> iperf3 -l 4K -R -c 192.xxx.x.xxx
Connecting to host 192.xxx.x.xxx, port 5201
Reverse mode, remote host 192.xxx.x.xxx is sending
[  5] local 192.xxx.x.xxx port 55632 connected to 192.xxx.x.xxx port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   772 KBytes  6.30 Mbits/sec                  
[  5]   1.00-2.00   sec   828 KBytes  6.78 Mbits/sec                  
[  5]   2.00-3.06   sec   456 KBytes  3.54 Mbits/sec                  
[  5]   3.06-4.08   sec   444 KBytes  3.58 Mbits/sec                  
[  5]   4.08-5.08   sec   388 KBytes  3.17 Mbits/sec                  
[  5]   5.08-6.00   sec   424 KBytes  3.77 Mbits/sec                  
[  5]   6.00-7.06   sec   360 KBytes  2.78 Mbits/sec                  
[  5]   7.06-8.01   sec   360 KBytes  3.12 Mbits/sec                  
[  5]   8.01-9.00   sec   392 KBytes  3.23 Mbits/sec                  
[  5]   9.00-10.09  sec   440 KBytes  3.31 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.12  sec  4.76 MBytes  3.95 Mbits/sec    0             sender
[  5]   0.00-10.09  sec  4.75 MBytes  3.95 Mbits/sec                  receiver

@eightycc
Copy link
Collaborator Author

eightycc commented Feb 6, 2025

@bablokb Thank you the tests! This update depends on tlsf so it won't work with CP versions earlier than 9.0.0. If you haven't already, give this update a try on the Pico W. It reduces the static RAM footprint substantially, so hopefully your app will fit without resorting to backporting.

@eightycc
Copy link
Collaborator Author

eightycc commented Feb 6, 2025

@tannewt Please move this PR out of draft status and review.

@eightycc eightycc marked this pull request as ready for review February 6, 2025 14:22
@eightycc
Copy link
Collaborator Author

eightycc commented Feb 6, 2025

The following issues may benefit from this PR: #9576, #8017, #9124, #8799, #7318, #9150, #9127, #8964, and #8115.

@bablokb
Copy link

bablokb commented Feb 6, 2025

If you haven't already, give this update a try on the Pico W. It reduces the static RAM footprint substantially, so hopefully your app will fit without resorting to backporting.

Good point. Just checked: the PR-artifact has 128K available, 9.2.4 has 71K available but 9.0.0 was at 125K. What in the hell did they add in between?! The 125K from 9.0.0 were about 7K short (even without any networking, wifi never imported). So maybe if I can find and remove the bloat from 9.0.0 to 9.2.4 it might even work. Going back to nm and objdump...

@eightycc
Copy link
Collaborator Author

eightycc commented Feb 6, 2025

@bablokb That's interesting. There could be another library using static allocations that got added or reconfigured after 9.0.0.

@dhalbert
Copy link
Collaborator

dhalbert commented Feb 6, 2025

According to https://git.savannah.gnu.org/cgit/lwip.git, it looks like lwip 2.2.1 was released yesterday??

@eightycc
Copy link
Collaborator Author

eightycc commented Feb 6, 2025

According to https://git.savannah.gnu.org/cgit/lwip.git, it looks like lwip 2.2.1 was released yesterday??

Crazy coincidence or preternatural foresight?

@bablokb
Copy link

bablokb commented Feb 7, 2025

That's interesting. There could be another library using static allocations that got added or reconfigured after 9.0.0.

I retested and it turned out to be a wrong assumption about what happens when entering the REPL. I used this code:

import gc
print(f"free memory: {gc.mem_free()}")

from the REPL and from code.py. Depending on what happened before you enter the REPL (with CTRL-C), results are extremely different. So running from code.py 9.2.4 gives me 138K and your PR 183K which should be enough for my needs. The 128K cited above for the PR-artifact was after my iperf-tests, the 71K for 9.2.4 was after a different test using wifi and a display.

@eightycc
Copy link
Collaborator Author

eightycc commented Feb 7, 2025

@anecdata The RP2350 has hardware acceleration for SHA-256 controlled by MBEDTLS_SHA256_ALT, but it doesn't appear to be active in CircuitPython builds for the RP2350. This will impact HTTPS timings. I'll look into it more deeply and submit a PR if necessary.

@tannewt
Copy link
Member

tannewt commented Feb 7, 2025

What did they add in between?!

USB Host was added and it has code that needs to live in RAM (so it can run while flash accesses happen). We don't load it when we need it so it's always there.

@bablokb
Copy link

bablokb commented Feb 7, 2025

What did they add in between?!

USB Host was added and it has code that needs to live in RAM (so it can run while flash accesses happen). We don't load it when we need it so it's always there.

The difference I noticed was actually due to the fact that the REPL does not garbage-collect before it starts - something I wasn't aware off. So nothing was added, the lower RAM was only due to whatever ran before the REPL started.

@bablokb
Copy link

bablokb commented Feb 12, 2025

I retested the newest artifact in my target setup. Works great. This PR is a game-changer. It does not only change network performance, but with this PR there is also much more memory available. Until now I was stuck with 8.0.5 with the Pico-W, because all versions afterwards did not provide enough memory, but with this PR I can leave 8.0.5 behind.

Definitely the MVP (most-valuable PR) award from my side. Looking forward to see this in a release.

@eightycc eightycc requested a review from jepler February 12, 2025 11:31
Copy link
Member

@tannewt tannewt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you!

jepler pushed a commit to jepler/circuitpython that referenced this pull request Mar 28, 2025
Side-setting can also be used to change pin directions instead of pin
values.  This adds a parameter `side_pindir` to decorator `asm_pio()` to
configure it.

Also replaces a few close-by 0s with corresponding PIO.* constants.

Addresses issue adafruit#10027.

Signed-off-by: Markus Gyger <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants