Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DO NOT COMMIT test for u55 + mv2 #9830

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

DO NOT COMMIT test for u55 + mv2 #9830

wants to merge 1 commit into from

Conversation

digantdesai
Copy link
Contributor

Summary

[PLEASE REMOVE] See CONTRIBUTING.md's Pull Requests for ExecuTorch PR guidelines.

[PLEASE REMOVE] If this PR closes an issue, please add a Fixes #<issue-id> line.

[PLEASE REMOVE] If this PR introduces a fix or feature that should be the upcoming release notes, please add a "Release notes: " label. For a list of available release notes labels, check out CONTRIBUTING.md's Pull Requests.

Test plan

[PLEASE REMOVE] How did you test this PR? Please write down any manual commands you used and note down tests that you have written if applicable.

Copy link

pytorch-bot bot commented Apr 2, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9830

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit e466669 with merge base bcf4b46 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 2, 2025
Copy link

github-actions bot commented Apr 2, 2025

This PR needs a release notes: label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@digantdesai
Copy link
Contributor Author

@zingo - If I run MV2 locally I get a different number for ethos PMU, any clue on how to start debugging? I believe George also confirmed that something odd is going on after comparing two runs.

@zingo
Copy link
Collaborator

zingo commented Apr 4, 2025

@zingo - If I run MV2 locally I get a different number for ethos PMU, any clue on how to start debugging? I believe George also confirmed that something odd is going on after comparing two runs.

When you run it locally do you use run.sh or
python3 backends/arm/test/test_model.py --test_output=arm_test/test_model --target=ethos-u55-128 --model=mv2 --extra_flags="-DET_ATOL=2.00 -DET_RTOL=2.00" --build_libs --memory_mode Sram_Only

( --memory_mode Sram_Only seem to be missing from at least this version of the patch)

But I get the same big number 8M with it so looking in the lor there is probably a bug in the backends/arm/test/test_model.py test flow as I spot that Vela is using Sram_Only BUT the elf is build with Shared_Sram

So until we fix this run.sh would work best for testing (or running all sub scrips directly)

@zingo
Copy link
Collaborator

zingo commented Apr 4, 2025

Found the bug and a PR is on it's way :) With it I now see 5.5M NPU cycles on mv2 when using Sram_Only with test_model.py also so it should match run.sh testing. Thanks for testing and reporting this :)

@zingo
Copy link
Collaborator

zingo commented Apr 4, 2025

I hope this PR will fix the problem you got.
#9902

@gggekov
Copy link
Collaborator

gggekov commented Apr 4, 2025

Thanks @zingo @digantdesai !
For completeness, the performance on mobilenet_v2 on U55-128 on the Corstone-300 is:

  • Shared Sram memory mode, Ethos_U55_Deep_Embedded system config: 8.3M NPU active cycles
  • Shared Sram memory mode, Ethos_U55_High_End_Embedded system config: 6.7M NPU active cycles
  • Sram Only memory mode, Ethos_U55_Deep_Embedded and Ethos_U55_High_End_Embedded - 5.5M NPU active cycles

For Shared_Sram, we see a difference in the performance between the Ethos_U55_Deep_Embedded and the Ethos_U55_High_End_Embedded system configs and that is expected. This is because the memory bandwidth and latency are different between the two system configurations. With the deeply embedded timing adapter settings, the NPU perceives bus bandwidth of 4bits/clock cycle(=250MB/s if you are @ 500MHz) on the external memory and for the High_End_Embedded, the bus bandwidth is 8bits/cc(or 500MB/s if the NPU is clocked at 500MHz) for the external memory. Note that even though the performance is different for Shared_Sram(8.3M vs 6.7M cycles), the PMU counters for amount of beats read on the AXI0/AXI1 interfaces are the same. In other words, the NPU reads the same amount of data between the deeply embedded & high end embedded system configs. This is again expected because we run the same model, just in the deeply embedded case the NPU has to wait longer for the data to arrive due to the lower bandwidth on the Flash.

As for Sram_Only, we place everything in the SRAM and the SRAM bandwidth is the same between Deeply Embedded & High End systems, therefore we see identical performance.

@zingo zingo added the module: arm Issues related to arm backend label Apr 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: arm Issues related to arm backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants