Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad hostname after going to the REPL when using mdns #10048

Open
Neradoc opened this issue Feb 10, 2025 · 11 comments
Open

Bad hostname after going to the REPL when using mdns #10048

Neradoc opened this issue Feb 10, 2025 · 11 comments
Labels
bug espressif applies to multiple Espressif chips web workflow
Milestone

Comments

@Neradoc
Copy link

Neradoc commented Feb 10, 2025

CircuitPython version and board name

Adafruit CircuitPython 9.2.4-11-g9eeb5d93e7 on 2025-02-05; Adafruit QT Py ESP32C3 with ESP32-C3FN4
Adafruit CircuitPython 9.2.4 on 2025-01-29; Seeed Studio XIAO ESP32C3 with ESP32-C3FN4
Adafruit CircuitPython 9.0.0 on 2024-03-19; Seeed Studio XIAO ESP32C3 with ESP32-C3FN4

Code/REPL

import mdns
import wifi
mdnserv =  mdns.Server(wifi.radio)
mdnserv.hostname = "test-name"

Behavior

When the mdns hostname is set just before the code ends, it ends up being scrambled with random data, (looking like a dangling pointer). This is solved by not leaving code.py, or seemingly adding some seconds of sleep before the end. Although for example with a 5 seconds sleep I still get a bad mdns after power cycle, sometimes.

I could not repro on an S2 or S3 QTPY, it seems limited to the C3.
It happens with 9.0.0 but not 8.2.10

  • The bad hostname does not seem to affect the actual hostname used as reported by a mdns scanner.
  • It makes the web workflow version.json (accessed via IP) be invalid json if the characters are bad.
  • wifi.radio.hostname however still reports the default hostname. For example: cpy-d_xiao_esp32c3-d4f98d03fbec

There seems to be other issues related to mdns since 9.0, but they don't seem to be the same, though they might have the same underlying cause.

Description

No response

Additional information

Board properly reported from the home page of another board as test-name:
Image

Default hostname reported by the wifi module:

Adafruit CircuitPython 9.2.4-44-g255eea9b9c on 2025-02-09; Seeed Studio XIAO ESP32C3 with ESP32-C3FN4
>>> import wifi
>>> wifi.radio.hostname
'cpy-d_xiao_esp32c3-d4f98d03fbec'
>>> 

But has a bad name in web workflow:
Image

If the name causes a bad json:
Image

Uncaught (in promise) SyntaxError: JSON.parse: bad control character in string literal at line 1 column 297 of the JSON data

One such json (varies on reloads):

{"web_api_version": 4, "version": "9.2.4-44-g255eea9b9c", "build_date": "2025-02-09", "board_name": "Seeed Studio XIAO ESP32C3", "mcu_name": "ESP32-C3FN4", "board_id": "seeed_xiao_esp32c3", "creator_id": 796806, "creation_id": 12779521, "hostname": "be8d94e303e36182432040d60379fbc2592cf3ca3fa158�tp=UDP�vn=65537
vs=845.6.1�ov=18.3�vv=1�Z", "port": 80, "UID": "4D9FD830BFCE", "ip": "192.168.1.44"}

@Neradoc
Copy link
Author

Neradoc commented Feb 11, 2025

I have revised my tests and actually it seems now that the issue manifests on S2 and S3 too, as soon as you enter the REPL. I missed that previously as I relied on stopping the code without pressing a key to enter the REPL apparently. This happened without being related to timing.

Before code.py

{"web_api_version": 4, "version": "9.2.4-42-ge19ff435d2", "build_date": "2025-02-08", "board_name": "Adafruit QT Py ESP32S2", "mcu_name": "ESP32S2", "board_id": "adafruit_qtpy_esp32s2", "creator_id": 9114, "creation_id": 33042, "hostname": "cpy-t_qtpy_esp32s2-58cf79ab206a", "port": 80, "UID": "85FC97BA02A6", "ip": "192.168.1.15"}

While code.py is running or after ctrl-C

{"web_api_version": 4, "version": "9.2.4-42-ge19ff435d2", "build_date": "2025-02-08", "board_name": "Adafruit QT Py ESP32S2", "mcu_name": "ESP32S2", "board_id": "adafruit_qtpy_esp32s2", "creator_id": 9114, "creation_id": 33042, "hostname": "test-name", "port": 80, "UID": "85FC97BA02A6", "ip": "192.168.1.15"}

After entering the REPL (here the name is empty)

{"web_api_version": 4, "version": "9.2.4-42-ge19ff435d2", "build_date": "2025-02-08", "board_name": "Adafruit QT Py ESP32S2", "mcu_name": "ESP32S2", "board_id": "adafruit_qtpy_esp32s2", "creator_id": 9114, "creation_id": 33042, "hostname": "", "port": 80, "UID": "85FC97BA02A6", "ip": "192.168.1.15"}

After letting this run in the REPL for a couple of seconds

while True: print("hello")

{"web_api_version": 4, "version": "9.2.4-42-ge19ff435d2", "build_date": "2025-02-08", "board_name": "Adafruit QT Py ESP32S2", "mcu_name": "ESP32S2", "board_id": "adafruit_qtpy_esp32s2", "creator_id": 9114, "creation_id": 33042, "hostname": "t("hello")
"
, "port": 80, "UID": "85FC97BA02A6", "ip": "192.168.1.15"}

That issue with version.json might be a specific Web Workflow issue that could be mitigated there. Or an issue with deiniting of the mdns server on reload when it's supposed to still be in use by the web workflow.
Also retrieving seemingly arbitrary memory might be a security issue ?

@Neradoc Neradoc added espressif applies to multiple Espressif chips and removed esp32-c3 labels Feb 11, 2025
@Neradoc Neradoc changed the title Bad hostname when using mdns just before the end of code Bad hostname after going to the REPL when using mdns Feb 12, 2025
@veleek
Copy link

veleek commented Feb 25, 2025

I'm seeing the exact same behavior. Often times the hostname will be a single control character (I've seen 0x01: SOH - Start of Header, 0x02: STX - Start of Text) as well as random strings as show above.

Image

This prevents me from loading into the web workflow code editor. Depending on the state I can occasionally just refresh the browser (WITHOUT restarting the device) and the hostname will change allowing me to get a hostname without an invalid character. But it seems like usually when it's set to SOH or STX refreshing does not change the hostname and I need to manually reset the device.

@veleek
Copy link

veleek commented Feb 27, 2025

The version.json file is populated _reply_with_version_json, and the hostname is only set in one place. common_hal_mdns_server_get_hostname (which also has a raspberry pi implementation which we can probably ignore) is just returning data.

Eventually, in mdns_server_construct there's a call into esp_netif_get_hostname (this is just a random implementation in the adafruit repo, which is a fork of the expressif repo). I don't see any obvious reported issues with the code and I think that the return value of that function just comes from what's passed in to esp_netif_set_hostname.

Okay, now we're getting somewhere. We call esp_netif_set_hostname in:

@veleek
Copy link

veleek commented Feb 27, 2025

Wait, just noticed that the logic around how the hostname is populated was changed in ed0e640, so maybe this is a question for @tannewt?

@tannewt
Copy link
Member

tannewt commented Feb 27, 2025

I suspect this is a mismatch of my expectations of how the IDF works. I thought it duplicated the given string and then held onto it: https://github.com/espressif/esp-idf/blob/0461e2ff88369c3da0d4caced31e8488f53376cd/components/esp_netif/lwip/esp_netif_lwip.c#L1689

However, there is some tricky task stuff here that may mean we sometimes corrupt the hostname before it is copied. Maybe we need to read it back until it matches after setting it?

@veleek
Copy link

veleek commented Feb 28, 2025

Okay just so I'm on the same page, we're looking at:

char cpy_default_hostname[board_len + (MAC_ADDRESS_LENGTH * 2) + 6];
uint8_t mac[MAC_ADDRESS_LENGTH];
esp_wifi_get_mac(ESP_IF_WIFI_STA, mac);
snprintf(cpy_default_hostname, sizeof(cpy_default_hostname), "cpy-%s-%x", CIRCUITPY_BOARD_ID + board_trim, (unsigned int)mac);
const char *default_lwip_local_hostname = cpy_default_hostname;
ESP_ERROR_CHECK(esp_netif_set_hostname(self->netif, default_lwip_local_hostname));

You're saying that the call to esp_netif_set_hostname may potentially trigger some asynchronous process to set the hostname so by the time it uses the provided char* (pointing to cpy_default_hostname) it's gone out of scope and is being overwritten by something? Yeah I could see that causing a problem, but it's not clear to me where in the IDF code that forking actually happens.

@veleek
Copy link

veleek commented Feb 28, 2025

The underlying async call looks like it would be happening here: https://github.com/espressif/esp-idf/blob/0461e2ff88369c3da0d4caced31e8488f53376cd/components/esp_netif/lwip/esp_netif_lwip.c#L212-L227
But it looks like that should still block until it receives a result. I don't see anything that wouldn't require the return value that it gets from eventually executing the callback.

@tannewt
Copy link
Member

tannewt commented Feb 28, 2025

Yup, that was my conclusion too. It looks like it should work. :-) Maybe this is an IDF bug.

Maybe LWIP does a second delay. I know the MDNS code does this a lot where it issues a message to another thread to set a value. Reading it doesn't and reads memory directly.

@veleek
Copy link

veleek commented Mar 3, 2025

I've never built and run my own circuitpython firmware. How difficult is the process if I wanted to do something simple like add a check right after the esp_netif_set_hostname call to verify that it was actually set?

@dhalbert
Copy link
Collaborator

dhalbert commented Mar 3, 2025

It's not that hard if you are familiar with software development on Linux. See https://learn.adafruit.com/building-circuitpython.

@veleek
Copy link

veleek commented Mar 9, 2025

I've made it a bit farther but I'm not sure what the next steps are. I've got everything building and I was hoping to use the ESP_LOG library. I've added some logs but I'm not sure how I can capture this output. For reference: fed68b4

I'm using an https://circuitpython.org/board/adafruit_qtpy_esp32s3_nopsram/ and this is the only CircuitPython device I currently have.

Any suggestions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug espressif applies to multiple Espressif chips web workflow
Projects
None yet
Development

No branches or pull requests

4 participants