Skip to content

Command acknowledgement deadlock inside the publish loop #11

@nik-markovic

Description

@nik-markovic

When command is received while telemetry is being published, this deadlock could occur:

  • iotc_nrf_mqtt_publish(telemetry) [ sys_mutex_lock(&mutex_mqtt_pub, K_FOREVER); ]
  • iotc_nrf_mqtt_loop();
  • mqtt_handle_rx()
  • event_notify()
  • mqtt_evt_handler(MQTT_EVT_PUBLISH)
  • config->data_cb()
  • iotc_on_mqtt_data()
  • iotcl_process_event()
  • iotc_process_callback()
  • on_command()
  • iotconnect_sdk_send_packet(ack) [ attempts to get lock which is not released]

Somewhere this ends up releasing with message "get_event_payload: EAGAIN", but only after a long time.

I am also concerned about calling iotc_nrf_mqtt_loop() inside iotc_nrf_mqtt_publish(). We may end up using up too much stack because this could also result in an inbound mqtt message from broker. iotc_nrf_mqtt_loop() should probably only be called from main to be able to better estimate stack usage. This could maybe even cause recursion. I am not sure how much of an impact this change would have on MQTT confirmation code.

@syjen Assigning to you, but feel free to reassign this ticket to someone else on your team.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions