Skip to content

IAS Zone zone_state cached as Not_enrolled (0) in appdb due to race with proactive enroll_response #759

@TheJulianJES

Description

@TheJulianJES

Summary

The IAS Zone zone_state attribute (0x0000) is sometimes persisted as Not_enrolled (0) in the appdb / diagnostics dumps, even when the device is actually enrolled and reporting status normally. Because the cached value is reused on every restart (only_cache=True), the bad value sticks indefinitely.

Root cause

In IASZoneClusterHandler.async_configure() (zha/zigbee/cluster_handlers/security.py:361-400):

  1. await self.bind()
  2. await self.write_attributes_safe({cie_addr: ieee}) — per ZCL spec, writing CIE address can transition the device back to Not_enrolled until enrollment completes
  3. self._cluster.create_catching_task(self.enroll_response(...))fire-and-forget, not awaited
  4. Returns

Then async_initialize() runs, and zone_state (in ZCL_INIT_ATTRS with cached=True) is read from the device because the cache is empty (fresh pair) or wiped (re-interview). The read can race the proactive enroll_response, so the device returns 0 (Not_enrolled). That value is cached, persisted to appdb via AttributeReadEvent, and reused forever.

ZHA's cluster_command handler for the device-initiated enroll command (security.py:353-359) sends enroll_response but does not trigger a re-read of zone_state, so even when enrollment subsequently completes, the cached value isn't refreshed.

Why both fresh pair and re-interview are affected

  • Fresh pair: Cache is empty → reads from device → race. After successful enrollment via the device's own enroll command, no re-read happens, so the stale 0 persists in the appdb.
  • Re-interview: zigpy's _device_reinterviewed cascade-deletes the old attribute cache, so the new shadow device starts with empty cache and goes through the same race. Worse, re-interview re-writes cie_addr on a device that was already enrolled, which on spec-compliant devices forces them back to Not_enrolled until the new enroll handshake completes — and the device usually doesn't initiate a new enroll after re-interview, so the proactive enroll_response is the only path back. If the read beats it, 0 sticks again.
  • Restart: async_initialize(from_cache=True) reads zone_state cache-only, so a previously-bad value is never re-validated.

Note

ZHA itself doesn't consume zone_state for any entity — it's effectively diagnostics-only data, but it's misleading in dumps and bug reports.

Possible fixes

  1. await the enroll response task instead of create_catching_task, then re-read zone_state so the post-enroll value is what gets cached.
  2. Re-read zone_state in the response task itself — wrap the proactive enroll into a small helper that sends the response and then reads the attribute back.
  3. Re-read zone_state in cluster_command when the device initiates an enroll, so even devices that take their time to send enroll get a fresh value cached.
  4. Drop zone_state from ZCL_INIT_ATTRS — nothing in ZHA uses it, and on-demand reads would avoid persisting stale state.

Likely the cleanest is a combination of (2) and (3): always re-read after replying with enroll_response.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions