discovery(ansible): slm_manager health-check diagnostic tasks need block/rescue to execute on failure

## Discovery

Noticed during review of PR #7918 (merged in e35b32a6d).

In `autobot-slm-backend/ansible/roles/slm_manager/tasks/main.yml`, two diagnostic tasks were added to collect and display SLM backend logs when the health check fails:

```yaml
- name: "SLM | Collect backend logs on health check failure"
  when: _slm_health is defined and _slm_health is failed
  ...

- name: "SLM | Display backend logs on health check failure"
  when: _slm_health is defined and _slm_health is failed
  ...
```

## Problem

In Ansible, when the `uri` task's `until` + `retries` loop is exhausted, the task is marked as failed and Ansible **halts the play** (default behavior, no `ignore_errors`). Because the play is halted at the health check task, the subsequent diagnostic tasks **never execute**.

The `_slm_health is failed` condition is correctly written, but it can only be evaluated if Ansible reaches those tasks — which it won't when the health check fails without error handling.

## Fix

Wrap the health check and diagnostics in a `block/rescue` pattern:

```yaml
- block:
    - name: "SLM | Wait for backend health"
      ansible.builtin.uri:
        url: "http://{{ slm_backend_host }}:{{ slm_backend_port }}/api/health"
        method: GET
        status_code: 200
        timeout: 10
      register: _slm_health
      retries: 120
      delay: 5
      until: _slm_health is succeeded
      tags: ['slm', 'service']

  rescue:
    - name: "SLM | Collect backend logs on health check failure"
      ansible.builtin.shell:
        cmd: >-
          journalctl -u {{ slm_backend_service }} -n 100 --no-pager 2>&1 ||
          tail -100 {{ slm_log_dir }}/slm-backend.log 2>/dev/null ||
          echo "No logs available yet"
      register: _slm_backend_log_output
      changed_when: false
      failed_when: false
      tags: ['slm', 'service', 'debug']

    - name: "SLM | Display backend logs on health check failure"
      ansible.builtin.debug:
        msg: "{{ _slm_backend_log_output.stdout_lines | default([]) }}"
      tags: ['slm', 'service', 'debug']

    - name: "SLM | Fail with diagnostic context"
      ansible.builtin.fail:
        msg: "SLM backend health check failed after {{ 120 * 5 }}s. See logs above."
      tags: ['slm', 'service']
```

## Impact

Low — the primary fix in PR #7918 (timeout increase + connect_timeout) works correctly. This is a diagnostics-only gap. Fresh-install failures are harder to debug without the logs, which is the original intent.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

discovery(ansible): slm_manager health-check diagnostic tasks need block/rescue to execute on failure #7919

Discovery

Problem

Fix

Impact

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

discovery(ansible): slm_manager health-check diagnostic tasks need block/rescue to execute on failure #7919

Description

Discovery

Problem

Fix

Impact

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions