[Autoscaler v1] AutoscalerSummary Active node check ignores raylet termination

### What happened + What you expected to happen

## Description
- The Ray Autoscaler (v1) AutoscalerSummary currently uses [LoadMetrics.is_active(ip)](https://github.com/ray-project/ray/blob/master/python/ray/autoscaler/_private/autoscaler.py#L1471-L1476) to determine whether a node is active. However, this check does not account for whether the raylet on that node is still running.
- In particular, if a node’s raylet has already exited (e.g., due to idle timeout), but the node is still returned by the NodeProvider as part of the non_terminated_nodes list, the autoscaler will incorrectly consider the node as active. This leads to inconsistencies in the summary() output.
- Although this situation may not occur frequently, it highlights the need to revise the logic for determining active nodes. The current implementation results in inaccurate cluster state reporting in edge cases like this.

### Versions / Dependencies

2.44.1

### Reproduction script

## Reproduction
1.	A Ray worker(ray.worker.gpu/192.168.1.40) is launched.
2.	The worker becomes idle and its raylet exits after the idle timeout.
3.	The NodeProvider still includes the worker in the non_terminated_nodes() response.
4.	As a result, the worker is still marked as active in the autoscaler summary and ray status, even though its raylet is no longer running.

## Log
Even after Draining 1 raylet (ray.worker.gpu/192.168.1.40) was logged, the corresponding node’s IP remained in LoadMetrics, indicating that the node was still being tracked despite the raylet having been terminated.

```
======== Autoscaler status: 2025-04-10 01:30:28.225299 ========
Node status
---------------------------------------------------------------
Active:
 1 ray.head.default
 1 ray.worker.gpu
Pending:
 (no pending nodes)
Recent failures:
 (no failures)

Resources
---------------------------------------------------------------
Usage:
 0.0/32.0 CPU
 0B/19.82GiB memory
 0B/9.15GiB object_store_memory

Demands:
 (no resource demands)
2025-04-10 01:30:28,225 INFO autoscaler.py:589 -- StandardAutoscaler: Terminating the node with id 100d04f27901636e710d8dab49eab126b2c0e4588579544beb8f3d72 and ip 192.168.1.40. (idle)
2025-04-10 01:30:28,225 INFO autoscaler.py:543 -- Node last used: Thu Apr 10 01:22:35 2025.
2025-04-10 01:30:28,225 INFO autoscaler.py:675 -- Draining 1 raylet(s).
2025-04-10 01:30:28,226 INFO node_provider.py:173 -- NodeProvider: 100d04f27901636e710d8dab49eab126b2c0e4588579544beb8f3d72: Terminating node
2025-04-10 01:30:28,226 INFO node_provider.py:176 -- submit_scale_request 
2025-04-10 01:30:28,226 INFO node_provider.py:199 -- {'desired_num_workers': {'ray.worker.gpu': 0}, 'workers_to_delete': ['100d04f27901636e710d8dab49eab126b2c0e4588579544beb8f3d72']}
2025-04-10 01:30:28,226 INFO node_provider.py:236 -- _patch
2025-04-10 01:30:28,227 DEBUG connectionpool.py:241 -- Starting new HTTP connection (1): 192.168.1.30:50000
2025-04-10 01:30:28,228 DEBUG connectionpool.py:544 -- http://192.168.1.30:50000 "PATCH /nodes HTTP/1.1" 200 37
2025-04-10 01:30:28,229 INFO autoscaler.py:461 -- The autoscaler took 0.006 seconds to complete the update iteration.
2025-04-10 01:30:28,229 INFO monitor.py:433 -- :event_summary:Removing 1 nodes of type ray.worker.gpu (idle).
2025-04-10 01:30:33,252 INFO node_provider.py:231 -- _get
2025-04-10 01:30:33,253 DEBUG connectionpool.py:241 -- Starting new HTTP connection (1): 192.168.1.30:50000
2025-04-10 01:30:33,255 DEBUG connectionpool.py:544 -- http://192.168.1.30:50000 "GET /nodes HTTP/1.1" 200 377
2025-04-10 01:30:33,256 INFO node_provider.py:172 -- get_node_data{'dd9f073845c670f20633936b798c3a26bb746a575d583580a490d8e0': NodeData(kind='head', type='ray.head.default', ip='192.168.1.10', status='up-to-date', replica_index=None), '100d04f27901636e710d8dab49eab126b2c0e4588579544beb8f3d72': NodeData(kind='worker', type='ray.worker.gpu', ip='192.168.1.40', status='up-to-date', replica_index=None)}
2025-04-10 01:30:33,256 INFO autoscaler.py:146 -- The autoscaler took 0.004 seconds to fetch the list of non-terminated nodes.
2025-04-10 01:30:33,256 INFO node_provider.py:203 -- safe_to_scale
2025-04-10 01:30:33,257 INFO custom_load_metrics.py:8 -- [CustomLoadMetrics] prune_active_ips called
2025-04-10 01:30:33,257 INFO custom_load_metrics.py:25 -- [CustomLoadMetrics] ray_nodes_last_used_time_by_ip: {'192.168.1.10': 1744248131.4891038, '192.168.1.40': 1744248155.2064462}
2025-04-10 01:30:33,257 INFO custom_load_metrics.py:27 -- [CustomLoadMetrics] static_resources_by_ip: {'192.168.1.10': {'memory': 9765379278.0, 'CPU': 16.0, 'object_store_memory': 4882689638.0, 'node:192.168.1.10': 1.0, 'node:__internal_head__': 1.0}, '192.168.1.40': {'memory': 11521410253.0, 'node:192.168.1.40': 1.0, 'CPU': 16.0, 'object_store_memory': 4937747251.0}}
2025-04-10 01:30:33,257 INFO custom_load_metrics.py:29 -- [CustomLoadMetrics] raylet_id_by_ip: {'192.168.1.10': b'\\\xf7\xc5^\xde\xf5t\x85j1]\x8f\xf2D6X\xcb\x11\xe5\xa5\xc2\xfc.E\x7f\xb0\xa3\xcf', '192.168.1.40': b'\xd5\x0b\x94~O\x13.\x89r\x84\x91w4\xac\xa0N9_g\x1d\x1e\x83Dc<\xf4~\x9f'}
2025-04-10 01:30:33,257 INFO custom_load_metrics.py:31 -- [CustomLoadMetrics] dynamic_resources_by_ip: {'192.168.1.10': {'memory': 9765379278.0, 'CPU': 16.0, 'object_store_memory': 4882689638.0, 'node:__internal_head__': 1.0, 'node:192.168.1.10': 1.0}, '192.168.1.40': {'node:192.168.1.40': 1.0, 'memory': 11521410253.0, 'CPU': 16.0, 'object_store_memory': 4937747251.0}}
2025-04-10 01:30:33,257 INFO custom_load_metrics.py:33 -- [CustomLoadMetrics] last_heartbeat_time_by_ip: {'192.168.1.10': 1744248633.2381039, '192.168.1.40': 1744248628.1984463}
2025-04-10 01:30:33,258 INFO autoscaler.py:418 -- 
======== Autoscaler status: 2025-04-10 01:30:33.258024 ========
Node status
---------------------------------------------------------------
Active:
 1 ray.head.default
 1 ray.worker.gpu
Pending:
 (no pending nodes)
Recent failures:
 (no failures)

Resources
---------------------------------------------------------------
Usage:
 0.0/32.0 CPU
 0B/19.82GiB memory
 0B/9.15GiB object_store_memory
```

### Issue Severity

Low: It annoys or frustrates me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Autoscaler v1] AutoscalerSummary Active node check ignores raylet termination #52198

What happened + What you expected to happen

Description

Versions / Dependencies

Reproduction script

Reproduction

Log

Issue Severity

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Autoscaler v1] AutoscalerSummary Active node check ignores raylet termination #52198

Description

What happened + What you expected to happen

Description

Versions / Dependencies

Reproduction script

Reproduction

Log

Issue Severity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions