Skip to content

MLFlow logger: log system metrics for all nodes when rank_zero_only is True #3905

@WeichenXu123

Description

@WeichenXu123

🚀 Feature Request

When setting rank_zero_only to True and setting log_system_metrics to True, the Mlflow logger should log system metrics for all nodes, these metrics are all logged into the MLflow run created by Rank-0, and the metric keys are grouped by node IP.

Motivation

In current Composer MLflow logger implementation, if setting rank_zero_only to True and setting log_system_metrics to True, the system metrics are only logged in the first node. Then user can't view the system metrics of other nodes.

[Optional] Implementation

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew (engineering) enhancements, such as features or API changes.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions