Metric Logging updates 5/N #363

felipemello1 · 2025-10-09T19:51:41Z

Changes logging mode, so its clearer:

Before:

reduce_across_ranks: bool
share_run_id: bool

After:

logging_mode: Enum[GLOBAL_REDUCE, PER_RANK_REDUCE, PER_RANK_NO_REDUCE]
per_rank_share_run: bool

Adds class:

class LoggingMode(Enum):
    GLOBAL_REDUCE = "global_reduce"
    PER_RANK_REDUCE = "per_rank_reduce"
    PER_RANK_NO_REDUCE = "per_rank_no_reduce"

Introduces the "PER_RANK_NO_REDUCE" mode. This means we call backend.log(metric) as soon as we get it, without any reduction.

Before, MetricLogger.push(metric) would just collect the metric. Now, it also logs.

def push(self, metric: Metric) -> None:
      # flush in "PER_RANK_NO_REDUCE" mode
      for backend in self.per_rank_no_reduce_backends:
            backend.log_stream(metric=metric, global_step=self.global_step)

      # Always accumulate for reduction and state return
        key = metric.key
        if key not in self.accumulators:
            self.accumulators[key] = metric.reduction.accumulator_class(
                metric.reduction
            )
        self.accumulators[key].append(metric.value)

Notice how x-axis is timestamp:

Main design change: logger backends now have async def log_batch and def log_stream. It not totally clear to me if both should be async/sync or if i should try to unify them.

class LoggerBackend(ABC):
    """Abstract logger_backend for metric logging, e.g. wandb, jsonl, etc."""

    def __init__(self, logger_backend_config: dict[str, Any]) -> None:
        self.logger_backend_config = logger_backend_config

    @abstractmethod
    async def init(
        self,
        role: BackendRole,
        primary_logger_metadata: dict[str, Any] | None = None,
        process_name: str | None = None,
    ) -> None:
        """Initializes backend, e.g. wandb.run.init()."""
        pass

    @abstractmethod
    async def log_batch(
        self, metrics: list[Metric], global_step: int, *args, **kwargs
    ) -> None:
        """Log batch of accumulated metrics to backend"""
        pass

    def log_stream(self, metric: Metric, global_step: int, *args, **kwargs) -> None:
        """Stream single metric to backend immediately."""
        pass

    async def finish(self) -> None:
        pass

    def get_metadata_for_secondary_ranks(self) -> dict[str, Any] | None:
        """Return sharable state after primary init (e.g., for shared modes). Called only on globals."""
        return None

…estamp_logging_diff2

…estamp_logging_diff3

Felipe Mello added 21 commits October 8, 2025 08:38

commit

77488cf

commit

feb4771

update backend role typehints and enum

41ceaa4

update where we check FORGE_DISABLE_METRICS

8a24e71

remove protected import

3f3bc51

Merge branch 'timestamp_logging_diff1' into timestamp_logging_diff2

d82c354

protect import

4fe2611

Merge branch 'timestamp_logging_diff1' into timestamp_logging_diff2

8759bc8

Merge branch 'main' of https://github.com/meta-pytorch/forge into tim…

fbb4a9e

…estamp_logging_diff2

record_metric uses dataclass Metric

d81a4ed

commit

1e2255d

Merge branch 'main' of https://github.com/meta-pytorch/forge into tim…

a94c612

…estamp_logging_diff3

commit

5b477e8

commit

f2b3eed

revert

471b88a

Merge branch 'timestamp_logging_diff2_5' into timestamp_logging_diff3

1a02784

remove unnecessary code

fa4895f

better logging

7bb1fe7

docs/names

43d5d27

Merge branch 'timestamp_logging_diff2_5' into timestamp_logging_diff3

c97eb98

commit

75355a2

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 9, 2025

felipemello1 changed the title ~~Metric Logging updates 5/N~~ [draft] Metric Logging updates 5/N Oct 9, 2025

Felipe Mello added 2 commits October 9, 2025 12:52

Merge branch 'main' of https://github.com/meta-pytorch/forge into tim…

70e9c67

…estamp_logging_diff3

Merge branch 'timestamp_logging_diff3' into timestamp_logging_diff4

12f77c9

felipemello1 marked this pull request as ready for review October 9, 2025 20:56

felipemello1 requested a review from allenwang28 October 9, 2025 20:59

felipemello1 changed the title ~~[draft] Metric Logging updates 5/N~~ Metric Logging updates 5/N Oct 9, 2025

felipemello1 requested review from ebsmothers and joecummings October 10, 2025 22:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metric Logging updates 5/N #363

Metric Logging updates 5/N #363

Uh oh!

felipemello1 commented Oct 9, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Metric Logging updates 5/N #363

Are you sure you want to change the base?

Metric Logging updates 5/N #363

Uh oh!

Conversation

felipemello1 commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

felipemello1 commented Oct 9, 2025 •

edited

Loading