Skip to content

Conversation

@sufeng-buaa
Copy link
Contributor

Motivation

According to #17482, organize the observability-related code, remove redundant code, and unify the interfaces for time statistics, request latency metrics, and request tracing.

Modifications

  • Move observability-related code to python/sglang/srt/observability
  • Remove redundant code, refactor inappropriate code, and correct non-standard naming.
  • Timestamps Record
    • In python/sglang/srt/observability/req_time_stats.py, APIServerReqTimeStats, DPControllerReqTimeStats, and SchedulerReqTimeStats are defined to record timestamp information for the tokenizer/gRPC server, dp controller, and scheduler, respectively. A series of set_time methods are provided to set timestamps, along with get methods to calculate latency.
    • Uniformly use MONOTONIC TIME, and update the time difference between MONOTONIC TIME and REALTIME upon each incoming request, for converting to REALTIME when necessary.
  • Request latency metrics
    • Define the base class ReqTimeStatsBase for APIServerReqTimeStats, DPControllerReqTimeStats, and SchedulerReqTimeStats, integrating a metrics collector inside it. Export latency information to the metrics collector within each set_*_time method.
  • Request Tracing
    • Define the base class ReqTimeStatsBase for APIServerReqTimeStats, DPControllerReqTimeStats, and SchedulerReqTimeStats, integrating a trace context inside it. Export trace spans within each set_*_time method. Define getstate and setstate of ReqTimeStatsBase to propagate the trace context.
    • Refactor the tracing package and optimize the span structure.
    • Support trace levels, and dynamically adjust trace levels via HTTP API.
    • Support tracing for requests with parallel_sample_num > 1.
    • Support tracing for request retract

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jan 28, 2026
@sufeng-buaa
Copy link
Contributor Author

/tag-and-rerun-ci

@sufeng-buaa
Copy link
Contributor Author

/gemini review

@gemini-code-assist
Copy link
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@sufeng-buaa sufeng-buaa force-pushed the sufeng-buaa/observability_integration branch from fbb3f87 to c381672 Compare January 29, 2026 05:45
Copy link
Collaborator

@ShangmingCai ShangmingCai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The disaggregation part LGTM.

global_diff_realtime_monotonic = time.time() - time.perf_counter()


def calibrate_time_diff():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using global_diff_realtime_monotonic is a good idea for converting perf_counter values to timestamps. However, when DP > 1, you may have a single tokenization manager but multiple schedulers running across different devices. In that case, you need to be careful: each process/device can have its own monotonic clock offset, so the conversion may be inconsistent across ranks unless those offsets are synchronized or computed per rank.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Monotonic time is typically obtained via the Linux kernel's vDSO (or system call), and the Linux kernel ensures timestamp consistency across different CPUs. Even in extremely rare cases, non-observability functionalities only rely on time retrieved within the same process, while observability features can fully tolerate such minimal timing discrepancies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if the tokenize manager and the scheduler are running on completely different machines, will the monotonic time they obtain still be consistent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Correct monotonic time during deserialization by propagating it with a diff.

@sufeng-buaa sufeng-buaa force-pushed the sufeng-buaa/observability_integration branch 2 times, most recently from 2c1c70f to 8092b3f Compare January 30, 2026 10:44
@sufeng-buaa
Copy link
Contributor Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and well-executed refactoring of the observability code. The changes centralize timing, metrics, and tracing logic into new ReqTimeStats and TraceContext objects, which greatly improves code organization and maintainability. The new API for tracing is much cleaner and more powerful, with features like dynamic trace levels. The use of monotonic time and careful handling of time across processes are also commendable. Overall, this is a high-quality refactoring that enhances the observability of the system. My review includes a few minor suggestions for improving the documentation.

@sufeng-buaa
Copy link
Contributor Author

/rerun-failed-ci

@sufeng-buaa sufeng-buaa force-pushed the sufeng-buaa/observability_integration branch 2 times, most recently from c64a0eb to 457d9b6 Compare February 2, 2026 12:20
@ishandhanani
Copy link
Collaborator

ishandhanani commented Feb 2, 2026

The consolidation and tracing pieces overall LGTM. AI review https://app.devin.ai/review/sgl-project/sglang/pull/17862 concurs overall

@sufeng-buaa sufeng-buaa force-pushed the sufeng-buaa/observability_integration branch from 457d9b6 to 5de694f Compare February 4, 2026 06:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation high priority run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants