Skip to content

Conversation

@XG-xin
Copy link

@XG-xin XG-xin commented Dec 5, 2025

Motivation

Move reasoning tokens from metadata to metrics in nodejs openai tests.
Need to configure to skip some old dd-trace versions.

Changes

Workflow

  1. ⚠️ Create your PR as draft ⚠️
  2. Work on you PR until the CI passes
  3. Mark it as ready for review
    • Test logic is modified? -> Get a review from RFC owner.
    • Framework is modified, or non obvious usage of it -> get a review from R&P team

🚀 Once your PR is reviewed and the CI green, you can merge it!

🛟 #apm-shared-testing 🛟

Reviewer checklist

  • If PR title starts with [<language>], double-check that only <language> is impacted by the change
  • No system-tests internal is modified. Otherwise, I have the approval from R&P team
  • A docker base image is modified?
    • the relevant build-XXX-image label is present
  • A scenario is added (or removed)?

@github-actions
Copy link
Contributor

github-actions bot commented Dec 5, 2025

CODEOWNERS have been resolved as:

tests/integration_frameworks/llm/openai/test_openai_llmobs.py           @DataDog/ml-observability

@XG-xin XG-xin marked this pull request as ready for review December 9, 2025 19:20
@XG-xin XG-xin requested a review from a team as a code owner December 9, 2025 19:20
@sabrenner
Copy link
Contributor

I think what we can do to best avoid issues with updating the repos is

  1. Split out the Responses tests into their own test class in test_openai_llmobs.py, ie
@features.llm_observability_openai_llm_interactions
@scenarios.integration_frameworks
class TestOpenAiResponses(BaseOpenaiTest):

then in each of the nodejs.yml and python.yml manifests, we can mark those test classes as irrelevant

test_openai_llmobs.py:
  TestOpenAiEmbeddingInteractions: *ref_5_80_0
  TestOpenAiLlmInteractions: *ref_5_80_0
  TestOpenAiPromptTracking: missing_feature
  TestOpenAiResponses: irrelevant

and similarly for python.yml. We can update these once the features land in a given release.

This will resolve the responses tests. For the other tests where we assert reasoning tokens are present but 0, we can do

assert_llmobs_span_event(
  ...
  metrics=mock.ANY
)

# assert input, output, total, and maybe cached tokens separately
assert llm_span_event["metrics"]["input_tokens"] = ...

to make the tests version-independent and more resilient to future metrics, as we only want to assert specific tokens in those tests. lmk if this makes/doesn't make sense!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants