Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add instrumentation of AWS Bedrock to use gen_ai conventions #13355

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

anuraaga
Copy link
Contributor

@anuraaga anuraaga commented Feb 20, 2025

Hello - I'm working on getting genai conventions implemented in OTel instrumentation. This adds genai conventions for AWS bedrock calls. It is a Java version of what's in Python and being worked on for JS

This is an initial implementation to include the minimum support, just Converse API with span attributes. Will followup with metrics, log events, other APIs, etc.

One point for the gen ai instrumentations is we're generally using an API recording mechanism rather than crafted mocks - this has become the industry standard for genai instrumentations due to the complexity and non-determinism in LLM responses. i.e., Python is also using a similar mechanism. While I've kept the implementation local to aws-sdk for now, it is with the intent that it could be moved to testing to be used in a future openai sdk instrumentation, or others.

/cc @codefromthecrypt

@anuraaga anuraaga requested a review from a team as a code owner February 20, 2025 05:30
@@ -72,6 +89,7 @@ tasks {
}

check {
dependsOn(testing.suites)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this causes other suites defined above to be run that I think previously weren't

isEnabled = false;
} else {
try {
Class.forName("software.amazon.awssdk.services.bedrockruntime.model.ConverseRequest");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think unlike SQS / SNS, because there is no entrypoint added for wrapping, I need this for safety when bedrock isn't included (like in :test target)

@@ -128,6 +131,7 @@ public TracingExecutionInterceptor(
}

@Override
@SuppressWarnings("deprecation") // need to access deprecated signer
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with compileOnly, the compile versions of the other artifacts also get bumped causing some new deprecations

Copy link

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing this to bare and also introducing infrastructure with a minimal change before doing too much, @anuraaga! I'm clicking approve but I'm not an approver, but anyway I approve ;)

Those looking, if you scroll bottom-up you'll see the recorded request, which is also how other genai SDKs have been working due to norms in other SDKs and also routine edge cases of responses which change more frequently than some might expect (not specific about bedrock)

RECORD_WITH_REAL_API is the key where it allows someone with credentials to record, without manual copy/paste, which allows the tests to run and survive refactoring etc vs needing credentials always.

cc'ing a few folks who I know are interested in java side like @karthikscale3 @ThomasVitale and @salaboy

@salaboy
Copy link

salaboy commented Feb 20, 2025

Good stuff! I am happy to see progress in these areas! @JonasKunz :)

Copy link

@xrmx xrmx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, asserts in tests looks familiar to what we have in the python implementation :)

@anuraaga
Copy link
Contributor Author

@trask I noticed it says "requested a review from a team", not the team (on mobile it says "unknown") and there aren't any reviewers, so maybe there is something wrong with the assign script. Would you be able to help find a reviewer? Thanks!

@anuraaga
Copy link
Contributor Author

@trask Sorry might be a red herring. @codefromthecrypt showed that he sees a reviewer listed, it is just not showing up for me. Perhaps there is a visibility restriction on the team. If there are eyes on the PR already, no worries

image

@Cirilla-zmh
Copy link
Contributor

Great work! 👍

However, I have some concerns. Similar instrumentation also appears in frameworks like Spring AI for GenAI, which serves as a higher-level API. Clearly, Spring AI also references the AWS SDK. I think this may cause some confusion for new users, as they have to deal with the existence of two similar spans and add some configuration to suppress one of them.

Of course, this is definitely a common situation, similar to the relationship between Kafka and Spring-Kafka. The difference is that we can automatically suppress one of them. However, given that Spring AI provides library instrumentation, addressing this issue may become more challenging. cc @trask @ThomasVitale

I think we need to consider these issues, but I certainly don’t believe this will become a barrier for merging this PR. :)

@Cirilla-zmh
Copy link
Contributor

BTW, do you plan to log the GenAI prompts and completions in events? I didn't see those elements in this PR. If you would be willing to share your approach, that would be incredibly helpful!

@anuraaga
Copy link
Contributor Author

Hi @Cirilla-zmh - yes, this is just an initial PR to add the minimum level of instrumentation, while setting up the test infrastructure that will allow more easily iterating on features with smaller PRs. Will be following up with events (we'll probably use the stable log API directly), metrics, etc.

Comment on lines +124 to +125
ConverseRequest(BEDROCK_RUNTIME, "ConverseRequest", request("gen_ai.request.model", "modelId")),
;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ConverseRequest(BEDROCK_RUNTIME, "ConverseRequest", request("gen_ai.request.model", "modelId")),
;
ConverseRequest(BEDROCK_RUNTIME, "ConverseRequest", request("gen_ai.request.model", "modelId"));

trace ->
trace.hasSpansSatisfyingExactly(
span ->
span.hasName("chat amazon.titan-text-lite-v1")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also assert the spanKind for all of these

Suggested change
span.hasName("chat amazon.titan-text-lite-v1")
span.hasName("chat amazon.titan-text-lite-v1")
.hasKind(SpanKind.INTERNAL)

equalTo(GEN_AI_USAGE_INPUT_TOKENS, 8),
equalTo(GEN_AI_USAGE_OUTPUT_TOKENS, 14),
equalTo(
GEN_AI_RESPONSE_FINISH_REASONS, Arrays.asList("end_turn")))));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arrays.asList is used several times, could do static import

Suggested change
GEN_AI_RESPONSE_FINISH_REASONS, Arrays.asList("end_turn")))));
GEN_AI_RESPONSE_FINISH_REASONS, asList("end_turn")))));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants