-
Notifications
You must be signed in to change notification settings - Fork 904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add instrumentation of AWS Bedrock to use gen_ai conventions #13355
base: main
Are you sure you want to change the base?
Conversation
…nstrumentation into bedrock-genai
@@ -72,6 +89,7 @@ tasks { | |||
} | |||
|
|||
check { | |||
dependsOn(testing.suites) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note this causes other suites defined above to be run that I think previously weren't
isEnabled = false; | ||
} else { | ||
try { | ||
Class.forName("software.amazon.awssdk.services.bedrockruntime.model.ConverseRequest"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think unlike SQS / SNS, because there is no entrypoint added for wrapping, I need this for safety when bedrock isn't included (like in :test
target)
@@ -128,6 +131,7 @@ public TracingExecutionInterceptor( | |||
} | |||
|
|||
@Override | |||
@SuppressWarnings("deprecation") // need to access deprecated signer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even with compileOnly
, the compile versions of the other artifacts also get bumped causing some new deprecations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for bringing this to bare and also introducing infrastructure with a minimal change before doing too much, @anuraaga! I'm clicking approve but I'm not an approver, but anyway I approve ;)
Those looking, if you scroll bottom-up you'll see the recorded request, which is also how other genai SDKs have been working due to norms in other SDKs and also routine edge cases of responses which change more frequently than some might expect (not specific about bedrock)
RECORD_WITH_REAL_API is the key where it allows someone with credentials to record, without manual copy/paste, which allows the tests to run and survive refactoring etc vs needing credentials always.
cc'ing a few folks who I know are interested in java side like @karthikscale3 @ThomasVitale and @salaboy
Good stuff! I am happy to see progress in these areas! @JonasKunz :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, asserts in tests looks familiar to what we have in the python implementation :)
@trask I noticed it says "requested a review from a team", not the team (on mobile it says "unknown") and there aren't any reviewers, so maybe there is something wrong with the assign script. Would you be able to help find a reviewer? Thanks! |
@trask Sorry might be a red herring. @codefromthecrypt showed that he sees a reviewer listed, it is just not showing up for me. Perhaps there is a visibility restriction on the team. If there are eyes on the PR already, no worries |
Great work! 👍 However, I have some concerns. Similar instrumentation also appears in frameworks like Spring AI for GenAI, which serves as a higher-level API. Clearly, Spring AI also references the AWS SDK. I think this may cause some confusion for new users, as they have to deal with the existence of two similar spans and add some configuration to suppress one of them. Of course, this is definitely a common situation, similar to the relationship between Kafka and Spring-Kafka. The difference is that we can automatically suppress one of them. However, given that Spring AI provides library instrumentation, addressing this issue may become more challenging. cc @trask @ThomasVitale I think we need to consider these issues, but I certainly don’t believe this will become a barrier for merging this PR. :) |
BTW, do you plan to log the GenAI prompts and completions in events? I didn't see those elements in this PR. If you would be willing to share your approach, that would be incredibly helpful! |
Hi @Cirilla-zmh - yes, this is just an initial PR to add the minimum level of instrumentation, while setting up the test infrastructure that will allow more easily iterating on features with smaller PRs. Will be following up with events (we'll probably use the stable log API directly), metrics, etc. |
ConverseRequest(BEDROCK_RUNTIME, "ConverseRequest", request("gen_ai.request.model", "modelId")), | ||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConverseRequest(BEDROCK_RUNTIME, "ConverseRequest", request("gen_ai.request.model", "modelId")), | |
; | |
ConverseRequest(BEDROCK_RUNTIME, "ConverseRequest", request("gen_ai.request.model", "modelId")); |
trace -> | ||
trace.hasSpansSatisfyingExactly( | ||
span -> | ||
span.hasName("chat amazon.titan-text-lite-v1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also assert the spanKind for all of these
span.hasName("chat amazon.titan-text-lite-v1") | |
span.hasName("chat amazon.titan-text-lite-v1") | |
.hasKind(SpanKind.INTERNAL) |
equalTo(GEN_AI_USAGE_INPUT_TOKENS, 8), | ||
equalTo(GEN_AI_USAGE_OUTPUT_TOKENS, 14), | ||
equalTo( | ||
GEN_AI_RESPONSE_FINISH_REASONS, Arrays.asList("end_turn"))))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arrays.asList
is used several times, could do static import
GEN_AI_RESPONSE_FINISH_REASONS, Arrays.asList("end_turn"))))); | |
GEN_AI_RESPONSE_FINISH_REASONS, asList("end_turn"))))); |
Hello - I'm working on getting genai conventions implemented in OTel instrumentation. This adds genai conventions for AWS bedrock calls. It is a Java version of what's in Python and being worked on for JS
This is an initial implementation to include the minimum support, just
Converse
API with span attributes. Will followup with metrics, log events, other APIs, etc.One point for the gen ai instrumentations is we're generally using an API recording mechanism rather than crafted mocks - this has become the industry standard for genai instrumentations due to the complexity and non-determinism in LLM responses. i.e., Python is also using a similar mechanism. While I've kept the implementation local to aws-sdk for now, it is with the intent that it could be moved to testing to be used in a future openai sdk instrumentation, or others.
/cc @codefromthecrypt