Skip to content

Conversation

RanVaknin
Copy link
Contributor

@RanVaknin RanVaknin commented Sep 5, 2025

This PR adds:

  1. New utils-lite code package that provides SdkInternalThreadLocal, a wrapper around threadLocal.
  2. Support for trace id propagation using the new utils-lite class

Background

Previously, we implemented trace ID propagation using SLF4J's MDC in PR #6363, but this was
reverted because the MDC interface exists but the implementation is not provided by the SDK, Lambda runtime, or X-Ray SDK.

Solution

Added a small utils-lite utility class that provides thread local key value storage using ThreadLocal<Map<String, String>>. For this case, it allows the Lambda Runtime Interface Client, AWS SDK, and X-Ray SDK to share trace context via this one package, but can extended to other use cases.

Example:

SdkInternalThreadLocal.put("some-value", foo);
String traceId = SdkInternalThreadLocal.get("some-value");
SdkInternalThreadLocal.remove("some-value");
SdkInternalThreadLocal.clear();

@RanVaknin RanVaknin requested a review from a team as a code owner September 5, 2025 17:59
@RanVaknin RanVaknin added the api-surface-area-approved-by-team Indicate API surface area introduced by this PR has been approved by team label Sep 5, 2025
@RanVaknin RanVaknin force-pushed the feature/master/utils-lite-lambda-trace branch from 9badcf1 to 85642fe Compare September 11, 2025 17:58
.importPackages("software.amazon.awssdk.utilslite");

@Test
public void utilsLitePackage_shouldOnlyContainAllowedClasses() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a requirement ?
Why is this test required ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was following @zoewangg suggestion to add a manual check to make sure we don't add classes to this package unintentionally. The sole purpose of this package as of now is to add this wrapper. If we want to expand this in the future, this presents another checkpoint.

private static void saveTraceId(ExecutionAttributes executionAttributes) {
String traceId = executionAttributes.getAttribute(TRACE_ID);
if (traceId != null) {
SdkInternalThreadLocal.put(CONCURRENT_TRACE_ID_KEY, executionAttributes.getAttribute(TRACE_ID));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see SdkInternalThreadLocal we do a put and we never call remove/clear thus if TRACE_ID is unique then the map in SdkInternalThreadLocal will keep growing . How do we clear/remove CONCURRENT_TRACE_ID_KEY for request that are compete

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a thread goes back to thread pool, and a fresh request is being fired from the Lambda Runtime Client interface, it will contain a brand new traceID stored under the same key on the map so it will overwrite itself.

If the thread is killed, then when a new thread is being spun up, a clean map is created.


@Override
public void beforeExecution(Context.BeforeExecution context, ExecutionAttributes executionAttributes) {
String traceId = SdkInternalThreadLocal.get(CONCURRENT_TRACE_ID_KEY);
Copy link
Contributor

@joviegas joviegas Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are using ThreadLocal can we confirm if this can come across virtual thread ?
As in if customer makes call on a virtual thread , if the impact has been investigated ?

public void beforeExecution(Context.BeforeExecution context, ExecutionAttributes executionAttributes) {
String traceId = SdkInternalThreadLocal.get(CONCURRENT_TRACE_ID_KEY);
if (traceId != null) {
executionAttributes.putAttribute(TRACE_ID, traceId);
Copy link
Contributor

@joviegas joviegas Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I was thinking about this case where client is running on host where SdkInternalThreadLocal was not cleared/not-used

1. beforeExecution():
   - SdkInternalThreadLocal.get(CONCURRENT_TRACE_ID_KEY) → null (first time)
   - Creates HashMap for Thread-1 (empty)

Now lets imagine the user is creating new request for every thread as below

for (int i = 0; i < 1_000_000; i++) {
    new Thread(() -> {
        AmazonS3 client = AmazonS3ClientBuilder.standard().build();
        client.getObject(...);  // Creates HashMap for this thread
        client.shutdown();
        // Thread dies, but not sure what will happen to the above empty HashMap instantaited
    }).start();
}

Result: 1,000,000 HashMaps created ? Can we please confirm this will not cause any leak of empty hash maps ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great call out. I didnt think about this. I changed it so that the hashmap only gets initialized when .put() is being called (initially called by lambda RIC)

Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-surface-area-approved-by-team Indicate API surface area introduced by this PR has been approved by team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants