Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: X-Amzn-Trace-Id Header Growth in instrumentFetch Causes 431 Errors When Using SQS Triggers #3692

Closed
drewjohnston-0000 opened this issue Mar 4, 2025 · 4 comments
Labels
not-a-bug New and existing bug reports incorrectly submitted as bug rejected This is something we will not be working on. At least, not in the measurable future tracer This item relates to the Tracer Utility

Comments

@drewjohnston-0000
Copy link

Expected Behavior

  • The X-Amzn-Trace-Id header is added once per outgoing request.

  • On retries, the instrumentation should either replace the trace header or prevent duplicate appends so that the header does not grow over successive invocations.

Current Behavior

  • When a Lambda function triggered by SQS (with DLQ enabled) fails and is retried, the Powertools middleware re-applies the X-Amzn-Trace-Id header.
  • This causes the header to accumulate extra trace context (additional Parent and Sampled values) on each retry.
  • Eventually, the header becomes so large that nginx returns a 431 "Request Header Fields Too Large" error.

Code snippet

public instrumentFetch(): void {
  /**
   * Create a segment at the start of a request made with `undici` or `fetch`.
   *
   * That `message` must be `unknown` because that's the type expected by `subscribe`
   *
   * @param message The message received from the `undici` channel
   */
  const onRequestStart = (message: unknown): void => {
    const { request } = message as DiagnosticsChannel.RequestCreateMessage;

    const parentSubsegment = this.getSegment();
    const requestURL = getRequestURL(request);
    if (parentSubsegment && requestURL) {
      const method = request.method;

      const subsegment = parentSubsegment.addNewSubsegment(
        requestURL.hostname
      );
      subsegment.addAttribute('namespace', 'remote');

      // addHeader is not part of the type definition but it's available:
      // https://github.com/nodejs/undici/blob/main/docs/docs/api/DiagnosticsChannel.md#undicirequestcreate
      // @ts-expect-error
      request.addHeader(
        'X-Amzn-Trace-Id',
        `Root=${environmentVariablesService.getXrayTraceId()};Parent=${subsegment.id};Sampled=${subsegment.notTraced ? '0' : '1'}`
      );

      (subsegment as HttpSubsegment).http = {
        request: {
          url: `${requestURL.protocol}//${requestURL.hostname}${requestURL.pathname}`,
          method,
        },
      };

      this.setSegment(subsegment);
    }
  };

  /**
   * Enrich the subsegment with the response details, and close it.
   * Then, set the parent segment as the active segment.
   *
   * `message` must be `unknown` because that's the type expected by `subscribe`
   *
   * @param message The message received from the `undici` channel
   */
  // ... rest of the existing code ...
}

Steps to Reproduce

  • Deploy a Lambda function instrumented using Powertools for AWS Lambda (TypeScript) with tracing enabled that uses undici to make downstream api calls.
  • Configure the Lambda to be invoked via an SQS event source with a DLQ in place.
  • Cause the Lambda function to fail (for example, by throwing an error) so that the message is retried.
  • Observe that on each retry, the instrumentFetch method appends an additional X-Amzn-Trace-Id header.
  • Over several retries, inspect the outgoing HTTP requests to find that the header has grown, eventually triggering 431 errors from nginx.

Possible Solution

  • Modify the instrumentFetch implementation to check for an existing X-Amzn-Trace-Id header and either replace it or prevent appending multiple times.
  • Alternatively, provide an option or configuration to reset the trace header on Lambda invocations, ensuring that only one trace header is sent per outgoing request.
  • Consider truncating or simplifying the header, especially for retry scenarios, to ensure it stays within acceptable size limits.

Powertools for AWS Lambda (TypeScript) version

latest

AWS Lambda function runtime

22.x

Packaging format used

Lambda Layers

Execution logs

@drewjohnston-0000 drewjohnston-0000 added bug Something isn't working triage This item has not been triaged by a maintainer, please wait labels Mar 4, 2025
Copy link

boring-cyborg bot commented Mar 4, 2025

Thanks for opening your first issue here! We'll come back to you as soon as we can.
In the meantime, check out the #typescript channel on our Powertools for AWS Lambda Discord: Invite link

@dreamorosi
Copy link
Contributor

dreamorosi commented Mar 4, 2025

Hi @drewjohnston-0000 thank you for opening this issue.

I've been trying to reproduce the behavior you have described but I am unable to do so.

I have created a stack with a SQS queue, a DQL, and an API. The SQS queue has a consumer function (the one that should be exhibiting the bug) and the API also has a function that simply logs the header of the incoming request - this second function replaces the NGINX server you mentioned in your request.

The SQS consumer function uses Tracer to instrument fetch requests made with the undici package like you specified. The function processes a batch of records coming from the queue and forces a failure after making the request. There's a bit more to the function to process the batch, sign requests, and log, but that's pretty much what you described (if I understood it correctly).

See consumer function code
import {
  BatchProcessor,
  EventType,
  processPartialResponse,
} from '@aws-lambda-powertools/batch';
import { Tracer } from '@aws-lambda-powertools/tracer';
import type { SQSHandler, SQSRecord } from 'aws-lambda';
import { fetch } from 'undici';
import { AwsClient } from 'aws4fetch';
import { Logger } from '@aws-lambda-powertools/logger';

const logger = new Logger({ logLevel: 'DEBUG' });
const signer = new AwsClient({
  region: process.env.AWS_REGION,
  service: 'lambda',
  accessKeyId: process.env.AWS_ACCESS_KEY_ID || '',
  secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY || '',
  sessionToken: process.env.AWS_SESSION_TOKEN,
});
const tracer = new Tracer();
const processor = new BatchProcessor(EventType.SQS);
const API_URL = process.env.API_URL || '';

const recordHandler = async (record: SQSRecord) => {
  const { body } = record;
  const { action } = JSON.parse(body) as {
    action: 'fail' | 'succeed';
  };

  try {
    logger.debug('url', {
      url: API_URL,
    });
    const request = await signer.sign(API_URL, {
      method: 'POST',
    });
    logger.debug('signed', { request: request.headers });
    const response = await fetch(API_URL, request);
    const body = await response.json();
    if (action === 'fail') {
      throw new Error(`Failing request: ${JSON.stringify(body)}`);
    }
    return body;
  } catch (error) {
    tracer.addErrorAsMetadata(error as Error);
    throw error;
  }
};

export const handler: SQSHandler = async (event, context) => {
  return processPartialResponse(event, recordHandler, processor, {
    context,
  });
};

On the other hand, the API function is very simple and its only purpose is to log the headers of the request. If the bug is confirmed the size of the x-amzn-trace-id header should increase at every subsequent retry.

See API function code
import { Logger } from '@aws-lambda-powertools/logger';
import type { APIGatewayProxyEventV2 } from 'aws-lambda';

const logger = new Logger({ logLevel: 'DEBUG' });

export const handler = async (event: APIGatewayProxyEventV2) => {
  logger.debug('event headers', {
    headers: event.headers,
  });
  return {
    statusCode: 200,
    body: JSON.stringify({
      message: 'Hello from Lambda!',
    }),
  };
};

After deploying the stack, I have then sent a single message to the queue with this payload:

{"action": "fail"}

Then, I observed the trace data, and confirmed that the request is being instrumented (see example below):

Image

Finally, I checked the logs of both functions and confirmed that the trace headers looked like this in all the three retries:

First request - "x-amzn-trace-id": "Self=1-67c70bc7-26d76bd768b96b4016dbb0f3;Root=1-67c70bc6-d4d6d6c475f1be02cdb23271;Parent=e9db11cd30c722df;Sampled=1",
First retry - "x-amzn-trace-id": "Self=1-67c70be5-3cb7149c659c781420046d78;Root=1-67c70be4-929ff72656370db96e17c480;Parent=3b20f2e887f7d51a;Sampled=1",
Second retry - "x-amzn-trace-id": "Self=1-67c70c03-7664698b78d1b0371aff4edd;Root=1-67c70c02-3b2624e1fbad9afb6c5636ec;Parent=2613b5b73b864bf4;Sampled=1",

In all the three requests the trace header that arrived to the API has the expected components and doesn't appear to be repeated or otherwise abnormally large.

You can find the entire stack with all the components, and optionally try the test for yourself, at this repo: https://github.com/dreamorosi/3692


As a side note, while it's true that we don't check for an existing header nor clear existing headers, this is because the Request object we're applying the header to is always a brand new one, and thus there's no case (afaik) for it to be reused. Additionally, a retried SQS message is inherently part of a new Lambda invocation and this in itself excludes object/scope reuse.

The only cases I can think of - unless I'm missing something or have misunderstood the case - is that you're already adding the trace id manually, in that case the solution would be to not do that and let the Tracer add the header.

With this in mind, could you please provide a minimal reproduction example similar to the one I share above that consistently reproduces the bug? This would help us understand what's happening and hopefully provide a fix.

@dreamorosi dreamorosi added tracer This item relates to the Tracer Utility not-a-bug New and existing bug reports incorrectly submitted as bug need-response This item requires a response from a customer and will considered stale after 2 weeks and removed bug Something isn't working triage This item has not been triaged by a maintainer, please wait labels Mar 4, 2025
@dreamorosi dreamorosi moved this from Triage to Pending customer in Powertools for AWS Lambda (TypeScript) Mar 4, 2025
Copy link
Contributor

This issue has not received a response in 2 weeks. If you still think there is a problem, please leave a comment to avoid the issue from automatically closing.

@github-actions github-actions bot added the pending-close-response-required This issue will be closed soon unless the discussion moves forward label Mar 19, 2025
Copy link
Contributor

Greetings! We are closing this issue because it has been open a long time and hasn’t been updated in a while and may not be getting the attention it deserves. We encourage you to check if this is still an issue in the latest release and if you find that this is still a problem, please feel free to comment or reopen the issue.

@github-actions github-actions bot added the rejected This is something we will not be working on. At least, not in the measurable future label Mar 26, 2025
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 26, 2025
@github-project-automation github-project-automation bot moved this from Pending customer to Coming soon in Powertools for AWS Lambda (TypeScript) Mar 26, 2025
@dreamorosi dreamorosi removed pending-close-response-required This issue will be closed soon unless the discussion moves forward need-response This item requires a response from a customer and will considered stale after 2 weeks labels Mar 26, 2025
@dreamorosi dreamorosi moved this from Coming soon to Closed in Powertools for AWS Lambda (TypeScript) Mar 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
not-a-bug New and existing bug reports incorrectly submitted as bug rejected This is something we will not be working on. At least, not in the measurable future tracer This item relates to the Tracer Utility
Projects
Development

No branches or pull requests

2 participants