Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Add multiple dimensets to the same Metrics instance #6198

Open
2 tasks done
north-star-saj opened this issue Mar 2, 2025 · 22 comments
Open
2 tasks done
Assignees
Labels

Comments

@north-star-saj
Copy link

north-star-saj commented Mar 2, 2025

Edit history

03/20/2025: Updated EMF spec and code snippit for accuracy.

Use case

Overview

Enabling the AWS Powertools Python package to support multiple dimension sets for the same Metrics instance would significantly enhance the packages monitoring capabilities. This feature would allow users to gain more granular insights and comprehensive views of their applications by creating aggregating metrics across various dimensions.

Reference Code

This feature request is akin to the aws_embedded_metrics (code link) put_dimensions method which adds a dimension set to a common MetricsContext. The metrics then get serialized into Embedded Metric Format (EMF) with multiple dimension sets (code link).

Example Use Case

This is a simplified example that demonstrates my use case. In my usecase, one of these dimensions's values is not known in advance. Instead, it is dynamically retrieved.

I am monitoring a lambda that gets deployed to two environments (beta and production) across three regions (us-east-1, us-west-1, and eu-west-1). My lambda produces application-specific metrics such as SuccessfulRequests, FailedRequests, and RetryCount. By creating multiple dimension sets across these three dimensions, I can create aggregate metrics which enable a comprehensive view of my application.

The generated EMF log may look something like this:

03/20/2025 edit: fixed incorrect emf spec with repetitive dimensions / keys

{
  "_aws": {
    "Timestamp": 946684800000,
    "CloudWatchMetrics": [
      {
        "Namespace": "MyApplication",
        "Dimensions": [
          [
            "Environment",
            "Region"
          ],
          [
            "Enviornment"
          ],
          [
            "Region"
          ]
        ],
        "Metrics": [
          {
            "Name": "SuccessfulRequests",
            "Unit": "Count"
          },
          {
            "Name": "FailedRequests",
            "Unit": "Count"
          },
          {
            "Name": "RetryCount",
            "Unit": "Count"
          }
        ]
      }
    ]
  },
  "Environment": "Production",
  "Region": "us-west-2",
  "SuccessfulRequests": 20,
  "FailedRequests": 0,
  "RetryCount": 1
}

Benefits

  • Improved granularity in monitoring and alerting: Supports better visibility into metrics for environments, regions, and other dynamic factors.
  • Easier aggregation of metrics across various dimensions, making it simpler to analyze and report on complex data sets.
  • Automation of dimension management reduces manual errors and maintenance overhead.

Solution/User Experience

The following is an example uses a new add_dimension_set method defined in the AmazonCloudWatchEMFProvider class.

Edit on 03/20/2025: added missing add_dimension_set call to code snippit

# logic which would produce the example EMF log from the ticket
@metrics.log_metrics
def lambda_handler(event: dict, context: LambdaContext):
    metrics.add_dimension_set({"environment": STAGE})
    metrics.add_dimension_set({"environment:" STAGE, "region": REGION})
    metrics.add_dimension_set({"region": REGION})
    metrics.add_metric(name="SuccessfulRequests", unit=MetricUnit.Count, value=20)
    metrics.add_metric(name="FailedRequests", unit=MetricUnit.Count, value=0)
    metrics.add_metric(name="RetryCount", unit=MetricUnit.Count, value=1)

Considerations:

From my point of view, these are a few considerations that will need to be accounted for as part of this feature request.

  • Does add_dimension add the dimension to all dimension sets? How does it work when invoked before or after the add_dimension_set method?
  • Does add_dimension_set handle duplicate dimensions? What about duplicate dimension keys, but differing values?
  • Is the new method name clear to existing and future customers (e.g. add_dimension_set) ?

Alternatives

I've considered the following alternatives. Each of these solutions come up short compared to an easy-to-use method in powertools that lets me add multiple dimension set to the same metric value.

Alternative Benefits Weaknesses
Publish metrics using aws-embedded-metrics-python Supports multi-dimension set metrics Limited to EMF logging; adds unnecessary dependencies like async which is not required in Lambda.
Create multiple instances of EphemeralMetrics for each dimensionset Enables aggregated metric logging Creates excess EMF logs, error prone and repetitive
Create custom metric without using dimension sets Simple to implement Lacks granularity and flexibility; requires manual aggregation of metrics and fails to provide the same level of dynamic insight across multiple dimensions.
Use CloudWatch Metric Math Allows mathematical operations across metrics Complex to set up, especially for a large number of dimensions. As the number of dimensions increases, maintaining and scaling Metric Math queries becomes harder
Use CloudWatch SEARCH functionality Enables querying and analyzing metrics data Requires advanced querying skills, which may not be familiar to all users.Additionally, this approach may require additional resources for query optimization, performance tuning, and potentially more API calls to fetch and process large volumes of data.

Acknowledgment

@north-star-saj north-star-saj added feature-request feature request triage Pending triage from maintainers labels Mar 2, 2025
Copy link

boring-cyborg bot commented Mar 2, 2025

Thanks for opening your first issue here! We'll come back to you as soon as we can.
In the meantime, check out the #python channel on our Powertools for AWS Lambda Discord: Invite link

@leandrodamascena
Copy link
Contributor

Hey @north-star-saj! Thanks so much for opening this issue here! I'm working on finishing up a few items for the release we have on Friday, but I'll give you some feedback here by the end of the week.

@leandrodamascena leandrodamascena added metrics and removed triage Pending triage from maintainers labels Mar 4, 2025
@leandrodamascena leandrodamascena moved this from Triage to Backlog in Powertools for AWS Lambda (Python) Mar 4, 2025
@leandrodamascena leandrodamascena self-assigned this Mar 4, 2025
@leandrodamascena leandrodamascena added the revisit Maintainer to provide update or revisit prioritization in the next cycle label Mar 4, 2025
@north-star-saj
Copy link
Author

Hey @leandrodamascena , just checking in -- is there anything I can clear up in the feature request? I'd be open to providing a proof of concept / creating a PR once we align on the general idea if that helps.

@leandrodamascena
Copy link
Contributor

Hey @leandrodamascena , just checking in -- is there anything I can clear up in the feature request? I'd be open to providing a proof of concept / creating a PR once we align on the general idea if that helps.

Hey hey @north-star-saj! I had some internal stuff to sort out, but I promise I'll give you feedback here tomorrow.

@leandrodamascena
Copy link
Contributor

Hi @north-star-saj! Thank you so much for the detailed explanation you gave me during the meeting we had!

Yes, I think it makes sense to add support for adding multiple dimension sets for the same metric, especially since you can get better visualization in CloudWatch metrics without doing any trick with MATH or SEARCH. Also, DimensionSet in EMF Specification is an array of arrays, which means EMF expects customers to create multiple dimension sets. Some other popular EMF libraries, such as aws-embedded-metrics-python, also support this.

I think the experience could look like this:

from aws_lambda_powertools import Metrics
from aws_lambda_powertools.metrics import MetricUnit

metrics = Metrics(namespace="A")

metrics.add_dimension(name="dimension1", value="1") # This continue working AS IS

metrics.add_dimension_set({"dimension1": "1", "dimension2": "2"}) # This is a new method to add a new array to the Dimensions
metrics.add_dimension_set({"dimension1": "1", "dimension2": "2", "dimension3": "3"}) # Same here
metrics.add_metric(name="mymmmm", unit=MetricUnit.Bytes, value=1 )

metrics.flush_metrics()

And this will produce the following output. Note that dimension1 will be the only value in the first array because it was added by the add_dimension method. Also, add_dimension will not add dimensions to any other array than the first one, which means that add_dimension is intended to be the first dimensions you will have if you use this method.

{
   "_aws":{
      "Timestamp":1742404106448,
      "CloudWatchMetrics":[
         {
            "Namespace":"A",
            "Dimensions":[
               [
                  "dimension1"
               ],
               [
                  "dimension1",
                  "dimension2"
               ],
               [
                  "dimension1",
                  "dimension2",
                  "dimension3"
               ]
            ],
            "Metrics":[
               {
                  "Name":"mymmmm",
                  "Unit":"Bytes"
               }
            ]
         }
      ]
   },
   "dimension1":"1",
   "dimension2":"2",
   "dimension3":"3",
   "mymmmm":[
      1.0
   ]
}

We need to be aware that this may increase the costs for the customer in CloudWatch who uses this feature. But this is something we can make well documented if we go ahead with this implementation.

I would like to hear the opinion of other maintainers. Taggging @aws-powertools/lambda-typescript-core @aws-powertools/lambda-java-core and @aws-powertools/lambda-dotnet-core.

@dreamorosi
Copy link
Contributor

Thanks for flagging this, I'll take a look and share my feedback before end of the week.

@leandrodamascena leandrodamascena removed the revisit Maintainer to provide update or revisit prioritization in the next cycle label Mar 20, 2025
@dreamorosi
Copy link
Contributor

Thanks for bringing this up and for the detailed request, including pros/cons and alternative. Well done.

I have a few questions I'd like to understand before moving forward, some of these might be obviously a lack of knowledge of the EMF spec on my part, but I'd still appreciate clarifications. To ease the discussion, I'm gonna number each point but the order is just random.

Point 1

First, can we expound/clarify this statement? Which costs are we talking here: ingestion, storage, or?

We need to be aware that this may increase the costs for the customer in CloudWatch who uses this feature.

Point 2

Next, I kind of understand the code example here but I don't understand the one in the original post. Specifically, how does this:

# logic which would produce the example EMF log from the ticket
@metrics.log_metrics
def lambda_handler(event: dict, context: LambdaContext):
    metrics.add_dimension_set({"environment": STAGE})
    metrics.add_dimension_set({"environment:" STAGE, "region": REGION})
    metrics.add_metric(name="SuccessfulRequests", unit=MetricUnit.Count, value=20)
    metrics.add_metric(name="FailedRequests", unit=MetricUnit.Count, value=0)
    metrics.add_metric(name="RetryCount", unit=MetricUnit.Count, value=1)

can yield this:

{
    "_aws": {
        "Timestamp": 946684800000,
        "CloudWatchMetrics": [
            {
                "Namespace": "MyApplication",
                "Dimensions": [["Environment", "Region"]],
                "Metrics": [
                    {"Name": "SuccessfulRequests", "Unit": "Count"},
                    {"Name": "FailedRequests", "Unit": "Count"},
                    {"Name": "RetryCount", "Unit": "Count"}
                ]
            },
            {
                "Namespace": "MyApplication",
                "Dimensions": [["Environment"]],
                "Metrics": [
                    {"Name": "SuccessfulRequests", "Unit": "Count"},
                    {"Name": "FailedRequests", "Unit": "Count"},
                    {"Name": "RetryCount", "Unit": "Count"}
                ]
            },
            {
                "Namespace": "MyApplication",
                "Dimensions": [["Region"]],
                "Metrics": [
                    {"Name": "SuccessfulRequests", "Unit": "Count"},
                    {"Name": "FailedRequests", "Unit": "Count"},
                    {"Name": "RetryCount", "Unit": "Count"}
                ]
            }
        ]
    },
    "Environment": "Production",
    "Region": "us-west-2",
    "SuccessfulRequests": 20,
    "FailedRequests": 0,
    "RetryCount": 1
}

specifically, where is this one coming from?

{
    "Namespace": "MyApplication",
    "Dimensions": [["Region"]],
    "Metrics": [
        {"Name": "SuccessfulRequests", "Unit": "Count"},
        {"Name": "FailedRequests", "Unit": "Count"},
        {"Name": "RetryCount", "Unit": "Count"}
    ]
}

Point 3

Regarding this question:

Does add_dimension add the dimension to all dimension sets? How does it work when invoked before or after the add_dimension_set method?

I agree with @leandrodamascena comment above that add_dimension only adds said dimension to its own array and not others, but just to be extra clear, to say it in other words is it correct to say that: "add_dimension always creates a single item dimension array, no matter when it's called"?

Basically this:

metrics.add_dimension(name="dimension1", value="1") # This continue working AS IS

metrics.add_dimension_set({"dimension1": "1", "dimension2": "2"}) # This is a new method to add a new array to the Dimensions
metrics.add_metric(name="mymmmm", unit=MetricUnit.Bytes, value=1 )

and this:

metrics.add_dimension_set({"dimension1": "1", "dimension2": "2"}) # This is a new method to add a new array to the Dimensions
metrics.add_dimension(name="dimension1", value="1") # This continue working AS IS
metrics.add_metric(name="mymmmm", unit=MetricUnit.Bytes, value=1 )

are functionally equivalent in the sense that they'll both yield two arrays where the only difference is the order

  • [["dimension1"], ["dimension1", "dimension2"]]
  • [["dimension1", "dimension2"],["dimension1"]]

Point 4

This point for me it's still unanswered, especially the second question - and to be honest I don't have a mental model to answer:

Does add_dimension_set handle duplicate dimensions? What about duplicate dimension keys, but differing values?

In all the examples above we showed adding dimensions with the same value, what happens if I do this?

metrics.add_dimension_set({"dimension1": "3", "dimension2": "2"}) # dimension1 has value 3 here
metrics.add_dimension(name="dimension1", value="1") # dimension1 has value 1 here
metrics.add_metric(name="mymmmm", unit=MetricUnit.Bytes, value=1 )

How do we represent this in EMF? Does the last dimension to be added overrides the previous one? or?


Thanks

@leandrodamascena
Copy link
Contributor

Thank you so much for responding so quickly @dreamorosi! I'll wait for @north-star-saj to share his thoughts as well - he said he will this week - and I'll respond to both of them because they'll probably have some overlap.

@dreamorosi
Copy link
Contributor

I'd still would like @hjgraca and @phipag share their thoughts on the proposal sooner rather than later, to avoid many back and forth. I think it's better we all put forward all the questions/doubts/feedback we have, so we can batch the clarifications and avoid extending this into weeks.

@dreamorosi
Copy link
Contributor

I also forgot two more points so I guess:

Point 5

How does this new set of dimensions interact with default dimensions? If a metric instance has some default dimensions, do they get added to each one of the arrays in the array?

Point 6

This is more of a positioning / documentation comment - but I think we'll need to do a very good job at conveying to customers when to use this, vs when to use the ephemeral metrics, vs when to just use regular single-dimension metrics.

I have the sense that this is the kind of feature that is very intuitive if you have an advanced understanding of how metrics work in CloudWatch, but it might not be immediately clear to the average customer what to use when "I just want some metrics with dimensions".

@sthulb
Copy link
Contributor

sthulb commented Mar 20, 2025

Overall, I agree with the idea. Allowing dimension sets would indeed make the metrics lib more useful for customers. I think we need to be clear in docs that this will create an elevated cost.

@dreamorosi In point 2, I assume there was an omission of the dimension set.

Without reading the EMF spec to verify, is there a significant difference between:

1:

        "CloudWatchMetrics": [
            {
                "Namespace": "MyApplication",
                "Dimensions": [["Environment", "Region"]],
                "Metrics": [
                    {"Name": "SuccessfulRequests", "Unit": "Count"},
                    {"Name": "FailedRequests", "Unit": "Count"},
                    {"Name": "RetryCount", "Unit": "Count"}
                ]
            },
            {
                "Namespace": "MyApplication",
                "Dimensions": [["Environment"]],
                "Metrics": [
                    {"Name": "SuccessfulRequests", "Unit": "Count"},
                    {"Name": "FailedRequests", "Unit": "Count"},
                    {"Name": "RetryCount", "Unit": "Count"}
                ]
            },
            {
                "Namespace": "MyApplication",
                "Dimensions": [["Region"]],
                "Metrics": [
                    {"Name": "SuccessfulRequests", "Unit": "Count"},
                    {"Name": "FailedRequests", "Unit": "Count"},
                    {"Name": "RetryCount", "Unit": "Count"}
                ]
            }
        ]
    },

2:

      "CloudWatchMetrics":[
         {
            "Namespace":"A",
            "Dimensions":[
               [
                  "dimension1"
               ],
               [
                  "dimension1",
                  "dimension2"
               ],
               [
                  "dimension1",
                  "dimension2",
                  "dimension3"
               ]
            ],
            "Metrics":[
               {
                  "Name":"mymmmm",
                  "Unit":"Bytes"
               }
            ]
         }
      ]
   },

From a debugging point of view option 1 seems easier to read though

@phipag
Copy link

phipag commented Mar 20, 2025

I just reviewed the proposed feature and realized that the Java runtime already supports dimension sets. In fact, the only way to add a new dimension is by using dimension sets. Adding a single dimension would simply be a dimension set of length 1. Please correct me if I am missing a point of this feature request.

The reason for this behavior is that Java uses the official EMF library (https://github.com/awslabs/aws-embedded-metrics-java) under the hood and exposes the default EMF metrics logger.

Here is an example that I just tested and I believe it satisfies the feature requested by @north-star-saj:

public class App implements RequestHandler<Object, String> {
    MetricsLogger metricsLogger = MetricsUtils.metricsLogger();

    @Metrics(namespace = "HelloWorldFunction", service = "Powertools")
    public String handleRequest(final Object input, final Context context) {
        var startTime = System.currentTimeMillis();
        var endTime = System.currentTimeMillis();
        metricsLogger.putDimensions(DimensionSet.of("environment", "prod", "region", "us-west-2"));
        metricsLogger.putDimensions(DimensionSet.of("environment", "prod"));
        metricsLogger.putDimensions(DimensionSet.of("region", "us-west-2"));
        metricsLogger.putMetric("ExecutionTime", (double) endTime - startTime, Unit.MILLISECONDS);
        return "OK";
    }
}

This gives me the following output (ignore default service dimension).

{
  "_aws": {
    "Timestamp": 1742467748165,
    "CloudWatchMetrics": [
      {
        "Namespace": "HelloWorldFunction",
        "Metrics": [
          {
            "Name": "ExecutionTime",
            "Unit": "Milliseconds"
          }
        ],
        "Dimensions": [
          [
            "Service"
          ],
          [
            "environment",
            "region"
          ],
          [
            "environment"
          ],
          [
            "region"
          ]
        ]
      }
    ]
  },
  "function_request_id": "c0e5afc1-c033-4aa6-9b13-26bfc324874b",
  "ExecutionTime": 0.0,
  "environment": "prod",
  "functionVersion": "$LATEST",
  "Service": "Powertools",
  "logStreamId": "$LATEST",
  "region": "us-west-2",
  "executionEnvironment": "AWS_Lambda_java17"
}

@dreamorosi regarding Point 4: The Java runtime overwrites the value to the last one that was set. For example:

        metricsLogger.putDimensions(DimensionSet.of("environment", "prod", "region", "us-west-2"));
        metricsLogger.putDimensions(DimensionSet.of("environment", "prod"));
        metricsLogger.putDimensions(DimensionSet.of("region", "us-west-3"));

yields

{
  "environment": "prod",
  "region": "us-west-3",
}

and

        metricsLogger.putDimensions(DimensionSet.of("region", "us-west-3"));
        metricsLogger.putDimensions(DimensionSet.of("environment", "prod", "region", "us-west-2"));
        metricsLogger.putDimensions(DimensionSet.of("environment", "prod"));

yields

{
  "environment": "prod",
  "region": "us-west-2",
}

It always overwrites the value of a dimension with the latest one defined.

@hjgraca
Copy link
Contributor

hjgraca commented Mar 20, 2025

Overall I agree with all that has been discussed.

  1. add_dimension will add a single dimension to the position to when it was called and will not update other dimensions. This is to keep the current behaviour as is.
    The new functionality will only be added to the new dimension_set. This leads to my question on naming, can this be just add_dimensions and maybe in the future add a set_dimensions that can add a list of dictionaries and in one go you can add all the dimensions?
  2. Overriding values, based on the code link provided I will say the same, override the latest set value for the same dimension key # Duplicate dimension sets are removed before being added to the end of the collection.
    # This ensures only latest dimension value is used as a target member on the root EMF node.
    # This operation is O(n^2), but acceptable given sets are capped at 30 dimensions
  3. Same as the point @dreamorosi raised, how do we deal with default dimensions? this will be included in all sets, so we need to refresh all dimensions if we add default dimensions later

@dreamorosi
Copy link
Contributor

This is interesting, thank you both for commenting, and everyone for clarifying some of the questions.

@phipag makes a good point about a single metric dimension being a set of one - should we consider marking the add_dimension (singular) as deprecated and keep only the other one (plural) which can be more versatile? @leandrodamascena thoughts?

@north-star-saj
Copy link
Author

north-star-saj commented Mar 21, 2025

Hey all, thanks for taking the time to consider this request! I added my input where I could. It seems most of the talk is around ergonomics at this point. Please let me know if I need to clarify or contribute to anything.


In point 2, I assume there was an omission of the dimension set.

@dreamorosi and @sthulb, I've fixed errors in my original EMF and code snippit. I wrote the example by hand when obscuring my real implementation -- sorry for the confusion! It should be correct now, but it's still untested. @leandrodamascena did a better job than me at demonstrating my original request.

Point 1: First, can we expound/clarify this statement? Which costs are we talking here: ingestion, storage, or?

It's likely a metric related cost risk, given the metrics cost model and that each unique dimensionset is a new metric (e.g. 3 dimensionsets * 2 metric values = 6 metrics). I'd assume ingestion and storage costs would be marginal in most cases as the multi-dimension set EMF log is compact.

The reason for this behavior is that Java uses the official EMF library (https://github.com/awslabs/aws-embedded-metrics-java) under the hood and exposes the default EMF metrics logger.

@phipag, my team uses the equivalent aws-embedded-metrics-python put_dimensions method. I didn't realize until now that the put_dimensions removes duplicate dimensions as @hjgraca said.

fwiw, while removing duplicates is not what I originally asked for, evidently it's the behavior I'd prefer. My use case is focused around aggregate metrics which removing duplicate dimensions helps achieve.

@dreamorosi
Copy link
Contributor

Thank you @north-star-saj and everyone else for clarifying several points.

Just to recap where we're at with the issue, here's my understanding:

What has been settled/agreed upon

  • We are in favor of adding support for this use case because: 1/ the EMF spec already supports multiple dimension sets, and 2/ the Java runtime already supports this
  • The experience would look something like this (pending agreement on the method name):
metrics = Metrics(namespace="A")

# Existing functionality
metrics.add_dimension(name="dimension1", value="1")

# New functionality
metrics.add_dimension_set({"dimension1": "1", "dimension2": "2"})
metrics.add_dimension_set({"dimension1": "1", "dimension2": "2", "dimension3": "3"})
metrics.add_metric(name="mymmmm", unit=MetricUnit.Bytes, value=1)

metrics.flush_metrics()
Click to see EMF output
{
   "_aws":{
      "CloudWatchMetrics":[
         {
            "Namespace":"A",
            "Dimensions":[
               ["dimension1"],
               ["dimension1", "dimension2"],
               ["dimension1", "dimension2", "dimension3"]
            ],
            "Metrics":[
               {
                  "Name":"mymmmm",
                  "Unit":"Bytes"
               }
            ]
         }
      ]
   },
   "dimension1":"1",
   "dimension2":"2",
   "dimension3":"3",
   "mymmmm":[1.0]
}
  • Duplicate dimension keys will be handled by overwriting the value with the latest value provided
  • Before launching the feature we'll need to clarify in our docs: 1/ when to use this feature vs others, 2/ cost implications of using this

Outstanding items/points

  1. Default dimensions how do default dimensions interact with dimensions set?
  • Should default dimensions be added to all future dimension set?
  • Should default dimensions be added retroactively to existing, but not flushed, dimension sets if added at runtime?
  1. Should we mark add_dimension() (singular) as deprecated in favor of the new method?
  2. How do we want to name the new method?
  • add_dimensions() (plural) is consistent with existing one but inconsistent with Java
  • add_dimension_set() consistent with Java

My answers to the outstanding points:

  1. I view default dimensions as something set before a request happens (i.e. constructor or method in the init phase) or at the beginning of a request (i.e. multi tenant app). Because of this, I would only add default dimensions to new sets added after the default dimensions were set, and not retroactively
metrics = Metrics()
metrics.set_default_dimensions(environment=STAGE, another="one")

metrics.add_dimension_set({"dimension1": "1", "dimension2": "2"}) # this includes environment and another
# for some reason I want to add more default dimensions
metrics.set_default_dimensions(tenant_id="1") # this does not set tenant_id retroactively into the previous set
metrics.add_dimension_set({"foo": "1", "bar": "2"}) # this includes environment, another, and tenant_id
  1. I'm in favor of keeping the method to avoid breaking changes for customers, and instead simplify the internal implementation to possibly call the future add_dimensions() (plural) under the hood.
  2. I woud name it add_dimensions() (plural) to be consistent with the current naming, and also because the word "set" here refers to how it's represented in EMF rather than what you're actually passing to the method - which imo is exposing implementation details and useless

@phipag
Copy link

phipag commented Mar 21, 2025

I agree with your answers to the outstanding items @dreamorosi. One comment regarding Java consistency. The method is named putDimensions in Java (only the data class is named DimensionSet).

Therefore, I am also in favor of calling it add_dimensions.

@leandrodamascena
Copy link
Contributor

Thank you all for your feedback and ideas. My answers to questions that still make sense to answer because you guys have covered almost everything already.

I view default dimensions as something set before a request happens (i.e. constructor or method in the init phase) or at the beginning of a request (i.e. multi tenant app). Because of this, I would only add default dimensions to new sets added after the default dimensions were set, and not retroactively

I agree with that.

Duplicate dimension keys will be handled by overwriting the value with the latest value provided

Just to make sure I understood this correctly, we are proposing this experience:

metrics.add_dimension_set({"dimension1": "1", "dimension2": "2"}) # dimension1 = 1
metrics.add_dimension_set({"dimension1": "1", "dimension2": "2", "dimension3": "3"}) # dimension1 = 1
metrics.add_dimension_set({"dimension1": "2", "dimension2": "2", "dimension3": "3"}) # dimension1 = 2
metrics.add_dimension(name="dimension1", value="3") # dimension1 = 3

So dimension1 will end up with value 3. But in this case, should we warn that an existing dimension already exists with a previous value or not? Warning means we need to go through all dimension + dimension_set arrays to notify this. I don't know the computation costs of doing this for multiple metrics. I'm happy to not warn and just highlight this in the documentation.

Before launching the feature we'll need to clarify in our docs: 1/ when to use this feature vs others, 2/ cost implications of using this

+1

I view default dimensions as something set before a request happens (i.e. constructor or method in the init phase) or at the beginning of a request (i.e. multi tenant app). Because of this, I would only add default dimensions to new sets added after the default dimensions were set, and not retroactively

+1

I'm in favor of keeping the method to avoid breaking changes for customers, and instead simplify the internal implementation to possibly call the future add_dimensions() (plural) under the hood.

+1

I woud name it add_dimensions() (plural) to be consistent with the current naming, and also because the word "set" here refers to how it's represented in EMF rather than what you're actually passing to the method - which imo is exposing implementation details and useless

I'm on the fence here. When the IDE autocompletes with suggestions, customers may have trouble understanding what the singular and plural methods do and what they need. I'd stick with add_dimension_set, but I'm happy to go with the majority and adopt add_dimensions if that's the case.

Please let me know your final thoughts before we agree and implement them.

@dreamorosi
Copy link
Contributor

So dimension1 will end up with value 3. But in this case, should we warn that an existing dimension already exists with a previous value or not? Warning means we need to go through all dimension + dimension_set arrays to notify this. I don't know the computation costs of doing this for multiple metrics. I'm happy to not warn and just highlight this in the documentation.

Yes, it ends with value 3.

I agree with calling it out in the docs but not warn, for example we don't warn when metadata or log attributes are overwritten.

I'm on the fence here. When the IDE autocompletes with suggestions, customers may have trouble understanding what the singular and plural methods do and what they need. I'd stick with add_dimension_set, but I'm happy to go with the majority and adopt add_dimensions if that's the case.

I think the parameter types are different enough to show the difference, at least in TypeScript, but I can understand. I don't have a strong preference even though I am inclined to keep add_dimensions.


Should we move the issue to the backlog and create a copy in the TS & .NET repos?

@leandrodamascena
Copy link
Contributor

I agree with calling it out in the docs but not warn, for example we don't warn when metadata or log attributes are overwritten.

OK

I think the parameter types are different enough to show the difference, at least in TypeScript, but I can understand. I don't have a strong preference even though I am inclined to keep add_dimensions.

In Python, it will show differences in types, but you know, Python... Let's keep add_dimensions and do a good job of documenting it.

Should we move the issue to the backlog and create a copy in the TS & .NET repos?

Sure. Do you create in TS? @hjgraca do you do the same in .NET? Should I do both?

I'm moving it to the backlog and expect to work in the next week.

@dreamorosi
Copy link
Contributor

Yes, I'll create the one in TS next week.

I'll also take some time to flash out the integration and items to do based on the conversation. I'll initially open it up for the community, and if nobody picks it up we'll work on it in 2 iterations.

@dreamorosi
Copy link
Contributor

Upon further inspection, in TS we already have an addDimensions() method but after having this discussion I now think it's very broken and we should treat this as a bug in TS.

Specifically, when calling the method instead of creating a new set, we just add the dimensions to the existing (and only) set and that's all.

I have opened an issue aws-powertools/powertools-lambda-typescript#3777 to track the item, and we'll work on it in Q2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Backlog
Development

No branches or pull requests

6 participants