-
Notifications
You must be signed in to change notification settings - Fork 417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Add multiple dimensets to the same Metrics instance #6198
Comments
Thanks for opening your first issue here! We'll come back to you as soon as we can. |
Hey @north-star-saj! Thanks so much for opening this issue here! I'm working on finishing up a few items for the release we have on Friday, but I'll give you some feedback here by the end of the week. |
Hey @leandrodamascena , just checking in -- is there anything I can clear up in the feature request? I'd be open to providing a proof of concept / creating a PR once we align on the general idea if that helps. |
Hey hey @north-star-saj! I had some internal stuff to sort out, but I promise I'll give you feedback here tomorrow. |
Hi @north-star-saj! Thank you so much for the detailed explanation you gave me during the meeting we had! Yes, I think it makes sense to add support for adding multiple dimension sets for the same metric, especially since you can get better visualization in CloudWatch metrics without doing any trick with MATH or SEARCH. Also, I think the experience could look like this: from aws_lambda_powertools import Metrics
from aws_lambda_powertools.metrics import MetricUnit
metrics = Metrics(namespace="A")
metrics.add_dimension(name="dimension1", value="1") # This continue working AS IS
metrics.add_dimension_set({"dimension1": "1", "dimension2": "2"}) # This is a new method to add a new array to the Dimensions
metrics.add_dimension_set({"dimension1": "1", "dimension2": "2", "dimension3": "3"}) # Same here
metrics.add_metric(name="mymmmm", unit=MetricUnit.Bytes, value=1 )
metrics.flush_metrics() And this will produce the following output. Note that {
"_aws":{
"Timestamp":1742404106448,
"CloudWatchMetrics":[
{
"Namespace":"A",
"Dimensions":[
[
"dimension1"
],
[
"dimension1",
"dimension2"
],
[
"dimension1",
"dimension2",
"dimension3"
]
],
"Metrics":[
{
"Name":"mymmmm",
"Unit":"Bytes"
}
]
}
]
},
"dimension1":"1",
"dimension2":"2",
"dimension3":"3",
"mymmmm":[
1.0
]
} We need to be aware that this may increase the costs for the customer in CloudWatch who uses this feature. But this is something we can make well documented if we go ahead with this implementation. I would like to hear the opinion of other maintainers. Taggging @aws-powertools/lambda-typescript-core @aws-powertools/lambda-java-core and @aws-powertools/lambda-dotnet-core. |
Thanks for flagging this, I'll take a look and share my feedback before end of the week. |
Thanks for bringing this up and for the detailed request, including pros/cons and alternative. Well done. I have a few questions I'd like to understand before moving forward, some of these might be obviously a lack of knowledge of the EMF spec on my part, but I'd still appreciate clarifications. To ease the discussion, I'm gonna number each point but the order is just random. Point 1 First, can we expound/clarify this statement? Which costs are we talking here: ingestion, storage, or?
Point 2 Next, I kind of understand the code example here but I don't understand the one in the original post. Specifically, how does this: # logic which would produce the example EMF log from the ticket
@metrics.log_metrics
def lambda_handler(event: dict, context: LambdaContext):
metrics.add_dimension_set({"environment": STAGE})
metrics.add_dimension_set({"environment:" STAGE, "region": REGION})
metrics.add_metric(name="SuccessfulRequests", unit=MetricUnit.Count, value=20)
metrics.add_metric(name="FailedRequests", unit=MetricUnit.Count, value=0)
metrics.add_metric(name="RetryCount", unit=MetricUnit.Count, value=1) can yield this: {
"_aws": {
"Timestamp": 946684800000,
"CloudWatchMetrics": [
{
"Namespace": "MyApplication",
"Dimensions": [["Environment", "Region"]],
"Metrics": [
{"Name": "SuccessfulRequests", "Unit": "Count"},
{"Name": "FailedRequests", "Unit": "Count"},
{"Name": "RetryCount", "Unit": "Count"}
]
},
{
"Namespace": "MyApplication",
"Dimensions": [["Environment"]],
"Metrics": [
{"Name": "SuccessfulRequests", "Unit": "Count"},
{"Name": "FailedRequests", "Unit": "Count"},
{"Name": "RetryCount", "Unit": "Count"}
]
},
{
"Namespace": "MyApplication",
"Dimensions": [["Region"]],
"Metrics": [
{"Name": "SuccessfulRequests", "Unit": "Count"},
{"Name": "FailedRequests", "Unit": "Count"},
{"Name": "RetryCount", "Unit": "Count"}
]
}
]
},
"Environment": "Production",
"Region": "us-west-2",
"SuccessfulRequests": 20,
"FailedRequests": 0,
"RetryCount": 1
} specifically, where is this one coming from? {
"Namespace": "MyApplication",
"Dimensions": [["Region"]],
"Metrics": [
{"Name": "SuccessfulRequests", "Unit": "Count"},
{"Name": "FailedRequests", "Unit": "Count"},
{"Name": "RetryCount", "Unit": "Count"}
]
} Point 3 Regarding this question:
I agree with @leandrodamascena comment above that Basically this: metrics.add_dimension(name="dimension1", value="1") # This continue working AS IS
metrics.add_dimension_set({"dimension1": "1", "dimension2": "2"}) # This is a new method to add a new array to the Dimensions
metrics.add_metric(name="mymmmm", unit=MetricUnit.Bytes, value=1 ) and this: metrics.add_dimension_set({"dimension1": "1", "dimension2": "2"}) # This is a new method to add a new array to the Dimensions
metrics.add_dimension(name="dimension1", value="1") # This continue working AS IS
metrics.add_metric(name="mymmmm", unit=MetricUnit.Bytes, value=1 ) are functionally equivalent in the sense that they'll both yield two arrays where the only difference is the order
Point 4 This point for me it's still unanswered, especially the second question - and to be honest I don't have a mental model to answer:
In all the examples above we showed adding dimensions with the same value, what happens if I do this? metrics.add_dimension_set({"dimension1": "3", "dimension2": "2"}) # dimension1 has value 3 here
metrics.add_dimension(name="dimension1", value="1") # dimension1 has value 1 here
metrics.add_metric(name="mymmmm", unit=MetricUnit.Bytes, value=1 ) How do we represent this in EMF? Does the last dimension to be added overrides the previous one? or? Thanks |
Thank you so much for responding so quickly @dreamorosi! I'll wait for @north-star-saj to share his thoughts as well - he said he will this week - and I'll respond to both of them because they'll probably have some overlap. |
I also forgot two more points so I guess: Point 5 How does this new set of dimensions interact with default dimensions? If a metric instance has some default dimensions, do they get added to each one of the arrays in the array? Point 6 This is more of a positioning / documentation comment - but I think we'll need to do a very good job at conveying to customers when to use this, vs when to use the ephemeral metrics, vs when to just use regular single-dimension metrics. I have the sense that this is the kind of feature that is very intuitive if you have an advanced understanding of how metrics work in CloudWatch, but it might not be immediately clear to the average customer what to use when "I just want some metrics with dimensions". |
Overall, I agree with the idea. Allowing dimension sets would indeed make the metrics lib more useful for customers. I think we need to be clear in docs that this will create an elevated cost. @dreamorosi In point 2, I assume there was an omission of the dimension set. Without reading the EMF spec to verify, is there a significant difference between: 1: "CloudWatchMetrics": [
{
"Namespace": "MyApplication",
"Dimensions": [["Environment", "Region"]],
"Metrics": [
{"Name": "SuccessfulRequests", "Unit": "Count"},
{"Name": "FailedRequests", "Unit": "Count"},
{"Name": "RetryCount", "Unit": "Count"}
]
},
{
"Namespace": "MyApplication",
"Dimensions": [["Environment"]],
"Metrics": [
{"Name": "SuccessfulRequests", "Unit": "Count"},
{"Name": "FailedRequests", "Unit": "Count"},
{"Name": "RetryCount", "Unit": "Count"}
]
},
{
"Namespace": "MyApplication",
"Dimensions": [["Region"]],
"Metrics": [
{"Name": "SuccessfulRequests", "Unit": "Count"},
{"Name": "FailedRequests", "Unit": "Count"},
{"Name": "RetryCount", "Unit": "Count"}
]
}
]
}, 2: "CloudWatchMetrics":[
{
"Namespace":"A",
"Dimensions":[
[
"dimension1"
],
[
"dimension1",
"dimension2"
],
[
"dimension1",
"dimension2",
"dimension3"
]
],
"Metrics":[
{
"Name":"mymmmm",
"Unit":"Bytes"
}
]
}
]
}, From a debugging point of view option 1 seems easier to read though |
I just reviewed the proposed feature and realized that the Java runtime already supports dimension sets. In fact, the only way to add a new dimension is by using dimension sets. Adding a single dimension would simply be a dimension set of length 1. Please correct me if I am missing a point of this feature request. The reason for this behavior is that Java uses the official EMF library (https://github.com/awslabs/aws-embedded-metrics-java) under the hood and exposes the default EMF metrics logger. Here is an example that I just tested and I believe it satisfies the feature requested by @north-star-saj: public class App implements RequestHandler<Object, String> {
MetricsLogger metricsLogger = MetricsUtils.metricsLogger();
@Metrics(namespace = "HelloWorldFunction", service = "Powertools")
public String handleRequest(final Object input, final Context context) {
var startTime = System.currentTimeMillis();
var endTime = System.currentTimeMillis();
metricsLogger.putDimensions(DimensionSet.of("environment", "prod", "region", "us-west-2"));
metricsLogger.putDimensions(DimensionSet.of("environment", "prod"));
metricsLogger.putDimensions(DimensionSet.of("region", "us-west-2"));
metricsLogger.putMetric("ExecutionTime", (double) endTime - startTime, Unit.MILLISECONDS);
return "OK";
}
} This gives me the following output (ignore default service dimension). {
"_aws": {
"Timestamp": 1742467748165,
"CloudWatchMetrics": [
{
"Namespace": "HelloWorldFunction",
"Metrics": [
{
"Name": "ExecutionTime",
"Unit": "Milliseconds"
}
],
"Dimensions": [
[
"Service"
],
[
"environment",
"region"
],
[
"environment"
],
[
"region"
]
]
}
]
},
"function_request_id": "c0e5afc1-c033-4aa6-9b13-26bfc324874b",
"ExecutionTime": 0.0,
"environment": "prod",
"functionVersion": "$LATEST",
"Service": "Powertools",
"logStreamId": "$LATEST",
"region": "us-west-2",
"executionEnvironment": "AWS_Lambda_java17"
} @dreamorosi regarding Point 4: The Java runtime overwrites the value to the last one that was set. For example: metricsLogger.putDimensions(DimensionSet.of("environment", "prod", "region", "us-west-2"));
metricsLogger.putDimensions(DimensionSet.of("environment", "prod"));
metricsLogger.putDimensions(DimensionSet.of("region", "us-west-3")); yields {
"environment": "prod",
"region": "us-west-3",
} and metricsLogger.putDimensions(DimensionSet.of("region", "us-west-3"));
metricsLogger.putDimensions(DimensionSet.of("environment", "prod", "region", "us-west-2"));
metricsLogger.putDimensions(DimensionSet.of("environment", "prod")); yields {
"environment": "prod",
"region": "us-west-2",
} It always overwrites the value of a dimension with the latest one defined. |
Overall I agree with all that has been discussed.
|
This is interesting, thank you both for commenting, and everyone for clarifying some of the questions. @phipag makes a good point about a single metric dimension being a set of one - should we consider marking the |
Hey all, thanks for taking the time to consider this request! I added my input where I could. It seems most of the talk is around ergonomics at this point. Please let me know if I need to clarify or contribute to anything.
@dreamorosi and @sthulb, I've fixed errors in my original EMF and code snippit. I wrote the example by hand when obscuring my real implementation -- sorry for the confusion! It should be correct now, but it's still untested. @leandrodamascena did a better job than me at demonstrating my original request.
It's likely a metric related cost risk, given the metrics cost model and that each unique dimensionset is a new metric (e.g. 3 dimensionsets * 2 metric values = 6 metrics). I'd assume ingestion and storage costs would be marginal in most cases as the multi-dimension set EMF log is compact.
@phipag, my team uses the equivalent aws-embedded-metrics-python fwiw, while removing duplicates is not what I originally asked for, evidently it's the behavior I'd prefer. My use case is focused around aggregate metrics which removing duplicate dimensions helps achieve. |
Thank you @north-star-saj and everyone else for clarifying several points. Just to recap where we're at with the issue, here's my understanding: What has been settled/agreed upon
metrics = Metrics(namespace="A")
# Existing functionality
metrics.add_dimension(name="dimension1", value="1")
# New functionality
metrics.add_dimension_set({"dimension1": "1", "dimension2": "2"})
metrics.add_dimension_set({"dimension1": "1", "dimension2": "2", "dimension3": "3"})
metrics.add_metric(name="mymmmm", unit=MetricUnit.Bytes, value=1)
metrics.flush_metrics() Click to see EMF output{
"_aws":{
"CloudWatchMetrics":[
{
"Namespace":"A",
"Dimensions":[
["dimension1"],
["dimension1", "dimension2"],
["dimension1", "dimension2", "dimension3"]
],
"Metrics":[
{
"Name":"mymmmm",
"Unit":"Bytes"
}
]
}
]
},
"dimension1":"1",
"dimension2":"2",
"dimension3":"3",
"mymmmm":[1.0]
}
Outstanding items/points
My answers to the outstanding points:
metrics = Metrics()
metrics.set_default_dimensions(environment=STAGE, another="one")
metrics.add_dimension_set({"dimension1": "1", "dimension2": "2"}) # this includes environment and another
# for some reason I want to add more default dimensions
metrics.set_default_dimensions(tenant_id="1") # this does not set tenant_id retroactively into the previous set
metrics.add_dimension_set({"foo": "1", "bar": "2"}) # this includes environment, another, and tenant_id
|
I agree with your answers to the outstanding items @dreamorosi. One comment regarding Java consistency. The method is named Therefore, I am also in favor of calling it |
Thank you all for your feedback and ideas. My answers to questions that still make sense to answer because you guys have covered almost everything already.
I agree with that.
Just to make sure I understood this correctly, we are proposing this experience: metrics.add_dimension_set({"dimension1": "1", "dimension2": "2"}) # dimension1 = 1
metrics.add_dimension_set({"dimension1": "1", "dimension2": "2", "dimension3": "3"}) # dimension1 = 1
metrics.add_dimension_set({"dimension1": "2", "dimension2": "2", "dimension3": "3"}) # dimension1 = 2
metrics.add_dimension(name="dimension1", value="3") # dimension1 = 3 So
+1
+1
+1
I'm on the fence here. When the IDE autocompletes with suggestions, customers may have trouble understanding what the singular and plural methods do and what they need. I'd stick with Please let me know your final thoughts before we agree and implement them. |
Yes, it ends with value 3. I agree with calling it out in the docs but not warn, for example we don't warn when metadata or log attributes are overwritten.
I think the parameter types are different enough to show the difference, at least in TypeScript, but I can understand. I don't have a strong preference even though I am inclined to keep Should we move the issue to the backlog and create a copy in the TS & .NET repos? |
OK
In Python, it will show differences in types, but you know, Python... Let's keep
Sure. Do you create in TS? @hjgraca do you do the same in .NET? Should I do both? I'm moving it to the backlog and expect to work in the next week. |
Yes, I'll create the one in TS next week. I'll also take some time to flash out the integration and items to do based on the conversation. I'll initially open it up for the community, and if nobody picks it up we'll work on it in 2 iterations. |
Upon further inspection, in TS we already have an Specifically, when calling the method instead of creating a new set, we just add the dimensions to the existing (and only) set and that's all. I have opened an issue aws-powertools/powertools-lambda-typescript#3777 to track the item, and we'll work on it in Q2. |
Edit history
03/20/2025: Updated EMF spec and code snippit for accuracy.
Use case
Overview
Enabling the AWS Powertools Python package to support multiple dimension sets for the same
Metrics
instance would significantly enhance the packages monitoring capabilities. This feature would allow users to gain more granular insights and comprehensive views of their applications by creating aggregating metrics across various dimensions.Reference Code
This feature request is akin to the
aws_embedded_metrics
(code link)put_dimensions
method which adds a dimension set to a commonMetricsContext
. The metrics then get serialized into Embedded Metric Format (EMF) with multiple dimension sets (code link).Example Use Case
This is a simplified example that demonstrates my use case. In my usecase, one of these dimensions's values is not known in advance. Instead, it is dynamically retrieved.
I am monitoring a lambda that gets deployed to two environments (beta and production) across three regions (
us-east-1
,us-west-1
, andeu-west-1
). My lambda produces application-specific metrics such asSuccessfulRequests
,FailedRequests
, andRetryCount
. By creating multiple dimension sets across these three dimensions, I can create aggregate metrics which enable a comprehensive view of my application.The generated EMF log may look something like this:
03/20/2025 edit: fixed incorrect emf spec with repetitive dimensions / keys
Benefits
Solution/User Experience
The following is an example uses a new
add_dimension_set
method defined in theAmazonCloudWatchEMFProvider
class.Edit on 03/20/2025: added missing
add_dimension_set
call to code snippitConsiderations:
From my point of view, these are a few considerations that will need to be accounted for as part of this feature request.
add_dimension
add the dimension to all dimension sets? How does it work when invoked before or after theadd_dimension_set
method?add_dimension_set
handle duplicate dimensions? What about duplicate dimension keys, but differing values?add_dimension_set
) ?Alternatives
I've considered the following alternatives. Each of these solutions come up short compared to an easy-to-use method in powertools that lets me add multiple dimension set to the same metric value.
aws-embedded-metrics-python
EphemeralMetrics
for each dimensionsetAcknowledgment
The text was updated successfully, but these errors were encountered: