Feature Request: Blacklist Host Tags #3130

ericlarssen-wf · 2019-03-06T16:19:27Z

It would be great if it was possible to strip off Host tags off of metrics. Tags such as what autoscaling group a metric is coming from is not very valuable and can clutter the tags for a particular metric. Being able to exclude tags based on a regex would enable to strip multiple at a time and allow to strip generated tags.

tonglil · 2021-02-06T00:13:48Z

To give an example, in a cattle-like environment, I don't need host, internal-hostname, instance-id, instance-template, or created-by tags on my metrics as these are highly automatically generated and cycled in the runtime environment.

Maybe I need 1 of host OR instance-id, but I'm usually not drilling down to that level, especially if I'm more concerned about high level metrics.

hermanbanken · 2021-04-01T18:37:04Z

We could really use this too. We have high cardinality of tags, but ALL of our metrics are annotated by the various hosts too, multiplying the whole thing enormously.

I believe all the instance-related tags are all added by the DataDog agent itself, right? Based on where the data is sourced from.

kolloch · 2021-06-23T14:07:54Z

👍 on a possibility to remove host tags!

I'd also appreaciate a possibility to remove the kube_replicaset tag.

Maybe one could have a generic blacklist in the agent?

kirecek · 2021-12-13T22:54:37Z

hmm, the issue was created in 2019, but I assume this exclude option is still not implemented, right? Or did you guys figure out some workaround by any chance?

danopia · 2021-12-13T23:21:47Z

I think Metrics Without Limits can work around the whole pricing aspect of this extra cardinality. But the actual remove-tags feature doesn't exist that i know of

knowshan · 2022-01-09T21:27:24Z

We would like to have this feature as well. This should be configurable through AWS integration configuration OR Datadog agent.

richid · 2022-04-11T14:44:18Z

I will say that I've tried overwriting these tags to a single dummy value in the datadog.yaml file:

tags:
- aws:ec2:fleet-id:dummy

But that dummy value just gets added to the list of tag values for that tag key.

andrew-kolesnikov · 2022-08-25T17:23:32Z

My team could really use this too.
Really sad to see this has been requested several years ago but appears to have been neither satisfied nor rejected.

alexb-img · 2022-10-18T11:16:44Z

We require tags to be filtered/removed at source(ingest) also. Metrics without limits only removes tags during indexing not ingest.

arloliu · 2022-10-18T11:35:29Z

Our company require this features too, manty unused tags cost is high

adudek · 2022-11-04T05:37:54Z

Same here - I would appreciate the possibility to create filter mask for tags. If tags cannot be removed - by principle monitoring should not affect infrastructure nor it should impact cost (tags play functional role in some scenarios).
Since tagging is money cow for Data Dog, I doubt someone will pick this up :(

patbl · 2022-11-05T15:50:17Z

I wrote a Ruby script that adds custom tag groups that exclude tags you don't want. You'll want to tailor it to your use case, or use it as inspiration. It takes about 8 hours to run against the Datadog account I work on, which has around 10k metrics. My company set up a Datadog monitor for custom-metrics usage, and we re-run this script whenever it alerts.

I agree with others that Datadog's tooling for managing tags on large numbers of metrics is poor. Having Terraform configuration for thousands of metrics isn't practical, and neither is manually configuring them through the web UI. All we're asking for is a blacklist, which seems a lot easier to implement than many other parts of Datadog's tooling, which I'm generally impressed by.

zekth · 2023-05-14T14:31:55Z

Following-up on this one. There was a question to not only support it for statsd but also for openmetrics. Also as i see we currently have the tag exclusion for EC2 tags. Do we want to make a generic pkg to make those exclude tags feature work the same by consuming string slices from the configuration?

I'm happy to implement this if i got a green flag from maintainer team.

tagging @alexb-img and @olivielpeau because you were active on #6526

LutaoX · 2023-06-15T15:19:21Z

I'm a PM at Datadog and want to chime in here to provide context that we are aware of this feature request and are actively looking for details about the use cases, telemetry pipeline needs and pain-points. We highly encourage customers to reach out to us via our support channel (https://www.datadoghq.com/support/) or your CSM contact about this topic! Thanks!

2rs2ts · 2023-06-20T23:16:43Z

@LutaoX Let me just say that it's super awesome to hear from a PM in a public setting, because until now that's definitely not been the norm from my experience with requests on this repo; my company had internally started assuming that filing feature requests in Github to go along with our support cases was pointless.

Anyway, I'm pretty sure I've reached out before about this and ended up having a support case where I linked this issue, but I don't want to go dig it back up and cause confusion on the customer support end by necro-ing a 2-3 year old issue. One of the big use cases for us is in Kubernetes, where we don't want the host tag from the reporting agent to get applied to statsd metrics, because (to make a long story short) internalTrafficPolicy: local does not actually turn off kube-proxy's load balancing behavior, thereby making it impossible for pod Foo on host A to guarantee that any statsd metrics it sends aren't tagged as potentially coming from other hosts. (I brought it up with the sig-networking at some point but they told me that that's intentional behavior of kube-proxy and that a KEP would be needed to add a new option to actually only send traffic to the local pod.)

Obviously this would be a different part of the codebase, but there's also the matter of EC2 tags where we want some of those tags (such as Env) but not others (such as weird inventorying tags that the company forces us to add but which we don't want showing up in the DataDog web UI.) I'm not sure if that falls under "blacklist host tags" but it sure feels adjacent to it at the very least, and I'm nearly certain I've also brought this up with customer support–maybe even in the same breath as the matter of the host tag in Kubernetes.

I'm only commenting about my use cases here to make sure they get a little bump and to let anyone who's in the same boat as I am just reference my comment when they open their support tickets. Like, if one can summarize one's request by saying "please do what this guy on github said" then hey, I saved them time :)

lmello · 2023-10-16T14:13:50Z

I implemented this for k8s workloads, I guess my implementation could be expanded for host tags also.
Given that I added a method removeTags to the taglist.

#20161

cptkng · 2024-01-17T08:11:15Z

I'm a PM at Datadog and want to chime in here to provide context that we are aware of this feature request and are actively looking for details about the use cases, telemetry pipeline needs and pain-points. We highly encourage customers to reach out to us via our support channel (https://www.datadoghq.com/support/) or your CSM contact about this topic! Thanks!

I created a support case around the issue not long ago. @LutaoX , I don't know if you can access it, but here's the case: http://help.datadoghq.com/hc/requests/1402253

cptkng · 2024-01-17T08:14:52Z

We require tags to be filtered/removed at source(ingest) also. Metrics without limits only removes tags during indexing not ingest.

And that's exactly the issue. While you can define which tags you want to have indexed with Metrics Without Limits (and that's good as the indexing is really expensive), you cannot define what is ingested. And though the price per ingested metric is much lower than indexed, if you have many hosts with many custom metrics, you still end up with a super high bill.

ide · 2024-03-07T20:20:26Z

Pragmatically speaking, is there a workaround other than using something other than Datadog to reduce high counts of custom metrics? More precisely, my reading of this GitHub issue and the problem at hand is that the Datadog agent always adds tags, some of which have high cardinalities like host and name, which significantly raises the number of custom metrics and the customer's bill outside of the customer's control. The two ways I am aware of to remove these tags is either to use a custom Datadog agent or not use Datadog altogether, and it would be great to learn of a viable workaround.

IMO ideally the agent would allow for custom tag transformers that receive a tag–value pair from the agent and return a tag–value pair to send to the Datadog service, where returning null would drop that tag–value. But even a simpler API to drop tags regardless of their values would be a great feature.

adudek · 2024-03-08T05:46:11Z

@ide never tested this, but there is a Vector proxy service You can deploy between agent and datadog. You'd have to reconfigure agents and build your filtering rules, but anything can become financially viable at a certain threshold.

toan-hf · 2024-03-23T14:21:19Z

Hi everyone! I think the concern to blacklist HostTags here is super valid despite Datadog-agent being supported or not but currently I have tested with Vector.Dev to disable some unnecessary tags before it was ingested & indexed to Datadog.
Although it is super important for us (and our budget) but it seems this feature may take longer for implementation from the Datadog side, thus I share my implementation here for vis

Our architecture like this

Datadog-Agent --> Vector.Dev ---> Datadog Platform

Step 1: It is important to note that your DD-Agent version must be higher than 7.45.1
To allow some of the environment variables below could be applied

DD_OBSERVABILITY_PIPELINES_WORKER_METRICS_ENABLED - boolean - optional - default: false
    ## Enables forwarding of metrics to an Observability Pipelines Worker
    
DD_OBSERVABILITY_PIPELINES_WORKER_METRICS_URL - string - optional - default: ""
    ## This is the URL of vector.dev service that you need to enter
DD_OBSERVABILITY_PIPELINES_WORKER_LOGS_ENABLED - boolean - optional - default: false
    ## Enables forwarding of logs to an Observability Pipelines Worker

DD_OBSERVABILITY_PIPELINES_WORKER_LOGS_URL - string - optional - default: ""

DD_OBSERVABILITY_PIPELINES_WORKER_TRACES_ENABLED - boolean - optional - default: false
    ## Enables forwarding of traces to an Observability Pipelines Worker

DD_OBSERVABILITY_PIPELINES_WORKER_TRACES_URL - string - optional - default: ""

Step 2: Setup vector.dev service

(it can be managed by HELM, so please utilise it). The outcome is you have that service up and running and you have a practical endpoint to receive the data

Step 3: The rule that I used on vector.dev to discard the unnecessary tags

 sources:
   ## This aims to allow your vector.dev can receive the traffic from Datadog-Agent
   datadog_agents:
      type: datadog_agent
      address: 0.0.0.0:8282
      multiple_outputs: true
      store_api_key: false
  ## This aims to delete 2 tags pod_phase & namespace, also append image_id tag with the default value themystery
  transforms:
   drop_one_tag:
      type: remap
      inputs:
        - datadog_agents.metrics
      source: |-
        del(.tags.pod_phase)
        del(.tags.namespace)
        .tags.image_id = "themystery"

    ##  Output all of them to the Datadog Platform
   sinks:
     datadog_metrics:
       type: datadog_metrics
       inputs:
         - drop_one_tag
       compression: gzip
       default_api_key: ${DD_API_KEY}
       site: "datadoghq.eu"

Finally, you can check your metric after it arrives in Datadog Platform, surely the tag pod_phase / namespace will not appear anymore

Hope it helps to rescue everyone (not just short-term but even for a long-term model).

Twe3tTwe3t · 2024-05-02T08:45:43Z

We have this same issue as well, particularly in a Azure AKS environment.
In our case, our scenario is, 1 team looks after the AKS infrastructure, so therefore they tag the underlying infrastructure with common tags such as team, service, env as well as some custom tags such as 'costcentre'.
We then have multiple application teams that will run their application on top of the provided AKS clusters, those teams too will tag their deployments with the same tags.
What we are seeing on logging especially are duplicate tags, coming off the both the host tags as well as application logs. this is becoming a pain point as we back bill customers for their logging usage into Datadog.
We've checked, double checked, re-checked all our configuration to make sure none of the settings are enabled in the DD-agent to pull host tags as labels, so should only be apply a customers application tags to its logs and metrics.

lifttocode · 2024-07-18T13:23:31Z

More precisely, my reading of this GitHub issue and the problem at hand is that the Datadog agent always adds tags, some of which have high cardinalities like host and name, which significantly raises the number of custom metrics and the customer's bill outside of the customer's control.

For those facing issues with high cardinality due to host tags in Datadog, I’d like to share a solution that worked for me.

In the official Datadog documentation on DogStatsD metrics submission - host tags, there is a somewhat vague but crucial statement: “The submitted host tag overrides any hostname collected by or configured in the Agent.” Leveraging this, I discovered that by including an empty host: tag to all generated metrics, I successfully eliminated unnecessary host tag that were significantly increasing the cardinality. Now, all custom metrics submitted to Datadog only include the tags that I intend to include.

My breakthrough came from Datadog Agent release 6.6.0, which introduced an enhancement allowing DogStatsD to support the removal of hostnames on events and services checks, similar to metrics, by adding an empty host: tag.

ide · 2024-07-19T22:56:10Z

@lifttocode Thank you for sharing that. Specifying host: appears to have removed the "host" tag.

Relatedly, I tried doing the same for other tags (other_tag:) but the behavior is slightly different. The agent will use other_tag: to override the tag that the agent would have otherwise specified but it still sends other_tag: to Datadog. The Datadog website UI displays the tag as other_tag without a value. Trying it out is the easiest way to see the behavior for yourself. It is still useful to be able to override tags that would otherwise increase your ingested or indexed tag counts.

sherifabdlnaby · 2024-10-07T22:18:31Z

We have this same issue as well, particularly in a Azure AKS environment. In our case, our scenario is, 1 team looks after the AKS infrastructure, so therefore they tag the underlying infrastructure with common tags such as team, service, env as well as some custom tags such as 'costcentre'. We then have multiple application teams that will run their application on top of the provided AKS clusters, those teams too will tag their deployments with the same tags. What we are seeing on logging especially are duplicate tags, coming off the both the host tags as well as application logs. this is becoming a pain point as we back bill customers for their logging usage into Datadog. We've checked, double checked, re-checked all our configuration to make sure none of the settings are enabled in the DD-agent to pull host tags as labels, so should only be apply a customers application tags to its logs and metrics.

THIS

2rs2ts · 2024-10-23T21:20:14Z

@lifttocode

“The submitted host tag overrides any hostname collected by or configured in the Agent.”

That's good to know; unfortunately, it seems that autodiscovery-scraped metrics (e.g. openmetrics, prometheus, etc.) that have a host label on them just end up creating host aliases, so the metrics are just double-tagged with two host:... metrics. So I guess it's not a consistent experience, or maybe I'm just misunderstanding something. Even if my problem exists between my keyboard and chair in that case, there's still the matter of wanting a partial list of host tags to apply... well, that or duplicating the host tags at the agent level. I just don't want my users to have to re-specify env and a bunch of other tags every time they set up any autodiscovery (or StatsD metrics, for that matter)

Scalahansolo · 2024-12-12T19:44:35Z

How is this not an obvious feature to include on the agent. At this point this just feels like Datadog is being overly greedy and not implementing this as a way to extract more money out of it's customers.

kaarolch · 2025-02-20T09:32:54Z

@ide never tested this, but there is a Vector proxy service You can deploy between agent and datadog. You'd have to reconfigure agents and build your filtering rules, but anything can become financially viable at a certain threshold.

Unfortunately we tried to do multiple options with vector as a proxy but:

when you drop host tag your metrics will have different value in datadog. This could be especially critical for gauges (sink take last value of series) The counter looks a little better, but datadog sinks perform aggregation. When multiple vector aggregators send similar series (where the host is often the unique series differentiator), the series could be marked as duplicates in the Datadog backend and subsequently dropped.
With the vector, you can temporarily duplicate metrics. We also tried renaming the metrics from metrics_a to metrics_a_new_host_tag and changing the host tag to the aggregator's pod (sts) name. Additionally, we send our cluster_id, hoping to have enough unique data to differentiate counter series between aggregator pods and clusters. Unfortunately, the metrics have different values, characteristic is not bad but still between 5-10% diff. Below, I am including the VRL transform so you can test it on your own:

# metric_duplicate.yaml
type: route
inputs:
  - metrics_rename
route:
  host_rename: .name == "metric_a"

# metric_rename_host.yaml
type: remap
inputs:
  - metric_duplicate.host_rename
source: |-
      .name = "metric_a_with_new_host"
      .tags.host = "${POD_NAME}"
     #del(.tags.host) <- this can drop host tags. 

#metric_sink_route.yaml: 
type: route
inputs:
  - metric_duplicate._unmatched
  - metric_rename_host
route:
 sink_s3: .tags.s3=true
 sink_datadog: .tags.s3=false

Then you can compare the results and check how much your data differs, especially when you sum the series by environment tags.

Is worth to mentioned that by default host tag is using to merge integration tags like aws, k8s that currently are merged on dd backend side not client side. When you removed host tag you also drop a lot of environment tags.

We have a lot of metrics that only need 1-2 tags like environment, cluster_id but when we dropped host tags we saw different metrics results.

When we dropped any other tag everything works as expected and summary data points are almost identical.

The only working solution is enabling DD MWL and reduce indexed tags; droping host tag there works, unfortunately you will need to pay by ingested metrics.

tonglil mentioned this issue Feb 8, 2021

Please unify cloud provider tags #7396

Open

alexb-img mentioned this issue Sep 21, 2022

Allow excluding EC2 tags (implement exclude_ec2_tags configuration option) #13588

Closed

alexb-img mentioned this issue Oct 18, 2022

Adds a Dogstatsd Tag Filter #6526

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Blacklist Host Tags #3130

Feature Request: Blacklist Host Tags #3130

ericlarssen-wf commented Mar 6, 2019

tonglil commented Feb 6, 2021

hermanbanken commented Apr 1, 2021

kolloch commented Jun 23, 2021

kirecek commented Dec 13, 2021

danopia commented Dec 13, 2021 •

edited

Loading

knowshan commented Jan 9, 2022

richid commented Apr 11, 2022

andrew-kolesnikov commented Aug 25, 2022 •

edited

Loading

alexb-img commented Oct 18, 2022

arloliu commented Oct 18, 2022

adudek commented Nov 4, 2022 •

edited

Loading

patbl commented Nov 5, 2022

zekth commented May 14, 2023

LutaoX commented Jun 15, 2023

2rs2ts commented Jun 20, 2023

lmello commented Oct 16, 2023

cptkng commented Jan 17, 2024

cptkng commented Jan 17, 2024

ide commented Mar 7, 2024

adudek commented Mar 8, 2024 •

edited

Loading

toan-hf commented Mar 23, 2024 •

edited

Loading

Twe3tTwe3t commented May 2, 2024

lifttocode commented Jul 18, 2024 •

edited

Loading

ide commented Jul 19, 2024

sherifabdlnaby commented Oct 7, 2024

2rs2ts commented Oct 23, 2024

Scalahansolo commented Dec 12, 2024

kaarolch commented Feb 20, 2025 •

edited

Loading

Feature Request: Blacklist Host Tags #3130

Feature Request: Blacklist Host Tags #3130

Comments

ericlarssen-wf commented Mar 6, 2019

tonglil commented Feb 6, 2021

hermanbanken commented Apr 1, 2021

kolloch commented Jun 23, 2021

kirecek commented Dec 13, 2021

danopia commented Dec 13, 2021 • edited Loading

knowshan commented Jan 9, 2022

richid commented Apr 11, 2022

andrew-kolesnikov commented Aug 25, 2022 • edited Loading

alexb-img commented Oct 18, 2022

arloliu commented Oct 18, 2022

adudek commented Nov 4, 2022 • edited Loading

patbl commented Nov 5, 2022

zekth commented May 14, 2023

LutaoX commented Jun 15, 2023

2rs2ts commented Jun 20, 2023

lmello commented Oct 16, 2023

cptkng commented Jan 17, 2024

cptkng commented Jan 17, 2024

ide commented Mar 7, 2024

adudek commented Mar 8, 2024 • edited Loading

toan-hf commented Mar 23, 2024 • edited Loading

Twe3tTwe3t commented May 2, 2024

lifttocode commented Jul 18, 2024 • edited Loading

ide commented Jul 19, 2024

sherifabdlnaby commented Oct 7, 2024

THIS

2rs2ts commented Oct 23, 2024

Scalahansolo commented Dec 12, 2024

kaarolch commented Feb 20, 2025 • edited Loading

danopia commented Dec 13, 2021 •

edited

Loading

andrew-kolesnikov commented Aug 25, 2022 •

edited

Loading

adudek commented Nov 4, 2022 •

edited

Loading

adudek commented Mar 8, 2024 •

edited

Loading

toan-hf commented Mar 23, 2024 •

edited

Loading

lifttocode commented Jul 18, 2024 •

edited

Loading

kaarolch commented Feb 20, 2025 •

edited

Loading