Skip to content

azure-monitor-opentelemetry incorrectly resolves cloud_RoleInstance in AKS #45532

@greatvovan

Description

@greatvovan
  • azure-monitor-opentelemetry:
  • 1.8.6:
  • ~Linux (managed):
  • Python 3.12:

Describe the bug
When exporting user metrics to Application Insights, cloud_RoleInstance field is getting an unexpected value. Attempts to set it to pod name don't work.

To Reproduce
Steps to reproduce the behavior:

  1. Include the code:
from opentelemetry.metrics import get_meter
from azure.monitor.opentelemetry import configure_azure_monitor

configure_azure_monitor()

meter = get_meter('MyMeter')
gauge = meter.create_gauge(name='MyGauge')
gauge.set(123.45)
  1. Set environment variable: OTEL_RESOURCE_ATTRIBUTES: service.name=MyService,service.namespace=MyNamespace or OTEL_SERVICE_NAME=MyService.
  2. Deploy to AKS (as Pod, Job, Deployment, etc).

Expected behavior
Pod name in cloud_RoleInstance field.

Additional context
Looking at _get_cloud_role_instance() function, the resolution process goes as follows:

  • ResourceAttributes.SERVICE_INSTANCE_ID
  • ResourceAttributes.K8S_POD_NAME
  • platform.node() — hostname
    The issue is that something (probably Azure resource detectors) configures the value into a UUID (probably of the VM) and it takes the priority. In AKS, though, VM attributes have very little sense as it is a contanerized environment.

To add more to that, if I manually configure the exporter like so

resource = Resource.create({"service.name": "MyService"})
metric_exporter = AzureMonitorMetricExporter(connection_string=os.getenv('APPLICATIONINSIGHTS_CONNECTION_STRING'))
reader = PeriodicExportingMetricReader(metric_exporter)
provider = MeterProvider(resource=resource, metric_readers=[reader])
set_meter_provider(provider)
meter = provider.get_meter('MyMeter')

then the pod name appears. This makes me think that this behavior is an unplanned problem since it works inconsistently in different setups.

Suggestion
I think we should suppress producing ResourceAttributes.SERVICE_INSTANCE_ID if _is_on_aks() to allow the K8S branch to work.

Failed workaround attempts
I tried to use K8S Downward API in order to set the pod name manually into service.instance.id, like so:

          env:
            - name: MY_POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: OTEL_RESOURCE_ATTRIBUTES
              value: service.name=MyService,service.namespace=MyNamespace,service.instance.id=$(MY_POD_NAME)

but I got some inconsistent behavior, meaning that sometimes it works and sometimes it does not. I could not figure out the pattern, just subsequent runs of the same job may yield different cloud_RoleInstance. There is a related issue that is very confusingly saying that resource detectors have priority over environment variables, but in this case it is unclear why the behavior is inconsistent and why having this branch for k8s.

EDIT: Successful workaround
Similarly to the linked issue, setting OTEL_EXPERIMENTAL_RESOURCE_DETECTORS=otel resolves both problems:

  1. Correct value for cloud_RoleInstance
  2. Control over it through OTEL_RESOURCE_ATTRIBUTES

Metadata

Metadata

Assignees

No one assigned

    Labels

    customer-reportedIssues that are reported by GitHub users external to the Azure organization.needs-triageWorkflow: This is a new issue that needs to be triaged to the appropriate team.questionThe issue doesn't require a change to the product in order to be resolved. Most issues start as that

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions