Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metric collection for errors always report the same error "ListenerExecutionFailedException" regardless of what's thrown in the consumer method #3741

Open
renannprado opened this issue Feb 13, 2025 · 6 comments

Comments

@renannprado
Copy link

renannprado commented Feb 13, 2025

In what version(s) of Spring for Apache Kafka are you seeing this issue?

3.3.2

Describe the bug

I've followed this part of the documentation to integrate micrometer in the Kafka listener's container:

@Bean
KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<String, MyPayload>> myPayloadKafkaListenerContainerFactory(
    ConsumerFactory<String, MyPayload> MyPayloadKafkaConsumerFactory,
    ObservationRegistry observationRegistry
) {
    ConcurrentKafkaListenerContainerFactory<String, MyPayload> factory = new ConcurrentKafkaListenerContainerFactory<>();
    factory.setConsumerFactory(myPayloadKafkaConsumerFactory);
    factory.setConcurrency(1);
    // the relevant part
    factory.getContainerProperties().setObservationEnabled(true);
    factory.getContainerProperties().setObservationRegistry(observationRegistry);

    return factory;
}

Then after throwing an exception from the consumer:

@RetryableTopic(
    kafkaTemplate = "myProcessRecoverKafkaTemplate",
    attempts = "3",
    backoff = @Backoff(
        delay = 2000, maxDelay = 10000, multiplier = 5
    )
)
@KafkaListener(
    id = "processmyProcess",
    topics = "my-process",
    containerFactory = "myProcessKafkaListenerContainerFactory"
)
public void processPayload(
    ConsumerRecord<String, MyPayload> data
) {
    if (1 == 1) {
        throw new CustomException();
    }
}

This is what I see in the metric:

spring_kafka_listener_seconds_count{error="ListenerExecutionFailedException",messaging_kafka_consumer_group="processMethod-retry-10000",messaging_operation="receive",messaging_source_kind="topic",messaging_source_name="my-method-retry-10000",messaging_system="kafka",spring_kafka_listener_id="processMethod-retry-10000-0"} 6

Without configuring micrometer observation:

spring_kafka_listener_seconds_sum{exception="ListenerExecutionFailedException",name="my-process-retry-4000-0",result="failure"} 0.02504

I tracked to this piece of code:

Though here I don't know if this exclusive to the micrometer version of the metric or not.

I've described as a bug because I find the metric not particular insightful when every error counts towards the same metric regardless of what happens in the consumer.

To Reproduce

Create a kafka consumer, configure micrometer observability (see above), throw a custom exception from within the consumer and check what metric was generated.

Expected behavior

If MyException is thrown during the consumption of the message, that's what I'd like to see in the metric.

basically:

catch (ListenerExecutionFailedException e) {
	listenerError = e;
	if (e.getCause() != null) {
		currentObservation.error(e.getCause());
	} else {
		currentObservation.error(e);
	}

	handleException(records, acknowledgment, consumer, message, e);
}

Maybe I'm oversimplifying and the actual solution is something else, but it makes very explicit what I mean.

And then that's what I would like to see in the metric:

spring_kafka_listener_seconds_count{error="MyException",messaging_kafka_consumer_group="processMethod-retry-10000",messaging_operation="receive",messaging_source_kind="topic",messaging_source_name="my-method-retry-10000",messaging_system="kafka",spring_kafka_listener_id="processMethod-retry-10000-0"} 6

Sample

I didn't prepare one, but if really needed, I can provide.

@sobychacko
Copy link
Contributor

@renannprado I think this is how it works now and may not be a bug. With that said, this is an opportunity to correct it so that the error tag contains more information on the exception. We will look into it. Thanks for the report.

@artembilan
Copy link
Member

Sounds reasonable, @renannprado .

Please, consider contributing the fix: https://github.com/spring-projects/spring-kafka/blob/main/CONTRIBUTING.adoc!

@renannprado
Copy link
Author

@artembilan if nobody works on it, I believe I should have some time before the end of the month. Thanks for quick replies!

@artembilan
Copy link
Member

Thank you, @renannprado , for letting us know about your availability!
Unfortunately, the end of the month doesn't work for us.
We have scheduled release for next Tuesday.
Right, we can just leave it as is until a fix from your and next release cycle, but having it as known bug without decent reason to not incorporate into the release right now, would lead to some questions from community.

Well, we may treat this as a breaking change since we are going to change the value of that error tag.
This way, existing metrics after upgrade would not collect properly any more.
(Not telling that they work correctly right now though 😄).

For me, as this library developer, it is a bug and has to be fixed.
For end-users who deal with real metrics this might cause headache.
Therefore we may just do the fix for 4.0.0 which is due on in November.

So, tell me, please, @renannprado , how harmful it would be for your metrics collector that some timers are going to be different now?

@renannprado
Copy link
Author

@artembilan the label bug was removed by @sobychacko, but I still believe it's a bug.

Well I sadly can't stop what I'm doing now to come and fix this bug, so I wanted to report it at least. Please don't wait for me if you'd like to have a release including these changes by Tuesday.

Would it be a breaking change though? you're not renaming the label (or are you?) but just making sure the right exception class appears in there. Sure, someone might have an implementation of alert or dashboard relying on exactly this value, but that would be misleading/wrong anyway.

From my point of view, it's not going to be a breaking change, because I just started to integrate the metrics, so I don't have any alarms or dashboards using it, you could change it however it makes sense, but I'm only a single user 😁.

@artembilan
Copy link
Member

Yeah...
Makes sense - "False alarm".
So, fixing it to the proper values as soon as possible is better, then being afraid of some broken reports, which are really misleading/wrong as you said.
Bringing back bug label...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants