-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MINOR: Improve Producer error message for failed metadata update #18587
base: trunk
Are you sure you want to change the base?
Conversation
We should provide the same informative error message for both timeout cases.
Java 17
Java 23
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @mjsax! It seems like a straightforward improvement to the error logging. Just had one very minor question based on curiosity. Thanks!
private String getErrorMessage(Integer partitionsCount, String topic, Integer partition, long maxWaitMs) { | ||
return partitionsCount == null ? | ||
String.format("Topic %s not present in metadata after %d ms.", | ||
topic, maxWaitMs) : | ||
String.format("Partition %d of topic %s with partition count %d is not present in metadata after %d ms.", | ||
partition, topic, partitionsCount, maxWaitMs); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any value in including the partition
value in the error message if it's non-null
?
On line 1085 the method waitOnMetadata()
exits when partitionsCount
is non-null
but partition
is null
. It seems like it's possible for partitionsCount
to be null
but partition
non-null
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are 3 cases too:
partitionsCount == null
meaning we don't know how many partitions there arepartitionsCount != null && partition == null
meaning we do know how many partitions there are but no partition was specifiedpartitionsCount != null && partition != null
meaning we do know how many partitions there are and a partition was specified
Seems to me that 3 messages would be best so that we never get a "null" as an ugly insert. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any value in including the partition value in the error message if it's non-null?
Yes. partition
is an input parameter to the method, and it's only null
if the caller does not care about the metadata of a specific partitions (as pointed out by Andrew). So it's if not-null, the user cares and we should log it.
If partition != null
, it helps to identify the case of a non-existing partition the user wants to write into. Assume a topic with 4 partitions, and user specific partition 5 as explicit argument when calling send()
. The producer would try to learn about the leader broker for partition 5, but it does not exist, and it would loop until max.block.ms
is exhausted and finally fail with: Partition 5 of topic <FOO> with partition count 4 is not present in metadata after XXX ms.
-- Then we can suspect that the topic may really only have 4 partitions... W/o Partition 5
it's totally unclear why the metadata could not be found and why we hit max.block.ms
on send()
; the generic error Topic %s not present in metadata after %d ms.
does not help us to identify this case.
Or did you mean, if partition == null
? For this case, I agree it's not really helpful to log partition
(as the user does not care anyway), but it think we don't log it anyway given the current code... (cf below)
Seems to me that 3 messages would be best so that we never get a "null" as an ugly insert. wdyt?
I believe the code is correct as-is. There is only two cases:
-
partition == null
. If we get metadata successfully, it implies thatpartitionCount != null
and we just move on and don't throwTimeoutException
. If we cannot get any metadata for the topic, we stay withpartitionCount == null
and useTopic %s not present in metadata after %d ms.
for theTimeoutException
. -
partition != null
. For this case, if we get metadata successfully, we have apartitionCount
, but we might still fail, if we don't get the metadata for the specified partition. For this case, we would usePartition %d of topic %s with partition count %d is not present in metadata after %d ms
as error message. The other failure case is, that we don't get any metadata for the topic at all, and thuspartitionCount == null
and we still log the right thing. The fact thatpartition != null
does not matter ifpartitionCount == null
.
Hence, case (3) is not an error case... If we did get metadata (partitionCount != null
) and if the user does not care (partition == null
), we will never throw TimeoutException
.
BTW, I saw several other branches that are recently failing with similar test failures, including one of mine 🤷♂️ |
private String getErrorMessage(Integer partitionsCount, String topic, Integer partition, long maxWaitMs) { | ||
return partitionsCount == null ? | ||
String.format("Topic %s not present in metadata after %d ms.", | ||
topic, maxWaitMs) : | ||
String.format("Partition %d of topic %s with partition count %d is not present in metadata after %d ms.", | ||
partition, topic, partitionsCount, maxWaitMs); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are 3 cases too:
partitionsCount == null
meaning we don't know how many partitions there arepartitionsCount != null && partition == null
meaning we do know how many partitions there are but no partition was specifiedpartitionsCount != null && partition != null
meaning we do know how many partitions there are and a partition was specified
Seems to me that 3 messages would be best so that we never get a "null" as an ugly insert. wdyt?
Thanks. There is already a ticket for Java 17:
Java 23:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. Thanks for the patch.
We should provide the same informative error message for both timeout cases.