Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MINOR: Improve Producer error message for failed metadata update #18587

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -1104,8 +1104,7 @@ private ClusterAndWaitTime waitOnMetadata(String topic, Integer partition, long
metadata.awaitUpdate(version, remainingWaitMs);
} catch (TimeoutException ex) {
// Rethrow with original maxWaitMs to prevent logging exception with remainingWaitMs
final String errorMessage = String.format("Topic %s not present in metadata after %d ms.",
topic, maxWaitMs);
final String errorMessage = getErrorMessage(partitionsCount, topic, partition, maxWaitMs);
if (metadata.getError(topic) != null) {
throw new TimeoutException(errorMessage, metadata.getError(topic).exception());
}
Expand All @@ -1114,11 +1113,7 @@ private ClusterAndWaitTime waitOnMetadata(String topic, Integer partition, long
cluster = metadata.fetch();
elapsed = time.milliseconds() - nowMs;
if (elapsed >= maxWaitMs) {
final String errorMessage = partitionsCount == null ?
String.format("Topic %s not present in metadata after %d ms.",
topic, maxWaitMs) :
String.format("Partition %d of topic %s with partition count %d is not present in metadata after %d ms.",
partition, topic, partitionsCount, maxWaitMs);
final String errorMessage = getErrorMessage(partitionsCount, topic, partition, maxWaitMs);
if (metadata.getError(topic) != null && metadata.getError(topic).exception() instanceof RetriableException) {
throw new TimeoutException(errorMessage, metadata.getError(topic).exception());
}
Expand All @@ -1134,6 +1129,13 @@ private ClusterAndWaitTime waitOnMetadata(String topic, Integer partition, long
return new ClusterAndWaitTime(cluster, elapsed);
}

private String getErrorMessage(Integer partitionsCount, String topic, Integer partition, long maxWaitMs) {
return partitionsCount == null ?
String.format("Topic %s not present in metadata after %d ms.",
topic, maxWaitMs) :
String.format("Partition %d of topic %s with partition count %d is not present in metadata after %d ms.",
partition, topic, partitionsCount, maxWaitMs);
}
Comment on lines +1132 to +1138
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any value in including the partition value in the error message if it's non-null?

On line 1085 the method waitOnMetadata() exits when partitionsCount is non-null but partition is null. It seems like it's possible for partitionsCount to be null but partition non-null.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are 3 cases too:

  1. partitionsCount == null meaning we don't know how many partitions there are
  2. partitionsCount != null && partition == null meaning we do know how many partitions there are but no partition was specified
  3. partitionsCount != null && partition != null meaning we do know how many partitions there are and a partition was specified

Seems to me that 3 messages would be best so that we never get a "null" as an ugly insert. wdyt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any value in including the partition value in the error message if it's non-null?

Yes. partition is an input parameter to the method, and it's only null if the caller does not care about the metadata of a specific partitions (as pointed out by Andrew). So it's if not-null, the user cares and we should log it.

If partition != null, it helps to identify the case of a non-existing partition the user wants to write into. Assume a topic with 4 partitions, and user specific partition 5 as explicit argument when calling send(). The producer would try to learn about the leader broker for partition 5, but it does not exist, and it would loop until max.block.ms is exhausted and finally fail with: Partition 5 of topic <FOO> with partition count 4 is not present in metadata after XXX ms. -- Then we can suspect that the topic may really only have 4 partitions... W/o Partition 5 it's totally unclear why the metadata could not be found and why we hit max.block.ms on send(); the generic error Topic %s not present in metadata after %d ms. does not help us to identify this case.

Or did you mean, if partition == null? For this case, I agree it's not really helpful to log partition (as the user does not care anyway), but it think we don't log it anyway given the current code... (cf below)

Seems to me that 3 messages would be best so that we never get a "null" as an ugly insert. wdyt?

I believe the code is correct as-is. There is only two cases:

  1. partition == null. If we get metadata successfully, it implies that partitionCount != null and we just move on and don't throw TimeoutException. If we cannot get any metadata for the topic, we stay with partitionCount == null and use Topic %s not present in metadata after %d ms. for the TimeoutException.

  2. partition != null. For this case, if we get metadata successfully, we have a partitionCount, but we might still fail, if we don't get the metadata for the specified partition. For this case, we would use Partition %d of topic %s with partition count %d is not present in metadata after %d ms as error message. The other failure case is, that we don't get any metadata for the topic at all, and thus partitionCount == null and we still log the right thing. The fact that partition != null does not matter if partitionCount == null.

Hence, case (3) is not an error case... If we did get metadata (partitionCount != null) and if the user does not care (partition == null), we will never throw TimeoutException.

/**
* Validate that the record size isn't too large
*/
Expand Down
Loading