Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DnsServiceDiscoverer: add total resolution timeout #2908

Merged
merged 3 commits into from
May 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@
import static io.servicetalk.dns.discovery.netty.DnsResolverAddressTypes.IPV4_PREFERRED;
import static io.servicetalk.dns.discovery.netty.DnsResolverAddressTypes.IPV6_PREFERRED;
import static io.servicetalk.dns.discovery.netty.DnsResolverAddressTypes.preferredAddressType;
import static io.servicetalk.dns.discovery.netty.DnsResolverAddressTypes.toRecordTypeNames;
import static io.servicetalk.dns.discovery.netty.ServiceDiscovererUtils.calculateDifference;
import static io.servicetalk.transport.netty.internal.BuilderUtils.datagramChannel;
import static io.servicetalk.transport.netty.internal.BuilderUtils.socketChannel;
Expand All @@ -113,6 +114,7 @@
import static java.util.Collections.singletonList;
import static java.util.Comparator.comparing;
import static java.util.Objects.requireNonNull;
import static java.util.concurrent.TimeUnit.MILLISECONDS;
import static java.util.concurrent.TimeUnit.NANOSECONDS;
import static java.util.concurrent.TimeUnit.SECONDS;
import static java.util.function.Function.identity;
Expand All @@ -129,6 +131,7 @@ final class DefaultDnsClient implements DnsClient {
private final MinTtlCache ttlCache;
private final long maxTTLNanos;
private final long ttlJitterNanos;
private final long resolutionTimeoutMillis;
private final ListenableAsyncCloseable asyncCloseable;
@Nullable
private final DnsServiceDiscovererObserver observer;
Expand All @@ -149,6 +152,7 @@ final class DefaultDnsClient implements DnsClient {
Duration srvHostNameRepeatInitialDelay, Duration srvHostNameRepeatJitter,
@Nullable Integer maxUdpPayloadSize, @Nullable final Integer ndots,
@Nullable final Boolean optResourceEnabled, @Nullable final Duration queryTimeout,
@Nullable Duration resolutionTimeout,
final DnsResolverAddressTypes dnsResolverAddressTypes,
@Nullable final SocketAddress localAddress,
@Nullable final DnsServerAddressStreamProvider dnsServerAddressStreamProvider,
Expand Down Expand Up @@ -222,6 +226,10 @@ final class DefaultDnsClient implements DnsClient {
builder.nameServerProvider(toNettyType(dnsServerAddressStreamProvider));
}
resolver = builder.build();
this.resolutionTimeoutMillis = resolutionTimeout != null ? resolutionTimeout.toMillis() :
// Default value is chosen based on a combination of default "timeout" and "attempts" options of
// /etc/resolv.conf: https://man7.org/linux/man-pages/man5/resolv.conf.5.html
resolver.queryTimeoutMillis() * 2;
Copy link
Member Author

@idelpivnitskiy idelpivnitskiy May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open for alternative suggestions. Originally, I planned to use resolver.queryTimeoutMillis() * resolver.maxQueriesPerResolve(), but Netty's default values are:

queryTimeout = 5 sec
maxQueriesPerResolve = 16

which results in 80 sec and seems unreasonably high.

Decided to go with x2 because because the linux default for attempts (maxQueriesPerResolve) is 2 (see https://man7.org/linux/man-pages/man5/resolv.conf.5.html).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fwiw, this seems reasonable to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me as well 👍

}

@Override
Expand Down Expand Up @@ -424,9 +432,21 @@ protected AbstractDnsSubscription newSubscription(
return new AbstractDnsSubscription(subscriber) {
@Override
protected Future<DnsAnswer<HostAndPort>> doDnsQuery(final boolean scheduledQuery) {
Promise<DnsAnswer<HostAndPort>> promise = nettyIoExecutor.eventLoopGroup().next().newPromise();
resolver.resolveAll(new DefaultDnsQuestion(name, SRV))
.addListener((Future<? super List<DnsRecord>> completedFuture) -> {
final EventLoop eventLoop = nettyIoExecutor.eventLoopGroup().next();
final Promise<DnsAnswer<HostAndPort>> promise = eventLoop.newPromise();
final Future<List<DnsRecord>> resolveFuture =
resolver.resolveAll(new DefaultDnsQuestion(name, SRV));
final Future<?> timeoutFuture = resolutionTimeoutMillis == 0L ? null : eventLoop.schedule(() -> {
if (!promise.isDone() && promise.tryFailure(DnsNameResolverTimeoutException.newInstance(
name, resolutionTimeoutMillis, SRV.toString(),
SrvRecordPublisher.class, "doDnsQuery"))) {
resolveFuture.cancel(true);
}
}, resolutionTimeoutMillis, MILLISECONDS);
resolveFuture.addListener((Future<? super List<DnsRecord>> completedFuture) -> {
if (timeoutFuture != null) {
timeoutFuture.cancel(true);
}
Throwable cause = completedFuture.cause();
if (cause != null) {
promise.tryFailure(cause);
Expand Down Expand Up @@ -501,9 +521,21 @@ protected Future<DnsAnswer<InetAddress>> doDnsQuery(final boolean scheduledQuery
if (scheduledQuery) {
ttlCache.prepareForResolution(name);
}
Promise<DnsAnswer<InetAddress>> dnsAnswerPromise =
nettyIoExecutor.eventLoopGroup().next().newPromise();
resolver.resolveAll(name).addListener(completedFuture -> {
final EventLoop eventLoop = nettyIoExecutor.eventLoopGroup().next();
final Promise<DnsAnswer<InetAddress>> dnsAnswerPromise = eventLoop.newPromise();
final Future<List<InetAddress>> resolveFuture = resolver.resolveAll(name);
final Future<?> timeoutFuture = resolutionTimeoutMillis == 0L ? null : eventLoop.schedule(() -> {
if (!dnsAnswerPromise.isDone() && dnsAnswerPromise.tryFailure(
DnsNameResolverTimeoutException.newInstance(name, resolutionTimeoutMillis,
toRecordTypeNames(addressTypes), ARecordPublisher.class, "doDnsQuery"))) {
resolveFuture.cancel(true);
}
}, resolutionTimeoutMillis, MILLISECONDS);

resolveFuture.addListener(completedFuture -> {
if (timeoutFuture != null) {
timeoutFuture.cancel(true);
}
Throwable cause = completedFuture.cause();
if (cause != null) {
dnsAnswerPromise.tryFailure(cause);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
import io.servicetalk.client.api.ServiceDiscovererEvent;
import io.servicetalk.transport.api.HostAndPort;
import io.servicetalk.transport.api.IoExecutor;
import io.servicetalk.utils.internal.DurationUtils;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
Expand Down Expand Up @@ -124,6 +125,8 @@ public final class DefaultDnsServiceDiscovererBuilder implements DnsServiceDisco
private IoExecutor ioExecutor;
@Nullable
private Duration queryTimeout;
@Nullable
private Duration resolutionTimeout;
private int consolidateCacheSize = DEFAULT_CONSOLIDATE_CACHE_SIZE;
private int minTTLSeconds = DEFAULT_MIN_TTL_POLL_SECONDS;
private int maxTTLSeconds = DEFAULT_MAX_TTL_POLL_SECONDS;
Expand Down Expand Up @@ -258,16 +261,23 @@ public DefaultDnsServiceDiscovererBuilder ndots(final int ndots) {
}

@Override
public DefaultDnsServiceDiscovererBuilder queryTimeout(final Duration queryTimeout) {
this.queryTimeout = queryTimeout;
public DefaultDnsServiceDiscovererBuilder queryTimeout(final @Nullable Duration queryTimeout) {
this.queryTimeout = queryTimeout == null ? null : DurationUtils.ensureNonNegative(queryTimeout, "queryTimeout");
return this;
}

@Override
public DefaultDnsServiceDiscovererBuilder resolutionTimeout(final @Nullable Duration resolutionTimeout) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the name between resolution and query is quite close and not directly obvious how they are different. WDYT about using "total" in the name to make it more clear - i.e. totalResolutionTimeout in addition to the javadoc clarification you have in place?

Copy link
Member Author

@idelpivnitskiy idelpivnitskiy May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I debated between adding "total" or not in the name as well. Decided to drop it for 2 reasons:

  1. Consistency with naming in DnsServiceDiscovererObserver.DnsResolutionObserver. This timeout starts when onNewResolution is called and stops when DnsResolutionObserver terminates.
  2. Bcz of the SRV lookups, it's not really "total".

this.resolutionTimeout = resolutionTimeout == null ? null :
DurationUtils.ensureNonNegative(resolutionTimeout, "resolutionTimeout");
return this;
}

@Override
public DefaultDnsServiceDiscovererBuilder dnsResolverAddressTypes(
@Nullable final DnsResolverAddressTypes dnsResolverAddressTypes) {
this.dnsResolverAddressTypes = dnsResolverAddressTypes != null ? dnsResolverAddressTypes :
systemDefault();
DEFAULT_DNS_RESOLVER_ADDRESS_TYPES;
return this;
}

Expand Down Expand Up @@ -385,8 +395,8 @@ DnsClient build() {
ttlJitter.toNanos(),
srvConcurrency, completeOncePreferredResolved, srvFilterDuplicateEvents,
srvHostNameRepeatInitialDelay, srvHostNameRepeatJitter, maxUdpPayloadSize, ndots, optResourceEnabled,
queryTimeout, dnsResolverAddressTypes, localAddress, dnsServerAddressStreamProvider, observer,
missingRecordStatus, nxInvalidation);
queryTimeout, resolutionTimeout, dnsResolverAddressTypes, localAddress, dnsServerAddressStreamProvider,
observer, missingRecordStatus, nxInvalidation);
return filterFactory == null ? rawClient : filterFactory.create(rawClient);
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -117,11 +117,17 @@ public DnsServiceDiscovererBuilder ndots(final int ndots) {
}

@Override
public DnsServiceDiscovererBuilder queryTimeout(final Duration queryTimeout) {
public DnsServiceDiscovererBuilder queryTimeout(final @Nullable Duration queryTimeout) {
delegate = delegate.queryTimeout(queryTimeout);
return this;
}

@Override
public DnsServiceDiscovererBuilder resolutionTimeout(final @Nullable Duration resolutionTimeout) {
delegate = delegate.resolutionTimeout(resolutionTimeout);
return this;
}

@Override
public DnsServiceDiscovererBuilder dnsResolverAddressTypes(
@Nullable final DnsResolverAddressTypes dnsResolverAddressTypes) {
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
/*
* Copyright © 2024 Apple Inc. and the ServiceTalk project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package io.servicetalk.dns.discovery.netty;

import io.servicetalk.concurrent.internal.ThrowableUtils;

import java.net.UnknownHostException;

final class DnsNameResolverTimeoutException extends UnknownHostException {
private static final long serialVersionUID = 3089160074512305891L;

private DnsNameResolverTimeoutException(final String name, final String recordType, final long timeoutMs) {
super("Resolution for '" + name + "' [" + recordType + "] timed out after " + timeoutMs + " milliseconds");
}

@Override
public Throwable fillInStackTrace() {
return this;
}

static DnsNameResolverTimeoutException newInstance(final String name, final long timeoutMs, final String recordType,
final Class<?> clazz, final String method) {
return ThrowableUtils.unknownStackTrace(new DnsNameResolverTimeoutException(name, recordType, timeoutMs),
clazz, method);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
package io.servicetalk.dns.discovery.netty;

import io.netty.channel.socket.InternetProtocolFamily;
import io.netty.handler.codec.dns.DnsRecordType;
import io.netty.resolver.ResolvedAddressTypes;
import io.netty.resolver.dns.DnsNameResolverBuilder;

Expand Down Expand Up @@ -52,6 +53,9 @@ public enum DnsResolverAddressTypes {
*/
IPV6_PREFERRED_RETURN_ALL;

private static final String A_AAAA_STRING = DnsRecordType.A + ", " + DnsRecordType.AAAA;
private static final String AAAA_A_STRING = DnsRecordType.AAAA + ", " + DnsRecordType.A;

/**
* The default value, based on "java.net" system properties: {@code java.net.preferIPv4Stack} and
* {@code java.net.preferIPv6Stack}.
Expand Down Expand Up @@ -109,4 +113,22 @@ static InternetProtocolFamily preferredAddressType(ResolvedAddressTypes resolved
": " + resolvedAddressTypes);
}
}

static String toRecordTypeNames(DnsResolverAddressTypes dnsResolverAddressType) {
switch (dnsResolverAddressType) {
case IPV4_ONLY:
return DnsRecordType.A.toString();
case IPV6_ONLY:
return DnsRecordType.AAAA.toString();
case IPV4_PREFERRED:
case IPV4_PREFERRED_RETURN_ALL:
return A_AAAA_STRING;
case IPV6_PREFERRED:
case IPV6_PREFERRED_RETURN_ALL:
return AAAA_A_STRING;
default:
throw new IllegalArgumentException("Unknown value for " + DnsResolverAddressTypes.class.getName() +
": " + dnsResolverAddressType);
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ public interface DnsServiceDiscovererBuilder {
* @return {@code this}.
*/
default DnsServiceDiscovererBuilder consolidateCacheSize(int consolidateCacheSize) {
throw new UnsupportedOperationException("DnsServiceDiscovererBuilder#consolidateCacheSize(int) is not " +
"supported by " + getClass());
throw new UnsupportedOperationException(
"DnsServiceDiscovererBuilder#consolidateCacheSize(int) is not supported by " + getClass());
}

/**
Expand Down Expand Up @@ -123,8 +123,8 @@ default DnsServiceDiscovererBuilder consolidateCacheSize(int consolidateCacheSiz
*/
default DnsServiceDiscovererBuilder ttl(int minSeconds, int maxSeconds, int minCacheSeconds, int maxCacheSeconds,
int negativeCacheSeconds) {
throw new UnsupportedOperationException("DnsServiceDiscovererBuilder#ttl(int, int, int, int, int) is not " +
"supported by " + getClass());
throw new UnsupportedOperationException(
"DnsServiceDiscovererBuilder#ttl(int, int, int, int, int) is not supported by " + getClass());
}

/**
Expand All @@ -146,8 +146,8 @@ default DnsServiceDiscovererBuilder ttl(int minSeconds, int maxSeconds, int minC
* @return {@code this}.
*/
default DnsServiceDiscovererBuilder localAddress(@Nullable SocketAddress localAddress) {
throw new UnsupportedOperationException("DnsServiceDiscovererBuilder#localAddress(SocketAddress) is not " +
"supported by " + getClass());
throw new UnsupportedOperationException(
"DnsServiceDiscovererBuilder#localAddress(SocketAddress) is not supported by " + getClass());
}

/**
Expand Down Expand Up @@ -182,19 +182,46 @@ DnsServiceDiscovererBuilder dnsServerAddressStreamProvider(

/**
* Set the number of dots which must appear in a name before an initial absolute query is made.
* <p>
* If not set, the default value is read from {@code ndots} option of {@code /etc/resolv.conf}).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if it is not present in the resolv.conf?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs for resolv.conf clarify the behavior if value is not specified in the file: https://man7.org/linux/man-pages/man5/resolv.conf.5.html
netty follows the same behavior. I would prefer to avoid documenting the behavior ST does not control directly, bcz either linux or netty behavior can change and it's hard to track to keep docs consistent.

*
* @param ndots the ndots value.
* @return {@code this}.
*/
DnsServiceDiscovererBuilder ndots(int ndots);

/**
* Sets the timeout of each DNS query performed by this service discoverer.
* Sets the timeout of each DNS query performed by this service discoverer as part of a resolution request.
* <p>
* Zero ({@code 0}) disables the timeout. If not set, the default value is read from {@code timeout} option of
* {@code /etc/resolv.conf}). Similar to linux systems, this value may be silently capped.
*
* @param queryTimeout the query timeout value
* @return {@code this}.
* @return {@code this}
* @see #resolutionTimeout(Duration)
*/
DnsServiceDiscovererBuilder queryTimeout(@Nullable Duration queryTimeout);

/**
* Sets the total timeout of each DNS resolution performed by this service discoverer.
* <p>
* Each resolution may execute one or more DNS queries, like following multiple CNAME(s) or trying different search
* domains. This is the total timeout for all intermediate queries involved in a single resolution request. Note,
* that <a href="https://tools.ietf.org/html/rfc2782">SRV</a> resolutions may generate independent resolutions for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a slightly different conversations, but I wonder if this is the right behavior. Even if technically it's two different resolutions, from a user perspective I still want to apply a "total" timeout, right? Since in the end they are just individual queries against a nameserver where we need a total timeout to cap?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, I don't like this fact either but it's what it's. the way SRV resolutions are implemented right now makes it too complicated to apply a timeout at SRV+all_A_resolutions level. requires a big refactoring.
taking into account, they are less frequent and used only for BACKGROUND discovery strategy, the need for a proper total timeout is less.

* {@code A/AAAA} records. In this case, this timeout will be applied to an {@code SRV} resolution and each
* {@code A/AAAA} resolution independently.
* <p>
* Zero ({@code 0}) disables the timeout. If not set, it defaults to {@link #queryTimeout(Duration) query timeout}
* value multiplied by {@code 2}.
*
* @param resolutionTimeout the query timeout value
* @return {@code this}
* @see #queryTimeout(Duration)
*/
DnsServiceDiscovererBuilder queryTimeout(Duration queryTimeout);
default DnsServiceDiscovererBuilder resolutionTimeout(@Nullable Duration resolutionTimeout) {
throw new UnsupportedOperationException(
"DnsServiceDiscovererBuilder#resolutionTimeout(Duration) is not supported by " + getClass());
}

/**
* Sets the list of the protocol families of the address resolved.
Expand Down
Loading
Loading