Skip to content

client: enhance waiting methods #120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

drewdzzz
Copy link
Collaborator

@drewdzzz drewdzzz commented Apr 17, 2025

Enhancement of waiting methods.
See the commits for details.

Closes #109
Closes #111
Closes #128

Method `Connector::waitAny` is somewhy placed at the service interfaces
section, but it's not used internally at all - this method actually
belongs to the public API. Let's move it there.
@drewdzzz drewdzzz force-pushed the client_wait branch 4 times, most recently from d732248 to 5c1ceb5 Compare April 17, 2025 11:38
@drewdzzz drewdzzz changed the title Client wait client: enhance waiting methods Apr 17, 2025
@drewdzzz drewdzzz force-pushed the client_wait branch 4 times, most recently from c3436e7 to d78ac90 Compare April 17, 2025 11:51
@drewdzzz
Copy link
Collaborator Author

Ignoring clang-format suggestions - I stick to the old style in the functions I patch in order to reduce diff.

The suggestion
diff --git a/src/Client/Connector.hpp b/src/Client/Connector.hpp
index bcaa03f..fd8bfb0 100644
--- a/src/Client/Connector.hpp
+++ b/src/Client/Connector.hpp
@@ -74,12 +74,10 @@ public:
 	int connect(Connection<BUFFER, NetProvider> &conn,
 		    const std::string& addr, unsigned port);
 
-	int wait(Connection<BUFFER, NetProvider> &conn, rid_t future,
-		 int timeout = -1, Response<BUFFER> *result = nullptr);
-	int waitAll(Connection<BUFFER, NetProvider> &conn,
-		    const std::vector<rid_t > &futures, int timeout = -1);
-	int waitCount(Connection<BUFFER, NetProvider> &conn,
-		      size_t feature_count, int timeout = -1);
+	int wait(Connection<BUFFER, NetProvider> &conn, rid_t future, int timeout = -1,
+		 Response<BUFFER> *result = nullptr);
+	int waitAll(Connection<BUFFER, NetProvider> &conn, const std::vector<rid_t> &futures, int timeout = -1);
+	int waitCount(Connection<BUFFER, NetProvider> &conn, size_t feature_count, int timeout = -1);
 	std::optional<Connection<BUFFER, NetProvider>> waitAny(int timeout = -1);
 	////////////////////////////Service interfaces//////////////////////////
 	void readyToDecode(const Connection<BUFFER, NetProvider> &conn);
@@ -209,8 +207,7 @@ Connector<BUFFER, NetProvider>::wait(Connection<BUFFER, NetProvider> &conn,
 		return -1;
 	}
 	if (! conn.futureIsReady(future)) {
-		LOG_DEBUG("Connection has been timed out: future ", future,
-			  " is not ready");
+		LOG_DEBUG("Connection has been timed out: future ", future, " is not ready");
 		return -1;
 	}
 	LOG_DEBUG("Feature ", future, " is ready and decoded");
@@ -307,8 +304,7 @@ Connector<BUFFER, NetProvider>::waitCount(Connection<BUFFER, NetProvider> &conn,
 		LOG_ERROR("Connection got an error: ", conn.getError().msg);
 		return -1;
 	}
-	LOG_DEBUG("Connection has been timed out: only ",
-		  conn.getFutureCount() - ready_futures, " are ready");
+	LOG_DEBUG("Connection has been timed out: only ", conn.getFutureCount() - ready_futures, " are ready");
 	return -1;
 }
 
diff --git a/src/Utils/Timer.hpp b/src/Utils/Timer.hpp
index 3d3888a..a97d808 100644
--- a/src/Utils/Timer.hpp
+++ b/src/Utils/Timer.hpp
@@ -40,7 +40,7 @@ public:
 	}
 	bool isExpired() const
 	{
-		if (m_Timeout == std::chrono::milliseconds{-1})
+		if (m_Timeout == std::chrono::milliseconds {-1})
 			return false;
 		std::chrono::time_point<std::chrono::steady_clock> end =
 			std::chrono::steady_clock::now();
@@ -50,7 +50,7 @@ public:
 	}
 	int elapsed() const
 	{
-		if (m_Timeout == std::chrono::milliseconds{-1})
+		if (m_Timeout == std::chrono::milliseconds {-1})
 			return 0;
 		std::chrono::time_point<std::chrono::steady_clock> end =
 			std::chrono::steady_clock::now();

There are some test cases that do not close its connections - it
violates isolation of the future test cases. The problem is they cannot
close them because they are dead (see tarantool#122) - let's simply use separate
instances of `Connector` for such cases.

Also, there is a test case `many_conn_ping` that waits for any of
several connections and then closes them. The problem is other
connections can be ready to decode, hence, they won't be dropped because
of tarantool#123 - I tried to simply wait for all of them after the case, but
then I faced another tarantool#124 bug, so let's simply use separate instance of
`Connector` here as well.

Workaround for tarantool#122
Workaround for tarantool#123
Workaround for tarantool#124
@drewdzzz drewdzzz marked this pull request as ready for review April 18, 2025 09:53
Copy link
Member

@CuriousGeorgiy CuriousGeorgiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Dropped some thoughts about the current state of the API. Also, it's not good that we have active users (customers), and we change the API like this — I think it's high time we should start doing releases and versioning (let's open a ticket for this).

If you agree, please open some more tickets.

@drewdzzz
Copy link
Collaborator Author

drewdzzz commented Apr 22, 2025

I think it's high time we should start doing releases and versioning (let's open a ticket for this)

I thought about it, but it seemed to me that the connector is beta so it's OK to change the API significantly.
Well, after you suggested it again, I don't think so - filed #125, let's think about versioning a bit later.

@drewdzzz drewdzzz force-pushed the client_wait branch 2 times, most recently from e2c072a to 7042904 Compare April 22, 2025 07:53
@drewdzzz drewdzzz assigned CuriousGeorgiy and unassigned drewdzzz Apr 22, 2025
@drewdzzz drewdzzz force-pushed the client_wait branch 5 times, most recently from 9a3856d to 02c2039 Compare April 22, 2025 13:32
@drewdzzz drewdzzz assigned mkostoevr and CuriousGeorgiy and unassigned drewdzzz Apr 22, 2025
@drewdzzz
Copy link
Collaborator Author

drewdzzz commented Apr 22, 2025

@mkostoevr @CuriousGeorgiy
I found out that the previous implementation didn't poll requests at all with timeout = 0. I pushed the new fixed version.
All the changes are related only to commit client: redesign waiting methods.

UPD: the new two commits are client: handle timeout more safely and client: poll connections on non-blocking wait. Commit client: redesign waiting methods is left intact.

Currently, call of any waiting method (`wait`, `waitAll` and so on)
with zero timeout blocks execution until all required responses are
received. That's not great because in some scenarios users want to
check if there are any responses ready without blocking execution.

Of course, one can use `wait(1)` which sleeps for only a millisecond,
but anyway it's often unexpected that `wait(0)` sleeps forever. Let's
stick to `epoll_wait` interface - `wait(0)` is non-blocking and
`wait(-1)` blocks forever. Note that before the commit, passing `-1`
would lead to an assertion failure.

Note that we used to block forever in waiting methods by default. To
maintain this behavior, the commit updates default value of timeout
since the old default (zero) means non-blocking polling of connections
now.

Also, the commit makes `wait(0)` non-blocking, but it doesn't poll
connections. We will implement that part in a separate commit for the
sake of clarity of the patch.

Part of tarantool#111
Currently all wait methods are just cycling in a loop while timer is not
expired. At the beginning of each iteration, `timeout - timer.elapsed()`
expression is used to calculate timeout for `NetProvider::wait()`, and
that `wait` expects only non-negative timeouts (or `-1` for infinity).
So, for example, if the operating system will reschedule the process
right between `timer.isExpired()` and `timer.elapsed()` call, timer
can become expired after we are scheduled again, and we will suddenly
pass a negative value to the provider.

Let's replace `elapsed` method with more safe `timeLeft` which returns
zero when the timeout is expired. Also, such method will be handy in the
following commits.

Closes tarantool#128
In the previous commits, we made `wait(0)` non-blocking. However, it
must poll connections (otherwise, it is just pointless) - for this purpose,
we should check if the timer has expired after the first call of
`NetProvider.wait()`.

Also, `LibevNetProvider` should run event loop with a special
`EVRUN_NOWAIT` flag when the timeout is zero for non-blocking poll.

Closes tarantool#111
Currently, we log that timeout is exceeded with error level. The problem
is there are no dedicated methods for polling so users have to use
`wait` methods, and when they use them, they spam timeout errors.
Anyway, we return -1 on timeout, so users can log timeouts on their own.

What's about tests - we have no convenient way to check the logs now, so
let's simply change log level without tests, the commit is trivial anyway.

Closes tarantool#109
We need to get this method covered because we are going to refer to it
from section describing waiting methods (like `Connector::wait()`).
The new section covers all available ways of waiting for responses.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants