Fuseki unresponsive to requests if performing with a very large sync #173

tomkxy · 2022-08-13T11:43:40Z

If a Fuseki is performing a very large sync they are unresponsive. This causes a problem in a setup with load balancers which are not able to detect that and direct traffic to the Fuseki node.

afs · 2022-08-19T07:13:58Z

This is at startup?

This is how it currently works:

As well as the sync that happens when a request comes in (the normal case for an active system), there is a background thread that does a sync every 5 minutes. This ensures an idle server does not get too far behind in an active cluster.

Sync at start is done by making that background thread sync immediately the background task starts, then every 5 minutes.

The original idea was to not hold up startup for the case when a server has an existing database. But I can see if this a long way behind, it is not useful for serving requests and if it is not a long way behind, the sync is quick.

This could be changed to run the first sync synchronously as the dataset is constructed, before the HTTP server is started.

If this is happening not at startup, then something else is going on. #171 be be the issue and the fix includes a "if already doing a sync elsewhere (same server), do not sync but serve the request from the current (unsync'ed) state" as if the request

A sync is a write transaction on the database. A write request will be held up because TDB only allows one writer at a time but any number of readers can overlap (and see the before-write state of the database). The design favours query/reading data - read requests proceed with no need for any data locking.

tomkxy · 2022-08-24T17:43:43Z

Hi @afs ,

This is at startup?

When I rasised the issue it was driven by observations in our environment:

One of them is that applying patches is very slow, which is an issue in our setup. One of the potential remedies is to put the patchlog on SSDs which we didn't do until now. Let's see whether this will help
the other point is that at least at one occassion, the Fuseki was falling back to an initial sync . Honestly, I don't know whether this was on startup or while it was running and I don't know what the reason was.

Anyhow, if something like that happens (which might happen in reality) it is essential in a HA setup that a load balancer or in our case Kubernetes can spot that (liveness and readiness probe) and don't route traffic to such a node.
I would say, the way syncing is implemented right now, is absolutely fine but I still think it requires some means to spot at least a full sync, which as we saw, can happen (unfortunately infrastructure is not as robust as we might wish).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fuseki unresponsive to requests if performing with a very large sync #173

Fuseki unresponsive to requests if performing with a very large sync #173

tomkxy commented Aug 13, 2022

afs commented Aug 19, 2022

tomkxy commented Aug 24, 2022 •

edited

Loading

Fuseki unresponsive to requests if performing with a very large sync #173

Fuseki unresponsive to requests if performing with a very large sync #173

Comments

tomkxy commented Aug 13, 2022

afs commented Aug 19, 2022

tomkxy commented Aug 24, 2022 • edited Loading

tomkxy commented Aug 24, 2022 •

edited

Loading