Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuseki unresponsive to requests if performing with a very large sync #173

Open
tomkxy opened this issue Aug 13, 2022 · 2 comments
Open

Fuseki unresponsive to requests if performing with a very large sync #173

tomkxy opened this issue Aug 13, 2022 · 2 comments

Comments

@tomkxy
Copy link

tomkxy commented Aug 13, 2022

If a Fuseki is performing a very large sync they are unresponsive. This causes a problem in a setup with load balancers which are not able to detect that and direct traffic to the Fuseki node.

@afs
Copy link
Owner

afs commented Aug 19, 2022

This is at startup?

This is how it currently works:

As well as the sync that happens when a request comes in (the normal case for an active system), there is a background thread that does a sync every 5 minutes. This ensures an idle server does not get too far behind in an active cluster.

Sync at start is done by making that background thread sync immediately the background task starts, then every 5 minutes.

The original idea was to not hold up startup for the case when a server has an existing database. But I can see if this a long way behind, it is not useful for serving requests and if it is not a long way behind, the sync is quick.

This could be changed to run the first sync synchronously as the dataset is constructed, before the HTTP server is started.


If this is happening not at startup, then something else is going on. #171 be be the issue and the fix includes a "if already doing a sync elsewhere (same server), do not sync but serve the request from the current (unsync'ed) state" as if the request

A sync is a write transaction on the database. A write request will be held up because TDB only allows one writer at a time but any number of readers can overlap (and see the before-write state of the database). The design favours query/reading data - read requests proceed with no need for any data locking.

@tomkxy
Copy link
Author

tomkxy commented Aug 24, 2022

Hi @afs ,

This is at startup?

When I rasised the issue it was driven by observations in our environment:

  • One of them is that applying patches is very slow, which is an issue in our setup. One of the potential remedies is to put the patchlog on SSDs which we didn't do until now. Let's see whether this will help
  • the other point is that at least at one occassion, the Fuseki was falling back to an initial sync . Honestly, I don't know whether this was on startup or while it was running and I don't know what the reason was.

Anyhow, if something like that happens (which might happen in reality) it is essential in a HA setup that a load balancer or in our case Kubernetes can spot that (liveness and readiness probe) and don't route traffic to such a node.
I would say, the way syncing is implemented right now, is absolutely fine but I still think it requires some means to spot at least a full sync, which as we saw, can happen (unfortunately infrastructure is not as robust as we might wish).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants