Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement recursive dispatching #111

Closed
wants to merge 6 commits into from
Closed

Conversation

Alex079
Copy link

@Alex079 Alex079 commented Mar 3, 2022

This PR may be related to #54

The goal of this PR is to support composition of data loader invocations.

  • when using 'thenCompose' method to chain data loader calls, all composed load calls will be dispatched
  • the methods 'dispatchAll' or 'dispatchAllWithCount' will recur until there are no more calls to dispatch
  • the method 'dispatchAllWithCount' will block in order to return the counter of dispatches

Thanks to @npiguet for the original idea and code review.

alex079 added 2 commits March 1, 2022 23:27
Support composition of data loader invocations.
- when using 'thenCompose' method to chain data loader calls, all composed load calls will be dispatched
- the methods 'dispatchAll' or 'dispatchAllWithCount' will recur until there are no more calls to dispatch
- the method 'dispatchAllWithCount' will block in order to return the counter of dispatches
@bbakerman
Copy link
Member

Nominally this PR does indeed solve the composition of data loader calls - the problem is it does it in such a way that negates the benefits of batching.

This is related to graphql-java rather than data loader per se but any other use of data loader would have the same problem

Let me explain in graphql terms (but know that it could apply to other data loader use cases)

Imagine we have a query of "friends and enemies" and they call dataloader under the covers to get user info for friends and enemies .

query x {
   friends {  
       name
       friends {  # l1
          name
             friends { #l2
              name
            }
            enemies {  #l2
               name
            }
       }
       enemies { #l1
         name
      }
}

Ideally to MAXIMIZE batching (which is dataloaders primary purpose) we want to delay dispatching untl we have a number of outstanding requests - the bigger the batch size, typically the better the performance (by avoiding N+1 queries)

In graphql-java we do this by tracking at "field levels" - we dont dispatch until we have "internally called all the fields in the 1 level)

OK so in the above the first dispatch happens and the first level of friends and enemies calls are then dataloader.dispatch()ed and we have maximized batching windows. All the friends and all the enemies can be one batch call. Lets say a person has 7 friends and 7 enemies then we have 1 call to the user service with 14 ids.

This is what we want from dataloader.

Now we come back and there is second level of fields that wants to get the friend of my friends and the so on. Again in graphql-java these will not be "dataloader.dispatched" until we have all of the fields at a given level underway and then we can call dataloader.dispatch() and we maximize batching on the next level field fields.

OKAY thats how it works today. Notice now there is a "window" of levels where we dont want to dispatch util some convenient time (like all the fields of level N are ready to go say)

NOW if we used your PR pattern then what would happen is this. The first level would be dispatched and MAYBE (because graphql-java is an async library) it starts working on the second level. But you PR dispatches all calls in a hard loop. eg while there are outstanding fetches on the dataloader, it runs them again.

What can happen therefore is that some of the "friends l2" fields might be in flight but they get dispatched while none of the enemies fields might have made it say it back. Then some of the enemy fields do make it on the thread scheduler and they get dispatched in a hard loop.

What this will do is compress the "batching window" - all of a sudden you might get 1 batch call for 1-2 friend objects instead of one call for 7 objects.

By sitting in a hard loop (after a dataloader.load call completes) you don't know if its a good time to dispatch or not. YES you have cause chained dataloader CFs to complete BUT you have lost control of when it fires in relation to other calls that might be about to get executed.

If your interested, the reason this would work in Node.js is that they have a tick event which is a callback when there is "no more work to be done on the Node.js work queue (since Node.js is single threaded)". So nominally you can "set off N fields in a hard loop (setting up N dataloader loads) and then when it finally yields to the event loop, the tick fires and they use that to decide to dispatch()

However testing of this actually shows that they TOO can prematurely batch - because they have no idea when it's a good time to dispatch - and in bad conditions they do can fire off small batches loads.

We chose in java-dataloader to say

  • Since we cant know about chained CFs and their "co-dependencies"
    • (a CF cant tell you that its in fact chained to another CF in a dependent fashion)
  • Then we require that the CF returned from the dataloader.load() has to be the one we track AND only those outstanding CFs will be dispatched once per call to dispatch

This way a consumer (like graphql-java and that is out primary consumer) can be in charge of knowing WHEN to dispatch. This comes with a price as you know - which is that chained dispatched calls will "miss" their dependent inner calls.

To that end we have created org.dataloader.registries.ScheduledDataLoaderRegistry which is more akin to the Node.js mechanism - it runs on a schedule (via java.util.concurrent.ScheduledExecutorService) (say every 20ms) and it evaluates a predicate (is depth > 2 or is time since dispatch > 20 ms) and if so - it will call dispatch for you, multiple times if need be.

This will allow you to maximize your batching windows by allowing depth or time to dictate when extra dispatch calls are made. You can use this in graphql-java say because it will only ever be responsible for 1 level of outstanding load calls

I hope this helps explain the problem.

This is why we won't accept this PR - not because it wont work - but because its tradeoffs are not what we want.

ps. I have written a version of this PR in other forms early on in various explorations of the problem space. This is why I am so sure on its behaviour.

}
}
if (futuresToDispatch.size() > 0) {
CompletableFuture.allOf(futuresToDispatch.toArray(new CompletableFuture[0])).join();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

join() is typically in an async system. You going to make them sync onto each other

@Alex079
Copy link
Author

Alex079 commented Mar 13, 2022

Hello,

Thank you for the explanation.
Indeed, it looks like there is a race condition between graphql-java producing more load calls when transitioning to next level and java-dataloader (this PR) when checking that there are no more calls to dispatch.
One way to resolve this is to let graphql-java (or other consumer) know that the data loading has finished by providing a completable future in result of dispatch call. The tricky part is that this CF must be passed from data loader registry to field tracking approach and combined with field level future.
To achieve this, I can create DataLoaderRegistry.dispatch() method which will return CompletableFuture<Integer> and will eventually replace the original methods DataLoaderRegistry.dispatchAll() and DataLoaderRegistry.dispatchAllWithCounts(). Next, it would be possible to use CF from registry dispatch method call result in graphql-java (i.e. by making instrumentation hooks return CF when they process CF result as input and combining the CF from the hooks after field level completion).

Do you think this could be a viable approach?

Regards,
Alex

Alex079 and others added 4 commits March 26, 2022 20:51
- revert changes in `dispatchAll` and `dispatchAllWithCount` methods
- add a new method `dispatch`
@samuelAndalon
Copy link

samuelAndalon commented May 5, 2022

👀 using a different strategy to dispatch data loader registry could help to solve this issue
instead of doing it by level, do it like original dataloader impl of js does it, by dispatching when sync execution is exhausted (event loop tick)

graphql-java/graphql-java#1198

https://opensource.expediagroup.com/graphql-kotlin/docs/server/data-loader/data-loader-instrumentation#dispatching-by-synchronous-execution-exhaustion

this PR

#115

allows to have access to the futures from each DataLoader CacheMap and reduce some of the logic we had to decorate on DataLoaderRegistry to make this custom instrumentation to work

https://github.com/ExpediaGroup/graphql-kotlin/blob/master/executions/graphql-kotlin-dataloader-instrumentation/src/main/kotlin/com/expediagroup/graphql/dataloader/instrumentation/syncexhaustion/DataLoaderSyncExecutionExhaustedInstrumentation.kt

@softprops
Copy link

this looks promising! are there tradeoffs worth noting in comparison to today's default dispatching within graphql-java?

@samuelAndalon
Copy link

samuelAndalon commented May 9, 2022

This new custom instrumentation does keep track of state of each ExecutionStrategy so more computation will be needed, but that better because it will optimize the amount of possible IO operations that dataFetcher need to execute.

Having said that, this instrumentation COVERS what the default one by level (DataLoaderDispatcherInstrumentation) does.

@Alex079
Copy link
Author

Alex079 commented May 9, 2022

Thank you for the suggestion!

It looks like it is still possible that this new instrumentation, when invoked by the engine in i.e. graphql.execution.ExecutionStrategy#fetchField method and possibly others, will have the same problem as described in the first comment to this PR. Namely, the engine does not track the status of instrumentation callbacks and may dispatch next level independently of the instrumentation callback result. If data loading completes first - everything is OK, if not - loading will start dispatching the next level load calls, possibly in sub-optimal quantities.
In an attempt to fix this undefined behavior, I made a PR in graphql-java repo graphql-java/graphql-java#2787

Unlike the suggested approach, in this PR in java-dataloader repo, I try to dispatch as many calls as possible by the registry itself and do as many rounds as needed to dispatch more calls (if any) produced after dispatching.

The result of the registry dispatch method call in my case is a CompletableFuture, which completes only after dispatching all the load calls and their compositions. This future is then combined (not composed) with the result of field fetching which completes because all the load calls complete.

The composition of the two futures makes each level data loading stable in all cases but one, as far as I can see. Namely, whenComposeAsync in data fetcher can still cause problems and break the detection of new load calls by the registry.

@bbakerman
Copy link
Member

I am closing this PR as outdated = we wont be accepting it as is - but there is plenty of great conversations here - so thank you to all involved

@bbakerman bbakerman closed this May 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants