Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor query performance as traversal width increases #1519

Open
5 tasks done
e-compton-bb opened this issue Apr 4, 2024 · 5 comments
Open
5 tasks done

Poor query performance as traversal width increases #1519

e-compton-bb opened this issue Apr 4, 2024 · 5 comments
Labels
bug Something is not working.

Comments

@e-compton-bb
Copy link

e-compton-bb commented Apr 4, 2024

Preflight checklist

Ory Network Project

No response

Describe the bug

In our system, we have "projects" which a user can view and "clips" which a user can view. If a clip is in a project, then the user can view the clip transitively. This works fine when a clip is in a very small number of projects, but if a clip is in ~200 different projects, then it slows down to >1s to check if the user can view the clip.

Running Keto locally, it appears to be churning through requests against SQL, loading every relationship into memory before checking if the user has access to any of them.

Reproducing the bug

Permissions model:

class User implements Namespace {}

class UserGroup implements Namespace {}

class Project implements Namespace {
  related: {
    owners: User[];
    editors: (User | UserGroup)[];
  };
  permits = {
    view: (ctx: Context): boolean =>
      this.related.owners.includes(ctx.subject) || this.related.editors.includes(ctx.subject),
  };
}

class Clip implements Namespace {
  related: {
    owners: User[];
    parents: Project[];
  };
  permits = {
    view: (ctx: Context): boolean =>
      this.related.owners.includes(ctx.subject) || this.related.parents.traverse((parent) => parent.permits.view(ctx)),
  };
}

Create 200 relationships of different projects all pointing to the same clipId, make a user an owner or editor on any one of the project.

Check if the user can access the clip

Expected: The query to still be fairly fast <50ms
Actual: On an M2 Mac with SQLite >150ms, but on a real database deployed in a container can take >1s

Relevant log output

No response

Relevant configuration

log:
  level: warn

namespaces:
  location: file:///home/nonroot/model.ts

serve:
  read:
    host: 0.0.0.0
    port: 4466
  write:
    host: 0.0.0.0
    port: 4467

Version

v0.12.0

On which operating system are you observing this issue?

macOS

In which environment are you deploying?

Docker

Additional Context

No response

@e-compton-bb e-compton-bb added the bug Something is not working. label Apr 4, 2024
@e-compton-bb
Copy link
Author

After some more digging, the main problem here seems to be the breadth-first search in checkExpandSubject. It grabs every Project tuple from the DB and then it starts looking through each one for the user. Then each lookup is a separate query which has it's own round trip time. Which explains why it's so much slower in a real database.

@aeneasr
Copy link
Member

aeneasr commented Apr 5, 2024

Hey Edward, I believe that this is not yet optimized properly. If you have good ideas how this could be optimized we're happy to hear them!

@e-compton-bb
Copy link
Author

Could switch from a breadth-first search to a depth-first search. In this case the relationship chain we're trying to find is: Clip -> Project -> User, so by enumerating all the Projects first we always get worst-case time.

There's also a lot of chattiness between the container and the DB, so if more work could be offloaded to the DB it could make a big difference. This seems like a really good use-case for Postgres's WITH RECURSIVE clauses, although the current implementation seems to be independent of particular SQL versions
https://www.dylanpaulus.com/posts/postgres-is-a-graph-database

@stephenpmurray
Copy link

Hi, we're having performance issues with some instances of keto using sqlite which sound very similar.

Actual: On an M2 Mac with SQLite >150ms, but on a real database deployed in a container can take >1s

What is the 'real database' that you are using?

@e-compton-bb
Copy link
Author

Hi, we're having performance issues with some instances of keto using sqlite which sound very similar.

Actual: On an M2 Mac with SQLite >150ms, but on a real database deployed in a container can take >1s

What is the 'real database' that you are using?

I think for that benchmark it was just the official postgres docker image, but our main app is using AWS Aurora RDS running Postgres which saw the worst performance of the three

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is not working.
Projects
None yet
Development

No branches or pull requests

3 participants