-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Git services: improve sync repos #12428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
for remote_repository_relation in admin_repo_relations: | ||
remote_repository_relation.admin = True | ||
remote_repository_relation.save() | ||
).update(admin=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small optimization here, since we don't need to save each object individually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also nice to avoid the save()
logic if we don't need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the Git service repository synchronization process across GitHub, BitBucket, and GitLab to eliminate redundant API calls and fix organization-repository relationship issues. The main purpose is to consolidate repository syncing into a single method and separate it from organization syncing, while fixing incorrect permission handling and organization relationships.
Key changes:
- Separation of repository and organization syncing logic to eliminate duplicate API calls
- Introduction of organization caching to avoid repeated database queries for the same organization
- Fix for repository-organization relationship updates when repositories move between organizations
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
readthedocs/oauth/services/github.py |
Adds organization caching, separates repository/organization sync, removes duplicate repository fetching from org sync |
readthedocs/oauth/services/bitbucket.py |
Adds organization caching, fixes admin permission handling, ensures all repos have workspace organizations |
readthedocs/oauth/services/gitlab.py |
Adds organization caching, switches to unified /projects endpoint, fixes relative URL handling for avatars |
readthedocs/rtd_tests/tests/test_oauth_sync.py |
Updates test expectations to reflect that org sync no longer creates repositories |
readthedocs/rtd_tests/tests/test_oauth.py |
Updates tests to reflect new repository creation patterns and organization handling |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a solid improvement. My questions are mostly around if we should be considering adding some of this logic to the base class.
Overall I don't love having to support so many VCS providers, and wonder if we should be considering trimming down to just GH, but I think that's a longer term discussion given the usage of the other platforms we still have.
In terms of making the code here faster -- it seems like this is a small speedup -- maybe 50-60% with the caching? Do we think this is enough to re-enable the task?
for remote_repository_relation in admin_repo_relations: | ||
remote_repository_relation.admin = True | ||
remote_repository_relation.save() | ||
).update(admin=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also nice to avoid the save()
logic if we don't need it.
readthedocs/oauth/services/github.py
Outdated
@@ -36,6 +36,10 @@ class GitHubService(UserService): | |||
url_pattern = re.compile(r"github\.com") | |||
supports_build_status = True | |||
|
|||
def __init__(self, *args, **kwargs): | |||
super().__init__(*args, **kwargs) | |||
self._organizations_cache = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're setting this on all of the classes, should it just be in the base model?
|
||
repo.save() | ||
|
||
def _make_absolute_url(self, url): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do this for all the providers, just to be defensive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only observed this on GL, mainly from groups objects.
yeah, I do feel this is going to be huge per improvement for users that belong to large orgs. But not sure about the task... last time I checked we sync repositories of 2K users per-day, I'll also want to check if this tasks is automatically retrying on timeout. |
Common changes across providers:
create_organization
.Specific changes:
sync_repositories
method, I changed Gitlab do also do that. In order to do that I reverted to use the previous /projects endpoint (GitLab: handle when a repository is moved to another group #12233)./~/uploads...
instead of including the domain as well, so we are normalizing all URLs from Gitlab now.