-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Randomize solver document listing #2625
Randomize solver document listing #2625
Conversation
thoth/storages/sync.py
Outdated
processed, synced, skipped, failed = 0, 0, 0, 0 | ||
for document_id in document_ids or solver_store.get_document_listing(): | ||
for document_id in document_ids or document_listing: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please shuffle both to be consistent.
thoth/storages/sync.py
Outdated
) -> tuple: | ||
"""Sync solver documents into graph.""" | ||
document_listing = list(solver_store.get_document_listing()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like an unnecessary step if document_ids
is provided.
thoth/storages/sync.py
Outdated
) -> tuple: | ||
"""Sync solver documents into graph.""" | ||
document_listing = list(solver_store.get_document_listing()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like an unnecessary step if document_ids
is provided.
thoth/storages/sync.py
Outdated
@@ -98,8 +99,11 @@ def sync_solver_documents( | |||
graceful: bool = False, | |||
graph: Optional[GraphDatabase] = None, | |||
is_local: bool = False, | |||
randomize_listing: Optional[bool] = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have this configurable via environment variables on a library level? That way we can control this behavior directly from deployment manifests.
thoth/storages/sync.py
Outdated
@@ -98,8 +99,11 @@ def sync_solver_documents( | |||
graceful: bool = False, | |||
graph: Optional[GraphDatabase] = None, | |||
is_local: bool = False, | |||
randomize_listing: Optional[bool] = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have this configurable via environment variables on a library level? That way we can control this behavior directly from deployment manifests.
Thanks for the review @fridex 💯 Should we also randomize the listing in other sync methods? |
+1, that would be great |
Will we need to write a separate script that will do this conversion /migration for everything currently in the DB? |
Also shouldn't this micro version be bumped with this feature. |
thoth/storages/sync.py
Outdated
processed, synced, skipped, failed = 0, 0, 0, 0 | ||
for document_id in document_ids or solver_store.get_document_listing(): | ||
for document_id in document_ids or document_listing: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please shuffle both (if shuffling is enabled) to be consistent.
thoth/storages/sync.py
Outdated
@@ -38,6 +39,7 @@ | |||
from .graph import GraphDatabase | |||
|
|||
_LOGGER = logging.getLogger(__name__) | |||
RANDOMIZE_LISTING = os.getenv("RANDOMIZE_LISTING", 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the environment variables in Thoth share THOTH_ prefix, ideally stating which component or library should take the configuration:
RANDOMIZE_LISTING = os.getenv("RANDOMIZE_LISTING", 0) | |
_RANDOMIZE_LISTING = bool(int(os.getenv("THOTH_STORAGES_RANDOMIZE_LISTING", 0))) |
Also, no need to have this public outside of the module.
thoth/storages/sync.py
Outdated
processed, synced, skipped, failed = 0, 0, 0, 0 | ||
for document_id in document_ids or adviser_store.get_document_listing(): | ||
for document_id in document_ids or document_listing: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The snippet above might be prone to programmatic errors as the if-else is basically done twice. It might be a good idea to be explicit and state which listing is actually used just once:
for document_id in document_ids or document_listing: | |
if _RANDOMIZE_LISTING: | |
listing = random.shuffle(document_ids or document_listing) | |
for document_id in listing: |
thoth/storages/sync.py
Outdated
@@ -61,8 +63,13 @@ def sync_adviser_documents( | |||
adviser_store = AdvisersResultsStore() | |||
adviser_store.connect() | |||
|
|||
listing = list(adviser_store.get_document_listing()) or document_ids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we cast the generator to a list only if randomization is on? I know it might be nitpicking, but in some cases it might improve perf :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No you're right, thanks! 💯 Added in 03acd31
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks! 💯 👍🏻
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: fridex The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Related Issues and Dependencies
Related to thoth-station/thoth-application#2513
This introduces a breaking change
Randomizing the solver documents listing is optional.
This should yield a new module release
This Pull Request implements
Introduce an optional argument to the
sync_solver_documents
method in storages to randomize solver document listing.