Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Randomize solver document listing #2625

Merged

Conversation

mayaCostantini
Copy link
Contributor

Related Issues and Dependencies

Related to thoth-station/thoth-application#2513

This introduces a breaking change

  • No

Randomizing the solver documents listing is optional.

This should yield a new module release

  • Yes

This Pull Request implements

Introduce an optional argument to the sync_solver_documents method in storages to randomize solver document listing.

@sesheta sesheta added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Apr 12, 2022
processed, synced, skipped, failed = 0, 0, 0, 0
for document_id in document_ids or solver_store.get_document_listing():
for document_id in document_ids or document_listing:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please shuffle both to be consistent.

) -> tuple:
"""Sync solver documents into graph."""
document_listing = list(solver_store.get_document_listing())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like an unnecessary step if document_ids is provided.

) -> tuple:
"""Sync solver documents into graph."""
document_listing = list(solver_store.get_document_listing())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like an unnecessary step if document_ids is provided.

@@ -98,8 +99,11 @@ def sync_solver_documents(
graceful: bool = False,
graph: Optional[GraphDatabase] = None,
is_local: bool = False,
randomize_listing: Optional[bool] = False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have this configurable via environment variables on a library level? That way we can control this behavior directly from deployment manifests.

@@ -98,8 +99,11 @@ def sync_solver_documents(
graceful: bool = False,
graph: Optional[GraphDatabase] = None,
is_local: bool = False,
randomize_listing: Optional[bool] = False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have this configurable via environment variables on a library level? That way we can control this behavior directly from deployment manifests.

@mayaCostantini
Copy link
Contributor Author

Thanks for the review @fridex 💯 Should we also randomize the listing in other sync methods?

@fridex
Copy link
Contributor

fridex commented Apr 13, 2022

Thanks for the review @fridex 100 Should we also randomize the listing in other sync methods?

+1, that would be great

@sesheta sesheta added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 13, 2022
@Gregory-Pereira
Copy link
Member

Will we need to write a separate script that will do this conversion /migration for everything currently in the DB?

@Gregory-Pereira
Copy link
Member

Also shouldn't this micro version be bumped with this feature.

processed, synced, skipped, failed = 0, 0, 0, 0
for document_id in document_ids or solver_store.get_document_listing():
for document_id in document_ids or document_listing:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please shuffle both (if shuffling is enabled) to be consistent.

@@ -38,6 +39,7 @@
from .graph import GraphDatabase

_LOGGER = logging.getLogger(__name__)
RANDOMIZE_LISTING = os.getenv("RANDOMIZE_LISTING", 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the environment variables in Thoth share THOTH_ prefix, ideally stating which component or library should take the configuration:

Suggested change
RANDOMIZE_LISTING = os.getenv("RANDOMIZE_LISTING", 0)
_RANDOMIZE_LISTING = bool(int(os.getenv("THOTH_STORAGES_RANDOMIZE_LISTING", 0)))

Also, no need to have this public outside of the module.

processed, synced, skipped, failed = 0, 0, 0, 0
for document_id in document_ids or adviser_store.get_document_listing():
for document_id in document_ids or document_listing:
Copy link
Contributor

@fridex fridex Apr 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The snippet above might be prone to programmatic errors as the if-else is basically done twice. It might be a good idea to be explicit and state which listing is actually used just once:

Suggested change
for document_id in document_ids or document_listing:
if _RANDOMIZE_LISTING:
listing = random.shuffle(document_ids or document_listing)
for document_id in listing:

@@ -61,8 +63,13 @@ def sync_adviser_documents(
adviser_store = AdvisersResultsStore()
adviser_store.connect()

listing = list(adviser_store.get_document_listing()) or document_ids
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we cast the generator to a list only if randomization is on? I know it might be nitpicking, but in some cases it might improve perf :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No you're right, thanks! 💯 Added in 03acd31

Copy link
Contributor

@fridex fridex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks! 💯 👍🏻

@sesheta
Copy link
Member

sesheta commented Apr 25, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fridex

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sesheta sesheta added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 25, 2022
@sesheta sesheta merged commit 1ce78b4 into thoth-station:master Apr 25, 2022
@mayaCostantini mayaCostantini deleted the randomize-document-listing branch April 26, 2022 05:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants