[Epic] Managing data across deployments #2513

fridex · 2022-04-12T11:31:29Z

Is your feature request related to a problem? Please describe.

As Thoth operator, I would like to make sure data are properly managed across deployments. As of now, we use a staging environment to compute data (given the resources available) and propagate a database dump to prod environments. This situation proved to be not scalable long-term as production can write into the database as well and we can get out of sync easily. If we overwrite the prod environment with staging data, we can also lose some information.

One of the proposed solutions discussed was to do updates per-table. This solution can introduce overhead and possible inconsistencies we should avoid (ex. package entries in the database created by solvers can be overwritten by packages detected using container image analyses).

Another solution is to let syncs happen on the programming level. In other words, we could keep our background jobs that copy data from staging environment to production running (document-sync-job). In that case, the job places documents on ceph in the production environment. A subsequent graph-sync job can sync these data into the database so that they are available in prod (even if they were computed in the staging environment). This approach seems to be scalable and might require less maintenance.

[3pt] Create sync method in thoth-storages can randomize document listing #2529
[3pt] create sync-job cronworkflow in thoth-application #2530
[3pt] create sync job can be also triggered from management-api management-api#887
- make sure a user can select what documents should be synced on management api (ex. data for fedora 35 running python 3.10)
[1pt] Ensure database dumps are created also in the production environment #2532
[3pt] create metrics for sync-job and set the requirements in manifests #2533

Additional Info:
Epic: #2216

fridex · 2022-04-12T11:32:19Z

CC @Gregory-Pereira @harshad16

harshad16 · 2022-04-12T11:43:36Z

Thank you @fridex for the issue with all details

harshad16 · 2022-04-12T18:31:18Z

/label sig-devsecops

sesheta · 2022-04-12T18:31:19Z

@harshad16: The label(s) /label sig-devsecops cannot be applied. These labels are supported: community/discussion, community/group-programming, community/maintenance, community/question, deployment_name/ocp4-stage, deployment_name/ocp4-test, deployment_name/moc-prod, hacktoberfest, hacktoberfest-accepted, kind/cleanup, kind/demo, kind/deprecation, kind/documentation, kind/question, sig/advisor, sig/build, sig/cyborgs, sig/devops, sig/documentation, sig/indicators, sig/investigator, sig/knowledge-graph, sig/slo, sig/solvers, thoth/group-programming, thoth/human-intervention-required, thoth/potential-observation, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, triage/accepted, triage/duplicate, triage/needs-information, triage/not-reproducible, triage/unresolved, lifecycle/submission-accepted, lifecycle/submission-rejected

In response to this:

/label sig-devsecops

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

harshad16 · 2022-04-12T18:31:54Z

/label sig/devops
/triage accepted

sesheta · 2022-04-12T18:31:55Z

@harshad16: The label(s) sig/devops cannot be applied, because the repository doesn't have them.

In response to this:

/label sig/devops
/triage accepted

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Gregory-Pereira · 2022-04-18T20:17:53Z

* [ ]  make sure the sync-job uses the adjusted method and we can turn the job into a cronworkflow that periodically syncs data into the database (multiple sync jobs can be part of the cronworkflow to support parallel syncs)

I think Maya's current PR should update everything in such a way in storages that the only changes needed in the sync-job are to bump the version for storages once it goes through, and construct a CronWorkflow from the CronJob template in openshift.yaml.

fridex added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 12, 2022

sesheta added the needs-triage Indicates an issue or PR lacks a `triage/...` label and requires one. label Apr 12, 2022

mayaCostantini mentioned this issue Apr 12, 2022

Randomize solver document listing thoth-station/storages#2625

Merged

1 task

sesheta added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/...` label and requires one. labels Apr 12, 2022

harshad16 added the sig/devsecops Categorizes an issue or PR as relevant to SIG DevSecOps. label Apr 12, 2022

This was referenced Apr 19, 2022

creating a cron-workflow to run the cronjob thoth-station/sync-job#61

Closed

creating cron-workflow to invoke job template for the sync-job #2524

Merged

Gregory-Pereira mentioned this issue Apr 22, 2022

Enable sync job to be called from the management api thoth-station/management-api#884

Merged

2 tasks

harshad16 mentioned this issue Apr 28, 2022

[Epic] Improve the data migration process b/w prod and stage #2216

Open

2 tasks

harshad16 changed the title ~~Managing data across deployments~~ [13pt] Managing data across deployments Apr 28, 2022

harshad16 added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Apr 28, 2022

harshad16 mentioned this issue May 2, 2022

Managing data across deployments #2282

Closed

8 tasks

Gregory-Pereira mentioned this issue May 3, 2022

adding openshift function and schedule workflow for sync-job thoth-station/common#1260

Merged

2 tasks

harshad16 changed the title ~~[13pt] Managing data across deployments~~ [Epic] Managing data across deployments May 24, 2022

harshad16 moved this to Epics in SIG-DevSecOps Sep 22, 2022

harshad16 added this to SIG-DevSecOps Sep 22, 2022

codificat added this to Planning Board Sep 24, 2022

codificat moved this to 🆕 New in Planning Board Sep 24, 2022

codificat moved this from 🆕 New to 🔖 Next in Planning Board Sep 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Epic] Managing data across deployments #2513

[Epic] Managing data across deployments #2513

fridex commented Apr 12, 2022 •

edited by harshad16

Loading

fridex commented Apr 12, 2022

harshad16 commented Apr 12, 2022

harshad16 commented Apr 12, 2022

sesheta commented Apr 12, 2022

harshad16 commented Apr 12, 2022

sesheta commented Apr 12, 2022

Gregory-Pereira commented Apr 18, 2022

[Epic] Managing data across deployments #2513

[Epic] Managing data across deployments #2513

Comments

fridex commented Apr 12, 2022 • edited by harshad16 Loading

fridex commented Apr 12, 2022

harshad16 commented Apr 12, 2022

harshad16 commented Apr 12, 2022

sesheta commented Apr 12, 2022

harshad16 commented Apr 12, 2022

sesheta commented Apr 12, 2022

Gregory-Pereira commented Apr 18, 2022

fridex commented Apr 12, 2022 •

edited by harshad16

Loading