Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

requeue recently fetched components #390

Open
jeffmcaffer opened this issue Feb 9, 2019 · 5 comments
Open

requeue recently fetched components #390

jeffmcaffer opened this issue Feb 9, 2019 · 5 comments

Comments

@jeffmcaffer
Copy link
Member

jeffmcaffer commented Feb 9, 2019

We have cases now where the definitions are being built from reasonably old harvested data and as a result the definitions are substantially out of date. Previously we talked about having the service check the current crawler tool levels and requeue any definition based on older tools. We'd still return the old definition but soon enough the new one would be ready.

Thoughts:

  • pretty soon now we are going to rescan the world so we'll be up to date
  • pretty soon after that we'll do some sort of change that will make those obsolete
  • There is an issue if we do this lazily where the same thing could get queued many times causing queue bloat and even duplicate work for long running tools.
  • may be better to do it as a "job" where we periodically queue up "old" definitions.

Summary:

  • we can defer for a bit but should put this in place relatively soon after rescanning everything

thoughts @dabutvin ?

@storrisi storrisi added this to the February 2019 milestone Feb 11, 2019
@storrisi storrisi modified the milestones: February 2019, March 2019 Feb 28, 2019
@dabutvin
Copy link
Member

I really like the idea of doing it lazily, even if we end up running a job as well.
If a definition is requested and it has an old version of a tool we should return the existing definition and queue it for fresh harvesting for next time.

@jeffmcaffer
Copy link
Member Author

Updating with some thinking...

The webhooks from the crawler to the service tell the service about all new tool runs. In that run output we have the version of the tools that were used. The service should just keep a running max. Then when it sees a definition that has an older tool, it can requeue that component.

Take care to handle startup conditions where the service does not yet have a value for a given tool.

Also note that some tools can have different version numbers depending on the type/provider being crawled. For example, the ClearlyDefined tool for Crates may be 1.2.0 and fro NPMs may be 1.9.0. So the key for the table is type/provider/tools

@jeffmcaffer jeffmcaffer modified the milestones: March 2019, April 2019 Apr 15, 2019
@jeffmcaffer
Copy link
Member Author

@AlexWebYourmind this is a good issue to have a service side dev look at.

@storrisi storrisi modified the milestones: April 2019, May 2019 May 2, 2019
@ignacionr
Copy link
Member

Maybe once we determine that the fetched definition represents an update opportunity, we should also set a response header for the CDN to not cache it, and/or signal the caller (maybe an X-header, or another seemingly out-of-band indication).

@dabutvin
Copy link
Member

dabutvin commented May 6, 2019

@ignacionr that makes sense. good call

@storrisi storrisi modified the milestones: May 2019, June 2019 Jun 3, 2019
@storrisi storrisi modified the milestones: June 2019, July 2019 Jul 1, 2019
@storrisi storrisi modified the milestones: July 2019, August 2019 Aug 2, 2019
@storrisi storrisi modified the milestones: August 2019, September 2019 Sep 5, 2019
@storrisi storrisi modified the milestones: September 2019, October 2019 Oct 1, 2019
@jeffmendoza jeffmendoza removed this from the October 2019 milestone Nov 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants