Skip to content
This repository was archived by the owner on Aug 16, 2023. It is now read-only.
This repository was archived by the owner on Aug 16, 2023. It is now read-only.

Make fetching and filtering more performant #3

@danisyellis

Description

@danisyellis

(NOTE: I made this ticket a while ago, but performance/ amount of time Starfish takes to run has never been a problem for us at Indeed. I think a company would have to be checking an extremely large number of employees for this to matter. So I'm going to put this in the backlog. If someone wants to work on it, great, but I don't think it's a particularly useful change at the moment.)

Currently We:

  1. ping the github API for a person's events
  2. Look at the first page and keep only the events that are event types we care about
  3. Do this for every page of event history that Github has (they hold up to 300 events at a time, 10 per page)
  4. Now, we look through that array of events to see if one is in the correct time period, and stop looking when we find one.

However, there's no reason to look through all 30 pages of a person's events if an event on the first page meets both criteria

So, refactor the code to check for

  1. event type
  2. if it happened in the time range
    BEFORE fetching the next page. If those are both true, log the contributor's alternate id and move on to the next person.

Metadata

Metadata

Assignees

No one assigned

    Labels

    wontfixThis will not be worked on

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions