Skip to content

Milestones

List view

  • Turn each final transformation python notebook into an airflow task. - A given python notebook should take the fulloutput and perform transformations before putting them into a database - Geocoding augmentation - Filling in data that comes from COPA/SOCROTA

    No due date
  • Given an S3 bucket THEN I can create a unified dataset WITH a unique Identifier for each of the involved officers used in all of the dataset THAT matches the current format of Full Output

    No due date
  • - Find the template folder for individual scripts in the latest dropbox - Be able to clean data the same way it is cleaned in the individual/ folder Scenario: WHEN an airflow task is run THEN the data in a particular S3 bucket subfolder should be cleaned, and the results placed in a new bucket with Metadata

    No due date
  • WHEN a file is placed in the initial S3 bucket THEN the file should be transformed appropriately AND cleaned AND THEN placed into a resultant S3 Bucket WITH a metadata file indicating : -Data source -Date of ETL (Extract-Transform-Load) -ETL source (the code that ran this ETL) -ETL version (git hash describing the version of the code)

    No due date
    1/2 issues closed