Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define db sqlalchemy #15

Merged
merged 25 commits into from
Aug 25, 2023
Merged

Define db sqlalchemy #15

merged 25 commits into from
Aug 25, 2023

Conversation

rshewitt
Copy link
Contributor

@rshewitt rshewitt commented Aug 15, 2023

Pull Request

Related to 4400

About

  • Design harvest tables using SQLAlchemy
  • Create an ERD using PlantUML

PR TASKS

  • The actual code changes.
  • Tests written and passed.
  • Any changes to docs?

@rshewitt rshewitt requested a review from a team August 15, 2023 21:43
@rshewitt rshewitt linked an issue Aug 15, 2023 that may be closed by this pull request
2 tasks
@github-actions
Copy link

github-actions bot commented Aug 15, 2023

Coverage

Coverage Report
FileStmtsMissCoverMissing
harvester
   __init__.py30100% 
harvester/db/models
   __init__.py50100% 
   models.py530100% 
harvester/extract
   __init__.py1922 89%
   dcatus.py1122 82%
harvester/utils
   __init__.py00100% 
   json.py2266 73%
   pg.py3544 89%
   s3.py2466 75%
harvester/validate
   __init__.py00100% 
   dcat_us.py240100% 
TOTAL1962090% 

Tests Skipped Failures Errors Time
29 0 💤 0 ❌ 0 🔥 22.927s ⏱️

@robert-bryson
Copy link
Contributor

All of the added tests fail. I'm not sure why the check is reporting success.

@jbrown-xentity
Copy link
Contributor

I found the "tests failing but the jobs passes problem". tee eats the exit code, so as long as it writes the output then the command "succeeds". We need to change line 45 in the github project to something like set -o pipefail; poetry run pytest --junitxml=pytest.xml --cov=harvester | tee pytest-coverage.txt. You can confirm locally by running the above command, and then checking the exit code by running echo $?. You can confirm the current behavior by running the current pytest command, and then running the echo command to check the exit code. @rshewitt can you implement that in this PR?

@rshewitt
Copy link
Contributor Author

rshewitt commented Aug 17, 2023

yes, i'll add this!

Copy link
Contributor

@nickumia-reisys nickumia-reisys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine as an initial pass. Still lots of open questions. I think this may be stuck for the same reason S3 is stuck, pending how Airflow integrates with it. Will we be initializing the DB with a python package like this? Or does Airflow handle DB setup and init?

I don't mind merging this in the meantime. ⏳

docs/diagrams/src/README.md Show resolved Hide resolved
tests/db/sqlalchemy/conftest.py Outdated Show resolved Hide resolved
@btylerburton
Copy link
Contributor

btylerburton commented Aug 23, 2023

when it's fitting, i'd like to have a deeper discussion about the data model given the pivot to Airflow. i'm not sure if we need to merge this, but i don't want to lose this work.

EDIT: let's merge this as we have no reason not to

btylerburton
btylerburton previously approved these changes Aug 23, 2023
@rshewitt
Copy link
Contributor Author

rshewitt commented Aug 25, 2023

I think it's fine as an initial pass. Still lots of open questions. I think this may be stuck for the same reason S3 is stuck, pending how Airflow integrates with it. Will we be initializing the DB with a python package like this? Or does Airflow handle DB setup and init?

I don't mind merging this in the meantime. ⏳

I don't think we'll use a python package like this for DB setup and init. Based on what i've read it looks like the only kind of DDL expected from us is to create a database and user with the necessary privileges for airflow to use ( and possibly updating pg_hba.conf to add the user to the database access control list ). airflow comes with a db init utility for its metadata database. here's the ERD of the airflow metadata database.

Copy link
Contributor

@nickumia-reisys nickumia-reisys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⌛ ⌛

@nickumia-reisys nickumia-reisys merged commit 9aab59f into main Aug 25, 2023
4 checks passed
@nickumia-reisys nickumia-reisys deleted the define-db-sqlalchemy branch August 25, 2023 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[MVP] Define DB using SQLAlchemy
5 participants