Skip to content

Commit

Permalink
refactor/update: use subgraphs on dcat diagram + mark end of for loops
Browse files Browse the repository at this point in the history
  • Loading branch information
nickumia-reisys committed Sep 21, 2023
1 parent de73d75 commit 951fa61
Show file tree
Hide file tree
Showing 2 changed files with 144 additions and 131 deletions.
273 changes: 143 additions & 130 deletions docs/dcat.mmd
Original file line number Diff line number Diff line change
@@ -1,16 +1,146 @@
flowchart TD
flowchart LR

%% Old Logic
gs([GATHER STARTED])
ge([GATHER ENDED])
fs([FETCH STARTED])
fe([FETCH ENDED])
is([IMPORT STARTED])
ie([IMPORT ENDED])
%% Algorithm
gather_stage ==> fetch_stage
fetch_stage ==> import_stage

subgraph gather_stage [Gather Stage]
direction TB
gs([GATHER STARTED])
ge([GATHER ENDED])
gs ==> load_remote_catalog
load_remote_catalog ==> validate_conforms_to
validate_conforms_to == No ==> error
validate_conforms_to == Yes ==> check_schema_version
load_remote_catalog --> source_data
load_remote_catalog --> catalog_values
catalog_values --> check_schema_version
check_schema_version-- No -->default_schema_version
check_schema_version-- Yes -->schema_version
schema_version --> get_existing_datasets
default_schema_version --> get_existing_datasets
get_existing_datasets --> existing_datasets
get_existing_datasets ==> is_parent_
is_parent_ == Yes ==> existing_parents
existing_parents --> is_parent_demoted
is_parent_ == No ==> is_parent_demoted
is_parent_demoted -- Yes --> orphaned_parents
is_parent_demoted == No ==> is_parent_promoted
existing_datasets --> is_parent_promoted
is_parent_promoted -- Yes --> new_parents
is_parent_promoted == No ==> load_config
load_config --> hc_filter
load_config --> hc_defaults
load_config ==> is_identifier_both
is_identifier_both-. Yes .-> error
is_identifier_both == No ==> for_each_dataset
hc_filter --> dataset_contains_filter
for_each_dataset ==> dataset_contains_filter
dataset_contains_filter-. Yes .-> skip
dataset_contains_filter == No ==> has_identifier
has_identifier-. No .-> error
has_identifier == Yes ==> multiple_identifier
multiple_identifier-. Yes .-> skip
multiple_identifier == No ==> unique_datsets
unique_datsets --> unique_existing
unique_existing == Yes ==> hash_exists
unique_existing -- Yes --> seen_datasets
unique_existing == No ==> new_pkg_id
hash_exists == Yes ==> get_source_hash
get_source_hash ==> is_active
is_active == Yes ==> make_upstream_content_hash
is_active == No ==> HarvestObjectExtra
hash_exists == No ==> make_upstream_content_hash
orphaned_parents-- Disjunction -->make_upstream_content_hash
new_parents-- Disjunction -->make_upstream_content_hash
make_upstream_content_hash ==> check_hash
check_hash-. Yes .-> skip
check_hash-- No -->HarvestObjectExtra
new_pkg_id --> HarvestObjectExtra
Append__is_collection --> HarvestObjectExtra
schema_version --> HarvestObjectExtra
default_schema_version --> HarvestObjectExtra
catalog_values --> HarvestObjectExtra
Append__collection_pkg_id --> HarvestObjectExtra
HarvestObjectExtra ==> is_parent_2
is_parent_2 == Yes ==> Harvest_first
is_parent_2 == No ==> Harvest_second
Harvest_first ==> for_each_dataset_end
Harvest_second ==> for_each_dataset_end
for_each_dataset_end ==> for_each_existing
for_each_existing --> seen_datasets
for_each_existing ==> is_deleted
seen_datasets-. Inverse .-> skip
is_deleted-. Yes .-> skip
seen_datasets --> delete
is_deleted== No ==>delete
delete-. exception .-> error
delete ==> for_each_existing_end
for_each_existing_end ==> ge
end
subgraph fetch_stage [Fetch Stage]
direction TB
fs([FETCH STARTED])
fe([FETCH ENDED])
fs ==> do_nothing
do_nothing ==> fe
end
subgraph import_stage [Import Stage]
direction TB
is([IMPORT STARTED])
ie([IMPORT ENDED])
is ==> empty_dataset
empty_dataset == Yes ==> ie
empty_dataset == No ==> has_title
has_title == Yes ==> extract_extras
has_title-. No .->error_2
extract_extras --> default_schema_version_2
extract_extras --> default_collection
extract_extras --> default_parent
extract_extras --> default_catalog
extract_extras ==> does_parent_exist
does_parent_exist == Yes ==> fetch_parent
does_parent_exist == No ==> new_pkg_title
does_parent_exist-. No .->error_2
default_collection --> does_parent_exist
default_parent --> does_parent_exist
fetch_parent ==> new_pkg_title
new_pkg_title ==> is_title_valid
is_title_valid== Yes ==> is_federal
is_title_valid-. No .->error_2
default_schema_version_2 --> is_federal
hc_defaults_2 --> is_federal
new_pkg_title ==> is_federal
is_federal == Yes ==> federal_validation
is_federal == No ==> non_federal_validation
federal_validation ==> validate_dataset
non_federal_validation ==> validate_dataset
validate_dataset ==> get_owner_org
get_owner_org ==> make_upstream_content_hash_2
make_upstream_content_hash_2 ==> assemble_basic_dataset_info
assemble_basic_dataset_info ==> add_dataset_specific_info
add_dataset_specific_info ==> is_geospatial
is_geospatial == Yes ==> tag_geospatial
is_geospatial == No ==> is_collection
tag_geospatial ==> is_collection
is_collection == Yes ==> tag_collection_parent
is_collection == No ==> tag_collection_child
tag_collection_parent ==> tag_catalog_values
tag_collection_child ==> tag_catalog_values
tag_catalog_values ==> is_existing
is_existing == Yes ==> get_existing_pkg
is_existing == No ==> create
get_existing_pkg ==> avoid_resource_overwriting
avoid_resource_overwriting ==> update
create ==> update_object_reference
update ==> update_object_reference
update_object_reference ==> ie
end


%% Data
error[\Error/]
error_2[\Error/]
skip[/Skip\]
source_data[(Source Datasets)]
catalog_values[(Catalog Values)]
Expand All @@ -29,6 +159,7 @@ flowchart TD
default_catalog[(catalog_values=none)]
hc_filter[(Source Config Filter)]
hc_defaults[(Source Config Defaults)]
hc_defaults_2[(Source Config Defaults)]
new_pkg_id[(New package id)]
HarvestObjectExtra[(Create Harvest Object)]
new_pkg_title[(New package title)]
Expand All @@ -41,8 +172,10 @@ flowchart TD
get_existing_datasets[[Get Existing Datasets]]
get_source_hash[[Source Hash]]
%% set_dataset_info[[Set Dataset Info]]
for_each_dataset[[For Each Source Dataset]]
for_each_existing[[For Each Existing Dataset]]
for_each_dataset[[For Each Source Dataset START]]
for_each_dataset_end[[For Each Source Dataset END]]
for_each_existing[[For Each Existing Dataset START]]
for_each_existing_end[[For Each Existing Dataset END]]
update[[Update Dataset]]
delete[[Delete Dataset]]
do_nothing[[Nothing to do]]
Expand Down Expand Up @@ -88,123 +221,3 @@ flowchart TD
has_title{Does the dataset have a title?}
does_parent_exist{Does Parent exist?}
is_title_valid{Is the title valid?}


%% Algorithm
gs ==> load_remote_catalog
load_remote_catalog ==> validate_conforms_to
validate_conforms_to == No ==> error
validate_conforms_to == Yes ==> check_schema_version
load_remote_catalog --> source_data
load_remote_catalog --> catalog_values
catalog_values --> check_schema_version
check_schema_version-- No -->default_schema_version
check_schema_version-- Yes -->schema_version
schema_version --> get_existing_datasets
default_schema_version --> get_existing_datasets
get_existing_datasets --> existing_datasets
get_existing_datasets ==> is_parent_
is_parent_ == Yes ==> existing_parents
existing_parents --> is_parent_demoted
is_parent_ == No ==> is_parent_demoted
is_parent_demoted -- Yes --> orphaned_parents
is_parent_demoted == No ==> is_parent_promoted
existing_datasets --> is_parent_promoted
is_parent_promoted -- Yes --> new_parents
is_parent_promoted == No ==> load_config
load_config --> hc_filter
load_config --> hc_defaults
load_config ==> is_identifier_both
is_identifier_both-. Yes .-> error
is_identifier_both == No ==> for_each_dataset
hc_filter --> dataset_contains_filter
for_each_dataset ==> dataset_contains_filter
dataset_contains_filter-. Yes .-> skip
dataset_contains_filter == No ==> has_identifier
has_identifier-. No .-> error
has_identifier == Yes ==> multiple_identifier
multiple_identifier-. Yes .-> skip
multiple_identifier == No ==> unique_datsets
unique_datsets --> unique_existing
unique_existing == Yes ==> hash_exists
unique_existing -- Yes --> seen_datasets
unique_existing == No ==> new_pkg_id
hash_exists == Yes ==> get_source_hash
get_source_hash ==> is_active
is_active == Yes ==> make_upstream_content_hash
is_active == No ==> HarvestObjectExtra
hash_exists == No ==> make_upstream_content_hash
orphaned_parents-- Disjunction -->make_upstream_content_hash
new_parents-- Disjunction -->make_upstream_content_hash
make_upstream_content_hash ==> check_hash
check_hash-. Yes .-> skip
check_hash-- No -->HarvestObjectExtra
new_pkg_id --> HarvestObjectExtra
Append__is_collection --> HarvestObjectExtra
schema_version --> HarvestObjectExtra
default_schema_version --> HarvestObjectExtra
catalog_values --> HarvestObjectExtra
Append__collection_pkg_id --> HarvestObjectExtra
HarvestObjectExtra ==> is_parent_2
is_parent_2 == Yes ==> Harvest_first
is_parent_2 == No ==> Harvest_second
Harvest_first ==> for_each_existing
Harvest_second ==> for_each_existing
for_each_existing --> seen_datasets
for_each_existing ==> is_deleted
seen_datasets-. Inverse .-> skip
is_deleted-. Yes .-> skip
seen_datasets --> delete
is_deleted== No ==>delete
delete-. exception .-> error
delete ==> ge
ge ==> fs
fs ==> do_nothing
do_nothing ==> fe
fe ==> is
is ==> empty_dataset
empty_dataset == Yes ==> ie
empty_dataset == No ==> has_title
has_title == Yes ==> extract_extras
has_title-. No .->error
extract_extras --> default_schema_version_2
extract_extras --> default_collection
extract_extras --> default_parent
extract_extras --> default_catalog
extract_extras ==> does_parent_exist
does_parent_exist == Yes ==> fetch_parent
does_parent_exist == No ==> new_pkg_title
does_parent_exist-. No .->error
default_collection --> does_parent_exist
default_parent --> does_parent_exist
fetch_parent ==> new_pkg_title
new_pkg_title ==> is_title_valid
is_title_valid== Yes ==> is_federal
is_title_valid-. No .->error
default_schema_version_2 --> is_federal
hc_defaults --> is_federal
new_pkg_title ==> is_federal
is_federal == Yes ==> federal_validation
is_federal == No ==> non_federal_validation
federal_validation ==> validate_dataset
non_federal_validation ==> validate_dataset
validate_dataset ==> get_owner_org
get_owner_org ==> make_upstream_content_hash_2
make_upstream_content_hash_2 ==> assemble_basic_dataset_info
assemble_basic_dataset_info ==> add_dataset_specific_info
add_dataset_specific_info ==> is_geospatial
is_geospatial == Yes ==> tag_geospatial
is_geospatial == No ==> is_collection
tag_geospatial ==> is_collection
is_collection == Yes ==> tag_collection_parent
is_collection == No ==> tag_collection_child
tag_collection_parent ==> tag_catalog_values
tag_collection_child ==> tag_catalog_values
tag_catalog_values ==> is_existing
is_existing == Yes ==> get_existing_pkg
is_existing == No ==> create
get_existing_pkg ==> avoid_resource_overwriting
avoid_resource_overwriting ==> update
create ==> update_object_reference
update ==> update_object_reference
update_object_reference ==> ie
2 changes: 1 addition & 1 deletion docs/dcat.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

1 comment on commit 951fa61

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coverage

Coverage Report
FileStmtsMissCoverMissing
harvester
   __init__.py30100% 
harvester/db/models
   __init__.py50100% 
   models.py530100% 
harvester/extract
   __init__.py1922 89%
   dcatus.py1122 82%
harvester/utils
   __init__.py00100% 
   json.py2266 73%
   pg.py3544 89%
   s3.py2466 75%
harvester/validate
   __init__.py00100% 
   dcat_us.py240100% 
TOTAL1962090% 

Tests Skipped Failures Errors Time
29 0 💤 0 ❌ 0 🔥 17.282s ⏱️

Please sign in to comment.