You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Discuss how Annotations should be implemented in HashStore
What format should we use to store annotation content in /hashstore/metadata? JSON-LD or EML?
What is HashStore's responsibility when storing annotations?
Is the EML document already formed at this point?
Where is the content coming from?
Who currently creates the EML documents to be stored?
Summarize issue discussion into substorage design document
Initial Proposal to kickstart the conversation (the content below is not final, and will likely change):
A dataset that is represented by an EML document can be broken down to 2 components:
Attributes that describe the dataset (ex. title, author, method, keywordSet, etc.)
Attributes that represent the tables associated with the dataset (ex. dataTable, otherEntity, etc.)
A HashStore annotation is a mapping document that should consist of a single parent member and a list that represents the child members
This document's location in hashstore/metadata is formed by calculating the SHA-256 hex digest of a given pid and formatId
The parent member's value is the id (location) of the parent metadata document in hashstore/metadata
- The id/location/address of this document is formed by calculating the SHA-256 hex digest of a given pid, formatId and the string "parent". Ex. sha-256(pid + formatId + "parent")
- This document is composed of the attributes/content that describe the dataset (ex. title, author, method, keywordSet, etc.)
The List/HashMap of child members are represented with a number as the key, and the id (location) of the child's metadata document in hashstore/metadata as the value
- The id/address of each child is formed by calculating the SHA-256 hex digest of a given pid, formatId and (int) key. Ex. sha-256(pid + formatId + 0) where 0 is the first table in the dataset
- Each child represents a data table in the dataset, or chunk of data that belongs to the dataset
Note: The format of the parent/child metadata documents to be stored/chunked requires further discussion/clarification
---
title: HashStoreAnnotation Class
---
classDiagram
direction RL
class HashStoreAnnotation{
+String Parent
+List~Dict/KVP~ Children
+setParent(string)
+setChildren(List)
+getContent()
+setContent()
+getChildrenTotal()
}
Loading
Example/flow to store an annotation document:
hs_annotation = HashStoreAnnotation()
// Get and store parent content
// Get and store children content
// Get parent location
dataset_parent = sha-256(pid + formatId + "parent")
// Create child list
dataset_children = [
{0: sha-256(pid + formatId + 0)},
{1: sha-256(pid + formatId + 1)},
...
]
hs_annotation.setParent(dataset_parent)
hs_annotation.setChildren(dataset_children)
// getContent() will format the document to be written based on the chosen format
hs_annotation_content = hs_annotation.getContent()
hashstore.store_metadata(pid, hs_annotation_content, formatId)
Example/flow to work with/retrieve an annotation document:
// Retrieve the mapping document
hs_annotation_stream = hashstore.retrieve_metadata(pid, formatId)
hs_annotation = HashStoreAnnotation.setContent(hs_annotation_stream)
hsa_parent = hs_annotation.parent
hsa_children = hs_annotation.children
// Iterate over the first 1000 table items
for i in range(0, 1000):
rel_path = shard(hsa_children[i])
location = `/hashstore/metadata/` + rel_path
// ... Do what we will with each child element
Questions & Todo:
/hashstore/metadata? JSON-LD or EML?Initial Proposal to kickstart the conversation (the content below is not final, and will likely change):
HashStore annotationis a mapping document that should consist of a single parent member and a list that represents the child membershashstore/metadatais formed by calculating the SHA-256 hex digest of a givenpidandformatIdhashstore/metadata- The id/location/address of this document is formed by calculating the SHA-256 hex digest of a given
pid,formatIdand the string "parent".Ex. sha-256(pid + formatId + "parent")- This document is composed of the attributes/content that describe the dataset (ex. title, author, method, keywordSet, etc.)
hashstore/metadataas the value- The id/address of each child is formed by calculating the SHA-256 hex digest of a given
pid,formatIdand(int) key.Ex. sha-256(pid + formatId + 0)where 0 is the first table in the dataset- Each child represents a data table in the dataset, or chunk of data that belongs to the dataset
--- title: HashStoreAnnotation Class --- classDiagram direction RL class HashStoreAnnotation{ +String Parent +List~Dict/KVP~ Children +setParent(string) +setChildren(List) +getContent() +setContent() +getChildrenTotal() }