Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update namespace in urn #1808

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

MBueschelberger
Copy link

@MBueschelberger MBueschelberger commented Feb 14, 2023

Related to #1806, the URN-double colon is updated to a single colon in order to be fully compliant to the urn-scheme

Copy link
Member

@kinow kinow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @MBueschelberger

Approved the GH Actions workflow to run. I don't know much about the urn spec or syntax, but searching the common-workflow-language, there are many other occurrences (55 “code“ occurrences):

https://github.com/search?q=org%3Acommon-workflow-language+urn%3Ahash%3A%3Asha1%3A&type=code

Also in this IETF doc: https://datatracker.ietf.org/doc/id/draft-thiemann-hash-urn-00.txt

Reading the document you linked in the issue linked here, I would expect :sha1, so no idea why we have the ::sha1, nor why this document above also uses the same two colons. Strange.

image

@codecov
Copy link

codecov bot commented Feb 14, 2023

Codecov Report

Merging #1808 (3d1797f) into main (c69221b) will increase coverage by 0.06%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main    #1808      +/-   ##
==========================================
+ Coverage   83.55%   83.62%   +0.06%     
==========================================
  Files          44       44              
  Lines        8102     8102              
  Branches     2218     2218              
==========================================
+ Hits         6770     6775       +5     
+ Misses        851      847       -4     
+ Partials      481      480       -1     
Impacted Files Coverage Δ
cwltool/provenance.py 81.58% <ø> (ø)
cwltool/job.py 81.88% <0.00%> (+0.39%) ⬆️
cwltool/workflow_job.py 87.42% <0.00%> (+0.56%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@mr-c mr-c requested a review from stain February 14, 2023 15:16
@stain
Copy link
Member

stain commented Jun 14, 2023

If we're using the urn:hash scheme then we should follow its recommendations, which in this case has a :: as the media type is unknown. Unfortunately his is stuck in 'draft' stage (now expired).

It is as far as I can tell however compatible with the older RFC2141 you linked to.

<URN> ::= "urn:" <NID> ":" <NSS>
<NID>         ::= <let-num> [ 1,31<let-num-hyp> ]
<NSS>         ::= 1*<URN chars>

Substituting the example urn:hash::sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL we get:

NID = "hash"
NSS = ":sha1:LBPI666ED2QSWVD3VSO5BG5R54TE22QL"

Now the question is if ":" is permitted in <URN chars>:

<URN chars>   ::= <trans> | "%" <hex> <hex>
<trans>       ::= <upper> | <lower> | <number> | <other> | <reserved>
<other>       ::= "(" | ")" | "+" | "," | "-" | "." |
                     ":" | "=" | "@" | ";" | "$" |
                     "_" | "!" | "*" | "'"

And there it is in second line of <other>.

Now in newer RFC8141 the rules are different:

assigned-name = "urn" ":" NID ":" NSS
NID           = (alphanum) 0*30(ldh) (alphanum)
ldh           = alphanum / "-"
NSS           = pchar *(pchar / "/")

We assign NID and NSS the same, and hash is still valid NID. Now the question is if the NSS can start with : from pchar?

      pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"

So ":" is declared directly, so it should also be valid.

Therefore I don't see the need for this pull request as it is breaking the namespace which was already valid before, if a bit odd-looking (but then so are all the URNs! :-) )

The real fix should be to get rid of insecure SHA1 all together and use sha256 or better, which then can use the newer Naming things with hashes scheme (RFC 6920), which deliberately do not support SHA1 and has a registry of modern checksum algorithms.

@stain
Copy link
Member

stain commented Jun 14, 2023

Reading the document you linked in the issue linked here, I would expect :sha1, so no idea why we have the ::sha1, nor why this document above also uses the same two colons. Strange.

You are thinking of the separate urn:sha1 scheme which is not well defined.

@mr-c
Copy link
Member

mr-c commented Jun 14, 2023

@MBueschelberger Can you help us by sharing your motivation for this change?

@MBueschelberger
Copy link
Author

Hi @mr-c,

I actually requested this change since I was not able to parse the graph with any of the available RDFLib-versions. Removing the second colon actually solved the problem for me.

This is why I actually initiated this discussion here about the origin of this ::.

Best regards

@mr-c
Copy link
Member

mr-c commented Sep 28, 2023

@simleo Do you have experience using rdflib to parse CWL prov aggregates that contain urn:hash::sha1 (note the double colon)?

https://github.com/common-workflow-language/cwltool/pull/1808/files#diff-2bb87b99f4b0d10faa69f7bcec12f404bc4a41ac1aae63f8a14d6460fa5798a5L520

@simleo
Copy link
Contributor

simleo commented Sep 28, 2023

@mr-c no, sorry. In runcrate, I use cwlprov-py to read CWLProv ROs, but I don't use rdflib directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants