Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

date-time tests problem #11

Open
mbaudis opened this issue Nov 29, 2021 · 8 comments
Open

date-time tests problem #11

mbaudis opened this issue Nov 29, 2021 · 8 comments
Assignees

Comments

@mbaudis
Copy link
Member

mbaudis commented Nov 29, 2021

The verifier throws an error for standard ISO8601 dates against the date-time format:

ERROR beacon_verifier::utils    >    ERROR: Format { format: "date-time" } - "2001-01-01" is not a "date-time" (/createDateTime)

AFAIK an ISO date fulfils the date-time requirement?

@mbaudis
Copy link
Member Author

mbaudis commented Nov 29, 2021

Addendum: also errors w/ time expansion:

 ERROR beacon_verifier::utils    >    ERROR: Format { format: "date-time" } - "2020-06-01T00:00:00.000000" is not a "date-time" (/updateDateTime)

@MrRobb
Copy link
Member

MrRobb commented Dec 1, 2021

This is not on my control to change since date-times in RFC3339 Section 5.6 are defined as the full date, a 'T', the time with optional sub-second precision and the time offset.

If it should only be a date, or something more custom with a regex pattern, then the proper changes should be made in the model to reflect that decision. But if you specify the date-time format of json-schemas, the data should be complete to be valid.

@MrRobb MrRobb closed this as completed Dec 1, 2021
@mbaudis
Copy link
Member Author

mbaudis commented Dec 1, 2021

Partial date-time formats are valid under ECMA script specifications and should verify.

Also, the only other solution AFAIK is to drop the format flag and to use only string with description for its use, i.e. no verification.

More details

While RFC3339 (July 2002) details the specification of a date-time format as the complete representation of an ISO8601 date & time string and recommend this or its parts for "new internet standards", JSON/ECMA ... definitions use an ISO8601 date-time format in a slightly different manner: same ISO string format (obviously), but allowing atomic representations. This is being documented e.g. in

ECMAScript defines a string interchange format for date-times based upon a simplification of the ISO 8601 Extended Format. The format is as follows: YYYY-MM-DDTHH:mm:ss.sssZ
...
This format includes date-only forms:
YYYY
YYYY-MM
YYYY-MM-DD
It also includes time-only forms with an optional time zone offset appended:
THH:mm
THH:mm:ss
THH:mm:ss.sss
Also included are “date-times” which may be any combination of the above.

Now, while the forced use of strict RFC 3339 times may make sense for system-generated timestamps (document updates etc.) it does not seem sensible for usually more fuzzy entries such as sample collection times - time there may be needed, but not in common scenarios. But anyway it seems sensible to verify against the common interpretation (i.e. ISO 8601 partial).

@mbaudis mbaudis reopened this Dec 1, 2021
@jrambla
Copy link

jrambla commented Dec 2, 2021

We are following Schemablocks suggestion here: RFC3339.
Relaxing a standard usage is actually not using the standard.
Converting that in an open string will remove the purpose of the data type.
Being conformant to the data type is a matter of formatting the output, seems easy to me.

@jrambla jrambla closed this as completed Dec 2, 2021
@mbaudis
Copy link
Member Author

mbaudis commented Dec 2, 2021

Sorry, @jrambla - SchemaBlocks documentation of time formats does not reference RFC3339 (as some recommendation how to check/implement) but ISO 8601 (as the format). And I wrote the {S}[B] page myself (RFC3339 isn't mentioned anywhere).

AFAIK RFC3339 is mentioned in timestamp docs such as updateDateTime / updated, for timestamp purposes (and to make it explicit that ISO strings are valid, in contrast to system time / time since epoch etc. - many discussions especially with PXF devs about the difference between format & implementation of timestamp expressions).

Summary

  • format date-time is nice to make sure nobody drops in random strings
  • no harm being done by using partial formats (which are considered ISO compliant, as documented in MS and ECMA etc. docs)
  • date-time should validate partial ISO 8601 strings such as 1999 (thanks, Prince, you were one of the best!) as commonly documented
  • if the validator doesn't cope => fix?
  • if that is a problem drop the format and use description only, for fields which are not of the "timestamp" form (i.e. system generated) but rather loosely defined dates / times

And yes, we can go with "date-time has to be fully RFC3339 date-time compliant" in which case data has to be converted during I/O. But IMO this introduces data errors, since data is being changed from low to higher granularity, since most dates and times for measurements, collections, observations will be rather fuzzy!

(comment updated)

@mbaudis mbaudis reopened this Dec 2, 2021
@jrambla
Copy link

jrambla commented Dec 3, 2021

As I understand the issue:
The current schemas include the format: "date-time" as part of the property definition.
As such, dates should be date-times and they should follow an strict rule for the format.
This is a simple formatting issue that has the garnularity consequence that you mention.

Expressing the date in other formats, using a relaxed validation, does not sound right to me, as we could not rely on the "automatic" validation of the type anymore, and we must revert to trust the implementers.
Remember that using the verifier would not be mandatory for all Beacon instances in the Univers, hence, the schema MUST be as precise as necessary.

Therefore, if I need to choose, given that the schemas are out, I will choose to stick to the current strict one.

@mbaudis
Copy link
Member Author

mbaudis commented Dec 3, 2021

But then the format should be dropped for all those fields where it can be anything from date +- time space, e.g. relatedvto collection or observation points, since they frequently will be date only (even year), but in some instances time will be used/needed. Forcing conversion to strict, complete date-time by implementers will introduce errors (e.g. adding wrong precision), i.e. be destructive.

I guess this is why newer standards recognize validation of partial.

And to iterate: "1999" is perfect ISO; "1999-01-01T00:00:00.000Z" checks out RFC but is about a year away from what Prince was meaning.

@mbaudis
Copy link
Member Author

mbaudis commented Dec 14, 2021

Oh well, JSON docs for current drafts https://json-schema.org/understanding-json-schema/reference/string.html?highlight=datetime#id7 really are ambiguous, now having date, time, and date-time, but no statement about the handling (besides that format definitions are hints, but verifiers may follow them...). Which leaves the interpretation ambiguous.

Which leaves us with the problem of fields with varying granularity...

As @jrambla mentioned the option of patterns: Here is mine for ISO date +/- time:

^\d{4}-\d{2}-\d{2}([Tt]\d{2}:\d{2}:\d{2}(\.\d{1,3})?([Zz]|[+-]\d{2}:\d{2})?)?$
or
^\d{4}-\d\d-\d\d([Tt]\d\d:\d\d:\d\d(\.\d{1,3})?([Zz]|[+-]\d\d:\d\d)?)?$

This does not verify the sanity of the dates or times, just the string format. Happy to get a better one!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants