Skip to content

Python conversion of xsd:gYear and xsd:gYearMonth typed Literals should not return datetime.date objects #3078

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lu-pl opened this issue Feb 25, 2025 · 7 comments

Comments

@lu-pl
Copy link
Contributor

lu-pl commented Feb 25, 2025

Currently, the rdflib.Literal.toPython conversion of xsd:gYear and xsd:gYearMonth-typed rdflib.Literals returns datetime.date objects with the day (xsd:gYearMonth) or day and month (xsd:gYear) arguments set to 1.

from rdflib import Literal, XSD

year = Literal("2025", datatype=XSD.gYear)
year_month = Literal("2025-02", datatype=XSD.gYearMonth)

year_month.toPython()    # datetime.date(2025, 2, 1)
year.toPython()   # datetime.date(2025, 1, 1)

I think this is plain wrong, xsd:gYear is not equivalent to datetime.date(2025, 1, 1).

Note that issue #1379 addressed this already, but focused more on a serialization problem and was then closed without really discussing the datetime conversion of xsd:gYear and xsd:gYearMonth.

A practical solution for this would be to just return an rdflib.Literal with the datatype property set; this way, the XSD type information would still be available.

I would be happy to implement the necessary changes and open a PR if this the suggested/some solution is agreed upon.

@WhiteGobo
Copy link
Contributor

WhiteGobo commented Feb 25, 2025

this could be related to #2419
Im not quite familiar with xsd:gYear but afaik some of the date-datatypes are processed via isodatetime, despite expressing data in classes from datetime

@WhiteGobo
Copy link
Contributor

WhiteGobo commented Feb 25, 2025

I dont expect any set behaviour from toPython so im not per-se against this change. These are just some arguments for consideration.
I would think it is reasonable to express xsd:gYear as date when you convert the data to python. When datetime isnt supporting any class, where this information, that its just a year with no set month, is expressed, then maybe we should not do something different, when handling data in context of toPython. And at least i wouldnt expect any guarantee not to lose such information during conversion. I dont think(im not sure) xsd-standard set any standards, when eg calculating the timedifference between a date expressed in xsd:gYear and xsd:gYearMonth. See for comparison xpath-function fn:subtract-dates

@lu-pl
Copy link
Contributor Author

lu-pl commented Mar 12, 2025

Thanks for input and sorry for my late reply.

As mentioned in the initial post, imo there is no way to express xsd:gYear and xsd:gYearMonth in Python, maybe timedeltas could be an option, but this too is ontologically not correct.

I think returning an rdflib.Literal with the datatype property set is best way to handle this problem, so if anyone could sanity-check this approach and give me a "go", I would open an PR with that change.

@lu-pl
Copy link
Contributor Author

lu-pl commented Mar 17, 2025

I took a quick look at the toPython parsing functions for xsd:gYear and xsd:gYearMonth; the basic checks defined in those functions should of course be maintained.

I.e. my plan would be to change parse_xsd_gyear and parse_xsd_gyearmonth functions to just a run an XSD-type validation check and return the typed Literal unchanged (if the test passes).

I think it would be best to outsource the responsibility for XSD-type validation to some external tool, either xmlschema or lxml. This would mean to introduce a dependency though, or at least make lxml a non-optional dependency.

@lu-pl
Copy link
Contributor Author

lu-pl commented Mar 17, 2025

Actually, for simple XSD types like xsd:gYear and xsd:gYearMonth it would suffice to use regexes for validation.

lu-pl added a commit to acdh-oeaw/rdfproxy that referenced this issue Mar 17, 2025
The change adds a code path for handling
xsd:gYear/xsd:gYearMonth-typed RDF literals to
`rdfproxy.sparqlwrapper.SPARQLWrapper._get_bindings_from_json_response`.

See the corresponding RDFLib issue: RDFLib/rdflib#3078

Concerns #232.
lu-pl added a commit to acdh-oeaw/rdfproxy that referenced this issue Mar 18, 2025
The change adds a code path for handling
xsd:gYear/xsd:gYearMonth-typed RDF literals to
`rdfproxy.sparqlwrapper.SPARQLWrapper._get_bindings_from_json_response`.

See the corresponding RDFLib issue: RDFLib/rdflib#3078

Concerns #232.
@WhiteGobo
Copy link
Contributor

Actually you are totally correct. gYear and gMonth is literally not supposed to be castable to datetime as seen by a table here in this documentation to xpath-functions.

lu-pl added a commit to lu-pl/rdflib that referenced this issue Apr 16, 2025
Issue RDFLib#3078 reports, that toPython-casting of xsd:gYear and
xsd:gYearMonth to datetime objects is not possible, as there is no
appropriate Python equivalence for those types. The current
implementation casts xsd:gYear and xsd:gYearMonth to datetime objects
assuming Jannuary 1st for xsd:gYear and the 1st day of the given month
for xsd:gYearMonth. This is plain wrong.

The change re-implements parse_xsd_gyear and parse_xsd_gyearmonth so
that XSD-types are checked against a regex and gYear and gYearMonth
are not converted to Python datetime objects anymore but simply return
the value unchanged after checking.
lu-pl added a commit to lu-pl/rdflib that referenced this issue Apr 16, 2025
Issue RDFLib#3078 reports, that toPython-casting of xsd:gYear and
xsd:gYearMonth to datetime objects is not possible, as there is no
appropriate Python equivalence for those types.

The current implementation casts xsd:gYear and xsd:gYearMonth to
datetime objects assuming Jannuary 1st for xsd:gYear and the 1st day
of the given month for xsd:gYearMonth. This is plain wrong.

The change removes casting to datetime objects in
rdflib.Literal.toPython for xsd:gYear and xsd:gYearMonth.

Closes RDFLib#3078 .
@lu-pl
Copy link
Contributor Author

lu-pl commented Apr 16, 2025

Actually you are totally correct. gYear and gMonth is literally not supposed to be castable to datetime as seen by a table here in this documentation to xpath-functions.

PR pending: #3115

This is still a draft because I would like to add docs for this.

lu-pl added a commit to lu-pl/rdflib that referenced this issue Apr 16, 2025
Issue RDFLib#3078 reports, that rdflib.Literal.toPython casting of xsd:gYear and
xsd:gYearMonth to datetime objects is not possible, as there is no
appropriate Python equivalence for those types.

The current implementation casts xsd:gYear and xsd:gYearMonth to
datetime objects assuming January 1st for xsd:gYear and the 1st day
of the given month for xsd:gYearMonth. This is plain wrong.

The change removes casting to datetime objects in
rdflib.Literal.toPython for xsd:gYear and xsd:gYearMonth.

Closes RDFLib#3078 .
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@WhiteGobo @lu-pl and others