Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SKOS importer doesn't like special characters #346

Open
mgbeyer opened this issue Jul 8, 2015 · 4 comments
Open

SKOS importer doesn't like special characters #346

mgbeyer opened this issue Jul 8, 2015 · 4 comments

Comments

@mgbeyer
Copy link

mgbeyer commented Jul 8, 2015

If the subject part of an N-Triple line contains characters like slash (/) or hash (#) the importer will reject them (example: "WARN -- : SkosImporter: Invalid origin. Skipping :concept/#Abbreviations rdf:type skos:concept").
But characters like / or # are normal parts of an URI. For example one of our thesauri we'd like to import to iQvoc contains multiple levels beyond the context path set by the default namespace to distinguish between actual concepts and personal classes and properties (among others). Then if you strip the leading default namespace from a subject string (like the importer does) the remaining part of the URI still contains slashes and will be rejected by the importer.

Generally an URI should be granted to contain UTF-8 conform special characters to allow for regional character sets.
So I wonder why the importer actively rejects characters beyond the minimal set of " a-zA-Z0-9_.-"? Was it a deliberate design decision with a sound purpose and I'm missing a point here? If you maybe could elaborate on that a little I would greatly appreciate it.

@mjansing
Copy link
Contributor

I can't reproduce the problem. Please provide more information about the imported triples. The fragment identifier should be the last part of an uri (after filename, your leading slash looks a bit curious).

@mgbeyer
Copy link
Author

mgbeyer commented Jul 16, 2015

Thanks for the reply!

I don't know what you mean by "after filename"...what filename?
Anyway, here's more detailed information about what we're trying to import (sorry this is a bit lengthy :))

The (stripped-down) N-Triples file:

<http://lod.gesis.org/thesoz/classification/0> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept> .
<http://lod.gesis.org/thesoz/classification/0> <http://www.w3.org/2004/02/skos/core#inScheme> <http://lod.gesis.org/thesoz/> .
<http://lod.gesis.org/thesoz/classification/0> <http://www.w3.org/2004/02/skos/core#prefLabel> "Grundlagen der Sozialwissenschaften\u00A00"@de .
<http://lod.gesis.org/thesoz/classification/0> <http://www.w3.org/2004/02/skos/core#prefLabel> "Fundamentals of the Social Sciences\u00A00"@en .
<http://lod.gesis.org/thesoz/classification/0> <http://www.w3.org/2004/02/skos/core#prefLabel> "'fondements des sciences sociales\u00A00"@fr .
<http://lod.gesis.org/thesoz/classification/0> <http://www.w3.org/2004/02/skos/core#notation> "0"^^<http://www.w3.org/2001/XMLSchema#string> .
<http://lod.gesis.org/thesoz/classification/1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept> .
<http://lod.gesis.org/thesoz/classification/1> <http://www.w3.org/2004/02/skos/core#inScheme> <http://lod.gesis.org/thesoz/> .
<http://lod.gesis.org/thesoz/classification/1> <http://www.w3.org/2004/02/skos/core#prefLabel> "Grundlagen der Sozialwissenschaften\u00A00"@de .
<http://lod.gesis.org/thesoz/classification/1> <http://www.w3.org/2004/02/skos/core#prefLabel> "Fundamentals of the Social Sciences\u00A00"@en .
<http://lod.gesis.org/thesoz/classification/1> <http://www.w3.org/2004/02/skos/core#prefLabel> "'fondements des sciences sociales\u00A00"@fr .
<http://lod.gesis.org/thesoz/classification/1> <http://www.w3.org/2004/02/skos/core#notation> "0"^^<http://www.w3.org/2001/XMLSchema#string> .

What seems to be the problem

We're using NAMESPACE='http://lod.gesis.org/thesoz/' as the default, so the remaining subjects will still contain a slash (like "classification/0").
I'm aware that if we expand the namespace to "http://lod.gesis.org/thesoz/classification/" we're facing subjects, starting with a number, which is also not approved by the importer for reasons unclear (see the validator method in the Origin class (/app/aides/origin.rb)). So basically we're talking about this code-fragment in the validator method of the Origin class:

    # should not start with a number
    valid = false if initial_value.match(/^\d.*/)

    # should not contain special chars
    valid = false if CGI.escape(initial_value) != initial_value

Ok, now here's the output:

I, [2015-07-16T11:44:58.282643 #14596]  INFO -- : Known namespaces:
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    1: skos: => http://www.w3.org/2004/02/skos/core#
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    2: skos: => http://www.w3.org/2008/05/skos#
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    3: rdf: => http://www.w3.org/1999/02/22-rdf-syntax-ns#
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    4: : => http://lod.gesis.org/thesoz/
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    5: rdfs: => http://www.w3.org/2000/01/rdf-schema#
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    6: owl: => http://www.w3.org/2002/07/owl#
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    7: dct: => http://purl.org/dc/terms/
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    8: foaf: => http://xmlns.com/foaf/spec/
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    9: void: => http://rdfs.org/ns/void#
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    10: iqvoc: => http://try.iqvoc.net/schema#
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- : Known first level classes:
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    1: skos:Concept => Concept::SKOS::Base
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    2: skos:Collection => Collection::SKOS::Unordered
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- : Known second level classes:
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    1: skos:prefLabel => Labeling::SKOS::PrefLabel
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    2: skos:altLabel => Labeling::SKOS::AltLabel
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    3: skos:changeNote => Note::SKOS::ChangeNote
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    4: skos:definition => Note::SKOS::Definition
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    5: skos:editorialNote => Note::SKOS::EditorialNote
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    6: skos:example => Note::SKOS::Example
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    7: skos:historyNote => Note::SKOS::HistoryNote
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    8: skos:scopeNote => Note::SKOS::ScopeNote
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    9: skos:related => Concept::Relation::SKOS::Related
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    10: skos:broader => Concept::Relation::SKOS::Broader::Mono
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    11: skos:narrower => Concept::Relation::SKOS::Narrower::Base
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    12: skos:closeMatch => Match::SKOS::CloseMatch
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    13: skos:exactMatch => Match::SKOS::ExactMatch
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    14: skos:relatedMatch => Match::SKOS::RelatedMatch
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    15: skos:broadMatch => Match::SKOS::BroadMatch
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    16: skos:narrowMatch => Match::SKOS::NarrowMatch
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    17: skos:notation => Notation::Base
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    18: skos:topConceptOf => Concept::SKOS::Scheme
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- :    19: skos:member => Collection::Member::SKOS::Base
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- : default namespace: 'http://lod.gesis.org/thesoz/'
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- : publish: 'true'
I, [2015-07-16T11:44:58.282643 #14596]  INFO -- : SkosImporter: Importing triples...
W, [2015-07-16T11:44:58.292643 #14596]  WARN -- : SkosImporter: Invalid origin. Skipping :classification/0 rdf:type skos:Concept
W, [2015-07-16T11:44:58.292643 #14596]  WARN -- : SkosImporter: Invalid origin. Skipping :classification/0 skos:inScheme :
W, [2015-07-16T11:44:58.292643 #14596]  WARN -- : SkosImporter: Invalid origin. Skipping :classification/0 skos:prefLabel "Grundlagen der Sozialwissenschaften\u00A00"@de
W, [2015-07-16T11:44:58.292643 #14596]  WARN -- : SkosImporter: Invalid origin. Skipping :classification/0 skos:prefLabel "Fundamentals of the Social Sciences\u00A00"@en
W, [2015-07-16T11:44:58.292643 #14596]  WARN -- : SkosImporter: Invalid origin. Skipping :classification/0 skos:prefLabel "'fondements des sciences sociales\u00A00"@fr
W, [2015-07-16T11:44:58.292643 #14596]  WARN -- : SkosImporter: Invalid origin. Skipping :classification/0 skos:notation "0"^^<http://www.w3.org/2001/XMLSchema#string>
W, [2015-07-16T11:44:58.292643 #14596]  WARN -- : SkosImporter: Invalid origin. Skipping :classification/1 rdf:type skos:Concept
W, [2015-07-16T11:44:58.292643 #14596]  WARN -- : SkosImporter: Invalid origin. Skipping :classification/1 skos:inScheme :
W, [2015-07-16T11:44:58.302643 #14596]  WARN -- : SkosImporter: Invalid origin. Skipping :classification/1 skos:prefLabel "Grundlagen der Sozialwissenschaften\u00A00"@de
W, [2015-07-16T11:44:58.302643 #14596]  WARN -- : SkosImporter: Invalid origin. Skipping :classification/1 skos:prefLabel "Fundamentals of the Social Sciences\u00A00"@en
W, [2015-07-16T11:44:58.302643 #14596]  WARN -- : SkosImporter: Invalid origin. Skipping :classification/1 skos:prefLabel "'fondements des sciences sociales\u00A00"@fr
W, [2015-07-16T11:44:58.302643 #14596]  WARN -- : SkosImporter: Invalid origin. Skipping :classification/1 skos:notation "0"^^<http://www.w3.org/2001/XMLSchema#string>
I, [2015-07-16T11:44:58.302643 #14596]  INFO -- : Computing 'forward' defined triples...
I, [2015-07-16T11:44:58.302643 #14596]  INFO -- : Basic import done (took 0 seconds).
I, [2015-07-16T11:44:58.302643 #14596]  INFO -- : Publishing 0 new subjects...
I, [2015-07-16T11:44:58.302643 #14596]  INFO -- : Publishing of 0 subjects done (took 0 seconds). 0 are in draft state.
I, [2015-07-16T11:44:58.302643 #14596]  INFO -- : Imported 0 published and 0 draft subjects in 0 seconds.
I, [2015-07-16T11:44:58.302643 #14596]  INFO -- : First step took 0 seconds, publishing took 0 seconds.

As I said: lengthy as hell, sorry :-) But I guess it'll help to clarify the problem...

@mjansing
Copy link
Contributor

Thanks. I updated your comment with some formatting options. I'll check that.

@mjansing
Copy link
Contributor

BTW

...we're facing subjects, starting with a number, which is also not approved by the importer for reasons unclear...

Origins should not start with a number so that iQvoc is able to generate a valid rdf/xml serialization. See RDF syntax grammar for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants