Skip to content

Latest commit

 

History

History
129 lines (91 loc) · 4.73 KB

File metadata and controls

129 lines (91 loc) · 4.73 KB

Tests

Selene Selection Engine

Selene is a tool of processing references to portions of resources like used in stand-off annotations. The Web Annotation Data Model (WADM) calls the mechanisms for referencing a portion of a textual, pictorial etc. resource selectors. Other standards may call them pointers. The purpose of Selene is processing of selectors and pointers respectively, not the rest of stand-off annotations.

Selene offers commands for

Normalizing Selectors

A selection of the same portion of a resource may be expressed in different ways. Normalizing selectors is important for storing them in a uniform way and key for testing equality without applying them to the resource.

Transforming Selectors

Deriving different representations (images) from a resource (preimage) is a common task, e.g. transforming an XML encoded text to HTML or plain text. Selene can transform pointers into the preimage to pointers into the image (forward transformation of selectors) and vice versa (backward transformation of selectors) with the same transformation that is used for deriving a representation of the resource. Transforming selectors is important for making standoff annotations interoperable when they were aggregated for a particular representation of a resource; it is thus a corner stone of generating and processing FAIR research data.

preimage image Transformation Techn. forward selector transformation backward selector transformation
XML XML arbitrary XSLT ✅ (XML selector to XML selector) ✅ (XML selector to XML selector)
XML XHTML arbitrary XSLT ✅ (XML selector to HTML selector) ✅ (HTML selector to XML selector)
XML HTML arbitrary XSLT ❓ (XML selector to HTML selector) ❓ (HTML selector to XML selector)
XML plain text arbitrary XSLT ✅ (XML selector to text selector) ✅ (text selector to XML selector)

Currently, selector transformations forward and backward are only supported alongside XSLT transformations of preimage to image. Why? Because XSLT is a formidable technology based on a truly declarative programming paradigm. And Selene's selector transformation exploits the possibilities only a declarative language can provide. Lern more about the preconditions to the XSLT in the project's Wiki.

Yes, as you can see in the table, Selene is capable of rewriting pointers into plain text (inter-glyph character positions) to pointers into XML (XPath selectors refined by inter-glyph character positions) if the plain text was derived from XML with XSLT.

Converting Selector Serializations

Converting between different selector serialization formats.

Roadmap

  • Fall 2025: core library with transformation of pointers and unit tests, minimal CLI
  • Winter 2025/6: implement distributed text service (other project) as a deployment platform for Selene
  • Spring 2026: add REST API endpoints for normalizing and transforming pointers as an extension to distributed text services
  • Sommer 2026: Introduce the technology on conferences or workshops

Getting Started

Java version required: OpenJDK 17+

Tesing and building:

./mvnw clean test package

After building with Maven, there is a command line interface in cli/target/bin/oasel.

Generating Java Docs, which will be in target/site/apidocs:

./mvnw javadoc:aggregate

Javadocs are online for the latest release.

Usage Examples (Linux):

cli/target/bin/oasel normalize test/gesang.annot1.json -l jsonld -f ttl
cli/target/bin/oasel normalize test/gesang.annot1.json -l jsonld  -x "path(root(.))" -f ntriples

Building native executable:

  1. set up graalvm
tar -zxf graalvm.tgz
export GRAALVM_HOMT=graalvm-...
  1. build native executable
./mvnw package -Pnative
cli/target/bin/selwr transforms --xpath='//*:head[1]/text()[1]' --character=5 --xsl=test/text-with-toc-pkg.xsl test/Gesang.tei.xml --normalizer-xpath="path(.)" --config=saxon-config.xml

Further Reading