Skip to content

Postprocessing with XSLT #2

@ebeshero

Description

@ebeshero

Here is a list of things that @Yuying-Jin and I have decided are best handled in post-processing of collation files with XSLT.

  • Solitary witness with only one token of content:
    XPath: //app[count(rdgGrp) = 1][rdgGrp[not(contains(@n, ','))]][count(descendant::rdg) < 3]

  • Solitary witness holding meaningful content: If start of a new sentence, move down. Check if this witness in a preceding rdGrp ends wtih a period, and following other witnesses start with a capital letter(?)

    • Or reconsider: move all of these down.
      As of 2022-10-11 All solitary witness are now moved down.
  • In the process of consolidating solitary witnesses, deal with this:

Let's try creating a conditional processing rule in the template rule on app with @mode="restructure":
IF the $norm param only contains [''] (string-length() = 4), do NOT create a new rdgGrp, and simply move
the $loner param into the existing structure.

    Example of the problem: these do not need to be two separate rdGrp elements: 
    ```
    <app><rdgGrp n="['with', 'my', 'aunt', 'and', 'my']">
		<rdg wit="f1818">with my aunt and my </rdg>
		<rdg wit="f1823">with my aunt and my </rdg>
		<rdg wit="fThomas">with my aunt and my </rdg>
		<rdg wit="fMS">with my aunt &amp; my </rdg>
	</rdgGrp><rdgGrp n="['', 'with', 'my', 'aunt', 'and', 'my']"><rdg wit="fMS">&lt;sga-add eID="c56-0104__main__d5e21929"/&gt; with my aunt &amp; my </rdg></rdgGrp></app> 
```

2022-10-18 Likely solved with 8925989

  • Ampersand and other special characters generated by nodeToXML() output of longTokens, adds, dels (inlineVariationEvent): &amp;amp; or &amp;quot;

    • This turned out to be a serious problem that might have distorted the collation and its handling of normalized & to and.
      Repaired in 0b02d1c
  • For inlineVariation events where we have constructed "long tokens": we now output these as a complete unit from start to end in only one <rdgGrp> and <rdg>. This means that sometimes the other witnesses split around them awkwardly. We should smooth out the awkwardness with this algorithm:

    • If the contents of a "long token" contains() a string from the very next <app> element following::app[1], then move the contents of that very next <app> up into the current app of the long token, and remove the very next <app>.
  • If an app shows one rdgGrp with all witnesses unified on a paragraph marker, but one witness (most likely f1831) actually has not been running in alignment for a long time, move the paragraph marker to the preceding or following app containing that witness. XPath: //app[count(rdgGrp) = 1 ][count(.//rdg) = 5][preceding-sibling::app[1][count((.//rdg)) < 5]][matches(rdgGrp/@n, '^\[.<p' ) and matches(rdgGrp/@n, '>.\]$')]

Metadata

Metadata

Assignees

No one assigned

    Labels

    XSLT post-processingissues associated with post-processing of collateX output with XSLT

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions