Postprocessing with XSLT

Here is a list of things that @Yuying-Jin and I have decided are best handled in post-processing of collation files with XSLT.

- [x]  Solitary witness with only one token of content: 
XPath: `//app[count(rdgGrp) = 1][rdgGrp[not(contains(@n, ','))]][count(descendant::rdg) < 3]`


- [x]  Solitary witness holding meaningful content: If start of a new sentence, move down. Check if this witness in a preceding rdGrp ends wtih a period, and following other witnesses start with a capital letter(?) 
    * Or reconsider: move all of these down.
    **As of 2022-10-11 All solitary witness are now moved down.**

- [x]  In the process of consolidating solitary witnesses, deal with this:

Let's try creating a conditional processing rule in the template rule on app with `@mode="restructure"`: 
        IF the $norm param only contains `['']` (string-length() = 4), do NOT create a new rdgGrp, and simply move
        the $loner param into the existing structure. 
        
        Example of the problem: these do not need to be two separate rdGrp elements: 
        ```
        <app><rdgGrp n="['with', 'my', 'aunt', 'and', 'my']">
			<rdg wit="f1818">with my aunt and my </rdg>
			<rdg wit="f1823">with my aunt and my </rdg>
			<rdg wit="fThomas">with my aunt and my </rdg>
			<rdg wit="fMS">with my aunt &amp; my </rdg>
		</rdgGrp><rdgGrp n="['', 'with', 'my', 'aunt', 'and', 'my']"><rdg wit="fMS">&lt;sga-add eID="c56-0104__main__d5e21929"/&gt; with my aunt &amp; my </rdg></rdgGrp></app> 
	```
             
  **2022-10-18 Likely solved with https://github.com/FrankensteinVariorum/collateX-Testing/commit/8925989b6fb3c634ae4182592916a528b77272ac** 

- [x]  Ampersand and other special characters generated by nodeToXML() output of longTokens, adds, dels (inlineVariationEvent): `&amp;amp;` or `&amp;quot;` 
    * This turned out to be a serious problem that might have distorted the collation and its handling of normalized `&` to `and`. 
    **Repaired in https://github.com/FrankensteinVariorum/collateX-Testing/commit/0b02d1c56e7a42fd2d6441bdeb6da22e2aaa3691**

- [ ]  For inlineVariation events where we have constructed "long tokens": we now output these as a complete unit from start to end in only one `<rdgGrp>` and `<rdg>`. This means that sometimes the other witnesses split around them awkwardly. We should smooth out the awkwardness with this algorithm:
    * If the contents of a "long token" `contains()` a string from the very next `<app>` element `following::app[1]`, then move the contents of that very next `<app>` up into the current app of the long token, and remove the very next `<app>`. 

- [ ]  If an app shows one rdgGrp with all witnesses unified on a paragraph marker, but one witness (most likely f1831) actually has not been running in alignment for a long time, move the paragraph marker to the preceding or following app containing that witness.  XPath: `//app[count(rdgGrp) = 1 ][count(.//rdg) = 5][preceding-sibling::app[1][count((.//rdg)) < 5]][matches(rdgGrp/@n, '^\[.<p' ) and matches(rdgGrp/@n, '>.\]$')]`



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Postprocessing with XSLT #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Postprocessing with XSLT #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions