Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External DTD Encoding with EXIficient #46

Open
brandonprry opened this issue Jan 23, 2025 · 9 comments · May be fixed by #47
Open

External DTD Encoding with EXIficient #46

brandonprry opened this issue Jan 23, 2025 · 9 comments · May be fixed by #47

Comments

@brandonprry
Copy link

I've read that EXI encoding should support encoding an external DTD reference. For instance.

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
  <!ELEMENT foo ANY >
  <!ENTITY file SYSTEM "file:///tmp/fragment.xml" >]>
<foo>&file;</foo>

I've attempted to encode this with the CMD class, but I cannot seem to encode it properly so that decoding preserves the external DTD reference, even with the -preserveDTDs -preservePIs etc arguments.

Can EXIficient encode an external DTD reference that will be decoded still as an external reference to a fragment?

Thanks so much.

@jsbiff
Copy link

jsbiff commented Jan 23, 2025

@brandonprry

When you encoded the exi file, did you set the -includeOptions option to save the encoding options you used, so that the decoder would use the same options? Or, alternatively, did you explicitly provide those options (-preserveDTDs, etc) to the decode command?

@danielpeintner
Copy link
Member

FYI: EXIficient was part of the EXI implementation report. Hence, all features are implemented.

By default, most XML artifacts like comments, prefixes, DTDs etc. are not preserved (to get the best compression). Anyhow, as mentioned in this thread, you need to set encoding options (when encoding the file) if you need them.

see https://github.com/EXIficient/exificient?tab=readme-ov-file#command-line-interface for the command-line interface.

@brandonprry
Copy link
Author

Here is what I'm doing exactly, with the outputs. I am not able to successfully persist the external reference.

bperry@MacBookPro target % cat ~/test.xml 
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
  <!ELEMENT foo ANY >
<!ENTITY file SYSTEM "file:///tmp/fragment.xml" >]>
<foo>&file;</foo>

bperry@MacBookPro target % java -jar exificient-jar-with-dependencies.jar -encode -preserveDTDs -preservePIs -preserveLexicalValues -retainEntityReference -preservePrefixes -includeOptions -preserveLexicalValues  -i ~/test.xml -o /tmp/fdsa.exi
bperry@MacBookPro target % cat /tmp/fdsa.exi | xxd
00000000: a008 03a0 3666 f6f0 0001 33c2 1454 c454  ....6f....3..T.T
00000010: d454 e542 0666 f6f2 0414 e593 e202 08cc  .T.B.f..........
00000020: dedf 4119 9a5b 1940                      ..A..[.@
bperry@MacBookPro target % java -jar exificient-jar-with-dependencies.jar -decode -i /tmp/fdsa.exi -o /tmp/test.xml                                                                                                                                
bperry@MacBookPro target % cat /tmp/test.xml
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE foo [
<!ELEMENT foo ANY>
]>
<foo/>%                                                                                                                                                                                                                                              bperry@MacBookPro target % 

@danielpeintner
Copy link
Member

I looked into, and it seems the handler misses one call (it does not handle it, it is just empty while the other methods are implemented)

public void externalEntityDecl(String name, String publicId, String systemId)
throws SAXException {
}

I will try to fix that

@danielpeintner danielpeintner linked a pull request Jan 24, 2025 that will close this issue
@danielpeintner
Copy link
Member

I created #47

Please have a look and report back @brandonprry

@brandonprry
Copy link
Author

That almost does it,but I don't see the external entity being persisted after decoding.

Here is the XML after decoding from EXI with the exact command as pasted above.

bperry@MacBookPro target % cat /tmp/test.xml                                                                       
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE foo [
<!ELEMENT foo ANY>
<!ENTITY file SYSTEM "file:///tmp/fragment.xml">
]>
<foo/>

Here is the original XML, with the entity reference in element which is missing after decoding. However, the ENTITY is persisted correctly.

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
  <!ELEMENT foo ANY >
<!ENTITY file SYSTEM "file:///tmp/fragment.xml" >]>
<foo>&file;</foo>

@danielpeintner
Copy link
Member

I checked the EXI file which contains the text file as entityReference but the call that should bring it back to XML doesn't do it...

Not sure... entity references are a bit special and my knowledge faded ..

protected void handleEntityReference(char[] erName) throws SAXException {
String entityReferenceName = new String(erName);
contentHandler.skippedEntity(entityReferenceName);

@brandonprry
Copy link
Author

I will dig.

@brandonprry
Copy link
Author

I've tried several EXI implementations. OpenV2G, RISE-V2G, and some others. I've not found one yet that supports external entities. This may just be squirrelly feature that hasn't been useful for applications consuming EXI yet.

I'll leave this open, it's not a pertinent feature for me at the moment, but it will be. I should still be able to dig and find out what code should be fixed to make this work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants