Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] Support Encoding option when writing XML #593

Draft
wants to merge 14 commits into
base: 2.17
Choose a base branch
from

Conversation

pjfanning
Copy link
Member

@pjfanning pjfanning commented May 1, 2023

@pjfanning pjfanning marked this pull request as draft May 1, 2023 08:57
@pjfanning pjfanning force-pushed the encoding-option branch 2 times, most recently from 1e0ec22 to 96ee4e9 Compare May 1, 2023 09:21
@pjfanning
Copy link
Member Author

@cowtowncoder when you get a chance, could you review the work in progress? If the new XmlMapper methods look ok, I can extend the coverage on them.

import java.io.IOException;
import java.nio.charset.Charset;

public class TestCharset extends XmlTestBase
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start, probably will want to check Latin-1 (ISO-8859-1) as well.

I guess this is WIP so probably were already planning (ditto for basic UTF-8)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can add more tests, including for other charsets. I guess that the point at this stage is try to agree an API approach first. Keeping the number of test cases low at this point means it's easier for me to adjust the API (without the need to then rewrite all the tests).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, agreed.

@cowtowncoder
Copy link
Member

Makes sense for the most part. The only thing I am not sure how I feel about is addition of more writeValue() methods in ObjectMapper: I understand convenience aspect but this introduces discrepancy between ObjectWriter and ObjectMapper. There's also the problem of combinatorial expansion if we wanted to cover all permutations.
So in some ways I wonder if we'd require construction of ToXmlGenerator, then using writeValue() passing that: not as convenient, but incremental improvement to first allow doing this, then seeing how to make it more convenient.

I guess if we were to consider extending support for encodings to be core part of jackson-databind, that could be solved better, but obviously bigger undertaking.

So maybe adding methods here only first makes sense.

@pjfanning
Copy link
Member Author

pjfanning commented May 2, 2023

I'm not sure about the idea to force users to create a ToXmlGenerator. I understand that this avoids expanding the number of methods but it just doesn't feel like the natural way for users to use the API. In practice, most Jackson users don't know about JsonGenerators and the various subclasses like ToXmlGenerator.

One alternative approach would be to add a setter on XmlMapper or XmlFactory where users can set a default charset.

The problem here is that if users need to create XML outputs with different charsets that they would need to create an XmlMapper instance for every charset that they need to support. We could have this setting on XmlMapper and allow it to be changed just before any 'write' call. This goes against the aim of making Mapper instances immutable but it might be the tidiest approach to allowing multiple charsets without having to add quite a few new 'write' methods.

If we keep the current approach in this PR of adding new write methods, we could also add them to a new XmlWriter subclass of ObjectWriter.

@cowtowncoder
Copy link
Member

I'm not sure about the idea to force users to create a ToXmlGenerator. I understand that this avoids expanding the number of methods but it just doesn't feel like the natural way for users to use the API. In practice, most Jackson users don't know about JsonGenerators and the various subclasses like ToXmlGenerator.

Yes, that is problematic as well.

One alternative approach would be to add a setter on XmlMapper or XmlFactory where users can set a default charset.

That does not seem right either, although of the two, XmlFactory should probably have it if anything.

Could use Builder(); but could not set it on per-call basis.

The problem here is that if users need to create XML outputs with different charsets that they would need to create an XmlMapper instance for every charset that they need to support. We could have this setting on XmlMapper and allow it to be changed just before any 'write' call. This goes against the aim of making Mapper instances immutable but it might be the tidiest approach to allowing multiple charsets without having to add quite a few new 'write' methods.

If we keep the current approach in this PR of adding new write methods, we could also add them to a new XmlWriter subclass of ObjectWriter.

Unfortunately when I tried to make ObjectWriter (and ObjectReader) extensible, I quickly found out that just won't work wrt chaining. This was quite a while ago (5+ years?) but I don't think things have changed.

I need to think about this some more. I am not dead set against new methods in XmlMapper but still trying to think of something better. This might be why I was originally thinking of a handler, callback style approach; configurable on XmlMapper or XmlFactory, but more flexible. Not sure that'd be good either, but might be why I had the idea.

@pjfanning
Copy link
Member Author

@cowtowncoder I added an XmlWriter to this PR - an extension to ObjectWriter but with the additional write methods that I added to XmlMapper. The methods on XmlMapper that return an ObjectWriter will return an XmlMapper that can be cast to get access to the extra methods. I also added a toXmlWriter(ObjectWriter) static method to XmlMapper.

If these API changes are ok, I can add extra test coverage to this PR.

@pjfanning
Copy link
Member Author

@cowtowncoder is this PR worth continuing with?

@ifindya
Copy link

ifindya commented Jul 27, 2023

Good draft, really needed.

@cowtowncoder
Copy link
Member

Ok, so:

  1. I really want to support ability to specify encoding and other XML prolog settings
  2. I do not want ObjectWriter (or ObjectReader) to be sub-classed (I played with this idea at some point but... it doesn't really compose nicely, unfortunately, from what I remember)

given this I don't think this PR as-is will be worth pursuing. But I think it's useful to keep open until an alternative path is found.

@pjfanning pjfanning changed the base branch from 2.16 to 2.17 November 20, 2023 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants