-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor StdDateFormat #1671
Refactor StdDateFormat #1671
Conversation
…ateDeserializationTest test cases and under gh1667, gh1668
Sounds good; I'll have a look. Forgot to add, that to me combination of improvements is fine, so that:
I assume this is along the ideas you are doing anyway, but thought I'll mention this. |
|
||
if (looksLikeISO8601(dateStr)) { // also includes "plain" | ||
dt = parseAsISO8601(dateStr, pos, true); | ||
} else { | ||
// Also consider "stringified" simple time stamp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this removed because it is handled at higher level, by deserializer (and not DateFormat
implementation)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really... look a couple lines further down in the same method...
Cfr https://github.com/brenuart/jackson-databind/blob/stddateformat/src/main/java/com/fasterxml/jackson/databind/util/StdDateFormat.java#L281-L291
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
} | ||
} | ||
|
||
protected int indexOfAnyFromEnd(String s, int startPos, int endPos, char... chars) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like only 2 calls; one with 1 char (which can just use String.lastIndexOf(ch)
), another with 2. So maybe write version for second case, to avoid inner loop as well as varargs.
|
||
private transient Map<DateFormat, SimpleDateFormat> _clonedFormats = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maps are bit expensive, esp. when key is date format (depending on how equals
and hashCode
are defined).
What would work better would be EnumMap
since that's basically just an array with index. So, create private enum
with variants, pass that.
Although alternative could be to just always clone()
/re-create instance for every call, and completely do away with recycling.
And yet third possibility would be to use ThreadLocal
, with container object. That would probably work best as that would allow reuse across calls, without possibility of race conditions. The only (?) concern would be that of how to clear settings if they are changes. So maybe that doesn't work after all. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enum is a good option indeed - I think we could go that way.
ThreadLocal is another I thought off. We could even build something like a DateFormatCache/factory from which we would ask for pre-configured instances. This cache could be used at different places: by the StdDateFormat but maybe also by the DateBasedDeserializer for when it creates Contextual instances. I would love to add something like the baseline vs. local DateFormat instances like what you did here initially. I planned to look at that possibility just after...
Ok, I think this would be a solid improvement. Given amount of change, I think this would make most sense for 2.9; leaving 2.8 as is. One other thing I was to suggest, was just this: it is probably fine to make use of What do you think? Apologies for adding more work here, but I think that perhaps existing code just should be mostly killed, at least wrt 8601-handling - with - |
I agree with you.
Maybe... This would certainly make the code more readable - but I'm not sure about performance. We should do some benchmarking to have some figures I guess... The current code could have been made simpler if we drop support for the optional time fields. I kept that feature because the previous version tried to make both minutes and seconds optional. If that is not a desired feature - then we would have less to parse from the input string and would delegate more to the SimpleDateFormat. Writing a tolerant parser is not a trivial task... There are so many weird cases to take into account. Honestly, I think we should validate the proposed BNF first. Possibly with additional comments like the actual behavior with +3 secfrac digits... Maybe remove some not so desired flexibility, etc. Once we are fine with it - then we can think about another implementation and the see if we can make it without the SimpleDateFormat...
That's fine - I was expecting your comments. It helps me to understand your vision and expectations too... Expect some more commits on this PR after your comments. |
@brenuart JDK's regex I suspect Support for optional fields is, at least with basic regexp grouping, simple to add. Validation then is just ensuring that input string matches, after which fields can be extracted ("missing" optional field is expressed as empty String for matching group). |
One alternate suggestion: would it be possible to refactor new tests into 2 sections:
latter would be added under Plus most importantly we could add these test cases before patching implementation to pass more failing tests. |
I decided to go bit different route, as per: #1678, and made some changes to tests.
I will also see if I can rewrite serialization; changes so far checked are for deserialization. |
This PR addresses issues highlighted in
DateDeserializationTest
test cases + #1667 & #1668The original version had different code path to handle
+hhmm
,+hh:mm
,Z
and not TZ hence producing different and inconsistent results for edge cases (like missing digits or time field).This new version makes sure the sanitisation process is the same for all cases. For instance, if we detect a
Z
timezone indicator, it is replaced by+0000
before going through the next steps.I made sure this new version behaves the same as the original and accepts the same input. Of course it differs for cases that were "wrongly" accepted or refused... The test cases defined in
DateDeserilizationTest
were very helpful to detect changes in behavior...Concerning ISO8601, this new version behaves like described by the following BNF:
The parser accepts
date-time
,full-date
andpartial-time
inputs.Optional Digits
Month, days, hours, minutes and second can be expressed with 1 or 2 digits.
Optional Fields
As described in ISO8601, any number of values may be dropped from any of the date and time representations, but in the order from the least to the most significant.
The parser supports optional hours, minutes and seconds. This means the following forms are now accepted whatever the timezone indicator.
2000-01-02T03:04:05
2000-01-02T03:04
2000-01-02T03
2000-01-02
The original version had support for optional seconds but only if a timezone was present (failed if no timezone).
Fraction of Seconds vs. millis
Cfr. #1668
This new version dropped the millis concept in favour of the more correct fraction of seconds. This means the following forms now produce the correct/expected result:
2000-01-02T03:04:05.1
--> 100 millis2000-01-02T03:04:05.01
--> 10 millis2000-01-02T03:04:05.001
--> 1 millisIf more than 3 digits after the dot:
2000-01-02T03:04:05.1234
--> 5 seconds + 0.1234 secondsLeniency
Leniency is fully supported on all fields except the old "millis" as it is now interpreted as a faction of second. The original version had a hard limit of 2 digits on some fields that somehow reduced the leniency support.
Although not expressed in the BNF above (maybe I should write another for the lenient mode), the following is now accepted:
2000-01-32
--> equivalent to2000-02-01
2000-01-02T03:04:61
--> equivalent to2000-01-02T03:05:01
When Leniency is turned off, the parser throws a ParseException when the field is not within the accepted range.
Others
Some other "technical" changes:
if/then/else
conditions and make the code flow easier to follow.parse(String)
andparse(String, ParsePosition)
methodsParseException
did not take into account the current position set in theParsePosition
StringBuffer
is now used through out the sanitisation process reducing the String manipulation overheadWHAT NOW?
I reviewed the existing test cases so they reflect the new behaviour. Diff against the previous version to see the impact...
I also discovered some of these tests are non-sense, some are now deprecated but some new may have to be written in the light of the proposed BNF. I have not done it yet so the only changes clearly highlight the diff between this version and the previous.
IMHO most of these tests should explicitly test
StdDateFormat
without Jackson and be hosted in a separated file. The stuff that involve Jackson - like timezone settings, annotations, etc - should be separated and could stay in the current test class.