Skip to content

libicu v78 formatting changes #7047

@ShortDevelopment

Description

@ShortDevelopment

The following Intl test started failing in MacOS ci likely to the new libicu v78.1 (see #7038 (comment))

// the "literal" tested here is the first of two literals, the second of which is a space between "12" and "AM"
test({ hour: "numeric", weekday: "long" }, ["weekday", "literal", "hour", "dayPeriod"], ["Saturday", ", ", "12", "AM"]);

ci-linux.txt
ci-macos.txt

ChakraCore

CC generates a skeleton in javascript

const formatToParts = createPublicMethod("Intl.DateTimeFormat.prototype.formatToParts", function formatToParts(date) {
/**
* Given a user-provided options object, getPatternForOptions generates a LDML/ICU pattern and then
* sets the pattern and all of the relevant options implemented by the pattern on the provided dtf before returning.
*
* @param {Object} dtf the DateTimeFormat internal object
* @param {Object} options the options object originally given by the user
*/
const getPatternForOptions = (function () {
// symbols come from the Unicode LDML: http://www.unicode.org/reports/tr35/tr35-dates.html#Date_Field_Symbol_Table
const symbolForOption = {
weekday: "E",
era: "G",
year: "y",
month: "M",
day: "d",
// for hour, we have some special handling
hour: "j", hour12: "h", hour24: "H",
minute: "m",
second: "s",
timeZoneName: "z",
};
// NOTE - keep this up to date with the map in PlatformAgnostic::Intl::GetDateTimePartKind and the UDateFormatField enum
const optionForSymbol = {
E: "weekday", c: "weekday", e: "weekday",
G: "era",
y: "year", u: "year", U: "year",
M: "month", L: "month",
d: "day",
h: "hour", H: "hour", K: "hour", k: "hour",
m: "minute",
s: "second",
z: "timeZoneName", Z: "timeZoneName", v: "timeZoneName", V: "timeZoneName", O: "timeZoneName", X: "timeZoneName", x: "timeZoneName",
};
// lengths here are how many times a symbol is repeated in a skeleton for a given option
// the Intl spec recommends that Intl "short" -> CLDR "abbreviated" and Intl "long" -> CLDR "wide"
const symbolLengthForOption = {
numeric: 1,
"2-digit": 2,
short: 3,
long: 4,
narrow: 5,
};
const optionForSymbolLength = {
1: "numeric",
2: "2-digit",
3: "short",
4: "long",
5: "narrow",
};
// for fixing up the hour pattern later
const patternForHourCycle = {
h12: "h",
h23: "H",
h11: "K",
h24: "k",
};
const hourCycleForPattern = {
h: "h12",
H: "h23",
K: "h11",
k: "h24",
};
// take the hour12 option by name so that we dont call the getter for options.hour12 twice
return function (dtf, options, hour12) {
const resolvedOptions = _.reduce(dateTimeComponents, function (resolved, component) {
const prop = component[0];
const value = GetOption(options, prop, "string", component[1], undefined);
if (value !== undefined) {
resolved[prop] = value;
}
return resolved;
}, _.create());
const hc = dtf.hourCycle;
// Build up a skeleton by repeating skeleton keys (like "G", "y", etc) for a count corresponding to the intl option value.
const skeleton = _.reduce(_.keys(resolvedOptions), function (skeleton, optionKey) {
let optionValue = resolvedOptions[optionKey];
if (optionKey === "hour") {
// hour12/hourCycle resolution in the spec has multiple issues:
// hourCycle and -hc can be out of sync: https://github.com/tc39/ecma402/issues/195
// hour12 has precedence over a more specific option in hourCycle/hc
// hour12 can force a locale that prefers h23 and h12 to use h11 or h24, according to the spec
// We temporarily work around these similarly to firefox and implement custom hourCycle/hour12 resolution.
// TODO(jahorto): follow up with Intl spec about these issues
if (hour12 === true || (hour12 === undefined && (hc === "h11" || hc === "h12"))) {
optionKey = "hour12";
} else if (hour12 === false || (hour12 === undefined && (hc === "h23" || hc === "h24"))) {
optionKey = "hour24";
}
}
return skeleton + _.repeat(symbolForOption[optionKey], symbolLengthForOption[optionValue]);
}, "");
let pattern = platform.getPatternForSkeleton(dtf.locale, skeleton);
// getPatternForSkeleton (udatpg_getBestPattern) can ignore, add, and modify fields compared to the markers we gave in the skeleton.
// Most importantly, udatpg_getBestPattern will determine the most-preferred hour field for a locale and time type (12 or 24).
// Scan the generated pattern to extract the resolved fields, and fix up the hour field if the user requested an explicit hour cycle
let inLiteral = false;
let i = 0;
while (i < pattern.length) {
let cur = pattern[i];
const isQuote = cur === "'";
if (inLiteral) {
if (isQuote) {
inLiteral = false;
}
++i;
continue;
} else if (isQuote) {
inLiteral = true;
++i;
continue;
} else if (cur === " ") {
++i;
continue;
}
// we are not in a format literal, so we are in a symbolic section of the pattern
// now, we can force the correct hour pattern and set the internal slots correctly
if (cur === "h" || cur === "H" || cur === "K" || cur === "k") {
if (hc && hour12 === undefined) {
// if we have found an hour-like symbol and the user wanted a specific hour cycle,
// replace it and all such proceding contiguous symbols with the symbol corresponding
// to the user-requested hour cycle, if they are different
const replacement = patternForHourCycle[hc];
if (replacement !== cur) {
if (pattern[i + 1] === cur) {
// 2-digit hour
pattern = _.substring(pattern, 0, i) + replacement + replacement + _.substring(pattern, i + 2);
} else {
// numeric hour
pattern = _.substring(pattern, 0, i) + replacement + _.substring(pattern, i + 1);
}
// we have modified pattern[i] so we need to update cur
cur = pattern[i];
}
} else {
// if we have found an hour-like symbol and the user didnt request an hour cycle,
// set the internal hourCycle property from the resolved pattern
dtf.hourCycle = hourCycleForPattern[cur];
}
}
let k = i + 1;
while (k < pattern.length && pattern[k] === cur) {
++k;
}
const resolvedKey = optionForSymbol[cur];
const resolvedValue = optionForSymbolLength[k - i];
dtf[resolvedKey] = resolvedValue;
i = k;
}
dtf.pattern = pattern;
};
})();

and uses udatpg_getBestPattern to let libicu generate an appropriate pattern.

Var IntlEngineInterfaceExtensionObject::EntryIntl_GetPatternForSkeleton(RecyclableObject *function, CallInfo callInfo, ...)
return udatpg_getBestPatternWithOptions(
dtpg,
reinterpret_cast<const UChar *>(skeleton->GetSz()),
skeleton->GetLength(),
UDATPG_MATCH_ALL_FIELDS_LENGTH,
buf,
bufLen,
status
);

libicu

The table below shows the behavior changes of udatpg_getBestPattern (de-DE is just for additional context)
Notice the missing , in en-US v78.1.

version locale skeleton pattern
v77.1 en-US EEEEh cccc, h/a
cccch cccc, h/a
de-DE EEEEh cccc, h 'Uhr' a
cccch cccc, h 'Uhr' a
v78.1 en-US EEEEh EEEE h/a
cccch EEEE h/a
de-DE EEEEh EEEE, h/a
cccch EEEE, h/a

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions