Skip to content

Design document for percent formatting #1068

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

aphillips
Copy link
Member

This document is focused for now on documenting the options.

This document is focused for now on documenting the options.
@aphillips aphillips added design Design document or issues related to design functions Issue pertains to the default function set LDML48 LDML48 Release labels Apr 7, 2025

#### Use a dedicated `:percent` function

Use a new function `:percent` dedicated to percentages.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider names besides :percent?

The function could apply to all dimensionless units including permille, permillion, perbillion, etc.

For example: {$var :dimensionless unit=permillion}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider other names. I'll add that option.

I'm not wild about unit=percent (mille, billion, etc. etc.). It's verbose and the other uses seems rare. Really only percent and permille are backed by CLDR data. The others strike me as special uses for unit or number formatting.

Comment on lines 160 to 161
#### Scaling
Implementation always scales the number. The value `0.5` formats as `50%`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go with the "always scales" approach, :unit can still be used to not scale.

@aphillips aphillips requested review from gibson042 and macchiati April 8, 2025 00:15
Copy link
Collaborator

@eemeli eemeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A first pass. I don't agree with the currently proposed design, but let's first get the available options better presented.

It would be nice to have more example code for the options.

Comment on lines +199 to +200
Implementation automatically does (or does not) scale.
There is an option to switch to the other behavior.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Implementation automatically does (or does not) scale.
There is an option to switch to the other behavior.
Formatter automatically does (or does not) scale.
There is an option to switch to the other behavior.
The option here may be:
- An option `scaling` with boolean values `true` and `false`.
- An option `scale` with a small set of supported integer values, possibly only `1` and `100`.

Comment on lines 92 to 94
- Allow `unit=percent` in `:unit` that is identical to `:percent` in formatting performance,
for compatibility with CLDR units,
but document that this usage is not preferred.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this mean that :unit unit=percent would or would not apply scaling? And why do we need or benefit from compatibility with CLDR units here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need or benefit from compatibility with CLDR units here?

We don't have to be compatible, except that currently the definition of the unit option values is completely delegated to the unit identifiers found here in TR35. It would be unfortunate to say "unit identifiers except this one specific one"

Would this mean that :unit unit=percent would or would not apply scaling?

That's a good question. In the most recent WG discussion, there was a sentiment that we should make them behave identically to avoid confusion. There's an equal sentiment that they should be opposite each other (for utility). Here I'm trying to express equivalent performance without binding to a specific scaling/not-scaling choice (since that is separate).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, my agreement with the proposal is dependent on :unit unit=percent not scaling. So, I see specifying that here one way or another as important.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I see specifying that here one way or another as important.

👍 It is imperative that we specify one or the other.

Currently, my agreement with the proposal is dependent on :unit unit=percent not scaling.

The proposal is to make :percent and :unit unit=percent perform identically, so both would scale by default. Is your opposition to :unit scaling so that message writers could get access to both behaviors without having to use a scale option? Articulating your reasoning will help me improve the design doc to include that as a design we considered (and perhaps sway consensus).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a misunderstanding of unit scaling.

For unit formatting, CLDR has both an input unit and an output unit, where the output unit typically depends on the unit preferences. For example, <3.5, meter> input with foot output formats as "11.5 feet" (in English). There is scaling involved, in the conversion of 3.5 to 11.5. If there is no specified output unit, or the output unit is explicitly the same as the input unit, then there is no scaling. Thus:

<3.5, meter> input with meter output doesn't scale.

  • If I supply <0.35 percent> as the input and the output unit were percent, it would format as 0.35%. Just like meter ==> meter doesn't scale.

However, if I supply the right input unit, then percent does scale (just like meter ==> foot). And the base unit is for such dimensionless units is 'part'.

With <0.35 part> as the input and the output unit of percent, the format is "35%".

Here are sample conversions that I just generated (no formatting)

0.35	part	0.35	part
0.35	part	35.0	percent
0.35	part	350.0	permille
0.35	part	3500.0	permyriad
0.35	part	350000.0	part-per-1e6
0.35	part	3.5E8	part-per-1e9

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understood that to be the case.

:unit can override the unit, in which case scaling occurs. The question is what happens when there is no other unit? Using MeasureFormat in ICU4J can only be an approximation, since the only way to call it is with a Measure object. Presumably a bare number operand in MF would, behind the scenes, be packaged with the unit.

I'm not suggesting that :unit does not convert. Only that the default behavior of unit=percent is unscaled given a numeric operand. This is different from MF1's handling of operand,number,percent formatting and the proposed performance of :percent. Do you disagree?

Copy link
Contributor

@bearfriend bearfriend Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is your opposition to :unit scaling so that message writers could get access to both behaviors without having to use a scale option?

Sorry I wasn't clear on that. Yes, I want them to act differently, so I guess that just isn't this proposal but an alternative. The reason, though, is not just to have access to both behaviors (though it's an excellent side benefit) but because it makes semantic/intuitive sense to me.

I see them as for different purposes, where the input value to :percent is for (or from) some computation which results in a ~number, and the input to :unit would be roughly a string (or semantically equivalent in its static intent, if that makes sense).

1/10 = .1 -> format via :percent -> "You've completed 10% of your tasks"
vs
A user inputs into a marketing tool a discount value of "10" and selects "%" (as opposed to "$", "lbs", "items" etc.), and that uses :unit to render things like "10% off", "$10 off", "Get 10 lbs free...", "Buy 10 get 1 free" or similar.

This is how I, presumptuously, think most people would expect each to work. Happy to be wrong about that, though.

I'm not terribly familiar with the input -> output scaling Mark mentioned, so I'll try to digest that a bit more and see if it changes my perspective. It doesn't initially seem problematic, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bearfriend I think those are both great use cases and will add them to the document.

Note that the "proposed solution" is a strawman. The alternatives considered are what is important. We'll see if a consensus emerges--or vote on which technical decisions to make.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@macchiati

There is a misunderstanding of unit scaling.

I added your example (suitably edited and expanded) at length to the document. Check for veracity.

Comment on lines +191 to +192
- Only percent and permille are backed with CLDR data and symbols.
Other scales would impose an implementation burden.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLDR has data for other scales, too, via portion units.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not really percent/per mille type scaling though, is it?

{{Completion: {$n :unit unit=percent}.}}
```

#### Use `:math multiply` to scale
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note my concern about implementation burden due to having to support a more general function than we actually need.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The :math function, currently in draft and originally included to support plural offsets from MF1, is certainly a potential "slippery slope".

Many programming languages have math-related classes or function sets with many different operators in them. The existence of a :math function in MF would certainly invite proposals for many of these to migrate into messages, regardless of utility. This in a specification that is strongly **UN**typed.

If we go down the :math route, I would suggest that we write a full design document, including considerations for what our policies would be about future expansion. We should also consider whether math is the right name or different design strategies, such as unbundling functionality into separate functions (is it a better imposition of burden to have separate required :add and :subtract functions than a required function that has addition, subtraction, scaling, etc. into which we might add hard-to-achieve functionality? There is also the question of versioning the :math function if we add new operations to it over time, creating a portability hazard)


**Pros**
- Consistent with ICU MessageFormat

Copy link
Member

@macchiati macchiati Apr 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Widely view as a number format option in spreadsheets and other contexts, so many people are familiar with it as a type of number format.
- Consistent with compact number formats, which _also_ scale; eg "3.5 M" for 3500000.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps debatable?

It's certainly proximate to numeric formats, at least in some spreadsheets. FWIW, we do group it into the number functions and it certainly takes a numeric operand. But I think a case can be made that :number type=percent or :percent are both intuitive--and the latter becomes maybe a bit more obvious given :currency.

The meta debate we're having is a classic in the I18N space: split or lump? Should we prefer functions that do many things with lots of options? Or should we prefer functions that do roughly one thing with minimal options (and lots and lots of functions)?


Note that the "123" button is "More Formats" in Google sheets:

image

Excel puts percent after date/time:

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A fair number of the other Pros/Cons are debatable... But I'll tweak my suggested change.

I'm not wild about :percent as a separate function; nor wild about :scientific or :engineering or :compact or even :integer. Just the sheer volume of duplicated options gets to be very daunting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the sheer volume of duplicated options gets to be very daunting.

The worst of all worlds would be lots of functions each of which has lots of options and where some functions are general purpose and overlap with special purpose ones. With support for custom functions, that will sometimes be unavoidable. But for the default function set we should have a clear policy/design philosophy. The meta debate is, in many ways, more important, than the concrete decision of what to name the percent formatting function (but percent is as good a trial horse, I think, as we're going to get). Note that the discussion about semantic skeletons also is considering the problem of function packaging.

@macchiati
Copy link
Member

macchiati commented Apr 21, 2025 via email

@eemeli
Copy link
Collaborator

eemeli commented Apr 22, 2025

I just realised that this whole discussion is also related to #1015 (review), which we probably ought to address as well.

In other words, as we currently don't have :number notation, we probably ought to figure out how we're going to do its style of scaling as well.

Comment on lines +75 to +78
In MF, a bare number literal, such as `.local $foo = {35}`
or an implementation-specific number type (such as an `int` in Java)
might be considered to use the input unit of `part`
unless we specified that the `percent` unit value or `:percent` function overrode the `part` unit with `percent`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does MeasureFormat or any other unit formatter implementation ever make such an assumption, of having a numerical input value not match the formatter's output units?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, it's not possible in MeasureFormat to format a number without a unit... and "part" is not a unit. MeasureFormat's Javadoc says that it doesn't do conversions. But then I shouldn't have been looking at that class, but rather NumberFormatter. That class implements the interface that @macchiati describes, including conversion. And it can be fed a number whose unit is "nothing":

        LocalizedNumberFormatter nf = NumberFormatter.withLocale(Locale.getDefault())
                .unit(MeasureUnit.PERCENT);
        System.out.println(nf.format(5.0));

This doesn't scale though. It produces 5%

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know about ICU, but CLDR does appear to treat part as a unit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly. There are various measures (such as parts-per-million, which is a flavor of concentr-part [concentration]) that have "parts", but not a measurement part all by itself. ICU represents the various measures in MeasureUnit and it doesn't have a standalone PART member.

Which is beside the point. The percent format in NumberFormatter is unscaled when working on a raw number.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Design document or issues related to design functions Issue pertains to the default function set LDML48 LDML48 Release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants