Skip to content

Design document for percent formatting #1068

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
363 changes: 363 additions & 0 deletions exploration/percent-format.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,363 @@
# Formatting Percent Values

Status: **Proposed**

<details>
<summary>Metadata</summary>
<dl>
<dt>Contributors</dt>
<dd>@aphillips</dd>
<dt>First proposed</dt>
<dd>2025-04-07</dd>
<dt>Pull Requests</dt>
<dd>#1068</dd>
</dl>
</details>

## Objective

_What is this proposal trying to achieve?_

One the capabilities present in ICU MessageFormat is the ability to format a number as a percentage.
This design enumerates the approaches considered for adding this ability as a _default function_
in Unicode MessageFormat.

## Background

_What context is helpful to understand this proposal?_

> [!NOTE]
> This design is an outgrowth of discussions in #956 and various teleconferences.

Developers and translators often need to insert a numeric value into a formatted message as a percentage.
The format of a percentage can vary by locale including
the symbol used,
the presence or absence of spaces,
the shaping of digits,
the position of the symbol,
and other variations.

One of the key problems is whether the value should be "scaled".
That is, does the value `0.5` format as `50%` or `0.5%`?
Developers need to know which behavior will occur so that they can adjust the value passed appropriately.

> [!NOTE]
> In ICU4J:
> - MessageFormat (MF1) scales.
> - MeasureFormat does not scale.
>
> In JavaScript:
> - `Intl.NumberFormat(locale, { style: 'percent' })` scales
> - `Intl.NumberFormat(locale, { style: 'unit', unit: 'percent' })` does not scale

It is also possible for Unicode MessageFormat to provide support for scaling in the message itself,
perhaps by extending the `:math` function.

An addition concern is whether to add a dedicated `:percent` function,
use one of the existing number-formatting functions `:number` and `:integer` with an option `type=percent`,
or use the proposed _optional_ function `:unit` with an option `unit=percent`.
Combinations of these approached might also be used.

### Unit Scaling

This section describes the scaling behavior of ICU's `NumberFormatter` class and its `unit()` method,
which is one model for how Unicode MessageFormat might implement percents and units.
There is a difference between _input_ scaling and _output_ scaling in ICU's `NumberFormatter`.

For example, an input of <3.5, `meter`> with `meter` as the output unit doesn't scale.

If one supplies <0.35 `percent`> as the input and the output unit were `percent`,
`MeasureFormat` would format as 0.35%.
Just like `meter` ==> `meter` doesn't scale.

However, if one supplies a different input unit, then percent does scale
(just like `meter` ==> `foot`).
The base unit for such dimensionless units is called 'part'.
In MF, a bare number literal, such as `.local $foo = {35}`
or an implementation-specific number type (such as an `int` in Java)
might be considered to use the input unit of `part`
unless we specified that the `percent` unit value or `:percent` function overrode the `part` unit with `percent`.
Comment on lines +76 to +79
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does MeasureFormat or any other unit formatter implementation ever make such an assumption, of having a numerical input value not match the formatter's output units?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, it's not possible in MeasureFormat to format a number without a unit... and "part" is not a unit. MeasureFormat's Javadoc says that it doesn't do conversions. But then I shouldn't have been looking at that class, but rather NumberFormatter. That class implements the interface that @macchiati describes, including conversion. And it can be fed a number whose unit is "nothing":

        LocalizedNumberFormatter nf = NumberFormatter.withLocale(Locale.getDefault())
                .unit(MeasureUnit.PERCENT);
        System.out.println(nf.format(5.0));

This doesn't scale though. It produces 5%

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know about ICU, but CLDR does appear to treat part as a unit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly. There are various measures (such as parts-per-million, which is a flavor of concentr-part [concentration]) that have "parts", but not a measurement part all by itself. ICU represents the various measures in MeasureUnit and it doesn't have a standalone PART member.

Which is beside the point. The percent format in NumberFormatter is unscaled when working on a raw number.


With <0.35 `part`> as the input and the output unit of `percent`, the format is "35%".

| Amount | Input Unit | Formatted Value with... | Unit |
|---|---|---|---|
| 0.35 | part | 0.35 | part |
| 0.35 | part | 35.0 | percent |
| 0.35 | part | 350.0 | permille |
| 0.35 | part | 3500.0 | permyriad |
| 0.35 | part | 350000.0 | part-per-1e6 |
| 0.35 | part | 3.5E8 | part-per-1e9 |

## Use-Cases

_What use-cases do we see? Ideally, quote concrete examples._

Developers wish to write messages that format a numeric value as a percentage in a locale-sensitive manner.

The numeric value of the operand is not pre-scaled because it is the result of a computation,
e.g. `var savings = discount / price`.

The numeric value of the operant is pre-scaled,
e.g. `var savingsPercent = 50`

Users need control over most formatting details, identical to general number formatting:
- negative number sign display
- digit shaping
- minimum number of fractional digits
- maximum number of fractional digits
- minimum number of decimal digits
- group used (for very large percentages, i.e. > 999%)
- etc.

## Requirements

_What properties does the solution have to manifest to enable the use-cases above?_

- **Be consistent**
- Any solution for scaling percentages should be a model for other, similar scaling operations,
such as _per-mille_ or _per-myriad_,
as well as other, non-percent or even non-unit scaling.
This does not mean that a scaling mechanism or any particular scaling mechanism itself is a requirement.
- Any solution for formatting percentages should be a model for solving related problems with:
- per-mille
- per-myriad
- compact notation
- scientific notation
- (others??)

## Constraints

_What prior decisions and existing conditions limit the possible design?_

## Proposed Design

_Describe the proposed solution. Consider syntax, formatting, errors, registry, tooling, interchange._

- Use a dedicated function `:percent` that scales by default.
- Provide an option `scaling` with values `true` and `false` and defaulting to `true`.
- Provide all options identical to `:number` _except_ that `select` does not provide `ordinal` value.
- Allow `unit=percent` in `:unit` that is identical to `:percent` in formatting capabilities,
for compatibility with CLDR units,
but document that this usage is not preferred.

## Alternatives Considered

_What other solutions are available?_
_How do they compare against the requirements?_
_What other properties they have?_

### Combinations of Functions and Scaling

Any proposed design needs to choose one or more functions
each of which has a scaling approach
or a combination of both.
It is possible to have separate functions, one that is scaling and one that is non-scaling.

Some working group members suspect that having a function that scales and one that does not
would represent a hazard,
since users would be forced to look up which one has which behavior.

Other working group members have expressed that the use cases for pre-scaled vs. non-pre-scaled are separate
and that having separate functions for these is logically sensible.

### Function Alternatives

#### Use `:unit`

Leverage the `:unit` function by using the existing unit option value `percent`.
The ICU implementation of `MeasureFormat` does **_not_** scale the percentage,
although this does not have to be the default behavior of UMF's percent unit format.

```
You saved {$savings :unit unit=percent} on your order today!
```

The `:unit` alternative could also support other unit-like alternatives, such as
_per-mille_ and _per-myriad_ formatting.
It doesn't fit as cleanly with other notational variations left out of v47, such as
compact notation (1000000 => 1M, 1000 => 1K),
or scientific notation (1000000 => 1.0e6).

_Pros_
- Uses an existing set of functionality
- Might provide a more consistent interface for formatting "number-like" values
- Keeps percentage formatting out of `:number` and `:integer`, making those functions more "pure"

_Cons_
- `:unit` won't be REQUIRED, so percentage format will not be guaranteed across implementations.
Requiring `:unit type=percent` would be complicated at best.
- Implementation of `:unit` in its entirely requires significantly more data than implementation of
percentage formatting.
- More verbose placeholder

---

#### Use `:number`/`:integer` with `type=percent`

Use the existing functions for number formatting with a separate `style` option for `percent`.
(This was previously the design)

```
You saved {$savings :number style=percent} on your order today!
```

_Pros_
- Consistent with ICU MessageFormat

Copy link
Member

@macchiati macchiati Apr 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Widely view as a number format option in spreadsheets and other contexts, so many people are familiar with it as a type of number format.
- Consistent with compact number formats, which _also_ scale; eg "3.5 M" for 3500000.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps debatable?

It's certainly proximate to numeric formats, at least in some spreadsheets. FWIW, we do group it into the number functions and it certainly takes a numeric operand. But I think a case can be made that :number type=percent or :percent are both intuitive--and the latter becomes maybe a bit more obvious given :currency.

The meta debate we're having is a classic in the I18N space: split or lump? Should we prefer functions that do many things with lots of options? Or should we prefer functions that do roughly one thing with minimal options (and lots and lots of functions)?


Note that the "123" button is "More Formats" in Google sheets:

image

Excel puts percent after date/time:

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A fair number of the other Pros/Cons are debatable... But I'll tweak my suggested change.

I'm not wild about :percent as a separate function; nor wild about :scientific or :engineering or :compact or even :integer. Just the sheer volume of duplicated options gets to be very daunting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the sheer volume of duplicated options gets to be very daunting.

The worst of all worlds would be lots of functions each of which has lots of options and where some functions are general purpose and overlap with special purpose ones. With support for custom functions, that will sometimes be unavoidable. But for the default function set we should have a clear policy/design philosophy. The meta debate is, in many ways, more important, than the concrete decision of what to name the percent formatting function (but percent is as good a trial horse, I think, as we're going to get). Note that the discussion about semantic skeletons also is considering the problem of function packaging.

_Cons_
- It's the only special case remaining in these functions,
unless we also restore compact, scientific, and other notational variations.

---

#### Use a dedicated `:percent` function

Use a new function `:percent` dedicated to percentages.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider names besides :percent?

The function could apply to all dimensionless units including permille, permillion, perbillion, etc.

For example: {$var :dimensionless unit=permillion}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider other names. I'll add that option.

I'm not wild about unit=percent (mille, billion, etc. etc.). It's verbose and the other uses seems rare. Really only percent and permille are backed by CLDR data. The others strike me as special uses for unit or number formatting.


```
You saved {$savings :percent} on your order today!
```

> [!NOTE]
> @sffc suggested that we should consider other names for `:percent`.
> The name shown here could be considered a placeholder pending other suggestions.

_Pros_
- Least verbose placeholder
- Clear what the placeholder does; self-documenting
- Consistent with separating specialized formats from `:number`/`:integer`
as was done with `:currency`

_Cons_
- Adds to a (growing) list of functions
- Not "special enough" to warrant its own formatter?
- Unlike `:currency`, because currency formatting depends on currency codes,
which in turn impact default fraction digits, and other presentation details.
Nothing like that applies to percents.

---

#### Use a generic scaling function

Use a new function with a more generic name so that it can be used to format other scaled values.
For example, it might use an option `unit` to select `percent`/`permille`/etc.

```
You saved {$savings :dimensionless unit=percent} on your order today!
You saved {$savings :scaled per=100} on your order today!
```

_Pros_
- Could be used to support non-percent/non-permille scales that might exist in other cultures
- Somewhat generic
- Unlike currency or unit values, "per" units do not have to be stored with the value to prevent loss of fidelity,
since the scaling is done to a plain old number.
This would not apply if the values are not scaled.

_Cons_
- Only percent and permille are backed with CLDR data and symbols.
Other scales would impose an implementation burden.
Comment on lines +259 to +260
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLDR has data for other scales, too, via portion units.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not really percent/per mille type scaling though, is it?

- More verbose. Might be harder for users to understand and use.

### Scaling Alternatives

#### No Scaling
User has to scale the number.
The value `0.5` formats as `0.5%`

> Example.
> ```
> .local $pctSaved = {50}
> {$pctSaved :percent}
> ```
> Prints as `50%`.

#### Always Scale
Implementation always scales the number.
The value `0.5` formats as `50%`

> Example.
> ```
> .local $pctSaved = {50}
> {$pctSaved :percent}
> ```
> Prints as `5,000%`.

#### Optional Scaling
Implementation automatically does (or does not) scale.
There is an option to switch to the other behavior.
Comment on lines +288 to +289
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Implementation automatically does (or does not) scale.
There is an option to switch to the other behavior.
Formatter automatically does (or does not) scale.
There is an option to switch to the other behavior.
The option here may be:
- An option `scaling` with boolean values `true` and `false`.
- An option `scale` with a small set of supported integer values, possibly only `1` and `100`.


> Example. Note that `scale=false` is only to demonstrate switching.
>```
> .local $pctSaved = {50}
> {$pctSaved :percent} {$pctSaved :percent scale=false}
>```
> Prints as `5,000% 50%` if `:percent` is autoscaling by default

#### Provide scaling via additions to `:math`
Regardless of the scaling done by the percent formatting function,
there might need to be an in-message mechanism for scaling/descaling values.
The (currently DRAFT) function `:math` was added to support offsets in number matching/formatting.
Extension of `:math` to support other mathematical capabilities would allow for scaling.

> Example.
>```
> .local $pctSaved = {0.5}
> .local $pctScaled = {$pctSaved :math exp=2}
> {$pctSaved :percent} {$pctScaled :unit unit=percent}
>```
> Prints as `50% 50%` if `:percent` is autoscaling by default and `:unit` is not.

_Pros_
- Users may find utility in performing math transforms in messages rather than in business logic.
- Should be easy to implement, given that basic math functionality is common

_Cons_
- Implementation burden, especially when providing generic mathematical operations
- Designs should be generic and extensible, not tied to short term needs of a given formatter.
- Potential for abuse and misuse is higher.
- "Real" math utilities or classes tend to have a long list of functions with many capabilities.
A complete implementation would require a lot of design work and effort or introduce
instability into the message regime as new options are introduced over time.
Compare with `java.lang.Math`

Two proposals exist for using `:math`:

##### Use `:math exp` to scale
Provide functionality to scale numbers with integer powers of 10 using the `:math` function.

Examples using `:unit`, each of which would format as "Completion: 50%.":
```
.local $n = {50}
{{Completion: {$n :unit unit=percent}.}}

.local $n = {0.5 :math exp=2}
{{Completion: {$n :unit unit=percent}.}}
```

_Pros_
- Avoids multiplication of random values
- Useful for other scaling operations

_Cons_
- Cannot use _digit size option_ as the `exp` option value type, since negative exponents are a Thing


##### Use `:math multiply` to scale
Provide arbitrary integer multiplication functionality using the `:math` function.

Examples using `:unit`, each of which would format as "Completion: 50%.":
```
.local $n = {50}
{{Completion: {$n :unit unit=percent}.}}

.local $n = {0.5 :math multiply=100}
{{Completion: {$n :unit unit=percent}.}}
```

_Pros_
- Can be used for other general purpose math

_Cons_
- Brings in multiplication