-
-
Notifications
You must be signed in to change notification settings - Fork 137
Improved Italian pluralization with multi-word inflection #279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 2.2.x
Are you sure you want to change the base?
Conversation
a12a49f to
99dfff8
Compare
|
Hi, I have completed the corrections on the PR. |
99dfff8 to
0138449
Compare
|
This would be great for other romance languages like Spanish as well. |
|
@garrettw yes, definitely, but I don't know why the PR hasn't been merged yet even though all the automated tests have passed. |
|
This is not a bugfix, it should target 2.1.x |
|
Yes, I confirm that it is not a fix but an expansion, although it introduces a simplification of the rules for pluralization and singularization of Italian words. |
| // Split the phrase into words and process each one | ||
| $words = preg_split('/([ -])/', $word, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY); | ||
| if ($words === false) { | ||
| return parent::inflect($word); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this fails, it feels like you should rather throw an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d say that as a fallback it’s right for it to handle the string as if it were a single word, so I would leave it like that without raising exceptions, because in Italian there are often compound or multiple words, so if there were any unmanaged cases exceptions would be raised, whereas this way it would still be handled.
| ['fascia', 'fasce'], | ||
| ['fascio', 'fasci'], | ||
| ['targa', 'targhe'], | ||
| // ['carta di credito', 'carte di credito'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are some of these commented out? I can see that some multi-word phrases are inflected correctly, is there an issue with these particular ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left it commented out because it's a compound-word case that uses a preposition and we should handle it with a dedicated PR separately since Italian is very complex grammatically. So when there are words like "carta di credito" it should convert to "carte di credito" and, as you can see, the last word stays singular. If I uncommented that test right now the test would fail, but simply because the current multilingual inflector does not handle prepositions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scusa, i miei giorni di scuola italiana erano molti anni fá, quindi non mi ricordo tutte le regole grammaticali. ;)
Is there a pattern we can apply? As much as I appreciate adding an inflector for Italian, I'd like to not merge it while it doesn't support a good part of phrases, as that tends to generate a number of bug reports once the inflector is used. Unfortunately my high school days are long gone, so I forgot most of my Italian grammar rules, but if it always affects <noun> <preposition> <word> for a list of prepositions (best list I could find was "di, a, da, in, con, su, per, tra, fra", not sure if we should add "allo, dalla" and some others) then it might be possible to extract the noun, inflect that as its own word, then reassemble the compound phrase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ciao! Non sapevo avessi studiato italiano! Grande! 💪🏼
I confirm that the current behavior is intentional: "simple" compound words (e.g., compounds without prepositions like “carta intestata” or joined compound words) are already handled by the inflector and the related works; whereas phrases of the type (e.g., "carta di credito") are deliberately left to fallback and therefore commented out in the tests because the current logic does not correctly inflect only the initial part while leaving the rest unchanged.
Italian has morphological/agreement rules and cases (articles, elisions, contractions) that make prepositional cases more complex.
Implementing robust handling requires: recognizing the preposition, isolating the noun to inflect, respecting agreement rules and any elisions/contractions (e.g., "a + il = al", "di + lo = dello") and an extensive list of prepositions/variants.
To avoid regressions and user-facing bugs I have chosen to leave these cases as a fallback until a specific and tested solution is introduced.
But for now the inflection rules have been simplified at the core of Italian grammar and everything works as it should, except for possible incorrect inflections caused by words with irregular pluralization/singularization forms, which will be added gradually because there is no list of all the irregular words of the Italian language from which we can draw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you be willing to take a stab at figuring out whether the inflector is able to handle these prepositional cases? I haven't dug into the internals of the inflector, so I'm not sure if it's possible to fetch and handle the full phrase. Doesn't have to be in this PR, it's just about figuring out whether it's actually possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it’s absolutely possible, and a solution could certainly be found. In the end a very simple rule might be to make the word before the preposition plural, while leaving the word after it singular. This alone would cover 90% of cases. Ex. "carta di credito" will become "carte di credito". As you can see, the last word "credito" it remains to singular form.
Then now that I think about it, it should work the same way in English: if I want to go from the singular "carta di credito" to the plural "carte di credito", it should become "credit card" which becomes "credit cards" and not "credits cards". Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is correct. Good to know -- with that in mind, I'd be fine merging this and then taking care of prepositional cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be awesome
|
|
||
| return implode('', $result); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class should come with tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have implemented the tests for the MultiWordInflector class.
35b4de3 to
2e23870
Compare
In Italian, unlike English, when forming the plural of a combination of words, often all the words need to be pluralized, not just the last one. For example, "libro giallo" (yellow book) becomes "libri gialli" (yellow books) so both the noun and the adjective change.
To handle this peculiarity, I created a new class called MultiWordInflector that uses and extends the standard WordInflector class to correctly transform each word in a phrase into plural or singular.
I had to modify some parts of the word inflection system. In particular, the RulesetInflector class was changed into an abstract class based on a generic class, so that a new version specific to Italian could be created without altering the original one. I also updated the class factory to allow choosing which version of RulesetInflector to use. Finally, I modified the Italian inflection class to use this new configuration and thus allow correct plural handling in multi-word phrases.