Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(lorem): define allowed words #2885

Open
wants to merge 2 commits into
base: next
Choose a base branch
from

Conversation

xDivisionByZerox
Copy link
Member

Description

Document the expected words for a locale in the lorem module.

Related to

This is the first part of #2884.

@xDivisionByZerox xDivisionByZerox added c: docs Improvements or additions to documentation p: 1-normal Nothing urgent m: lorem Something is referring to the lorem module labels May 8, 2024
@xDivisionByZerox xDivisionByZerox added this to the v9.0 milestone May 8, 2024
@xDivisionByZerox xDivisionByZerox requested a review from a team May 8, 2024 10:19
@xDivisionByZerox xDivisionByZerox self-assigned this May 8, 2024
@xDivisionByZerox xDivisionByZerox requested a review from a team as a code owner May 8, 2024 10:19
Copy link

netlify bot commented May 8, 2024

Deploy Preview for fakerjs ready!

Name Link
🔨 Latest commit 79e8529
🔍 Latest deploy log https://app.netlify.com/sites/fakerjs/deploys/665995697eeb600007b0a2fd
😎 Deploy Preview https://deploy-preview-2885.fakerjs.dev
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@xDivisionByZerox xDivisionByZerox linked an issue May 8, 2024 that may be closed by this pull request
Copy link

codecov bot commented May 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.96%. Comparing base (a082ed2) to head (79e8529).

Additional details and impacted files
@@           Coverage Diff            @@
##             next    #2885    +/-   ##
========================================
  Coverage   99.95%   99.96%            
========================================
  Files        2986     2986            
  Lines      215926   215929     +3     
  Branches      598      950   +352     
========================================
+ Hits       215839   215855    +16     
+ Misses         87       74    -13     
Files Coverage Δ
src/modules/lorem/index.ts 100.00% <100.00%> (ø)

... and 2 files with indirect coverage changes

Copy link
Member

@ST-DDT ST-DDT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about using normal words if they dont use Latin. 🤔

@xDivisionByZerox
Copy link
Member Author

I thought about using normal words if they dont use Latin. 🤔

Oh wow, then I completly misunderstood what we discussed in the team meeting when we made the decision. Discuss again in tomorrows meeting?

@matthewmayer
Copy link
Contributor

We should survey what current non-Latin locales actually do and document that (eg are they nonsense words or real words or transliterations of Latin lorem?)

@ST-DDT ST-DDT added the s: needs decision Needs team/maintainer decision label May 8, 2024
@matthewmayer
Copy link
Contributor

matthewmayer commented May 9, 2024

25 locales currently have a lorem/words.ts file. For each i generated 5 sample words

Latin

code script words(5) notes
cs_CZ Latn quasi neque quasi delectus minima standard Latin lorem
de Latn excepturi inventore nihil eveniet velit standard Latin lorem
en Latn crur capillus denique veritas audacia standard Latin lorem
fr Latn aliquid vitae accusamus suscipit est standard Latin lorem
fr_CH Latn iure ratione dicta voluptas illo standard Latin lorem
nl Latn veritatis quibusdam maxime magnam possimus standard Latin lorem
pl Latn accusamus eaque deleniti quam distinctio standard Latin lorem
pt_BR Latn vitae aliquid temporibus laudantium nam standard Latin lorem
sk Latn quidem possimus corrupti odio voluptate standard Latin lorem
tr Latn optio natus quis aspernatur molestias standard Latin lorem
uz_UZ_latin Latn tutamen ullam magni auctor delectatio standard Latin lorem
en_BORK Latn thees lebureeuoos gesh ooccoor injuy real English words with spelling modifications
lv Latn māxīmē vulnēro xīphīās soļ āēgrotātīo standard Latin lorem with extra diacritics
es_MX Latn Fichero Incorpóreo Basurear Engarbarse Gendarme random real and nonsense Spanish words
vi Latn yêu bè vàng ngọt độc random real Vietnamese words

Non-latin

(im not really enough of an expert in non-Latin languages to tell if these are nonsense or real words)

code script words(5) notes
ar Arab الذات اصرخ ليونة أتذكر فشيأ
dv Thaa އިންގިލާބެއް އެންމެ ތަރައްގީ މުޅިން ގެއްލުންނުވާ
el Grek nihil similique laudantium aliquid qui standard Latin lorem (perhaps should be changed)
fa Arab تمام به پایان بلکه ستون داشت
he Hebr דולור תוק לפתיעם רוגצה קלאצי Transliteration mimicking lorem ipsum in Hebrew
hy Armn աշխարհում բոլորն մեկ հասած իրենց
ja Jpan 色々 独裁 錠 めいがら たて
ko Kore 형에 확정될 자유를 정한다. 범하고
ru Cyrl направлений модели внедрения профессионального играет Random real words
ur Arab چاسدسد چسد ساسدبھ اسدفگبطاسدفد اسداسدھدسبابگ ابنسد

Copy link
Contributor

@matthewmayer matthewmayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested rewrite. In some ways the concept of "lorem ipsum" is unique to Latin-based languages, because you can write words using Ancient Latin, a dead language, that are "foreign" and yet immediately obvious as "words" in most Latin-based languages.

So I'm not sure there's a definite "best" way to handle lorem-ipsum in non-Latin languages. In some languages it may make sense to transliterate Latin words into the script like "l-o-r-e-m", in other cases you may just want to us random words or characters.

@@ -2,6 +2,9 @@ import type { LocaleEntry } from './definitions';

/**
* The possible definitions related to lorem texts.
*
* The words in this module are determined by the ISO 15924 script of the locale.
* If a locale uses the Latin script, it will utilize Latin lorem words, while a locale using the Cyrillic script will use Cyrillic lorem words, and so forth.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* If a locale uses the Latin script, it will utilize Latin lorem words, while a locale using the Cyrillic script will use Cyrillic lorem words, and so forth.
* If a locale uses the Latin script, it should generally utilize Latin "lorem ipsum" words, while a locale using another script should use real or nonsense words to give the same effect as Latin lorem text.

@ST-DDT
Copy link
Member

ST-DDT commented May 23, 2024

Team Proposal

  • The lorem module will always return the Latin lorem words/sentences.
  • The lorem word definitions will be inlined into the module and removed from the locale data
  • We will add replacement methods to the word module, that generate sentences/paragraphs with words of that locale

@matthewmayer
Copy link
Contributor

Would it make more sense to move the lorem definitions to the base locale?

@Shinigami92
Copy link
Member

Team Proposal

  • The lorem module will always return the Latin lorem words/sentences.
  • The lorem word definitions will be inlined into the module and removed from the locale data
  • We will add replacement methods to the word module, that generate sentences/paragraphs with words of that locale

I was not available at last team meeting, and I'm missing any reasons 👀
Why will lorem always return latin?

@ST-DDT
Copy link
Member

ST-DDT commented May 26, 2024

Why will lorem always return latin?

Because all locales that use Latin characters, use Latin anyway. And those that dont, use normal words and are sometimes even incompatible with the Latin sentence structure.

@ST-DDT
Copy link
Member

ST-DDT commented May 30, 2024

Team Task

Everybody should make their suggestions for the lorem module defimnitions and expectations.
We will discuss the proposals/expectations in the next team meeting.

@Shinigami92
Copy link
Member

Team Task

Everybody should make their suggestions for the lorem module defimnitions and expectations. We will discuss the proposals/expectations in the next team meeting.

My expectation:

If we do not at least change the behavior of lorem, there should be a e.g. word/lorem.blindtext that generates randomized but localized placeholder texts (wiki:de:Blindtext, wiki:en:Filler_text)
name and module is up for discussion

@ST-DDT
Copy link
Member

ST-DDT commented May 31, 2024

@Shinigami92 Could you please elaborate on what you would consider the defining difference between the word module and the lorem module?

@Shinigami92
Copy link
Member

@Shinigami92 Could you please elaborate on what you would consider the defining difference between the word module and the lorem module?

Personally I would say word is for generating words like nouns, verbs and so on, while lorem is more like for placeholder texts for example to test responsive table cells in a frontend.

However I did not read our docs yet and both modules were from times long before I came into the project and so I won’t like to be made responsible for any historical decisions.
Instead I would like more to find a good way in the future and not look into the past.
So I don’t care if it is called lorem, word or anything else but the functionality is provided.

@ST-DDT
Copy link
Member

ST-DDT commented Jun 1, 2024

I won’t like to be made responsible for any historical decisions.

That was not my intention. I'm sorry.


I spend quite some time thinking about this. The following represents my personal opinion.
I'll split the answer to multiple comments in order to make it easier to react to them using emojis.
I hope the reactions (and other answers) help us (all) determine where we are one the same page and where we are not.

@ST-DDT
Copy link
Member

ST-DDT commented Jun 1, 2024

For me, lorem (module) is a specific type of blind text that you are not supposed to read/be able to understand.

@ST-DDT
Copy link
Member

ST-DDT commented Jun 1, 2024

If we define lorem to be latin, then those locales that currently return locale specific words would be impacted.
If we define lorem to be locale specific, then the locales that currently use latin would loose their original intent of creating un-understandable text.
If we not define lorem, then the locales that use latin are unable to generate (pseudo-)understandable blind texts unless we add new methods for that.

@ST-DDT
Copy link
Member

ST-DDT commented Jun 1, 2024

If we add the explicit concept of (pseudo-)understandable blind text, then it is likely that it will require the same or at least similar methods to the current lorem module, so that you are able to generate a string matching your length requirements.
These would cause conflicts with the existing methods and would need some form of disambiguation.
Either by prefixing them with blind e.g. blindText, blindSentence, or moving them to a different new module.

@matthewmayer
Copy link
Contributor

I think the real question is does "lorem" mean general blind text or specifically the Latin lotem ipsum text?

Does the idea of "Chinese lorem" or "Thai lorem" or "Hebrew lorem" make sense?

@ST-DDT
Copy link
Member

ST-DDT commented Jun 1, 2024

Does the idea of "Chinese lorem" or "Thai lorem" or "Hebrew lorem" make sense?

Important question.
For me, Chinese lorem is just a blind text.
If it doesn't use latin (like) characters including translations it is a blind text but not lorem.
If it is a "phonetic translation", then it is just gibberish using the locales characters. not better than faker.string.fromCharacters.
The closest non-latin character based equivalent I can think of is the japanese DoReMi spelling training thing(?), because I assume their brain turns off as soon as it recognizes the "intro". Which kinds of leads me to "we should probably start our lorem text with lorem ipsum", but that kind of is a different feature request altogether.

@matthewmayer Could you please share your opinion on any of these questions? Ultimately we need answers/shared opinions to form any kind of consensus.

@matthewmayer
Copy link
Contributor

I don't really know. I think we need to try and involve some native speakers of non-Latin languages.

@ST-DDT
Copy link
Member

ST-DDT commented Jun 6, 2024

Team Proposal

  • We want to make the lorem module to consist only of lorem ipsum (Latin)
  • We want to make a new text (or similar) module that generates blind texts in the current locale
    • We are not sure yet whether these will be hard-coded pseudo realistic sentences or wild combinations of words that may or may not follow the normal syntax of the language (e.g. firstName drives through city vs noun verb adjective vs word word word)
    • The module should roughly mimic the methods in the lorem module
    • The lorem module and the blind text module should have links (at each method) between each other to help with discovery
  • The actual redefinition on the lorem module is v10/not now (after we have the blind-text module)

@ST-DDT ST-DDT modified the milestones: v9.0, v10.0 Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: docs Improvements or additions to documentation m: lorem Something is referring to the lorem module p: 1-normal Nothing urgent s: needs decision Needs team/maintainer decision
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve specification of lorem module and definitions
4 participants