From 7c64ba6c4489d83b080a360e0e257a4f7384f17b Mon Sep 17 00:00:00 2001 From: steppi Date: Thu, 1 Feb 2024 20:05:09 -0500 Subject: [PATCH 1/4] PDEP-14: Publish translations of pandas.pydata.org --- .../pdeps/0014-translate-website-content.md | 140 ++++++++++++++++++ 1 file changed, 140 insertions(+) create mode 100644 web/pandas/pdeps/0014-translate-website-content.md diff --git a/web/pandas/pdeps/0014-translate-website-content.md b/web/pandas/pdeps/0014-translate-website-content.md new file mode 100644 index 0000000000000..d3780392be657 --- /dev/null +++ b/web/pandas/pdeps/0014-translate-website-content.md @@ -0,0 +1,140 @@ +# PDEP-14: Publish translations of pandas.pydata.org + +- Created: 01 February 2024 +- Status: Under discussion +- Discussion: [#56301](https://github.com/pandas-dev/pandas/issues/56301) +- Author: [Albert Steppi](https://github.com/steppi), +- Revision: 1 + +## Abstract + +The suggestion is to have official translations made for content of the core +project website [pandas.pydata.org](https://pandas.pydata.org) and provide a +language drop-down selector on [pandas.pydata.org](https://pandas.pydata.org) +similar to what currently exists at [numpy.org](https://numpy.org). + + +## Motivation and Scope + +Pandas is a foundational package in the Scientific Python ecosystem and there +are many potential users with no or low English proficiency who would benefit +from having high quality information about Pandas available in their native +language. + +Translation of all content presents considerable challenge due to its sheer +volume and due to the tendency for technical documentation to exist in a state +of flux. The suggestion is to have translations for a targeted subset, selected: + +- from things which are relatively stable to reduce the ongoing burden of + keeping translations up to date. +- to maximize the benefit to users and potential users who currently have no or + a low level of English proficiency, given the person-hours and resources that + are likely to be available now and into the future. + +Consideration of what subset of content would be most useful for users with +no or a low level of English proficiency could be a guiding principal to help +select what information should be available on the core project website, outside +of the technical documentation. + +## Detailed Description + +The following is a list of all pages on the core project website which are sourced +from markdown files at https://github.com/pandas-dev/pandas/tree/main/web/pandas. + +- Landing page: https://pandas.pydata.org +- About pandas: https://pandas.pydata.org/about +- Project roadmap: https://pandas.pydata.org/about/roadmap.html +- Governance: https://pandas.pydata.org/about/governance.html +- Team: https://pandas.pydata.org/about/team.html +- Sponsors: https://pandas.pydata.org/about/sponsors.html +- Citing and logo: https://pandas.pydata.org/about/citing.html +- Getting started: https://pandas.pydata.org/getting_started.html +- Code of conduct: https://pandas.pydata.org/community/coc.html +- Ecosystem: https://pandas.pydata.org/community/ecosystem.html +- Contribute: https://pandas.pydata.org/contribute.html + +Provisionally, the suggestion is for all of this content to be translated with +the possible exception of the "Project roadmap", which may be of limited +interest to new users. Currently the "Getting started" section may be of +limited utility to users unable to engage with the externally linked content. In +the "Project roadmap" within the subsection labeled "Documentation improvements" +there is a stated goal to: + +*Improve the "Getting Started" documentation, designing and writing learning + paths for users different backgrounds (e.g. brand new to programming, familiar + with other languages like R, already familiar with Python).* + +It is recommended that this goal be accomplished alongside translation work in +order to make this page more useful to those with no or low English proficiency. +This would also prevent the need for retranslation if this goal were to be +accomplished after the original translation work is completed. + +A language selection drop-down should be added to the navigation-bar similar to +what exists at https://numpy.org. + + +## Usage and Impact + +The primary impact would be lowering the barrier to entry for non-English +speakers to get started using Pandas and moving along the path towards learning +to use it skillfully. + +In 2022 it was estimated that there were approximately 400 million native +speakers of English and between 1.5 - 2 billion people who speak English as a +second language worldwide +[Wikipedia](https://web.archive.org/web/20240129080609/https://en.wikipedia.org/wiki/English-speaking_world). +With an estimated world population of over 8 billion people, this leaves many +for whom the Pandas core website is not directly accessible. Pandas is an +important piece of software infrastructure for data manipulation and analysis +with utility beyond the English speaking world. There is a vast population of +users and potential users who could benefit from having official information +about Pandas published in their native language. + +Although automated translation tools can help those with no or low English +proficiency access the content of the Pandas website, these tools often still +struggle with the technical and jargon-laden language of scientific +software. This was evinced during the translation of https://numpy.org. +Automatic translation tools are invaluable as a starting point for human +translators, but human translators remain important to ensure accuracy. + +## Implementation + +The bulk of the work for setting up translation infrastructure, finding and +vetting translators, and working out how to publish translations, will fall +upon a cross-functional team funded by the [Scientific Python Community & Communications +Infrastructure grant](https://scientific-python.org/doc/scientific-python-community-and-communications-infrastructure-2022.pdf) +to work on adding translations for the main websites of all +[Scientific Python core projects](https://scientific-python.org/specs/core-projects/). +The goal is to minimize the burden on the core Pandas maintainers. + +A GitHub repository should be set up to mirror content from the core webpage +which is selected for translation. A GitHub action should be set up to keep +the mirrored repository up-to-date. Either an action within the main Pandas +repo which pushes updates to the mirror, or a cron in the mirror which polls +for relevant updates in Pandas repo and pulls them when necessary. + +The mirrored repository would then be synced to the Crowdin localization +management platform as described in +[Crowdin's documentation](https://support.crowdin.com/github-integration/). +There would be separate folders within the mirror repository, one for each target +language, with the content initially untranslated. +Crowdin would then provide a user interface for translators, and updates +to translations would be pushed to the branch `l10n_main` on the mirrored +repository. Periodically, manual pull requests would be made to the main Pandas +repo, adding translated content within folders alongside of the English content. + +Translations will be managed within an enterprise Crowdin organization created for +Scientific Python localization projects. Access to this organization is +invite-only, and translators will be vetted to help safe-guard against the +spamming of low quality or inflammatory translations. Approval from a trusted +admin would be required before translations are merged into the main Pandas +repo. + +A language drop-down selector will need to be added to the navigation-bar of +the Pandas website. The plan is for development of a generic solution that +can be reused for all Scientific Python website translations. + + +### PDEP History + +- 01 February 2024: Initial draft From 38ae16ea03b281a1cb1158ccb9f36a587f7f4cd0 Mon Sep 17 00:00:00 2001 From: steppi Date: Thu, 1 Feb 2024 20:54:32 -0500 Subject: [PATCH 2/4] PDEP-14: Add pull request to Discussion section --- web/pandas/pdeps/0014-translate-website-content.md | 1 + 1 file changed, 1 insertion(+) diff --git a/web/pandas/pdeps/0014-translate-website-content.md b/web/pandas/pdeps/0014-translate-website-content.md index d3780392be657..cfe32447b4d5d 100644 --- a/web/pandas/pdeps/0014-translate-website-content.md +++ b/web/pandas/pdeps/0014-translate-website-content.md @@ -3,6 +3,7 @@ - Created: 01 February 2024 - Status: Under discussion - Discussion: [#56301](https://github.com/pandas-dev/pandas/issues/56301) + [#57204](https://github.com/pandas-dev/pandas/pull/57204) - Author: [Albert Steppi](https://github.com/steppi), - Revision: 1 From 2610360146c4bc6e7d329d64d0fa311fc48c206e Mon Sep 17 00:00:00 2001 From: steppi Date: Fri, 2 Feb 2024 22:02:17 -0500 Subject: [PATCH 3/4] PDEP-14: First revision --- .../pdeps/0014-translate-website-content.md | 154 +++++------------- 1 file changed, 41 insertions(+), 113 deletions(-) diff --git a/web/pandas/pdeps/0014-translate-website-content.md b/web/pandas/pdeps/0014-translate-website-content.md index cfe32447b4d5d..0b3a9b04a57db 100644 --- a/web/pandas/pdeps/0014-translate-website-content.md +++ b/web/pandas/pdeps/0014-translate-website-content.md @@ -5,100 +5,26 @@ - Discussion: [#56301](https://github.com/pandas-dev/pandas/issues/56301) [#57204](https://github.com/pandas-dev/pandas/pull/57204) - Author: [Albert Steppi](https://github.com/steppi), -- Revision: 1 +- Revision: 2 ## Abstract The suggestion is to have official translations made for content of the core -project website [pandas.pydata.org](https://pandas.pydata.org) and provide a -language drop-down selector on [pandas.pydata.org](https://pandas.pydata.org) -similar to what currently exists at [numpy.org](https://numpy.org). +project website [pandas.pydata.org](https://pandas.pydata.org) and offer +a low friction way for users to access these translations on the core +project website. +## Motivation, Scope, Usage, and Impact -## Motivation and Scope +There are many potential users with no or a low level of English proficiency +who could benefit from quality official translations of the Pandas website +content. Though translations for all documentation would be valuable, +producing and maintaining translations for such a large and oft-changing +collection of text would take an immense and sustained effort which may +be infeasible. The suggestion is instead to have translations made for only +a key set of pages from the core project website. -Pandas is a foundational package in the Scientific Python ecosystem and there -are many potential users with no or low English proficiency who would benefit -from having high quality information about Pandas available in their native -language. - -Translation of all content presents considerable challenge due to its sheer -volume and due to the tendency for technical documentation to exist in a state -of flux. The suggestion is to have translations for a targeted subset, selected: - -- from things which are relatively stable to reduce the ongoing burden of - keeping translations up to date. -- to maximize the benefit to users and potential users who currently have no or - a low level of English proficiency, given the person-hours and resources that - are likely to be available now and into the future. - -Consideration of what subset of content would be most useful for users with -no or a low level of English proficiency could be a guiding principal to help -select what information should be available on the core project website, outside -of the technical documentation. - -## Detailed Description - -The following is a list of all pages on the core project website which are sourced -from markdown files at https://github.com/pandas-dev/pandas/tree/main/web/pandas. - -- Landing page: https://pandas.pydata.org -- About pandas: https://pandas.pydata.org/about -- Project roadmap: https://pandas.pydata.org/about/roadmap.html -- Governance: https://pandas.pydata.org/about/governance.html -- Team: https://pandas.pydata.org/about/team.html -- Sponsors: https://pandas.pydata.org/about/sponsors.html -- Citing and logo: https://pandas.pydata.org/about/citing.html -- Getting started: https://pandas.pydata.org/getting_started.html -- Code of conduct: https://pandas.pydata.org/community/coc.html -- Ecosystem: https://pandas.pydata.org/community/ecosystem.html -- Contribute: https://pandas.pydata.org/contribute.html - -Provisionally, the suggestion is for all of this content to be translated with -the possible exception of the "Project roadmap", which may be of limited -interest to new users. Currently the "Getting started" section may be of -limited utility to users unable to engage with the externally linked content. In -the "Project roadmap" within the subsection labeled "Documentation improvements" -there is a stated goal to: - -*Improve the "Getting Started" documentation, designing and writing learning - paths for users different backgrounds (e.g. brand new to programming, familiar - with other languages like R, already familiar with Python).* - -It is recommended that this goal be accomplished alongside translation work in -order to make this page more useful to those with no or low English proficiency. -This would also prevent the need for retranslation if this goal were to be -accomplished after the original translation work is completed. - -A language selection drop-down should be added to the navigation-bar similar to -what exists at https://numpy.org. - - -## Usage and Impact - -The primary impact would be lowering the barrier to entry for non-English -speakers to get started using Pandas and moving along the path towards learning -to use it skillfully. - -In 2022 it was estimated that there were approximately 400 million native -speakers of English and between 1.5 - 2 billion people who speak English as a -second language worldwide -[Wikipedia](https://web.archive.org/web/20240129080609/https://en.wikipedia.org/wiki/English-speaking_world). -With an estimated world population of over 8 billion people, this leaves many -for whom the Pandas core website is not directly accessible. Pandas is an -important piece of software infrastructure for data manipulation and analysis -with utility beyond the English speaking world. There is a vast population of -users and potential users who could benefit from having official information -about Pandas published in their native language. - -Although automated translation tools can help those with no or low English -proficiency access the content of the Pandas website, these tools often still -struggle with the technical and jargon-laden language of scientific -software. This was evinced during the translation of https://numpy.org. -Automatic translation tools are invaluable as a starting point for human -translators, but human translators remain important to ensure accuracy. - -## Implementation +## Detailed Description and Implementation The bulk of the work for setting up translation infrastructure, finding and vetting translators, and working out how to publish translations, will fall @@ -106,36 +32,38 @@ upon a cross-functional team funded by the [Scientific Python Community & Commun Infrastructure grant](https://scientific-python.org/doc/scientific-python-community-and-communications-infrastructure-2022.pdf) to work on adding translations for the main websites of all [Scientific Python core projects](https://scientific-python.org/specs/core-projects/). -The goal is to minimize the burden on the core Pandas maintainers. - -A GitHub repository should be set up to mirror content from the core webpage -which is selected for translation. A GitHub action should be set up to keep -the mirrored repository up-to-date. Either an action within the main Pandas -repo which pushes updates to the mirror, or a cron in the mirror which polls -for relevant updates in Pandas repo and pulls them when necessary. +The hope is to minimize the burden on the core Pandas maintainers. -The mirrored repository would then be synced to the Crowdin localization -management platform as described in +No translated content would be hosted within the Pandas repository itself. +Instead a separate GitHub repository could be set up containing the content +selected for translation. This repository could then be synced to the Crowdin +localization management platform as described in [Crowdin's documentation](https://support.crowdin.com/github-integration/). -There would be separate folders within the mirror repository, one for each target -language, with the content initially untranslated. -Crowdin would then provide a user interface for translators, and updates -to translations would be pushed to the branch `l10n_main` on the mirrored -repository. Periodically, manual pull requests would be made to the main Pandas -repo, adding translated content within folders alongside of the English content. - -Translations will be managed within an enterprise Crowdin organization created for -Scientific Python localization projects. Access to this organization is -invite-only, and translators will be vetted to help safe-guard against the -spamming of low quality or inflammatory translations. Approval from a trusted -admin would be required before translations are merged into the main Pandas -repo. - -A language drop-down selector will need to be added to the navigation-bar of -the Pandas website. The plan is for development of a generic solution that -can be reused for all Scientific Python website translations. +Crowdin would then provide a user interface for translators, and updates to +translations would be pushed to a feature branch, with completed translations +periodically merged into `main` after given approval by trusted +language-specific admin's working across the Scientific Python core projects +participating in the translation program. There will be no need for Pandas +maintainers to verify the quality of translations. + +The result would be a repository containing parallel versions of content from +pandas.pydata.org, translated into various languages. Translated content could +then be pulled from this repository during generation of the Pandas website. A +low friction means of choosing between languages could then be added. Possibly a +drop-down language selector similar to what now exists for https://numpy.org, or +simple links similar to what now exists for https://www.sympy.org/en/index.html. +A developer supported by the "Scientific Python Community & Communications +Infrastructure grant" could assist with making the changes necessary for the +Pandas website to support publication of translations. + +If desired, a cron job could be set up on the repository containing translated +content to check for relevant changes or updates to the Pandas website's content +and pull them if necessary. Translators could then receive a notification from +Crowdin that there are new strings to translate. This could help with the +process of keeping translations up to date. ### PDEP History - 01 February 2024: Initial draft +- 02 February 2024: First revision From eec127c216a4e6740cfc926ba0790eb30b6cd174 Mon Sep 17 00:00:00 2001 From: steppi Date: Sat, 3 Feb 2024 11:08:46 -0500 Subject: [PATCH 4/4] PDEP-14: Unbump revision number --- web/pandas/pdeps/0014-translate-website-content.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/web/pandas/pdeps/0014-translate-website-content.md b/web/pandas/pdeps/0014-translate-website-content.md index 0b3a9b04a57db..bffc6ae0a8081 100644 --- a/web/pandas/pdeps/0014-translate-website-content.md +++ b/web/pandas/pdeps/0014-translate-website-content.md @@ -5,7 +5,7 @@ - Discussion: [#56301](https://github.com/pandas-dev/pandas/issues/56301) [#57204](https://github.com/pandas-dev/pandas/pull/57204) - Author: [Albert Steppi](https://github.com/steppi), -- Revision: 2 +- Revision: 1 ## Abstract @@ -66,4 +66,3 @@ process of keeping translations up to date. ### PDEP History - 01 February 2024: Initial draft -- 02 February 2024: First revision