ENH: Implement translations infrastructure #61380

goanpeca · 2025-04-30T03:24:20Z

Hello team!

This PR is a proposal for adding the translations infrastructure to the pandas web page.

Following the discussion in #56301, we (a group of folks working on the Scientific Python grant) have been working to set up infrastructure and translate the contents of the pandas web site. As of this moment, we have 100% translations for the pandas website into Spanish and Brazilian Portuguese, with other languages available for translation (depending on volunteer translators).

To build, the command remains the same:

python pandas_web.py pandas/content --target-path build

If you want to check out other related work, please take a look at scipy/scipy.org#617

You an read more about how the translation process works at https://scientific-python-translations.github.io/docs/

What this PR does?

Download and extract the latest available translations (over 90% completion) from https://github.com/Scientific-Python-Translations/pandas-translations. The setting can be changed here
Adds a Language switcher (Thanks @melissawm ❤️ 🚀 ).
Added a new section to the config to store additional translations information.
Handles site generation for each language.
Left everything in the same script.

Supersedes #61220

Demo

cc @mroeschke @datapythonista

datapythonista · 2025-04-30T21:42:42Z

/preview

github-actions · 2025-04-30T21:43:14Z

Website preview of this PR available at: https://pandas.pydata.org/preview/pandas-dev/pandas/61380/

datapythonista · 2025-04-30T21:47:43Z

Thanks @goanpeca for this. Do you mind adding some more context here? I can't see in https://github.com/Scientific-Python-Translations/pandas-translations much information, like what languages are available, or how to fix a bad translation, which would be useful to know.

Also, in the docs generated from this PR I can see any language dropdown or anything different from our current docs. What are we expecting?

melissawm · 2025-05-01T13:43:35Z

Hi @datapythonista - this is a follow up to #61220, a proof-of-concept CI job to build the website with translations that don't live in this repo. This PR and #61220 are meant to work together and I'm happy to incorporate one into the other once we agree on the general direction and workflow for this.

Let us know if we can answer any other questions. Unfortunately I'm not sure how to get the preview for the other PR, I relied on building locally to test that things were working.

datapythonista · 2025-05-01T22:16:55Z

Sorry, I missed #61220 and the issue discussion.

I don't fully understand what you're doing here, but I describe next how to add translations without adding too much complexity in this repo, which I don't think any core dev would be onboard with.

You decide on how to generate translations and manage it independently from this repo, and end up with a structure like this with the translated documents:

+ es/
  - index.md
  + about/
    - team.md
    - ...
  - ...
+ pt/
  - index.md
  + about/
    - team.md
    - ...
  - ...

In our CI, before calling pandas_web.py you download this directory structure to the web/ directory. No other changes needed, this will create all translated pages.
We add a dropdown with the languages to the website (you can add the language list to web/pandas/config.yml)

I think this makes everyone's life easy, and we get the expected result.

melissawm · 2025-05-02T11:58:18Z

Thanks @datapythonista !

Can you clarify what is missing from #61220 to match your description? That is pretty much what is done in that PR. Maybe this is confusing because we chose to do it in two parts exactly because we wanted to decouple the reorganization of the repo + switcher (in #61220) from the actual translations (this PR).

Happy to follow up with any feedback in the other PR as well. Cheers!

datapythonista · 2025-05-02T13:26:01Z

In #61220 you are moving all the current website pages, that should be undone. You are adding the translated pages to this repo, we don't want it. You are making changes to pandas_web.py, this is not needed based on what I described above.

Only changes in a PR to this pandas repo should be addi g a CI step as per step 2, editing the wevsite template with the language dropdown as per step 3.

melissawm · 2025-05-02T13:45:03Z

I see! I will rework what I have there to match your proposal. Thanks!

datapythonista

Great improvement to the PR, this makes a lot of sense to me.

I'd personally still simplify things more here in two ways. And feel free to disagree, as it's something opinionated.

First, if I understand correctly, you download the translations of the web, and the percentage of the translated content, amd then you check for each language if it's translated enough to be published. Personally, I think you should better take care of this logic in your repo when generating the tar, not here. First to simplify the code here, and second to avoid downloading translations that are not going to be used.

Just as a suggestion, I wouldn't use this approach, even in your repo. Imagine you publish translations that are at least 90%, and we have Spanish at 100%. Then I add new content that is 11% of the website. And automatically the Spanish translations that are already indexed by search engines, in user bookmarks, in links in blog posts... are deleted from our website. Not great in my opinion, much better to simply get the new content in English and hope it will eventually be translated.

Another thing I would do is to extract the tar file as it is downloaded. So the tar file is passed to gzip/tarfile in memory, with the io module, not as a path in disk. With this you can get all the code here in a single short function. Or we ciuld even create a github action in your repo with this, as it's generic, and just use it here. So only the CI step would live in this PR.

Finally, we already have a configuration file for the website. We could save the url of the translations tar there. Also good in the script, just a question of preference. Or if you go for the github action approach, it could simply be a parameter in the CI step. Then you would need another one for the target dir in this repo.

In any case, the approach here is also very reasonable, all above are suggestions that personally I think would make things simpler.

goanpeca · 2025-05-12T02:50:10Z

Great improvement to the PR, this makes a lot of sense to me.

Thanks for the review @datapythonista.

First, if I understand correctly, you download the translations of the web, and the percentage of the translated content, amd then you check for each language if it's translated enough to be published. Personally, I think you should better take care of this logic in your repo when generating the tar, not here. First to simplify the code here, and second to avoid downloading translations that are not going to be used.

Fixed!

Just as a suggestion, I wouldn't use this approach, even in your repo. Imagine you publish translations that are at least 90%, and we have Spanish at 100%. Then I add new content that is 11% of the website. And automatically the Spanish translations that are already indexed by search engines, in user bookmarks, in links in blog posts... are deleted from our website. Not great in my opinion, much better to simply get the new content in English and hope it will eventually be translated.

This is also fixed!

Another thing I would do is to extract the tar file as it is downloaded. So the tar file is passed to gzip/tarfile in memory, with the io module, not as a path in disk. With this you can get all the code here in a single short function. Or we ciuld even create a github action in your repo with this, as it's generic, and just use it here. So only the CI step would live in this PR.

Did not follow this one as I think things are now simpler and in a single script.

Finally, we already have a configuration file for the website. We could save the url of the translations tar there. Also good in the script, just a question of preference. Or if you go for the github action approach, it could simply be a parameter in the CI step. Then you would need another one for the target dir in this repo.

Added the information the to the config file as requested and updated the scripts to handle site generation for languages. Moved all logic to the existing script.

In any case, the approach here is also very reasonable, all above are suggestions that personally I think would make things simpler.

Please let me know what do you think about the current changes.

This PR now supersedes #61220

Thanks @melissawm!

goanpeca · 2025-05-21T12:47:10Z

I'd split the preprocessors in a slightly different way as suggested in the comments, but I think the way the code is now is very simple and easy to understand and maintain. Thanks a lot for all the updates here.

Made some new changes based on your suggestions @datapythonista.

datapythonista · 2025-05-21T14:17:21Z

/preview

github-actions · 2025-05-21T14:18:16Z

Website preview of this PR available at: https://pandas.pydata.org/preview/pandas-dev/pandas/61380/

datapythonista · 2025-05-21T16:47:39Z

The sponsor logos in the home page don't render correctly. I guess the problem is not in this PR, but in the translation of the html file, no?

goanpeca · 2025-05-21T16:56:42Z

The sponsor logos in the home page don't render correctly. I guess the problem is not in this PR, but in the translation of the html file, no?

Would it be ok to use absolute URLs? since the english pages live in the root but the translated pages live in es/something. Would not work on preview though.

I could use /static/img... instead of ../static at is currently used. Frameworks rely on filters like relative_url / absolute_url to handle this cases and append the appropriate base_folder or base_url to the link.

Either that, or copying assets folder into each language.

datapythonista · 2025-05-21T17:00:05Z

The images of the books should be implemented in the same exact way, and those seem to be working fine in the translated pages. Doesn't seem like we need to change the links, feels more like a problem in the translated content for those images, no?

goanpeca · 2025-05-21T17:00:41Z

The images of the books should be implemented in the same exact way, and those seem to be working fine in the translated pages. Doesn't seem like we need to change the links, feels more like a problem in the translated content for those images, no?

I will look into the content.

goanpeca · 2025-05-21T17:01:25Z

I guess the problem is not in this PR, but in the translation of the html file, no?

Correct. This is an issue in the translations. (Working on those fixes)

¿Besides that is there anything else you consider needs a revision?

datapythonista · 2025-06-01T10:55:09Z

/preview

github-actions · 2025-06-01T10:55:50Z

Website preview of this PR available at: https://pandas.pydata.org/preview/pandas-dev/pandas/61380/

datapythonista · 2025-06-01T11:07:19Z

Thanks for the updates @goanpeca. The PR seems reasonable now.

Seeing how the translations are implemented, like having a copy of the whole home page for each language, I'm a bit concerned on how this translations are going to be maintained.

It would be good first to know more details of the grant. Like, for how long there are funds to keep the translations up to date after we merge this.

With approaches like how Django translates content, if a translation is not maintained, updated texts will default back to the base language. With this approach, if tomorrow we change the styles of the website, all the translated pages will appear immediately broken. This will make it very hard for us to make any change to the website. If this was a community effort it would be a problem. But if this is a time limited grant, which I guess it's the case, this is a much bigger problem.

It'd be good to get your feedback, but if we can't use an approach with .po files, maybe it's better to keep the translated pages in an unofficial domain that we can link from our website. Otherwise feels like we'll be merging this, and in a couple of years when the translations are outdated and broken we'll have to revert this.

goanpeca · 2025-06-01T14:44:58Z

Hi @datapythonista !

Seeing how the translations are implemented, like having a copy of the whole home page for each language, I'm a bit concerned on how this translations are going to be maintained.

Volunteers will maintain translations as they keep coming and new volunteers will be added as more translation sprints are organized within the scientific python organization.

There will be also periodic mainteinance to ensure all the infrastructure keeps running smoothly.

It would be good first to know more details of the grant. Like, for how long there are funds to keep the translations up to date after we merge this.

@trallard can provide more details on this.

With approaches like how Django translates content, if a translation is not maintained, updated texts will default back to the base language. With this approach, if tomorrow we change the styles of the website, all the translated pages will appear immediately broken. This will make it very hard for us to make any change to the website. If this was a community effort it would be a problem. But if this is a time limited grant, which I guess it's the case, this is a much bigger problem.

The same approach works with the crowdin infrastructure, any untranslated content will use the original language.

This has been running smoothly for numpy.org for many months (year now?) !

It'd be good to get your feedback, but if we can't use an approach with .po files, maybe it's better to keep the translated pages in an unofficial domain that we can link from our website. Otherwise feels like we'll be merging this, and in a couple of years when the translations are outdated and broken we'll have to revert this.

md, po files work in more or less the same way as the segmentation in crowdin stores sentences/paragraphs, so even if the files are moved, if a specific phrase was already translated it will be automatically available.

As new files are added or changed, the infrastructure and bot will pick up the changes on a weekly basis and inform translators that new strings are available.

In any case, any big change will require some extra attention in case the whole site infrastructure changes, but even then, already translated content will be available to be reused.

I hope this answers most of your questions.

datapythonista · 2025-06-01T21:47:37Z

I don't fully understand how given what you say, the home page of Spanish and Portuguese is broken (the sponsor logos), but let's give it a try.

Can you fix the broken home pages please.Then we can introduce a change to the website in this same PR to see what really happens when content changes.

github-actions · 2025-07-02T00:08:27Z

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

trallard · 2025-07-02T15:19:36Z

I thought I had replied here @datapythonista but it seems I did not. So apologies as things have been way too chaotic on my end.
Anyway our grant has ended but we can commit some amount of maintenance level for the foreseeable future.
If we are still interested in moving this forward I can coordinate internally and have someone look at your questions or outstanding actions to get this over the finish line.

melissawm · 2025-07-15T14:49:26Z

Hi @datapythonista and @goanpeca - I'm happy to follow up here and fix the PR so it's in an appropriate state.

@goanpeca would you mind giving me permissions to push to your PR? Thank you!

goanpeca · 2025-07-18T12:14:58Z

Hi @melissawm, just gave you access :), let me know if it works!

zhouyao1994 · 2025-08-06T15:48:46Z

great ，maybe i can help to translate to Chinese

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

melissawm · 2025-08-28T14:01:12Z

Hi all,

Sorry for the long wait here. I finally updated the PR to merge main and fix the display of the sponsor logos in the landing page. One issue left to solve is that this fix will only land once the PR is merged, so for now the translated pages don't show all the sponsor logos, as they are using the main version of the landing page still. If it makes it easier, I can send in a separate PR for the logos, and update this one to be complete.

Thanks, and happy to address any other feedback.

(I did not activate/do not use the copilot review function, not sure why that popped up)

goanpeca force-pushed the translations branch from 59c82e1 to 987b544 Compare April 30, 2025 03:25

goanpeca mentioned this pull request Apr 30, 2025

ENH: Create infrastructure for translations #61220

Closed

5 tasks

datapythonista added the Docs label Apr 30, 2025

goanpeca force-pushed the translations branch 3 times, most recently from cd72e5c to 2b85ad4 Compare May 8, 2025 01:47

Add script to import translations

d7f0545

goanpeca force-pushed the translations branch from 2b85ad4 to d7f0545 Compare May 8, 2025 01:50

datapythonista reviewed May 8, 2025

View reviewed changes

goanpeca force-pushed the translations branch 3 times, most recently from 4a6532b to 04f9259 Compare May 12, 2025 02:38

goanpeca marked this pull request as ready for review May 12, 2025 02:47

goanpeca requested a review from datapythonista May 12, 2025 02:50

goanpeca changed the title ~~Update CI to include Translations from Scientific Python Repo~~ Implement translations infrastructure May 12, 2025

goanpeca force-pushed the translations branch 3 times, most recently from 09aaca5 to 7e68731 Compare May 12, 2025 03:04

Update scripts to handle tranlsations

17063a7

goanpeca force-pushed the translations branch from 7e68731 to 17063a7 Compare May 12, 2025 03:04

Merge branch 'main' of github.com:pandas-dev/pandas into translations

8eec135

goanpeca force-pushed the translations branch 2 times, most recently from 9bc7032 to 5245445 Compare May 21, 2025 13:02

goanpeca requested a review from datapythonista May 21, 2025 13:54

Split preprocessor logic and more code review changes

99e9635

goanpeca force-pushed the translations branch from 5245445 to 99e9635 Compare May 21, 2025 14:19

Merge branch 'main' into translations

eadb5c9

github-actions bot added the Stale label Jul 2, 2025

Merge branch 'main' into translations

7783c89

Copilot AI review requested due to automatic review settings August 28, 2025 13:10

Copilot AI reviewed Aug 28, 2025

View reviewed changes

Update script and fix sponsors image display

698ac63

Uh oh!

ENH: Implement translations infrastructure #61380

Are you sure you want to change the base?

ENH: Implement translations infrastructure #61380

Uh oh!

Conversation

goanpeca commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does?

Demo

Uh oh!

datapythonista commented Apr 30, 2025

Uh oh!

github-actions bot commented Apr 30, 2025

Uh oh!

datapythonista commented Apr 30, 2025

Uh oh!

melissawm commented May 1, 2025

Uh oh!

datapythonista commented May 1, 2025

Uh oh!

melissawm commented May 2, 2025

Uh oh!

datapythonista commented May 2, 2025

Uh oh!

melissawm commented May 2, 2025

Uh oh!

datapythonista left a comment

Choose a reason for hiding this comment

Uh oh!

goanpeca commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

goanpeca commented May 21, 2025

Uh oh!

datapythonista commented May 21, 2025

Uh oh!

github-actions bot commented May 21, 2025

Uh oh!

datapythonista commented May 21, 2025

Uh oh!

goanpeca commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

datapythonista commented May 21, 2025

Uh oh!

goanpeca commented May 21, 2025

Uh oh!

goanpeca commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

datapythonista commented Jun 1, 2025

Uh oh!

github-actions bot commented Jun 1, 2025

Uh oh!

datapythonista commented Jun 1, 2025

Uh oh!

goanpeca commented Jun 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

datapythonista commented Jun 1, 2025

Uh oh!

github-actions bot commented Jul 2, 2025

Uh oh!

trallard commented Jul 2, 2025

Uh oh!

melissawm commented Jul 15, 2025

Uh oh!

goanpeca commented Jul 18, 2025

Uh oh!

zhouyao1994 commented Aug 6, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

melissawm commented Aug 28, 2025

Uh oh!

Reviewers

Assignees

Labels

goanpeca commented Apr 30, 2025 •

edited

Loading

goanpeca commented May 12, 2025 •

edited

Loading

goanpeca commented May 21, 2025 •

edited

Loading

goanpeca commented May 21, 2025 •

edited

Loading

goanpeca commented Jun 1, 2025 •

edited

Loading