-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Implement translations infrastructure #61380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
/preview |
Website preview of this PR available at: https://pandas.pydata.org/preview/pandas-dev/pandas/61380/ |
Thanks @goanpeca for this. Do you mind adding some more context here? I can't see in https://github.com/Scientific-Python-Translations/pandas-translations much information, like what languages are available, or how to fix a bad translation, which would be useful to know. Also, in the docs generated from this PR I can see any language dropdown or anything different from our current docs. What are we expecting? |
Hi @datapythonista - this is a follow up to #61220, a proof-of-concept CI job to build the website with translations that don't live in this repo. This PR and #61220 are meant to work together and I'm happy to incorporate one into the other once we agree on the general direction and workflow for this. Let us know if we can answer any other questions. Unfortunately I'm not sure how to get the preview for the other PR, I relied on building locally to test that things were working. |
Sorry, I missed #61220 and the issue discussion. I don't fully understand what you're doing here, but I describe next how to add translations without adding too much complexity in this repo, which I don't think any core dev would be onboard with.
I think this makes everyone's life easy, and we get the expected result. |
Thanks @datapythonista ! Can you clarify what is missing from #61220 to match your description? That is pretty much what is done in that PR. Maybe this is confusing because we chose to do it in two parts exactly because we wanted to decouple the reorganization of the repo + switcher (in #61220) from the actual translations (this PR). Happy to follow up with any feedback in the other PR as well. Cheers! |
In #61220 you are moving all the current website pages, that should be undone. You are adding the translated pages to this repo, we don't want it. You are making changes to pandas_web.py, this is not needed based on what I described above. Only changes in a PR to this pandas repo should be addi g a CI step as per step 2, editing the wevsite template with the language dropdown as per step 3. |
I see! I will rework what I have there to match your proposal. Thanks! |
cd72e5c
to
2b85ad4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great improvement to the PR, this makes a lot of sense to me.
I'd personally still simplify things more here in two ways. And feel free to disagree, as it's something opinionated.
First, if I understand correctly, you download the translations of the web, and the percentage of the translated content, amd then you check for each language if it's translated enough to be published. Personally, I think you should better take care of this logic in your repo when generating the tar, not here. First to simplify the code here, and second to avoid downloading translations that are not going to be used.
Just as a suggestion, I wouldn't use this approach, even in your repo. Imagine you publish translations that are at least 90%, and we have Spanish at 100%. Then I add new content that is 11% of the website. And automatically the Spanish translations that are already indexed by search engines, in user bookmarks, in links in blog posts... are deleted from our website. Not great in my opinion, much better to simply get the new content in English and hope it will eventually be translated.
Another thing I would do is to extract the tar file as it is downloaded. So the tar file is passed to gzip/tarfile in memory, with the io module, not as a path in disk. With this you can get all the code here in a single short function. Or we ciuld even create a github action in your repo with this, as it's generic, and just use it here. So only the CI step would live in this PR.
Finally, we already have a configuration file for the website. We could save the url of the translations tar there. Also good in the script, just a question of preference. Or if you go for the github action approach, it could simply be a parameter in the CI step. Then you would need another one for the target dir in this repo.
In any case, the approach here is also very reasonable, all above are suggestions that personally I think would make things simpler.
4a6532b
to
04f9259
Compare
Thanks for the review @datapythonista.
Fixed!
This is also fixed!
Did not follow this one as I think things are now simpler and in a single script.
Added the information the to the config file as requested and updated the scripts to handle site generation for languages. Moved all logic to the existing script.
Please let me know what do you think about the current changes. This PR now supersedes #61220 Thanks @melissawm! |
09aaca5
to
7e68731
Compare
9cf74c2
to
abe5aaa
Compare
/preview |
Website preview of this PR available at: https://pandas.pydata.org/preview/pandas-dev/pandas/61380/ |
Thanks @goanpeca, a couple of comments. Feels like we are repeating in this PR our logic that we already have implemented. If a markdown file is added to the web directory before the rendering of the website, it will be rendered in the same way as tge existing pages. There is no need to make changes to pandas_web.py for the translations, just copy the content of the translations tar in the web directory, and all the translated content will be in the rendered web. In the preview links to translations are broken. This is likely to be caused by assuming the website is always going to be hosted at the root of the domain, not a subdirectory. This can't be assumed. Finally, I think it'd be good to handle the downloading amd uncompressing of the translation file in a github action. Or less neat, in the step on the CI config, but out of pandas_web.py. This will keep things simple in pandas_web.py, and it'll be easier to maintain. |
Hi again, thanks for the comments @datapythonista !
If this request stems from not wanting to have this happen locally, I could add a flag to process translations, and only handle english if not.
|
Thanks for the clarification @goanpeca, I understand now. The idea is that pandas maintainers shouldn't have to spend time maintaining the script to build the website, so I want to keep it as simple as possible. Ideally we'd like to use Hugo or another static site generator. The problem is that the pandas website has a decent amount of content that is dynamically fetched from different sources. So, what I did is to do an almost single function extremely simple static site generation that simply renders markdown files into html using a template and leaving them in the same structure as they are found. And it uses a context which is the yaml config file parsed as a python object. There are a couple of tiny helper functions to this, and also a Preprocessors class which enriches the context with the external sources. My concern here is that the 120 lines of code that our static site generator needs (excluding the preprocessors, which are independent small functions) grow singnificantly, in size and complexity. I still think what I said about creating a github action in the repo managing the translation is the way to go. 90% of what it's needed is to download a tar file and uncompress it in a directory. If we don't translate the navbar, I guess this is fine and super simple, and the only thing needed in pandas_web.py is adding to the context the language of the page being translated. Does this make sense? If we get to this point, then only the translation of the navigation bar would be missing while keeping things very simple. And for those, we could just add a json file to each language directory, fetch a different file with all the translations from the server in the navbar preprocessor, or something very simple like that. |
Hello team!
This PR is a proposal for adding the translations infrastructure to the pandas web page.
Following the discussion in #56301, we (a group of folks working on the Scientific Python grant) have been working to set up infrastructure and translate the contents of the pandas web site. As of this moment, we have 100% translations for the pandas website into Spanish and Brazilian Portuguese, with other languages available for translation (depending on volunteer translators).
To build, the command remains the same:
If you want to check out other related work, please take a look at scipy/scipy.org#617
You an read more about how the translation process works at https://scientific-python-translations.github.io/docs/
What this PR does?
Supersedes #61220
Demo
cc @mroeschke @datapythonista