Skip to content

Added i18n component and related scripts #1082

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 83 commits into
base: master
Choose a base branch
from

Conversation

yihao03
Copy link
Member

@yihao03 yihao03 commented Mar 28, 2025

run yarn trans to start translation

yihao03 and others added 27 commits March 12, 2025 22:12
…toring segment handling to use arrays for better management
… merging logic and remove unnecessary debug logs
…ded try catch when parsing xml to json to report failed parsing possibly attributed to unsound xml structure.
- Created a new XML file for references (97references97.xml) containing a comprehensive list of references used in the SICP JS project.
- Added a new XML file for the index preface (98indexpreface98.xml) to provide context and formatting for the index section.
- Introduced a new XML file for the making section (99making99.xml) detailing the background, interactive features, and development history of the SICP JS project.
- Updated subsection2.xml to close the previously open SUBSECTION tag and include a comment for clarity.
@yihao03
Copy link
Member Author

yihao03 commented Apr 12, 2025

breaking changes are made: xml repositories are divided into folders, currently consisting of en and cn folders to store translated content. The same applies to the json folder after running "yarn json". Frontend needs to be changed accordingly to fetch json files from the corresponding url

@coder114514 coder114514 requested a review from arnav-goel10 June 5, 2025 14:03
@coder114514 coder114514 self-assigned this Jun 5, 2025
Copy link

@arnav-goel10 arnav-goel10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@RichDom2185 RichDom2185 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not look at the contents of the zh_CN folder. I assume they are generated by AI? Or was there manual editing? If the former, please add a gitattributes file so that reviewers (and GitHub) knows it's machine generated

Comment on lines +33 to +37
- name: Clone translated_xmls
run: |
git clone -b translated_xmls https://github.com/source-academy/sicp.git translated_xmls
mv translated_xmls/* xml/
rm -r translated_xmls
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we cloning from a specific branch instead of merging it to the default branch?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain what these new workflows are for?

Copy link

@coder114514 coder114514 Jul 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow is for triggering translations for all those English XMLs that are changed. The workflow is not set to be triggered on pushes yet because the AI translation is not very stable yet.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the translate-everything workflow is for translating all English XMLs

Comment on lines +38 to +39
echo API_KEY=${{ secrets.OPENAI_KEY }} >> .env
# echo API_KEY=${{ secrets.OPENAI_KEY2 }} >> .env
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they are no longer used, the secret should be deleted.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the start of the project, me and Yi Hao were both given an API key, so there are two secrets. Now one is used as a backup.

Comment on lines +29 to +30
echo API_KEY=${{ secrets.OPENAI_KEY }} >> .env
# echo API_KEY=${{ secrets.OPENAI_KEY2 }} >> .env
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

@coder114514
Copy link

Did not look at the contents of the zh_CN folder. I assume they are generated by AI? Or was there manual editing? If the former, please add a gitattributes file so that reviewers (and GitHub) knows it's machine generated

All the AI translated XMLs will go into the translated_xmls branch, so I will delete the zh_CN folder. Should I add gitattributes file to the translated_xmls branch to mark they are generated by AI?

@RichDom2185
Copy link
Member

Did not look at the contents of the zh_CN folder. I assume they are generated by AI? Or was there manual editing? If the former, please add a gitattributes file so that reviewers (and GitHub) knows it's machine generated

All the AI translated XMLs will go into the translated_xmls branch, so I will delete the zh_CN folder. Should I add gitattributes file to the translated_xmls branch to mark they are generated by AI?

How is the zh_CN folder different from the translated_xmls branch? Why not just merge the translations directly to master instead of using a separate branch?

@coder114514
Copy link

Did not look at the contents of the zh_CN folder. I assume they are generated by AI? Or was there manual editing? If the former, please add a gitattributes file so that reviewers (and GitHub) knows it's machine generated

All the AI translated XMLs will go into the translated_xmls branch, so I will delete the zh_CN folder. Should I add gitattributes file to the translated_xmls branch to mark they are generated by AI?

How is the zh_CN folder different from the translated_xmls branch? Why not just merge the translations directly to master instead of using a separate branch?

They are the same. The translated_xmls branch is separately created because in the workflow for translation (translate-changed, translate-everything) an action called deploy which pushes files to a branch is used to push AI generated translations to GitHub, so the branch is created for this.

@RichDom2185
Copy link
Member

Did not look at the contents of the zh_CN folder. I assume they are generated by AI? Or was there manual editing? If the former, please add a gitattributes file so that reviewers (and GitHub) knows it's machine generated

All the AI translated XMLs will go into the translated_xmls branch, so I will delete the zh_CN folder. Should I add gitattributes file to the translated_xmls branch to mark they are generated by AI?

How is the zh_CN folder different from the translated_xmls branch? Why not just merge the translations directly to master instead of using a separate branch?

They are the same. The translated_xmls branch is separately created because in the workflow for translation (translate-changed, translate-everything) an action called deploy which pushes files to a branch is used to push AI generated translations to GitHub, so the branch is created for this.

I see, noted, and what is the rationale of running it in a workflow, as opposed to running it locally?

@coder114514
Copy link

Did not look at the contents of the zh_CN folder. I assume they are generated by AI? Or was there manual editing? If the former, please add a gitattributes file so that reviewers (and GitHub) knows it's machine generated

All the AI translated XMLs will go into the translated_xmls branch, so I will delete the zh_CN folder. Should I add gitattributes file to the translated_xmls branch to mark they are generated by AI?

How is the zh_CN folder different from the translated_xmls branch? Why not just merge the translations directly to master instead of using a separate branch?

They are the same. The translated_xmls branch is separately created because in the workflow for translation (translate-changed, translate-everything) an action called deploy which pushes files to a branch is used to push AI generated translations to GitHub, so the branch is created for this.

I see, noted, and what is the rationale of running it in a workflow, as opposed to running it locally?

sicp.sourceacademy.org is a static site, so the translated XMLs need to be deployed too, and thus we are storing them on GitHub.

This is not really about the translation workflows. The translator can be run locally and then the generated content can be manually pushed to the translated_xmls branch.

@RichDom2185
Copy link
Member

RichDom2185 commented Aug 19, 2025

@yihao03 I still don't understand why you removed the zh_CN directory. I know it's AI created. But we need it for deployment right? Why run it as a workflow and push to a separate branch? Why do we need the separate branch at all?

Just run the command locally and push it to master branch?

In my view, if it's going to be deployed, then it should me in the master branch not some other branch.

Comment on lines +41 to +43
github_token: ${{ secrets.GITHUB_TOKEN }}
publish_dir: ./translation_output
force_orphan: false # leave the possiblity for direct modification on translated xmls
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do this, it will overwrite the deployment no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these files autogenerated? What is the ai_files folder used for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @RichDom2185, I have been out of the loop for a while. But I believe that ai_files are files for ai to refer to. While it hasn't been properly implemented/put to use, the plan is that it will have a folder structure that mirrors the xml/en directory where each file contains specific terms/instructions to translate the corresponding source file.
Based on our discussions with Prof @martin-henz , we agreed that users should not edit AI generated output directly to correct mistakes as the changes will be overwritten when we regenerate the output using AI, therefore it is better that should the translations need any amendment/improvement, it should be done by editing the prompt given to the AI model, which could be read from these files.

@RichDom2185 RichDom2185 marked this pull request as draft August 19, 2025 17:06
@yihao03 yihao03 marked this pull request as ready for review August 20, 2025 02:24
@yihao03
Copy link
Member Author

yihao03 commented Aug 20, 2025

@yihao03 I still don't understand why you removed the zh_CN directory. I know it's AI created. But we need it for deployment right? Why run it as a workflow and push to a separate branch? Why do we need the separate branch at all?

Just run the command locally and push it to master branch?

In my view, if it's going to be deployed, then it should me in the master branch not some other branch.

Hi, this was a design decision made by @coder114514 as he was in charge of the deployment workflow, while I mainly worked on the translation logic. However, I do agree with you that zh_CN should live in the master branch as that was my original design too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants