Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add eo-datascience-cookbook to cookbook_gallery.txt #224

Merged
merged 1 commit into from
Mar 11, 2025
Merged

Conversation

jukent
Copy link
Contributor

@jukent jukent commented Feb 3, 2025

Waiting for Cookbook to pass nightly tests

Copy link

github-actions bot commented Feb 3, 2025

👋 Thanks for opening this PR! The Cookbook will be automatically built with GitHub Actions. To see the status of your deployment, click below.
🔍 Git commit SHA: a521587
✅ Deployment Preview URL: https://ProjectPythia.github.io/cookbook-gallery/_preview/224

@MartinSchobben
Copy link

It is passing now: https://github.com/ProjectPythia/eo-datascience-cookbook/actions/workflows/nightly-build.yaml

@jukent jukent marked this pull request as ready for review February 6, 2025 14:41
@jukent jukent requested a review from a team as a code owner February 6, 2025 14:41
@jukent jukent requested review from dopplershift, mgrover1, ktyle, dcamron and r-ford and removed request for a team February 6, 2025 14:41
Copy link
Member

@r-ford r-ford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @MartinSchobben and the rest of the EO Data Science Cookbook team, thanks for contributing these tutorials! I think the Cookbook is in great shape, so I just have a few minor comments:

  • Like most other Project Pythia tutorials, I think it would be helpful to include a prerequisite section somewhere. I think it's fine to not have one for every notebook, but this section overview notebook might be a good place to list some prereqs.
  • I was unable to download the data in the Datacubes notebook when running locally. It seems to work fine when running on GitHub actions, so I'm not sure what the issue is.
  • The instructions on running locally in the README didn't really work for me, since conda was taking too long to resolve conflicts. Instead, creating environments from the .yml files in the three notebook directories with mamba worked fine. I was then able to run make kernel. I'm not familiar with this way of creating environments, so maybe some additional guidance in the README would help.
  • I opened a PR with some README and typo fixes, but I messed up some of the notebook metadata because of my confusion with the environments, so feel free to make those changes on your end. The only changes there that matter in my opinion are the author links in the README.

@ktyle
Copy link
Contributor

ktyle commented Feb 19, 2025

Thanks @r-ford! I agree with you about the multiple environments for each distinct notebook subdirectory. I combined each of them into a single top-level environment.yml, built the environment locally, and then ran through each notebook without error. This also has the nice effect of making the Binderhub image build a lot faster too. See ProjectPythia/eo-datascience-cookbook#4.

@MartinSchobben
Copy link

MartinSchobben commented Feb 20, 2025

Hi @jukent, @r-ford and @ktyle,

Thanks a lot for the review and feedback. I respond here to clarify our course of action in regards to the comments and requested changes.

Like most other Project Pythia tutorials, I think it would be helpful to include a prerequisite section somewhere. I think it's fine to not have one for every notebook, but this section overview notebook might be a good place to list some prereqs.

We agree that this is indeed a valuable addition that guides the reader. We will add a prerequisite to each of the courses and the separate notebooks of the tutorials and templates.

I was unable to download the data in the Datacubes notebook when running locally. It seems to work fine when running on GitHub actions, so I'm not sure what the issue is.

We can't replicate this issue. For reference I add here a link to the huggingface client used in the notebook.

The instructions on running locally in the README didn't really work for me, since conda was taking too long to resolve conflicts. Instead, creating environments from the .yml files in the three notebook directories with mamba worked fine. I was then able to run make kernel. I'm not familiar with this way of creating environments, so maybe some additional guidance in the README would help.

That is an interesting observation. The idea of separating/modularizing the environments was made after much deliberation in order to prevent future conflicts in setting up one big conda environment. We will gives some more feedback below (in response to @ktyle ) on why we think that this is the better maintainable approach in the long run. We have not seen issues with setting up the environments. This should also be unrelated to using GNU Make, which is still calling conda. Besides running it on GitHub Actions, setting up environments in Linux Mint 21.3, Windows with WSL and the TU Vienna JupyterHub did not indicate any issues here. Perhaps it would, however, be better to show the user the standard way of creating a conda environment locally (with a yml file like the other cookbooks do). We will amend this in a future version.

I opened a PR with some README and typo fixes, but I messed up some of the notebook metadata because of my confusion with the environments, so feel free to make those changes on your end. The only changes there that matter in my opinion are the author links in the README.

Thanks for the edits. We will make sure that we fix the author links.

Thanks @r-ford! I agree with you about the multiple environments for each distinct notebook subdirectory. I combined each of them into a single top-level environment.yml, built the environment locally, and then ran through each notebook without error. This also has the nice effect of making the Binderhub image build a lot faster too. See ProjectPythia/eo-datascience-cookbook#4.

@ktyle Thank you very much for looking into such detail in our cookbook. As already mentioned above, we only came up with this structure after much deliberation and through some experience which we gained from using the notebooks for teaching. The following things made use consider a multiple environment setup:

  • We foresee adding more lecture material to this book in the future (e.g. we are currently working on a course for remote drought monitoring). This expansion likely entails additional dependencies. We think that one increasing yml list of dependencies might result in conflicts when resolving the environment at some point. Hence, several files might alleviate this problem by only using the strictly needed dependencies for a course.
  • We can also better maintain the dependencies as there is a more clear relation between the notebooks and the yml files.
  • Most importantly, we would like to use the cookbook as teaching material for standalone courses (e.g. microwave remote sensing). We would find it therefore more logical to only build that environment for the particular course without additional clutter.

Of course, as you have noted, there are also some downsides to this approach:

  • It can take a long time to setup all the environment when invoking make kernel. It is currently probably longer than one big conda environment, but I am unsure how this would relate during future expansion.
  • We used a postBuild call to make kernel, which makes setting up the Binder image more time consuming.
  • The GitHub actions for publish cookbook takes a long time due to point 1, as caching is only performed for the top level yml file (could be changed).

Proposed solutions

We can think of three solutions:

  1. We keep the original multiple environment structure. This would require that we fix the Github actions to cache all environments. This would be relatively straightforward. We do this in our Quarto mirror repository. We should then also create a multiple kernel Binder image. We do not have the solution for that at the moment, but we would expect that such solutions exist.
  2. We go for Use a single environment.yml eo-datascience-cookbook#4.
  3. Alternatively, we can also create a hybrid of both 1 and 2. This way we have the standalone module (e.g. microwave remote sensing) for teaching and all the functionality for the Pythia cookbook. This means a top level master yml and the smaller yml files related to sections of the book with the Makefile for convenience.

I hope we can have some discussion on the possible solutions. Ultimately we will respect your decision on the matter.

Thanks once more! We are glad to be able to contribute to this awesome project!

@r-ford
Copy link
Member

r-ford commented Feb 20, 2025

Alternatively, we can also create a hybrid of both 1 and 2. This way we have the standalone module (e.g. microwave remote sensing) for teaching and all the functionality for the Pythia cookbook. This means a top level master yml and the smaller yml files related to sections of the book with the Makefile for convenience.

I'm interested in this hybrid approach if possible, since it makes the Cookbook testing and maintenance simpler, while still meeting your needs for standalone courses. I'm not familiar enough with our Cookbook Actions to know if you can store some environment yml files without them being used when the book is built, but I think it's worth a try.

@MartinSchobben
Copy link

@r-ford then we will go for the hybrid option. This would be a relatively minor adjustment; 1) creating the master yml file and 2) remove all reference to themake kernel call in the GitHub actions and README. @ktyle, would you agree with this?

@ktyle
Copy link
Contributor

ktyle commented Feb 21, 2025

@MartinSchobben yep sounds like a great plan!

@MartinSchobben
Copy link

MartinSchobben commented Mar 11, 2025

@jukent, @ktyle, @r-ford

Everything should now be in working order with:

  • a single environment
  • requirement sections at the start of each chapter

@r-ford r-ford self-requested a review March 11, 2025 19:20
Copy link
Member

@r-ford r-ford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making those changes! I was able to run the notebooks without issue. I think this is ready to merge.

@jukent
Copy link
Contributor Author

jukent commented Mar 11, 2025

LGTM! Thanks so much for your persistence and hard work. I hope you'll consider joining us for the Cook-off next summer.

@jukent jukent merged commit 2439922 into main Mar 11, 2025
2 checks passed
github-actions bot pushed a commit that referenced this pull request Mar 11, 2025
@brian-rose
Copy link
Member

Did we already add @MartinSchobben to the ProjectPythia GitHub org?

@brian-rose brian-rose deleted the jukent-patch-2 branch March 11, 2025 19:48
@jukent
Copy link
Contributor Author

jukent commented Mar 11, 2025

@brian-rose , yes we did.

@MartinSchobben
Copy link

Thank you all for the help an patience! I will consider the Cook-off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants