Skip to content

Commit

Permalink
Update xee dataflow example
Browse files Browse the repository at this point in the history
This updates the xee dataflow example to prevent users from accidentally deleting storage bucket when running the example.

This is a really simple fix for a bug in a recent [push to gcsfs](fsspec/gcsfs#608) paired with some [logic in the zarr library for writing datasets](https://github.com/zarr-developers/zarr-python/blob/df4c25f70c8a1e2b43214d7f26e80d34df502e7e/src/zarr/v2/storage.py#L567) which allows users to accidentally remove their bucket if writing to the root of a cloud storage bucket. This is problematic because users may have other data in a cloud storage bucket they may try to write to and accidental deletion of the bucket removes everything.

Changes in this PR include:
1. pinning the `gcsfs` version to `<=2024.2.0` before the PR that introduced the bug
2. point to write to subdirectory on the bucket in the example

PiperOrigin-RevId: 655683820
  • Loading branch information
KMarkert authored and Xee authors committed Jul 25, 2024
1 parent 88a4fac commit 03170f2
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion examples/dataflow/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ This example is focused on pulling data from Earth Engine, transforming the data
```shell
python ee_to_zarr_dataflow.py \
--input NASA/GPM_L3/IMERG_V06 \
--output gs://xee-out-${PROJECT} \
--output gs://xee-out-${PROJECT}/output/ \
--target_chunks='time=6' \
--runner DataflowRunner \
--project $PROJECT \
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ tests = [
dataflow = [
"absl-py",
"apache-beam[gcp]",
"gcsfs",
"gcsfs<=2024.2.0",
"xarray-beam",
]
examples = [
Expand Down

0 comments on commit 03170f2

Please sign in to comment.