Skip to content

Commit 040ad40

Browse files
authored
feat: update application directory logic (#15)
* use persistent application directory by default * formatting wheels.yaml * update readme with data dir info * add retry to downloading datasets on ci
1 parent 5da7d95 commit 040ad40

File tree

9 files changed

+134
-45
lines changed

9 files changed

+134
-45
lines changed

Diff for: .github/workflows/build-test.yaml

+17-12
Original file line numberDiff line numberDiff line change
@@ -19,13 +19,13 @@ on:
1919
[main]
2020

2121
## Paste this snippet into the workflow file to enable tmate debugging
22-
# - name: Setup tmate session
23-
# uses: mxschmitt/action-tmate@v3
24-
# if:
25-
# ${{ github.event_name == 'workflow_dispatch' && inputs.debug_enabled
26-
# }}
27-
# with:
28-
# limit-access-to-actor: false
22+
# - name: Setup tmate session
23+
# uses: mxschmitt/action-tmate@v3
24+
# if:
25+
# ${{ github.event_name == 'workflow_dispatch' && inputs.debug_enabled
26+
# }}
27+
# with:
28+
# limit-access-to-actor: false
2929

3030
concurrency:
3131
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
@@ -185,7 +185,7 @@ jobs:
185185
- name: Move Geant4 datasets (Ubuntu/MacOS)
186186
if: matrix.python-version != '3.11' && matrix.os != 'windows-latest'
187187
run: |
188-
GEANT4_DATA_DIR=$(python -c "import geant4_python_application; print(geant4_python_application.data_dir)")
188+
GEANT4_DATA_DIR=$(python -c "import geant4_python_application; print(geant4_python_application.data_directory())")
189189
mkdir -p ${{ github.workspace }}/geant4-data $GEANT4_DATA_DIR
190190
mv ${{ github.workspace }}/geant4-data/* $GEANT4_DATA_DIR
191191
@@ -194,14 +194,19 @@ jobs:
194194
if: matrix.python-version != '3.11' && matrix.os == 'windows-latest'
195195
shell: powershell
196196
run: |
197-
$env:GEANT4_DATA_DIR = python -c "import geant4_python_application; print(geant4_python_application.data_dir)"
197+
$env:GEANT4_DATA_DIR = python -c "import geant4_python_application; print(geant4_python_application.data_directory())"
198198
New-Item -ItemType Directory -Path "${{ github.workspace }}\geant4-data" -Force
199199
New-Item -ItemType Directory -Path $env:GEANT4_DATA_DIR -Force
200200
Move-Item "${{ github.workspace }}\geant4-data\*" $env:GEANT4_DATA_DIR -Force
201201
202-
- name: Check import
203-
run: |
204-
python -c "import geant4_python_application; geant4_python_application.Application()"
202+
- name: Install datasets
203+
uses: nick-fields/retry@v2
204+
with:
205+
timeout_minutes: 30
206+
retry_wait_seconds: 60
207+
max_attempts: 5
208+
command: |
209+
python -c "import geant4_python_application as g4; g4.install_datasets()"
205210
206211
- name: Run tests
207212
run: |

Diff for: .github/workflows/wheels.yaml

+2-3
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,15 @@
11
name: Wheels
22

33
on:
4+
schedule:
5+
- cron: "0 4 * * *" # every day at 4am UTC
46
workflow_dispatch:
57
inputs:
68
publish_to_test_pypi:
79
type: boolean
810
description: Publish to Test PyPI
911
default: false
1012

11-
schedule:
12-
- cron: "0 4 * * *"
13-
1413
env:
1514
GEANT4_VERSION: 11.2.0
1615
XERCES_VERSION: 3.2.4

Diff for: README.md

+51-10
Original file line numberDiff line numberDiff line change
@@ -37,22 +37,63 @@ directory automatically during simulation startup.
3737
pip install -i https://test.pypi.org/simple/ geant4-python-application==0.0.2.dev1
3838
```
3939

40-
## Usage
40+
### Geant4 data files
41+
42+
The Python wheels for this package do not contain any Geant4 data files due to
43+
their size. These files will be downloaded on demand without the need of any
44+
user interaction.
45+
46+
The default location for these files can be obtained by calling:
47+
48+
```bash
49+
python -c "import geant4_python_application; print(geant4_python_application.get_data_path())"
50+
```
51+
52+
**Uninstalling the package will not remove the data files. The user is
53+
responsible for removing them manually.**
54+
55+
#### Overriding the default data path
56+
57+
The default data path can be overridden by calling `application_directory`. This
58+
should be done before the application is initialized.
59+
60+
```python
61+
from geant4_python_application import application_directory
62+
63+
application_directory("/some/other/path")
64+
```
65+
66+
Overriding the default data directory is encouraged when submitting batch jobs
67+
to a cluster in order to avoid downloading the data files multiple times. In
68+
this case the application directory should point to some shared location.
69+
70+
#### Using a temporary directory
71+
72+
A temporary directory can be used by calling:
4173

4274
```python
43-
from geant4_python_application import Application, basic_gdml
75+
from geant4_python_application import application_directory
4476

45-
# The data files will be downloaded to a temporary directory the first time this is called
46-
app = Application()
77+
application_directory(temp=True)
78+
```
4779

48-
app.setup_manager(n_threads=4)
49-
app.setup_physics()
50-
app.setup_detector(gdml=basic_gdml)
51-
app.setup_action()
80+
The operating system will take care of cleaning up the temporary directory.
81+
82+
It is possible that the operating system will delete only some of the files in
83+
the temporary directory which can lead to a Geant4 runtime error. In this case
84+
it's recommended to delete the temporary directory manually and let the
85+
application recreate it again.
86+
87+
## Usage
88+
89+
```python
90+
import geant4_python_application as g4
5291

53-
app.initialize()
92+
# Use a temporary directory for the Geant4 data files (remove this line to use the default location)
93+
g4.application_directory(temp=True)
5494

55-
events = app.run(n_events=100)
95+
with g4.Application(gdml=g4.basic_gdml, seed=137) as app:
96+
events = app.run(n_events=100)
5697

5798
print(events)
5899
```

Diff for: pyproject.toml

+1
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ dependencies = [
3737
"awkward-pandas",
3838
"numpy",
3939
"requests",
40+
"platformdirs",
4041
"tqdm",
4142
]
4243
description = "Geant4 Python Application"

Diff for: src/geant4_python_application/__init__.py

+4-2
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,9 @@
88
pybind11_version,
99
)
1010
from geant4_python_application.application import Application
11-
from geant4_python_application.datasets import data_dir, install_datasets
1211
from geant4_python_application.detector import Detector
12+
from geant4_python_application.files.datasets import data_directory, install_datasets
13+
from geant4_python_application.files.directories import application_directory
1314
from geant4_python_application.gdml import basic_gdml
1415

1516
version = __version__
@@ -24,6 +25,7 @@
2425
"Application",
2526
"Detector",
2627
"basic_gdml",
27-
"data_dir",
2828
"install_datasets",
29+
"data_directory",
30+
"application_directory",
2931
]

Diff for: src/geant4_python_application/application.py

+1-2
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@
77
import awkward as ak
88

99
import geant4_python_application
10-
import geant4_python_application.datasets
1110
import geant4_python_application.events
1211
from geant4_python_application._geant4_application import (
1312
Application as Geant4Application,
@@ -71,7 +70,7 @@ class Application:
7170
def __init__(
7271
self, n_threads: int = 0, gdml: str = None, physics=None, seed: int = 0
7372
):
74-
geant4_python_application.datasets.install_datasets(show_progress=True)
73+
geant4_python_application.install_datasets(show_progress=True)
7574

7675
self._pipe, child_pipe = multiprocessing.Pipe()
7776
self._process = multiprocessing.Process(

Diff for: src/geant4_python_application/files/__init__.py

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
from __future__ import annotations

Diff for: src/geant4_python_application/datasets.py renamed to src/geant4_python_application/files/datasets.py

+21-16
Original file line numberDiff line numberDiff line change
@@ -16,17 +16,21 @@
1616

1717
url = "https://cern.ch/geant4-data/datasets"
1818

19+
1920
# It is discouraged to use the package directory to store data
2021
# data_dir = os.path.join(os.path.dirname(__file__), "geant4/data")
2122
# another idea is to use 'platformdirs' to store data in a platform-specific location
2223

23-
data_dir = os.path.join(
24-
tempfile.gettempdir(),
25-
geant4_python_application.__name__,
26-
"geant4",
27-
geant4_python_application.geant4_version,
28-
"data",
29-
)
24+
25+
def data_directory() -> str:
26+
return os.path.join(
27+
geant4_python_application.application_directory(),
28+
geant4_python_application.__name__,
29+
"geant4",
30+
geant4_python_application.geant4_version,
31+
"data",
32+
)
33+
3034

3135
# the datasets versions should be updated with each Geant4 version
3236
# https://geant4.web.cern.ch/download/11.2.0.html#datasets
@@ -155,27 +159,28 @@ def _download_extract_dataset(dataset: Dataset, pbar: tqdm):
155159

156160
f.seek(0)
157161
with tarfile.open(fileobj=f, mode="r:gz") as tar:
158-
tar.extractall(data_dir)
162+
tar.extractall(data_directory())
159163

160164

161165
def install_datasets(force: bool = False, show_progress: bool = True):
162-
os.environ["GEANT4_DATA_DIR"] = data_dir
166+
os.environ["GEANT4_DATA_DIR"] = data_directory()
163167
datasets_to_download = []
164168
for dataset in datasets:
165-
path = os.path.join(data_dir, dataset.name + dataset.version)
169+
path = os.path.join(data_directory(), dataset.name + dataset.version)
166170
os.environ[dataset.env] = path
167171
if not os.path.exists(path) or force:
168172
datasets_to_download.append(dataset)
169173

170174
if len(datasets_to_download) == 0:
171175
return
172176

173-
os.makedirs(data_dir, exist_ok=True)
177+
os.makedirs(data_directory(), exist_ok=True)
174178
if show_progress:
175179
print(
176180
f"""
177-
Geant4 datasets (<2GB) will be installed to temporary directory {data_dir}
181+
Geant4 datasets (<2GB) will be installed to {data_directory()}
178182
This may take a while but only needs to be done once.
183+
You can override the default location by calling `application_directory(path)` or `application_directory(temp=True)` to use a temporary directory.
179184
The following Geant4 datasets will be installed: {", ".join([f"{dataset.name}@v{dataset.version}" for dataset in datasets_to_download])}"""
180185
)
181186

@@ -196,14 +201,14 @@ def install_datasets(force: bool = False, show_progress: bool = True):
196201
concurrent.futures.wait(futures)
197202

198203
if show_progress:
199-
total_size_gb = sum(fp.stat().st_size for fp in Path(data_dir).rglob("*")) / (
200-
1024**3
201-
)
204+
total_size_gb = sum(
205+
fp.stat().st_size for fp in Path(data_directory()).rglob("*")
206+
) / (1024**3)
202207
print(f"Geant4 datasets size on disk after extraction: {total_size_gb:.2f}GB")
203208

204209

205210
def uninstall_datasets():
206-
dir_to_remove = os.path.dirname(data_dir)
211+
dir_to_remove = os.path.dirname(data_directory())
207212
package_dir = os.path.dirname(__file__)
208213

209214
if not os.path.relpath(package_dir, dir_to_remove).startswith(".."):

Diff for: src/geant4_python_application/files/directories.py

+36
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
from __future__ import annotations
2+
3+
import os
4+
import pathlib
5+
import tempfile
6+
7+
import platformdirs
8+
9+
import geant4_python_application
10+
11+
_app_name = geant4_python_application.__name__
12+
# TODO: get from pyproject.toml
13+
_app_author = "lobis"
14+
15+
_dirs = platformdirs.AppDirs(_app_name, _app_author)
16+
17+
_application_directory = _dirs.user_data_dir
18+
19+
20+
def application_directory(path: str | None = None, *, temp: bool = False) -> str:
21+
global _application_directory
22+
23+
if temp and path:
24+
raise ValueError("Cannot set both temp and path options")
25+
if temp:
26+
_application_directory = tempfile.gettempdir()
27+
return _application_directory
28+
if path:
29+
# override the default application directory
30+
# make sure path exists, otherwise create it. Throw if it cannot be created
31+
pathlib.Path(path).mkdir(parents=True, exist_ok=True)
32+
if not os.path.isdir(path):
33+
raise ValueError(f"Cannot create application directory: {path}")
34+
_application_directory = path
35+
36+
return _application_directory

0 commit comments

Comments
 (0)