You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -43,6 +43,7 @@ This command runs a job in Studio using the specified query file. You can config
43
43
*`--start-time START_TIME` - Time to schedule the task in YYYY-MM-DDTHH:mm format or natural language.
44
44
*`--cron CRON` - Cron expression for the cron task.
45
45
*`--no-wait` - Do not wait for the job to finish.
46
+
*`--ignore-checkpoints` - Ignore existing checkpoints and run from scratch.
46
47
*`-h`, `--help` - Show the help message and exit.
47
48
*`-v`, `--verbose` - Be verbose.
48
49
*`-q`, `--quiet` - Be quiet.
@@ -156,6 +157,7 @@ datachain job run query.py --no-wait
156
157
157
158
## Notes
158
159
160
+
***Checkpoints**: Running the same script multiple times via `datachain job run` automatically links jobs together, enabling checkpoint reuse. If a previous run of the same script (by absolute path) exists, DataChain will resume from where it left off.
159
161
* Closing the logs command (e.g., with Ctrl+C) will only stop displaying the logs but will not cancel the job execution
160
162
* To cancel a running job, use the `datachain job cancel` command
161
163
* The job will continue running in Studio even after you stop viewing the logs
Copy file name to clipboardExpand all lines: docs/guide/checkpoints.md
+13-2Lines changed: 13 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,13 +20,24 @@ This means that if your script creates multiple datasets and fails partway throu
20
20
21
21
### Studio Runs
22
22
23
-
When running jobs on Studio, the checkpoint workflow is managed through the UI:
23
+
#### Using `datachain job run` CLI
24
+
25
+
When you run `datachain job run my_script.py`, DataChain automatically:
26
+
27
+
1.**Links jobs** by finding previous runs of the same script (by absolute path) that were also executed in Studio
28
+
2.**Passes checkpoint context** to Studio, enabling checkpoint reuse across runs
29
+
30
+
This means running the same script multiple times via `datachain job run` will automatically benefit from checkpoints without any additional configuration.
31
+
32
+
#### Using Studio UI
33
+
34
+
When triggering jobs through the Studio interface:
24
35
25
36
1.**Job execution** is triggered using the Run button in the Studio interface
26
37
2.**Checkpoint control** is explicit - you choose between:
27
38
-**Run from scratch**: Ignores any existing checkpoints and recreates all datasets
28
39
-**Continue from last checkpoint**: Resumes from the last successful checkpoint, skipping already-completed stages
29
-
3.**Parent-child job linking** is handled automatically by the system - no need for script path matching or job name conventions
40
+
3.**Parent-child job linking** is handled automatically by the system
30
41
4.**Checkpoint behavior** during execution is the same as local runs: datasets are saved at each `.save()` call and can be reused on retry
0 commit comments