|
| 1 | +# Diff-based CI test selection |
| 2 | + |
| 3 | +This document explains the diff-driven test-selection system that lets some GPU CI |
| 4 | +workflows run only the tests a PR could affect, instead of the whole suite, while |
| 5 | +still satisfying Required status checks. |
| 6 | + |
| 7 | +It currently drives **`modal-torch-latest`** (which runs `tests/unit/v1/` on |
| 8 | +[modal.com](https://modal.com) GPUs), but is built to drive more workflows from |
| 9 | +one config — see [Adding a workflow](#adding-a-new-workflow). |
| 10 | + |
| 11 | +- [TL;DR](#tldr) |
| 12 | +- [Why](#why) |
| 13 | +- [Moving parts](#moving-parts) |
| 14 | +- [How a decision is made](#how-a-decision-is-made) |
| 15 | +- [Driving it (as a contributor)](#driving-it-as-a-contributor) |
| 16 | +- [Changing it (as a maintainer)](#changing-it-as-a-maintainer) |
| 17 | +- [Security model](#security-model) |
| 18 | +- [Failure modes & guarantees](#failure-modes--guarantees) |
| 19 | +- [FAQ / troubleshooting](#faq--troubleshooting) |
| 20 | + |
| 21 | + |
| 22 | +## TL;DR |
| 23 | + |
| 24 | +- On a PR, `ci/tests_fetcher.py` diffs your branch against the base branch, traces |
| 25 | + the import graph from your changed files to the impacted tests, and writes the |
| 26 | + list to `ci/.test_selection/test_list.txt`. |
| 27 | +- The workflow runs pytest on just that list. `push` to `master` and manual runs |
| 28 | + always run everything. |
| 29 | +- It is **fail-safe**: anything it can't reason about safely → run the *full* suite. |
| 30 | + It never silently runs *fewer* tests than reality. |
| 31 | +- Preview locally: |
| 32 | + ```bash |
| 33 | + python ci/tests_fetcher.py --base origin/master # what would CI run? |
| 34 | + python ci/tests_fetcher.py --base origin/master --explain # ...and why? |
| 35 | + ``` |
| 36 | +- Force a full run: put `[test all]` (or `[no filter]`) in a commit message. |
| 37 | + |
| 38 | + |
| 39 | +## Why |
| 40 | + |
| 41 | +`modal-torch-latest` is a **Required** check, so it must report a status on every |
| 42 | +PR — which means we can't use GitHub's path filters (`on.<event>.paths`) to skip |
| 43 | +it, because a skipped Required job blocks merges. Instead, the job always runs but |
| 44 | +*selects* what to test: a docs-only PR resolves to "no impacted tests" and exits |
| 45 | +fast (still green), while a PR touching a leaf module runs only the tests that |
| 46 | +import it. |
| 47 | + |
| 48 | +The design is a small, self-contained take on HuggingFace `transformers`' |
| 49 | +`utils/tests_fetcher.py`. |
| 50 | + |
| 51 | + |
| 52 | +## Moving parts |
| 53 | + |
| 54 | +| File | Role | |
| 55 | +| --- | --- | |
| 56 | +| `.github/workflows/modal-torch-latest.yml` | The workflow: a `collect-tests` job (runs the fetcher) gating a `deploy` job (runs pytest on modal). | |
| 57 | +| `ci/tests_fetcher.py` | The selector. AST-parses the repo, builds an import graph, decides `all` / `subset` / `none`, writes the test-list file, emits a job summary. | |
| 58 | +| `ci/test_tests_fetcher.py` | Self-tests for the selector (pure stdlib; run in `collect-tests`). | |
| 59 | +| `ci/torch_latest.py` | The modal runner. Reads the test-list file and feeds it to pytest. | |
| 60 | +| `ci/.test_selection/test_list.txt` | The hand-off artifact (one pytest target per line). Git-ignored. | |
| 61 | + |
| 62 | +### Job flow |
| 63 | + |
| 64 | +``` |
| 65 | + pull_request_target / push / workflow_dispatch |
| 66 | + │ |
| 67 | + ┌─────────────┴─────────────┐ |
| 68 | + │ collect-tests │ (no secrets; AST-only) |
| 69 | + │ checkout PR head │ |
| 70 | + │ restore ci/ from base │ |
| 71 | + │ self-test the fetcher │ |
| 72 | + │ run tests_fetcher.py │ |
| 73 | + │ → mode (all|subset|none) │ |
| 74 | + │ → upload test_list.txt │ |
| 75 | + └─────────────┬──────────────┘ |
| 76 | + │ needs: |
| 77 | + ▼ |
| 78 | + ┌────────────────────────────┐ |
| 79 | + │ deploy │ (has modal/HF secrets) |
| 80 | + │ if mode != none (or manual │ |
| 81 | + │ / collector failed) │ |
| 82 | + │ checkout PR head │ |
| 83 | + │ restore ci/ from base │ |
| 84 | + │ download test_list.txt │ |
| 85 | + │ modal run ci.torch_latest │ |
| 86 | + └────────────────────────────┘ |
| 87 | +``` |
| 88 | + |
| 89 | +`mode` controls `deploy`: |
| 90 | + |
| 91 | +- **`none`** → `deploy` is skipped. The Required status is still satisfied (a |
| 92 | + skipped dependent job counts as success here). |
| 93 | +- **`subset`** → `deploy` runs pytest on exactly the impacted files. |
| 94 | +- **`all`** → `deploy` runs the whole scope (`tests/unit/v1`). |
| 95 | + |
| 96 | + |
| 97 | +## How a decision is made |
| 98 | + |
| 99 | +`TestSelector.select()` in `ci/tests_fetcher.py` runs these checks in order; the |
| 100 | +first that matches wins: |
| 101 | + |
| 102 | +1. **No base ref** (push / manual) → `all`. |
| 103 | +2. **Base ref unresolvable** → `all`. |
| 104 | +3. **No merge-base** with the base (e.g. shallow clone, unrelated history) → `all`. |
| 105 | + A diff here would be wrong, so we never narrow on it. |
| 106 | +4. **Commit message tag** `[test all]` / `[no filter]` anywhere on the branch → `all`. |
| 107 | +5. **A changed file matches a run-all glob** (`COMMON_RUN_ALL_GLOBS` + |
| 108 | + the workflow's `extra_run_all_globs`) → `all`. These are files too central or |
| 109 | + too dynamic to narrow safely: CI scripts, build system, `csrc/`, `op_builder/`, |
| 110 | + `accelerator/`, shared fixtures (`tests/unit/common.py`, `tests/conftest.py`, |
| 111 | + `pytest.ini`), and core runtime hubs (`deepspeed/__init__.py`, |
| 112 | + `deepspeed/runtime/engine.py`, `deepspeed/comm/**`, `deepspeed/accelerator/**`, …). |
| 113 | +6. **A deleted module is still imported** by a surviving file (a dangling import |
| 114 | + the graph can't follow) → `all`. A *clean* deletion (importers removed/updated |
| 115 | + in the same PR) does **not** trigger this. |
| 116 | +7. Otherwise, **narrow via the import graph** (below). If nothing is impacted → |
| 117 | + `none`; if everything is → `all`; else → `subset`. |
| 118 | + |
| 119 | +### The import graph |
| 120 | + |
| 121 | +- Nodes are Python files under the package roots: `deepspeed/**` and the `unit` |
| 122 | + test-helper package at `tests/unit/**`. |
| 123 | +- An edge `A → B` means "B imports A". The selector walks **backwards** from each |
| 124 | + changed file to every test that (transitively) imports it. |
| 125 | +- **Opaque hub modules** (`OPAQUE_MODULES`: `deepspeed`, `deepspeed.comm`, |
| 126 | + `deepspeed.accelerator`): almost every test imports `unit.common`, which imports |
| 127 | + these. Their `__init__.py` files eagerly pull in huge subtrees, so if treated as |
| 128 | + normal nodes *any* `deepspeed/**` change would fan out to the whole suite. We |
| 129 | + therefore don't expand their `__init__` imports; instead, changes to the hubs |
| 130 | + themselves are caught by the run-all globs in step 5. |
| 131 | +- **`conftest.py`** changes select every test under that conftest's directory. |
| 132 | +- **New test files** are selected directly (they have no importers yet). |
| 133 | + |
| 134 | +### Dynamic edges (`DYNAMIC_EDGES`) |
| 135 | + |
| 136 | +Some dependencies are wired at runtime (monkey-patching, plugin/registry lookup, |
| 137 | +JIT-loaded ops, `deepspeed.initialize()`-time `replace_module` injection), so a |
| 138 | +test can depend on code it never `import`s. `DYNAMIC_EDGES` is a curated map of |
| 139 | +`changed-file glob → extra test-path globs` that patches these blind spots. It is |
| 140 | +additive on top of the static graph (and is only consulted if step 5 didn't |
| 141 | +already short-circuit to `all`). |
| 142 | + |
| 143 | + |
| 144 | +## Driving it (as a contributor) |
| 145 | + |
| 146 | +### Preview what CI will run |
| 147 | + |
| 148 | +The fetcher is pure stdlib — no DeepSpeed/torch install needed, just `git`: |
| 149 | + |
| 150 | +```bash |
| 151 | +# Selection for your branch vs. the base branch |
| 152 | +python ci/tests_fetcher.py --base origin/master |
| 153 | +cat ci/.test_selection/test_list.txt |
| 154 | + |
| 155 | +# Explain *why* each test was selected (prints import chains) |
| 156 | +python ci/tests_fetcher.py --base origin/master --explain |
| 157 | +``` |
| 158 | + |
| 159 | +`--explain` output looks like: |
| 160 | + |
| 161 | +``` |
| 162 | +deepspeed/shared.py impacts: |
| 163 | + tests/unit/v1/test_shared.py <- deepspeed/shared.py |
| 164 | + tests/unit/v1/moe/test_moe.py <- tests/unit/v1/moe/test_moe.py <- deepspeed/shared.py |
| 165 | +``` |
| 166 | + |
| 167 | +### Escape hatches |
| 168 | + |
| 169 | +- **Force the full suite for a push:** include `[test all]` (or `[no filter]`) |
| 170 | + anywhere in a commit message on the branch. |
| 171 | +- **Touch an infra file:** any change to a run-all glob runs everything. |
| 172 | +- **Found a missed test?** It's likely a runtime/dynamic dependency the static |
| 173 | + graph can't see — add a `DYNAMIC_EDGES` entry (see below) and/or report it. |
| 174 | + |
| 175 | + |
| 176 | +## Changing it (as a maintainer) |
| 177 | + |
| 178 | +All knobs live in `ci/tests_fetcher.py`. After any change, run the self-tests: |
| 179 | + |
| 180 | +```bash |
| 181 | +python ci/test_tests_fetcher.py # standalone |
| 182 | +# or: pytest ci/test_tests_fetcher.py |
| 183 | +``` |
| 184 | + |
| 185 | +### Add a run-all trigger |
| 186 | + |
| 187 | +Add a glob to `TestSelector.COMMON_RUN_ALL_GLOBS` (shared across workflows) or to |
| 188 | +a specific workflow's `extra_run_all_globs`. Globs match repo-root-relative POSIX |
| 189 | +paths; `dir/**` matches everything under `dir/`. |
| 190 | + |
| 191 | +### Add a dynamic edge |
| 192 | + |
| 193 | +Add to `TestSelector.DYNAMIC_EDGES`: |
| 194 | + |
| 195 | +```python |
| 196 | +DYNAMIC_EDGES = { |
| 197 | + # changed-file glob : test-path globs to pull in when it changes |
| 198 | + "deepspeed/module_inject/**": ("tests/unit/v1/moe/**",), |
| 199 | +} |
| 200 | +``` |
| 201 | + |
| 202 | +Keep entries conservative: too broad just wastes GPU time; too narrow misses |
| 203 | +coverage. Prefer fixing it here over widening a run-all glob when only a slice of |
| 204 | +tests is truly affected. |
| 205 | + |
| 206 | +### Mark a module opaque |
| 207 | + |
| 208 | +If a new universal hub starts fanning every change out to the whole suite, add it |
| 209 | +to `OPAQUE_MODULES` and (usually) add the hub file itself to the run-all globs so |
| 210 | +real changes to it still run everything. |
| 211 | + |
| 212 | +### Add a new workflow |
| 213 | + |
| 214 | +The engine is config-driven (`WORKFLOWS` registry of `WorkflowConfig`). To drive |
| 215 | +another workflow: |
| 216 | + |
| 217 | +1. Add an entry: |
| 218 | + ```python |
| 219 | + WORKFLOWS["my-workflow"] = WorkflowConfig( |
| 220 | + name="my-workflow", |
| 221 | + test_scopes=("tests/unit/v2",), # dirs this workflow runs |
| 222 | + extra_run_all_globs=(".github/workflows/my-workflow.yml",), |
| 223 | + ) |
| 224 | + ``` |
| 225 | +2. In that workflow's YAML, mirror `modal-torch-latest.yml`'s `collect-tests` job |
| 226 | + and call the fetcher with `--workflow my-workflow`. |
| 227 | +3. Point the runner at `ci/.test_selection/test_list.txt`. |
| 228 | +4. Add coverage to `ci/test_tests_fetcher.py` if the scope/behavior differs. |
| 229 | + |
| 230 | +### Add a self-test |
| 231 | + |
| 232 | +`ci/test_tests_fetcher.py` builds throwaway git repos (`TmpRepo`) from a synthetic |
| 233 | +`BASELINE` tree and asserts the resulting `mode`/`tests`. Add a `test_*` function |
| 234 | +for any new behavior; it runs both standalone and under pytest, and executes in |
| 235 | +the `collect-tests` CI job, so a broken selector is caught before it mis-picks |
| 236 | +tests. |
| 237 | + |
| 238 | + |
| 239 | +## Security model |
| 240 | + |
| 241 | +The workflow triggers on **`pull_request_target`**, so it runs in the base repo's |
| 242 | +context and the `deploy` job has the modal/HF **secrets** in scope. To keep PRs |
| 243 | +from abusing that: |
| 244 | + |
| 245 | +- `collect-tests` holds **no secrets** and only **AST-parses** (never executes) |
| 246 | + the PR tree. |
| 247 | +- **Both jobs restore `ci/` from the base branch** before using it: |
| 248 | + - `collect-tests` runs the **base** selector + self-tests. This job decides |
| 249 | + whether the Required `deploy` runs, so it must not trust the PR's own |
| 250 | + `ci/tests_fetcher.py` — otherwise a PR could rewrite it to emit `mode=none` |
| 251 | + and skip CI while still going green. The diff is computed from git history, so |
| 252 | + a PR's `ci/` changes still appear in the diff and (via the base selector's |
| 253 | + `ci/**` run-all glob) force a full run. |
| 254 | + - `deploy` runs the **base** orchestration (which drives `modal run` with the |
| 255 | + secrets), so a PR can't repoint it at the secrets to exfiltrate them. |
| 256 | + In both jobs the PR's `deepspeed/` + `tests/` are what gets parsed/tested. |
| 257 | +- The `pull_request_target` trigger types are `review_requested`, |
| 258 | + `ready_for_review`, and `synchronize`. Because `synchronize` re-runs on every |
| 259 | + push to an open PR (not just on a maintainer action), the maintainer review is a |
| 260 | + mitigation, **not** an absolute barrier — the base-`ci/` restore above is the |
| 261 | + primary protection for the secrets. |
| 262 | + |
| 263 | +> **Consequence:** changes to `ci/*` (including `tests_fetcher.py` itself) take |
| 264 | +> effect under `pull_request_target` only after they're **merged**. Validate |
| 265 | +> `ci/*` changes via a `pull_request`-triggered run or the `modal` CLI locally. |
| 266 | +> |
| 267 | +> **Bootstrap:** when the base branch has no selector yet (the PR that introduces |
| 268 | +> it), the restored base `ci/` won't contain `tests_fetcher.py`; `collect-tests` |
| 269 | +> detects this and falls back to running the full suite. |
| 270 | +
|
| 271 | + |
| 272 | +## Failure modes & guarantees |
| 273 | + |
| 274 | +The selector is built to **fail safe — to `all`, never to `none`**: |
| 275 | + |
| 276 | +- Missing/unresolvable base, no merge-base, a parse error on a file, or **any |
| 277 | + unexpected exception** in the selector → it falls back to the full suite (the |
| 278 | + top-level handler in `main()` logs the traceback and sets `mode=all`). |
| 279 | +- If `collect-tests` fails entirely, `deploy` still runs (and the workflow has an |
| 280 | + explicit "fail if selection failed" guard) so a broken collector can't let a PR |
| 281 | + pass without testing. |
| 282 | +- The only way to run *fewer* tests is a clean, well-understood narrow decision; |
| 283 | + every uncertain case widens to everything. |
| 284 | + |
| 285 | +Every run writes a summary to the GitHub job summary (mode, reason, and the |
| 286 | +selected files) so the decision is auditable from the Actions UI. |
| 287 | + |
| 288 | + |
| 289 | +## FAQ / troubleshooting |
| 290 | + |
| 291 | +**My PR shows `mode=none` but I changed code.** |
| 292 | +Either the change is non-Python / out of the workflow's scope, or your edits |
| 293 | +aren't committed (the fetcher diffs *committed* history, `merge-base..HEAD`). |
| 294 | + |
| 295 | +**A relevant test wasn't selected.** |
| 296 | +Likely a runtime/dynamic dependency. Confirm with `--explain`, then add a |
| 297 | +`DYNAMIC_EDGES` entry. As a stop-gap, push with `[test all]`. |
| 298 | + |
| 299 | +**Everything runs even for a tiny change.** |
| 300 | +You touched a run-all glob (CI/build/shared fixture/core runtime), or a hub module |
| 301 | +fanned out. Check the job summary's `reason`. If a hub over-fans, consider |
| 302 | +`OPAQUE_MODULES`. |
| 303 | + |
| 304 | +**It ran the full suite and the summary says "shallow clone?" / "no merge-base".** |
| 305 | +The runner didn't have enough history to compute the diff. `collect-tests` uses |
| 306 | +`fetch-depth: 0`; if you changed that, restore full history for the base. |
0 commit comments