Skip to content

Commit 3696152

Browse files
authored
Merge pull request #12 from crowdsecurity/reroute_troubleshooting
Improve troubleshooting structure
2 parents b2fb554 + 668f06e commit 3696152

15 files changed

Lines changed: 167 additions & 52 deletions

File tree

CLAUDE.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,17 @@
33
Conventions for authoring this skill. This governs how skill content is **written** and
44
**validated**.
55

6+
# General rules
7+
8+
Never open responses with filler phrases like "Great question!", "Of course!", "Certainly!", or similar warmups. Start every response with the actual answer. No preamble, no acknowledgment of the question.
9+
10+
Match response length to task complexity. Simple questions get direct, short answers. Complex tasks get full, detailed responses. Never pad responses with restatements of the question or closing sentences that repeat what you just said.
11+
12+
Before any significant task, show me 2-3 ways you could approach this work. Wait for me to choose before proceeding.
13+
14+
If you are uncertain about any fact, statistic, date, or piece of technical information: say so explicitly before including it. Never fill gaps in your knowledge with plausible-sounding information. When in doubt, say so.
15+
16+
617
## Writing style
718

819
- **Be concise.** Technical documentation, not an essay. Favor tables, command recipes, and short
@@ -20,6 +31,36 @@ Conventions for authoring this skill. This governs how skill content is **writte
2031
- **Anchor to canonical docs.** Each reference doc cites the upstream CrowdSec docs URL it derives
2132
from. Claims trace to canonical documentation, not to memory.
2233

34+
## Content structure
35+
36+
`SKILL.md` is the router — a symptom/intent-indexed table that points into `references/`.
37+
All depth lives in `references/<area>/`, organized by the axis that fits the area:
38+
39+
| Dir | Organized by | Notes |
40+
|---|---|---|
41+
| `install/` | **platform** (one file each) | `bare-metal.md` (apt/dnf + systemd), `docker.md`, `kubernetes.md`, `console.md` (enrollment) — install mechanics genuinely diverge per platform. |
42+
| `configure/` | **config domain** | `acquisition`, `hub`, `profiles`, `notifications`, `allowlists`; platforms merged inline. `configure/bouncers/` nests one level by **service type** (`firewall`, `web-servers`). |
43+
| `operate/` | **task** | `health-check`, `upgrades`, `multi-server`. |
44+
| `appsec/` | **lifecycle** | `overview``deploy``configure``troubleshoot` (the WAF/AppSec feature silo). |
45+
| `debug/` | **kind** | `common/` (`triage`, `errors`, `platform-gotchas`) + `symptoms/` (`parsing`, `no-alerts`, `not-blocked`). Feature troubleshooting is *routed to* the feature's own dir (e.g. AppSec → `appsec/troubleshoot.md`), not duplicated under debug/. |
46+
| `migrate/` | **source product** | `from-fail2ban`. |
47+
| `scripts/` || helper scripts (`diagnose.sh`, `check-verification.py`); stdlib/bash only, runnable in static checks. |
48+
49+
**Split files vs inline the prefix.** When deciding whether a platform variant gets its own file:
50+
51+
- **Split into separate files** only when the *content itself* diverges — package managers, file
52+
paths, install/upgrade mechanics. `install/` is the canonical case.
53+
- **Keep one file with inline command-prefix notes** when the task is identical and only the
54+
invocation differs (`sudo cscli …``docker exec <name> …``kubectl exec -n <ns> <pod> -- …`).
55+
This is the default across `configure/`, `operate/`, `appsec/`, and `debug/`.
56+
- **Genuinely platform-specific *failure modes*** (not just prefixes — e.g. container mounts,
57+
SELinux/AppArmor, k8s RBAC) collect in one place (`debug/common/platform-gotchas.md`) rather than
58+
fragmenting a single symptom across per-platform files.
59+
60+
**Keep this current.** When you add, move, or remove a `references/` directory — or change an
61+
area's organizing axis — update the table above in the *same* change. This section is the
62+
authoritative map of the layout; let it drift and it stops being trustworthy.
63+
2364
## Testing
2465

2566
- **Nothing ships unverified.** Every command and every expected outcome must have been

skills/crowdsec/SKILL.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -75,11 +75,13 @@ Docker/k8s commands run inside the container/pod and do not need this.
7575
| "upgrade", "back up", "roll back", "new version", "tainted items after upgrade" | [references/operate/upgrades.md](./references/operate/upgrades.md) |
7676
| "multiple agents", "remote LAPI", "mTLS", "postgres backend" | [references/operate/multi-server.md](./references/operate/multi-server.md) *(TODO — stub)* |
7777
| "is it working?", "smoke test", "validate install", "verify setup", "did detection / WAF / blocking actually wire up?" | [references/operate/health-check.md](./references/operate/health-check.md) |
78-
| "it's broken" / "not working" / general diagnosis | [references/debug/triage.md](./references/debug/triage.md) → run `bash ${CLAUDE_SKILL_DIR}/scripts/diagnose.sh` |
79-
| "logs not parsed", "0 parsed" | [references/debug/parsing.md](./references/debug/parsing.md) |
80-
| "no alerts firing" | [references/debug/no-alerts.md](./references/debug/no-alerts.md) |
81-
| "decision exists but not blocked" | [references/debug/bouncer-not-blocking.md](./references/debug/bouncer-not-blocking.md) |
82-
| Specific error message | [references/debug/common-errors.md](./references/debug/common-errors.md) |
78+
| **Debug — common** · "it's broken" / "not working" / general diagnosis | [references/debug/common/triage.md](./references/debug/common/triage.md) → run `bash ${CLAUDE_SKILL_DIR}/scripts/diagnose.sh` |
79+
| **Debug — common** · specific error string | [references/debug/common/errors.md](./references/debug/common/errors.md) |
80+
| **Debug — common** · "container can't see logs", "mount", "SELinux/AppArmor denied", "k8s RBAC / DaemonSet" | [references/debug/common/platform-gotchas.md](./references/debug/common/platform-gotchas.md) |
81+
| **Debug — by symptom** · "logs not parsed", "0 parsed" | [references/debug/symptoms/parsing.md](./references/debug/symptoms/parsing.md) |
82+
| **Debug — by symptom** · "no alerts firing" | [references/debug/symptoms/no-alerts.md](./references/debug/symptoms/no-alerts.md) |
83+
| **Debug — by symptom** · "decision exists but not blocked" | [references/debug/symptoms/not-blocked.md](./references/debug/symptoms/not-blocked.md) |
84+
| **Debug — by feature** · AppSec/WAF not blocking, false positives, captcha | [references/appsec/troubleshoot.md](./references/appsec/troubleshoot.md) |
8385
| "switch from fail2ban" | [references/migrate/from-fail2ban.md](./references/migrate/from-fail2ban.md) *(TODO — stub)* |
8486

8587
For anything debug-shaped, the first move is almost always:
@@ -134,7 +136,7 @@ Where things live on a default bare-metal install:
134136
Confirm with the user before any of these:
135137

136138
- `cscli decisions delete --all` — wipes every active ban including CAPI-pulled blocklists. Use targeted `delete -i`, `delete -r`, `delete --id`, `delete --origin lists --scenario <name>`.
137-
- Editing hub-managed files under `/etc/crowdsec/{parsers,scenarios,collections,postoverflows,contexts}/` instead of the sibling `_custom/` directory — see [references/debug/triage.md](./references/debug/triage.md) § Hard don'ts.
139+
- Editing hub-managed files under `/etc/crowdsec/{parsers,scenarios,collections,postoverflows,contexts}/` instead of the sibling `_custom/` directory — see [references/debug/common/triage.md](./references/debug/common/triage.md) § Hard don'ts.
138140
- Disabling a signature collection wholesale to silence a false positive — pick the right suppression layer (allowlist / whitelist parser / postoverflow) per [references/configure/allowlists.md](./references/configure/allowlists.md) § Suppression mechanisms.
139141
- Mutating host firewall state (firewall bouncer install, `ipset` flush, iptables↔nftables switch) without confirming — the firewall bouncer can wipe rule chains other tools depend on.
140142
- Skipping `--reset-then-reuse-values` on `helm upgrade crowdsec` — silently drops values.

skills/crowdsec/references/configure/acquisition.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ Acquisition tells the engine **what logs to read and how to label them**. Each s
1414
declares a `source:` (the datasource type) and a `labels.type:` (the parser hint). If the
1515
engine reads lines but they show up as **`Lines unparsed`**, acquisition is usually fine
1616
and the problem is the `type:` or the parser — debug that with
17-
[../debug/parsing.md](../debug/parsing.md). If a source shows **0 `Lines read`**, the
17+
[../debug/symptoms/parsing.md](../debug/symptoms/parsing.md). If a source shows **0 `Lines read`**, the
1818
problem is here.
1919

2020
## Where acquisition lives

skills/crowdsec/references/configure/bouncers/firewall.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ Only register manually when the bouncer runs on a **different host** than LAPI
6464
> `/var/log/crowdsec-firewall-bouncer.log` (and the dpkg `--configure` step errors).
6565
> Re-register: `cscli bouncers delete <name>`, `KEY=$(cscli bouncers add fw-local -o raw)`,
6666
> write it into the yaml's `api_key:`, `systemctl restart crowdsec-firewall-bouncer`.
67-
> See [../../debug/bouncer-not-blocking.md](../../debug/bouncer-not-blocking.md) § 3.
67+
> See [../../debug/symptoms/not-blocked.md](../../debug/symptoms/not-blocked.md) § 3.
6868
6969
## 3 — What it creates in nftables
7070

@@ -140,7 +140,7 @@ sudo cscli decisions delete -i 192.0.2.66
140140
container-to-container blocking matters.
141141
- **"Banned but still reachable"** → almost always `update_frequency` not
142142
elapsed, `disable_ipv6` masking a v6 client, or the bouncer service stopped.
143-
Full decision tree: [../../debug/bouncer-not-blocking.md](../../debug/bouncer-not-blocking.md).
143+
Full decision tree: [../../debug/symptoms/not-blocked.md](../../debug/symptoms/not-blocked.md).
144144

145145
## Teardown
146146

skills/crowdsec/references/configure/bouncers/web-servers.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -340,7 +340,7 @@ docker exec crowdsec cscli metrics show appsec # Processed/Blocked increment
340340
- **WAF off silently:** `crowdsecAppsecEnabled` defaults to `false`, and AppSec must listen on
341341
`0.0.0.0:7422` (not loopback) for a containerized Traefik to reach it.
342342
- **`stream` lag:** a fresh ban lands within `updateIntervalSeconds`; immediate ban-then-curl
343-
looks like a failure. (See [../../debug/bouncer-not-blocking.md](../../debug/bouncer-not-blocking.md).)
343+
looks like a failure. (See [../../debug/symptoms/not-blocked.md](../../debug/symptoms/not-blocked.md).)
344344

345345
### Kubernetes (Helm) — extra gotchas
346346

skills/crowdsec/references/configure/hub.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ editing them taints the item and your change is lost on the next `--force` upgra
101101
Instead, drop an override file in the sibling `_custom/` directory for that type
102102
(`scenarios/.../_custom/`, `parsers/.../_custom/`, etc.). Overrides are merged on top of the
103103
hub item by `name`, survive upgrades, and keep the hub item pristine. See
104-
[../debug/triage.md](../debug/triage.md) § Hard don'ts and the SKILL.md Hard don'ts list.
104+
[../debug/common/triage.md](../debug/common/triage.md) § Hard don'ts and the SKILL.md Hard don'ts list.
105105

106106
To remove a collection and its pulled items:
107107

skills/crowdsec/references/debug/common-errors.md renamed to skills/crowdsec/references/debug/common/errors.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -30,32 +30,32 @@ Match the error string the engine/bouncer printed to the row below.
3030
3131
| Error string | Cause | Fix |
3232
|---|---|---|
33-
| `datasource of type appsec: … cannot parse appsec configuration: [2:3] cannot unmarshal []interface {} into Go struct field Configuration.AppsecConfig of type string` | `appsec_config:` (singular) given a **list** | Use the **plural** key `appsec_configs:` for a list; singular takes one string. See [../appsec/configure.md](../appsec/configure.md). |
34-
| `unable to initialize inband engine : invalid WAF config from string: failed to compile the directive "secrule": duplicated rule id 100` | Two appsec-configs on one listener pull the **same** underlying rule (e.g. both include `base-config`/`vpatch-*`) | Use non-overlapping configs, or just `crowdsecurity/appsec-default` alone. See [../appsec/configure.md](../appsec/configure.md). |
35-
| `no appsec-rules found for pattern <name>` | A bare appsec-config was installed without its rules; engine expands globs at load, `cscli` does not | Install via the **collection** (`cscli collections install crowdsecurity/appsec-virtual-patching`), which pulls the rule graph. See [../appsec/deploy.md](../appsec/deploy.md). |
33+
| `datasource of type appsec: … cannot parse appsec configuration: [2:3] cannot unmarshal []interface {} into Go struct field Configuration.AppsecConfig of type string` | `appsec_config:` (singular) given a **list** | Use the **plural** key `appsec_configs:` for a list; singular takes one string. See [../appsec/configure.md](../../appsec/configure.md). |
34+
| `unable to initialize inband engine : invalid WAF config from string: failed to compile the directive "secrule": duplicated rule id 100` | Two appsec-configs on one listener pull the **same** underlying rule (e.g. both include `base-config`/`vpatch-*`) | Use non-overlapping configs, or just `crowdsecurity/appsec-default` alone. See [../appsec/configure.md](../../appsec/configure.md). |
35+
| `no appsec-rules found for pattern <name>` | A bare appsec-config was installed without its rules; engine expands globs at load, `cscli` does not | Install via the **collection** (`cscli collections install crowdsecurity/appsec-virtual-patching`), which pulls the rule graph. See [../appsec/deploy.md](../../appsec/deploy.md). |
3636
| `no such datasource` / source type unknown | `source:`/`labels.type:` typo or a datasource the build doesn't support | Fix the key in the `acquis.d/*.yaml`; `crowdsec -t` points at the file:line. |
37-
| Source reads lines but **0 parsed** | `type:` label doesn't match any installed parser | [parsing.md](./parsing.md). |
37+
| Source reads lines but **0 parsed** | `type:` label doesn't match any installed parser | [parsing.md](../symptoms/parsing.md). |
3838
3939
## Permissions / OS
4040
4141
| Symptom | Cause | Fix |
4242
|---|---|---|
43-
| `permission denied` opening a log file; or source present but 0 lines read | `crowdsec` user can't read the file | `sudo -u crowdsec head <path>`; fix ownership/ACL. If that user *can* read it but the engine still can't, it's **SELinux/AppArmor** — `ausearch -m avc -ts recent` / `dmesg | grep DENIED`, then relabel/add policy (don't disable enforcement). |
43+
| `permission denied` opening a log file; or source present but 0 lines read | `crowdsec` user can't read the file | `sudo -u crowdsec head <path>`; fix ownership/ACL. If that user *can* read it but the engine still can't, it's **SELinux/AppArmor** → [platform-gotchas.md](./platform-gotchas.md). |
4444
| apt install of a bouncer hangs: `Failed to open terminal … debconf: whiptail output the above errors, giving up!` | A debconf dialog (e.g. pending-kernel notice) on a non-interactive shell | Re-run with `sudo DEBIAN_FRONTEND=noninteractive apt install -y …`. |
4545
4646
## LAPI / CAPI / auth
4747
4848
| Error | Cause | Fix |
4949
|---|---|---|
5050
| Agent: `unable to authenticate … machine not validated` | Agent machine not registered/validated with LAPI | `cscli machines list`; validate with `cscli machines validate <name>` (or re-`cscli machines add` on the agent). |
51-
| Bouncer log: **HTTP 401** on decision pull | Bouncer key ≠ LAPI key (rotated, stale config, re-added) | `cscli bouncers list`; re-issue and paste the key into the bouncer config. [bouncer-not-blocking.md](./bouncer-not-blocking.md) §3. |
51+
| Bouncer log: **HTTP 401** on decision pull | Bouncer key ≠ LAPI key (rotated, stale config, re-added) | `cscli bouncers list`; re-issue and paste the key into the bouncer config. [not-blocked.md](../symptoms/not-blocked.md) §3. |
5252
| `cscli capi status` fails / CAPI register errors | Missing `online_api_credentials.yaml`, **clock skew**, or egress blocked to `api.crowdsec.net` | `cscli capi register` then reload; check `timedatectl` (TLS fails on skew); allow egress / set proxy. |
5353
5454
## Database
5555
5656
| Error | Cause | Fix |
5757
|---|---|---|
58-
| `database is locked` (sqlite) | Concurrent writers / slow disk; sqlite single-writer | Reduce write pressure; move `crowdsec.db` to faster storage; for multi-agent or high volume switch the backend to PostgreSQL — see [../operate/multi-server.md](../operate/multi-server.md). |
58+
| `database is locked` (sqlite) | Concurrent writers / slow disk; sqlite single-writer | Reduce write pressure; move `crowdsec.db` to faster storage; for multi-agent or high volume switch the backend to PostgreSQL — see [../operate/multi-server.md](../../operate/multi-server.md). |
5959
| sqlite errors + `df` shows full `/var/lib/crowdsec` | Disk full → silent alert-write failure | Free space / rotate; alerts resume. |
6060
6161
## Hub
@@ -69,10 +69,10 @@ Match the error string the engine/bouncer printed to the row below.
6969
7070
| Symptom | Likely cause | Confirm |
7171
|---|---|---|
72-
| Expected ban "not happening" for an IP | The IP matches an **allowlist** | `cscli allowlists check <ip>` → [../configure/allowlists.md](../configure/allowlists.md). |
73-
| Decision exists, traffic still passes | Bouncer latency / scope / key / IP family | Full ladder: [bouncer-not-blocking.md](./bouncer-not-blocking.md). |
72+
| Expected ban "not happening" for an IP | The IP matches an **allowlist** | `cscli allowlists check <ip>` → [../../configure/allowlists.md](../../configure/allowlists.md). |
73+
| Decision exists, traffic still passes | Bouncer latency / scope / key / IP family | Full ladder: [not-blocked.md](../symptoms/not-blocked.md). |
7474
7575
When the string isn't here, capture the full forensic bundle with
76-
[`scripts/diagnose.sh`](../../scripts/diagnose.sh) and read the agent log around
76+
[`scripts/diagnose.sh`](../../../scripts/diagnose.sh) and read the agent log around
7777
the first `level=error`/`FATAL` — the *first* error is usually the root cause;
7878
later ones are fallout.

0 commit comments

Comments
 (0)