-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Found your operating-production-services skill while browsing the registry—the way you've structured the progressive disclosure for such a dense topic (97/100 for a reason) makes me curious how you'd handle even more edge cases around observability and incident response.
Links:
The TL;DR
You're at 97/100, solidly in A-grade territory. This is based on Anthropic's skill best practices rubric. Your strongest area is Writing Style (10/10)—the skill reads like documentation written by someone who actually runs production systems, not a marketing pamphlet. Weakest spot is Spec Compliance (12/15), mostly because you're leaving discoverability points on the table with trigger phrases.
What's Working Well
- Blameless postmortem framework - The 5 Whys template and postmortem meeting checklist give Claude concrete structure for handling incidents. That's the kind of thing teams actually need.
- Token economy is chef's kiss - slo-alerting.md delegates heavy technical details while SKILL.md stays lean. You're not dumping a 200-line reference file on someone; you're layering it thoughtfully.
- Practical burn rate guidance - The multi-window alerting patterns with specific Prometheus queries and Grafana dashboard structure mean Claude can actually implement this, not just read philosophy.
- Clear scope boundaries - Your description explicitly calls out SLO alerting and postmortems while noting what you don't cover (deployment strategies, team structure). That's rare and helpful.
The Big One
slo-alerting.md (189 lines) is missing a table of contents. This hurts your navigation score because at 100+ lines, readers need an anchor point. Right now someone has to scroll through Prometheus rules, Grafana templates, and example YAMLs without knowing what's coming.
Add this at the top:
## Contents
- [Prometheus Recording Rules](#prometheus-recording-rules)
- [Multi-Window Burn Rate Alerts](#multi-window-burn-rate-alerts)
- [Burn Rate Reference](#burn-rate-reference)
- [Grafana Dashboard](#grafana-dashboard)
- [SLO Definition Template](#slo-definition-template)
- [Common Mistakes](#common-mistakes)Impact: +1 point to PDA (gets you to 28/30).
Other Things Worth Fixing
-
Expand trigger phrases in your frontmatter description - You're only hitting 1-2 right now. Add "error budget", "incident response", "reliability metrics" to catch more discovery queries. (-3 points on Spec Compliance; this could recover that easily).
-
Add one more example template - You've got postmortem templates and SLO YAML. A quick Alertmanager config snippet showing how to route burn rate alerts would give Claude another angle to work from.
-
Reference section could name-check - slo-alerting.md is good but it's generic. Could use a line in SKILL.md like "See references/slo-alerting.md for Prometheus query patterns and Grafana dashboard templates" to make the connection explicit.
Quick Wins
- Add TOC to slo-alerting.md → +1 point
- Expand trigger phrases (error budget, incident response, reliability) → +2-3 points
- One more config example (Alertmanager routing) → +0-1 point
These three things could realistically push you to 99-100.
Checkout your skill here: [SkillzWave.ai](https://skillzwave.ai) | [SpillWave](https://spillwave.com) We have an agentic skill installer that install skills in 14+ coding agent platforms. Check out this guide on how to improve your agentic skills.