This repository provides tooling to transform GitHub Security Advisories (GHSA) into Common Security Advisory Framework (CSAF) 2.0 advisories. Its primary focus is a converter that ingests GHSA JSON (from the GitHub API) and produces a valid CSAF advisory file.
This project is an exploratory, proof‑of‑concept attempt to see how far a direct GHSA → CSAF mapping can go without extensive manual curation. It is not a complete nor authoritative converter, and the output should be treated as a starting point for experimentation, review, and potential enrichment.
Why explore this?
- To surface the practical friction points when aligning an informal ecosystem advisory format (GHSA) with a formal standard (CSAF).
- To provide a lightweight sandbox for evaluating the viability of adopting CSAF for open source package advisories.
- To document gaps and edge cases rather than to claim seamless interoperability.
What this is NOT:
- A drop‑in production adapter guaranteeing schema fullness or semantic fidelity in all cases.
- An exhaustive coverage of all CSAF fields or advanced constructs (distribution, localization, signature handling, etc.).
- A normalization engine that infers missing business/vendor context.
Expectation management & disclaimers:
- Information loss is unavoidable: many CSAF fields lack source data in GHSA and are intentionally omitted.
- Some field representations are simplified or flattened (e.g., product taxonomy) to keep the prototype maintainable.
- No guarantee that version range formatting or score sets match best‑practice CSAF authoring guidelines.
- Consumers should perform validation and apply domain-specific post‑processing before relying on the output.
Value of the prototype:
- Makes differences concrete by producing tangible CSAF documents from real GHSA examples.
- Highlights where additional metadata or tooling would be required for a robust pipeline.
- Serves as a foundation others can iterate on (fill gaps, strengthen mappings, add validation).
Both product_tree and vulnerabilities are optional in the CSAF 2.0 specification;
a syntactically valid document can consist solely of the mandatory document section.
For clarity and experimentation this prototype chooses to populate them when GHSA provides enough input.
Heuristic choices applied, e.g. ecosystem becomes a top-level language/category branch.
If these assumptions do not align with a consumer’s taxonomy strategy or introduce risk of misinterpretation, the generation of these sections can be skipped or pruned in the future—yielding a leaner CSAF advisory focused only on tracking metadata.
| Aspect | GHSA Source | CSAF Expectation | Result / Handling / Assumption |
|---|---|---|---|
| Acknowledgments | credits_detailed entries |
acknowledgments[] optional |
Mapped per entry: Names = user.login, Organization = organizations_url, URLs = html_url, Summary via credit type |
| Aggregate severity | severity string |
aggregate_severity.text |
Direct mapping to text; namespace omitted |
| Category | n/a | document.category required |
Fixed constant from config (Security Advisory) |
| CSAF version | n/a | csaf_version required |
Fixed to CSAF 2.0 |
| Distribution (TLP) | n/a | document.distribution |
Set TLP to White by default |
Language (lang) |
n/a | Optional | Default to en (GHSA does not provide language) |
| Notes | summary, description |
Optional notes[] |
Two notes created: Summary + Description |
| Publisher: Category | n/a | Category (e.g., coordinator/discovery/other) | Use Discoverer |
| Publisher: Issuing Authority | n/a | Issuer | Use GitHub |
| Publisher: Name | user.login, user.name |
Single name | Use login because it is always set |
| Publisher: Namespace | user.html_url |
URI/namespace | Use HTMLURL as namespace |
| Publisher: Contact details | user.html_url, optional user.email |
Optional contact string | Compose: URL: <html_url>; email: <email> if present |
| References | URLs in GHSA body | Optional references array | Not populated |
| Source language | n/a | Optional | Not populated |
| Title | summary |
Required | Use summary; nil if empty |
| Tracking: Aliases | identifiers[] |
Optional list | Map all GHSA identifiers to aliases |
| Tracking: ID | ghsa_id |
Required | Use GHSA ID |
| Tracking: Initial release | published_at |
Required (ISO 8601) | Use published_at |
| Tracking: Current release | updated_at if > published_at |
Required (ISO 8601) | Use updated_at if newer, else published_at |
| Tracking: Revision history | published_at, updated_at |
Required | Synthesized: 1 = published, 2 = updated (if newer); numbers via strconv.Itoa |
| Tracking: Status | n/a | Required | Fixed to final |
| Tracking: Version | n/a | Required | Length of revision history (as decimal string) |
| Digital signatures | n/a | Optional signing metadata | Not populated |
| Aspect | GHSA Source | CSAF Expectation | Result / Handling / Assumption |
|---|---|---|---|
| Branch hierarchy | Ecosystem → package name → vulnerable range | Hierarchical branches | Implemented as Language → ProductName → ProductVersionRange → Product. |
| Ecosystem category | package.ecosystem |
Category label | Mapped to Language category. Assumption: Language comprises programming language. Other categories possible, e.g. Vendor if it is GitHub or the Package Owner. |
| Product name display | package.name and repository path |
Full product name | Derived via getRepositoryName: third path segment (e.g., github.com/org/repo → repo), fallback to full package name. |
| Product ID | package.name |
Stable identifier | Use full package name as product_id. |
| Version range formatting | vulnerable_version_range |
Clean canonical ranges | Whitespace normalized with operator replacement; < and <= may be substituted with Unicode lookalikes to avoid JSON HTML escaping (see normalizeOperators). |
| Multi-product relationships | Multiple packages per advisory | Cross-product mapping | Each package is a separate branch; no merging across packages. |
| Full product names | Consolidated list | full_product_names[] |
Populated alongside branches for all products. |
| Aspect | GHSA Source | CSAF Expectation | Result / Handling / Assumption |
|---|---|---|---|
| Vulnerability count | Single advisory with multiple CWEs | One CWE per flaw | Single CSAF Vulnerability generated; CWEs reduced to the first (primary) one to respect 1:1 CWE constraint. |
| IDs | ghsa_id and cve_id |
Multiple identifiers | IDs[] includes GHSA ID; CVE set if provided. |
| CWE mapping | cwes[] |
One CWE per vulnerability | Map first CWE (id and name), omit others. |
| References | Advisory URL (html_url) |
Typed references | One external reference pointing to GHSA HTML page with summary "Advisory HTML URL". |
| Product status | Affected packages | known_affected, fixed, etc. |
All products derived from product tree marked as KnownAffected; no unaffected or fixed breakdown yet. |
| Scores (CVSS) | cvss_severities, cvss legacy |
CVSS3 with version | Prefer CVSS v3.1/v3.0 vectors; legacy CVSS used if v3 absent; unsupported vectors are skipped. Severity derived from base score. |
| Remediations | patched_versions per vulnerability |
Remediation entries | If patched versions exist, one VendorFix remediation with details "Upgrade to version: " and product IDs attached. |
| Discovery / release dates | Not distinct in GHSA | Optional fields | Omitted; document tracking covers publish/update. |
| Threats / VEX flags | Not present | Optional | Omitted. |
| Notes / Title | Advisory-level text | Optional per-vuln | Omitted to avoid duplication; document notes/title already describe the advisory. |
Not performed: semantic rewriting of version operators, inference of missing ecosystem data, enrichment of vendor/product naming conventions beyond what GHSA provides.
High-level pipeline:
- Input acquisition: GHSA advisory JSON loaded into internal model (
models/ghsa/...). - Conversion orchestrator (
service/converter/) assembles CSAF structures. - Product tree construction (
producttree.go): branches categorized (language → product → version range → product instance). - Vulnerability section (
vulnerabilities.go): maps GHSA vulnerabilities, identifiers, scores, references. - Revision history (
document.go): timestamps converted into sequential revision numbers (fixed from earlier control character issue by using proper integer → string conversion). - Tracking and metadata: populates minimal CSAF tracking fields from GHSA advisory context.
- Serialization: CSAF advisory saved (current path uses upstream library defaults; HTML escaping of
<may appear as\u003c).
Development notes:
- Project layout mirrors responsibility (converter, downloader, store, models, schemas).
- Utilities (
internal/utils/ref.go) help with pointer/value wrapping to reduce noise when assembling CSAF structs. - Tests exist for product tree and downloader components to validate structure and basic behaviors.
- Incremental enhancements can extend mapping coverage without breaking existing schema usage.
Prerequisites: Go ≥ 1.21.
Install dependencies:
go mod downloadRun converter (example):
go run ./cmd --input examples/repository_GHSA/GHSA-mh63-6h87-95cp.json --output out/csaf.json(Adjust flags according to your actual command interface; example placeholders.)
Inspect output:
cat out/csaf.json | jq '.'- GHSA advisory ID →
document.tracking.idand vulnerability IDs list. - CVE (if present) →
vulnerabilities[].cve. - Package ecosystem/name → Product tree branches.
- Severity / CVSS →
vulnerabilities[].scores[]with score type set appropriately. - References (URLs) →
vulnerabilities[].references[]. - Published / Updated timestamps →
document.tracking.revision_history[]entries. - Description / summary →
vulnerabilities[].notes[](if implemented; may be minimal).
Unsupported or partially mapped:
- Advisory aliases beyond CVE/GHSA ID (unless provided explicitly).
- Rich supplier, distributor, and release channel metadata.
- Full remediation guidance if GHSA lacks structured fix details.
We rely on the upstream gocsaf.SaveAdvisory function to serialize the CSAF advisory.
That helper internally creates its own json.Encoder with Go's default settings;
we cannot inject SetEscapeHTML(false).
As a consequence characters <, >, and & are HTML‑escaped in the emitted JSON (e.g. < becomes \u003c).
For version range fields this makes strict comparisons < and <= awkward.
To keep output ASCII‑only and avoid escaped sequences without changing the encoder, the converter now normalizes operators into descriptive English phrases:
<=→less or equal>=→greater or equal<→less than>→greater than
This sidesteps HTML escaping while preserving the intended comparison semantics in a human‑readable form. The trade‑off is that downstream tooling expecting literal operators will need to adapt.
A more robust long‑term solution would be either (a) bypassing gocsaf.SaveAdvisory and performing our own encoding with enc.SetEscapeHTML(false), or (b) post‑processing the JSON to unescape these characters.
See examples/ directory:
global_GHSA/GHSA-cpj6-fhp6-mr6j.json(global advisory input).repository_GHSA/GHSA-mh63-6h87-95cp.json(repository advisory input).repository_GHSA/csaf_example_output.json(sample converted CSAF output).
You can diff the input vs. output to observe:
- Product tree hierarchy creation.
- Revision history entries.
- Identifier and reference mappings.
| Symptom | Cause | Resolution |
|---|---|---|
Escaped < in version range |
Default JSON encoder HTML escape | Accept as-is or post-process; custom encoder if allowed. |
| Missing product branches | Empty vulnerabilities list in GHSA |
Validate input advisory content; ensure downloader acquired full data. |
| Lost CVSS vector | GHSA advisory lacks CVSS | No remediation; CSAF will omit score. |
| Unexpected whitespace in ranges | GHSA formatting quirks | Normalization collapses spaces automatically. |
Logging: converter emits structured logs (via slog) for save operations; enable debug verbosity if expanding.
- Fix issues
- Perform extensive review
- Check global GHSA
- Check hidden GHSA (requires authentication probably)
- Add CLI functionality & configuration options
- Add more tests & validation
Licensed under the terms in LICENSE (refer to file). Built upon:
github.com/gocsaf/csaffor CSAF model structures.- GitHub Security Advisory data as source material.
Contributions, issues, and suggestions are welcome.