Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 94 additions & 2 deletions docs/generic.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Generic fetcher

- [Specifying artifacts to fetch](#specifying-artifacts-to-fetch)
- [Authentication](#authentication)
- [Using fetched dependencies](#using-fetched-dependencies)
- [Full example walkthrough](#example)

Expand All @@ -24,12 +25,13 @@ Below are sections for each type of supported artifact. Several artifacts of
different types can be specified in a single lockfile.

The lockfile must always contain a `metadata` header and a list of `artifacts`.
Currently, the only supported version is 1.0:
Supported versions are 1.0 and 2.0. Version 2.0 is required when using
[authentication](#authentication):

```yaml
---
metadata:
version: "1.0"
version: "1.0" # or "2.0" for auth support
artifacts: []
```

Expand Down Expand Up @@ -129,6 +131,96 @@ These files will be reported with `pkg:maven` purl in the output SBOM, because
the URL is fully assembled from the provided attributes and therefore the file
can be assumed to be a maven artifact.

## Authentication

The generic fetcher supports per-artifact authentication for downloading from
private repositories and registries. Authentication requires lockfile version
`"2.0"`. Artifacts without `auth` do not require authentication and will still
use `.netrc` credentials if available.

### Auth types

Each artifact can specify an `auth` block with exactly one auth type.

#### Bearer token

Header-based token authentication. Supports most platforms (GitHub, GitLab,
Gitea, JFrog Artifactory, etc.).

| Field | Required | Description |
|----------|----------|-----------------------------------------------------------------------------|
| `header` | No | HTTP header name. Defaults to `Authorization` |
| `value` | Yes | Header value, supports `$VAR` / `${VAR}` environment variable interpolation |

#### HTTP Basic

Username and password authentication encoded as a Base64 `Authorization` header.

| Field | Required | Description |
|------------|----------|----------------------------------------------------|
| `username` | Yes | Username, supports `$VAR` / `${VAR}` interpolation |
| `password` | Yes | Password, supports `$VAR` / `${VAR}` interpolation |

### Environment variable interpolation

Secret values should be provided via environment variables using `$VAR` or
`${VAR}` syntax. Hermeto will fail with a clear error if any referenced variable
is not set. Use `\$` for a literal dollar sign.

### Examples

**GitLab** (custom `PRIVATE-TOKEN` header):

```yaml
metadata:
version: "2.0"
artifacts:
- download_url: "https://gitlab.example.com/api/v4/projects/123/repository/archive.tar.gz"
checksum: "sha256:abc123..."
auth:
bearer:
header: PRIVATE-TOKEN
value: "$GITLAB_TOKEN"
```

**GitHub** (standard Bearer token):

```yaml
metadata:
version: "2.0"
artifacts:
- download_url: "https://api.github.com/repos/owner/repo/tarball/v1.0.0"
checksum: "sha256:abc123..."
auth:
bearer:
value: "Bearer $GITHUB_TOKEN"
```

**Mixed** (authenticated and public artifacts):

```yaml
metadata:
version: "2.0"
artifacts:
- download_url: "https://gitlab.example.com/api/v4/projects/123/repository/archive.tar.gz"
checksum: "sha256:..."
auth:
bearer:
header: PRIVATE-TOKEN
value: "$GITLAB_TOKEN"

- download_url: "https://example.com/public-file.zip"
checksum: "sha256:..."
```

Then run hermeto with the required environment variables set:

```shell
export GITLAB_TOKEN="glpat-xxxxxxxxxxxxxxxxxxxx"
export GITHUB_TOKEN="github_pat_xxxxxxxxxxxxxxxxxxxxx"
hermeto fetch-deps generic
```

## Using fetched dependencies

Hermeto downloads the files into the `deps/generic/` subpath of the output
Expand Down
12 changes: 6 additions & 6 deletions hermeto/core/package_managers/general.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ async def _async_download_binary_file(
session: aiohttp_retry.RetryClient,
url: str,
download_path: StrPath,
auth: aiohttp.BasicAuth | None = None,
headers: dict[str, str] | None = None,
ssl_context: ssl.SSLContext | None = None,
chunk_size: int = 8192,
) -> None:
Expand All @@ -95,7 +95,7 @@ async def _async_download_binary_file(
async with session.get(
url,
timeout=timeout,
auth=auth,
headers=headers,
raise_for_status=True,
ssl=ssl_context,
) as resp:
Expand All @@ -120,14 +120,14 @@ async def async_download_files(
files_to_download: Mapping[str, StrPath],
concurrency_limit: int,
ssl_context: ssl.SSLContext | None = None,
auth: aiohttp.BasicAuth | None = None,
headers: Mapping[str, dict[str, str]] | None = None,
) -> None:
"""Asynchronous function to download files.

:param files_to_download: Mapping of URLs and file paths to download.
:param files_to_download: Mapping of URLs to file paths to download.
:param concurrency_limit: Max number of concurrent tasks (downloads).
:param ssl_context: Optional SSL context for the requests.
:param auth: Optional authorization data for proxies.
:param headers: Optional per-URL headers mapping (URL -> headers dict).
"""
trace_config = aiohttp.TraceConfig()
num_attempts: int = int(DEFAULT_RETRY_OPTIONS["total"])
Expand Down Expand Up @@ -171,7 +171,7 @@ async def async_download_files(
url,
download_path,
ssl_context=ssl_context,
auth=auth,
headers=headers.get(url) if headers else None,
)
)
)
Expand Down
28 changes: 21 additions & 7 deletions hermeto/core/package_managers/generic/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,16 @@
from hermeto.core.models.output import RequestOutput
from hermeto.core.models.sbom import Component, create_backend_annotation
from hermeto.core.package_managers.general import async_download_files
from hermeto.core.package_managers.generic.models import GenericLockfileV1
from hermeto.core.package_managers.generic.models import GenericLockfile, GenericLockfileAdapter
from hermeto.core.rooted_path import RootedPath

log = logging.getLogger(__name__)
DEFAULT_LOCKFILE_NAME = "artifacts.lock.yaml"
DEFAULT_DEPS_DIR = "deps/generic"

Url = str
AuthHeaders = dict[str, str]


def fetch_generic_source(request: Request) -> RequestOutput:
"""
Expand Down Expand Up @@ -82,22 +85,33 @@ def _resolve_generic_lockfile(lockfile_path: Path, output_dir: RootedPath) -> li

log.info(f"Reading generic lockfile: {lockfile_path}")
lockfile = _load_lockfile(lockfile_path, output_dir)
to_download: dict[str, str | os.PathLike[str]] = {}
to_download: dict[Url, str | os.PathLike[str]] = {}
auth_headers: dict[Url, AuthHeaders] = {}

for artifact in lockfile.artifacts:
# create the parent directory for the artifact
Path.mkdir(Path(artifact.filename).parent, parents=True, exist_ok=True)
to_download[str(artifact.download_url)] = artifact.filename

asyncio.run(async_download_files(to_download, get_config().runtime.concurrency_limit))
url = str(artifact.download_url)
to_download[url] = artifact.filename
auth = getattr(artifact, "auth", None)
if auth:
auth_headers[url] = auth.get_headers()

asyncio.run(
async_download_files(
to_download,
get_config().runtime.concurrency_limit,
headers=auth_headers or None,
)
)

# verify checksums
for artifact in lockfile.artifacts:
must_match_any_checksum(artifact.filename, [artifact.formatted_checksum])
return [artifact.get_sbom_component() for artifact in lockfile.artifacts]


def _load_lockfile(lockfile_path: Path, output_dir: RootedPath) -> GenericLockfileV1:
def _load_lockfile(lockfile_path: Path, output_dir: RootedPath) -> GenericLockfile:
"""
Load the generic lockfile from the given path.

Expand All @@ -115,7 +129,7 @@ def _load_lockfile(lockfile_path: Path, output_dir: RootedPath) -> GenericLockfi
)

try:
lockfile = GenericLockfileV1.model_validate(
lockfile = GenericLockfileAdapter.validate_python(
lockfile_data, context={"output_dir": output_dir}
)
except ValidationError as e:
Expand Down
Loading
Loading