Optimize SBOM Fetching and Processing to Reduce Memory Usage #75

viveksahu26 · 2025-03-06T18:18:52Z

Problem

The current SBOM (Software Bill of Materials) fetching and processing in our tool consumes excessive memory:

The FetchSBOMs method preloads all SBOMs into memory, which would becomes inefficient with large SBOMs.
The sbomProcessing function holds both original and converted SBOMs in memory at the same time, doubling memory usage.
The upload step keeps all SBOMs in memory until the entire process finishes.
If something went wrong while fetching, it throws an error, and restart fetching from very beginning.

This inefficiency could lead to cause performance degradation or crashes when processing large numbers of repositories or SBOMs due to memory exhaustion.

For example: it fetches 100 sbom at a time, then process all at once, and then upload it all at once.

Suggested Solution

Refactoring the pipeline to handle SBOMs one at a time using lazy loading and on-demand processing. Although it is kind of implemented while uploading an SBOMs, but still deal with accumlated SBOMs in memory.

For example:
sequential: it fetches 1 sbom(out of 100) at a time, then process one at once, and then upload one at once. And contiue to second one.
Parallel: we can run 3-4 process concurrently.

fetch --> process --> upload (1 SBOM) and then repeat for next SBOM and conitnue.
So, our motive to be, Get Metadata of the SBOM(like filename, download URL, size, etc) --> Wile processing download fetch one SBOM --> Then it downloads the SBOM, and SBOM is processed and converted --> and then converted SBOM being passed to upload SBOM.

In detail Implementation:

Get SBOMs One by One: The tool will only pick up what it needs to find each SBOM (like where it’s located) without grabbing everything right away.
Handle Each SBOM Separately: It will process one SBOM at a time—doing whatever needs to be done, like converting it—without holding onto the others.
Send It Off Quickly: As soon as an SBOM is ready, the tool will upload it to where it needs to go and then move on to the next one, keeping memory free.

viveksahu26 · 2025-03-07T08:08:45Z

Key Findings

Case 1 (folder to dtrack): total sbom 100 with total size 77 MB

$ du -sh /home/linuzz/Downloads/sboms_cdx 
77M	/home/linuzz/Downloads/sboms_cdx

$ ls /home/linuzz/Downloads/sboms_cdx  | wc -l
100

$ go run main.go transfer --input-adapter=folder --in-folder-path="/home/linuzz/Downloads/sboms_cdx"  --output-adapter=dtrack --out-dtrack-url="http://localhost:8081" --out-dtrack-project-name="benchmark_cdx_3"

When Uploading 100 sboms of cdx type of total size 77 MB from folder, it consumes 219 MB of memory
Rises again to ~219 MB, with multiple ups and downs (e.g., ~98 MB, ~123 MB, ~151 MB).

Case 2 (folder to dtrack): total sbom 300 with total size 116 MB

$ ls -al  /home/linuzz/Downloads/sboms_cdx  | wc -l
300

$ du -sh /home/linuzz/Downloads/sboms_cdx     
116M	/home/linuzz/Downloads/sboms_cdx

$ go run main.go transfer --input-adapter=folder --in-folder-path="/home/linuzz/Downloads/sboms_cdx"  --output-adapter=dtrack --out-dtrack-url="http://localhost:8081" --out-dtrack-project-name="benchmark_cdx_116mb"

When Uploading 300 sboms of cdx type of total size 116 MB from folder, it consumes 249 MB memory
The final usage of ~236 MB and fluctuations (e.g., dropping to ~142.7 MB then rising again)

Case 3 (github --> dtrack)

`API method` to fetch SBOMs from all `sigstore` repos except `docs`

$ go run main.go transfer --input-adapter=github --in-github-url="https://github.com/sigstore/"  --in-github-exclude-repos=docs --in-github-method=api --output-adapter=dtrack --out-dtrack-url="http://localhost:8081/api/v1" --out-dtrack-project-name="benchmark_api"

Peak Memory: ~6.5 MB
Total 29 SBOMs
Total size 2.7 MB

`Release methods` to fetch SBOMs from all `sigstore` repos except `docs`

$ go run main.go transfer --input-adapter=github --in-github-url="https://github.com/sigstore/"  --in-github-exclude-repos=docs --in-github-method=release --output-adapter=dtrack --out-dtrack-url="http://localhost:8081/api/v1" --out-dtrack-project-name="benchmark_release"

Peak Memory: ~21.6 MB
Total 29 SBOMs
Total size 8.9 MB

`Tool methods` to fetch SBOMs from all `sigstore` repos except `docs`

$ go run main.go transfer --input-adapter=github --in-github-url="https://github.com/sigstore/"  --in-github-exclude-repos=docs --in-github-method=tool --output-adapter=dtrack --out-dtrack-url="http://localhost:8081/api/v1" --out-dtrack-project-name="benchmark_tool"

Peak Memory: ~11.5 MB
Total 29 SBOMs
Total size 7.3 MB

viveksahu26 self-assigned this Mar 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize SBOM Fetching and Processing to Reduce Memory Usage #75

Optimize SBOM Fetching and Processing to Reduce Memory Usage #75

viveksahu26 commented Mar 6, 2025

viveksahu26 commented Mar 7, 2025

Optimize SBOM Fetching and Processing to Reduce Memory Usage #75

Optimize SBOM Fetching and Processing to Reduce Memory Usage #75

Comments

viveksahu26 commented Mar 6, 2025

Problem

Suggested Solution

viveksahu26 commented Mar 7, 2025

Key Findings

Case 1 (folder to dtrack): total sbom 100 with total size 77 MB

Case 2 (folder to dtrack): total sbom 300 with total size 116 MB

Case 3 (github --> dtrack)

API method to fetch SBOMs from all sigstore repos except docs

Release methods to fetch SBOMs from all sigstore repos except docs

Tool methods to fetch SBOMs from all sigstore repos except docs

`API method` to fetch SBOMs from all `sigstore` repos except `docs`

`Release methods` to fetch SBOMs from all `sigstore` repos except `docs`

`Tool methods` to fetch SBOMs from all `sigstore` repos except `docs`