Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize SBOM Fetching and Processing to Reduce Memory Usage #75

Open
viveksahu26 opened this issue Mar 6, 2025 · 1 comment
Open
Assignees

Comments

@viveksahu26
Copy link
Contributor

Problem

The current SBOM (Software Bill of Materials) fetching and processing in our tool consumes excessive memory:

  • The FetchSBOMs method preloads all SBOMs into memory, which would becomes inefficient with large SBOMs.
  • The sbomProcessing function holds both original and converted SBOMs in memory at the same time, doubling memory usage.
  • The upload step keeps all SBOMs in memory until the entire process finishes.
  • If something went wrong while fetching, it throws an error, and restart fetching from very beginning.

This inefficiency could lead to cause performance degradation or crashes when processing large numbers of repositories or SBOMs due to memory exhaustion.

For example: it fetches 100 sbom at a time, then process all at once, and then upload it all at once.

Suggested Solution

Refactoring the pipeline to handle SBOMs one at a time using lazy loading and on-demand processing. Although it is kind of implemented while uploading an SBOMs, but still deal with accumlated SBOMs in memory.

For example:
sequential: it fetches 1 sbom(out of 100) at a time, then process one at once, and then upload one at once. And contiue to second one.
Parallel: we can run 3-4 process concurrently.

fetch --> process --> upload (1 SBOM) and then repeat for next SBOM and conitnue.
So, our motive to be, Get Metadata of the SBOM(like filename, download URL, size, etc) --> Wile processing download fetch one SBOM --> Then it downloads the SBOM, and SBOM is processed and converted --> and then converted SBOM being passed to upload SBOM.

In detail Implementation:

  • Get SBOMs One by One: The tool will only pick up what it needs to find each SBOM (like where it’s located) without grabbing everything right away.
  • Handle Each SBOM Separately: It will process one SBOM at a time—doing whatever needs to be done, like converting it—without holding onto the others.
  • Send It Off Quickly: As soon as an SBOM is ready, the tool will upload it to where it needs to go and then move on to the next one, keeping memory free.
@viveksahu26 viveksahu26 self-assigned this Mar 6, 2025
@viveksahu26
Copy link
Contributor Author

Key Findings

Case 1 (folder to dtrack): total sbom 100 with total size 77 MB

$ du -sh /home/linuzz/Downloads/sboms_cdx 
77M	/home/linuzz/Downloads/sboms_cdx

$ ls /home/linuzz/Downloads/sboms_cdx  | wc -l
100

$ go run main.go transfer --input-adapter=folder --in-folder-path="/home/linuzz/Downloads/sboms_cdx"  --output-adapter=dtrack --out-dtrack-url="http://localhost:8081" --out-dtrack-project-name="benchmark_cdx_3"
  • When Uploading 100 sboms of cdx type of total size 77 MB from folder, it consumes 219 MB of memory
  • Rises again to ~219 MB, with multiple ups and downs (e.g., ~98 MB, ~123 MB, ~151 MB).

Case 2 (folder to dtrack): total sbom 300 with total size 116 MB

$ ls -al  /home/linuzz/Downloads/sboms_cdx  | wc -l
300

$ du -sh /home/linuzz/Downloads/sboms_cdx     
116M	/home/linuzz/Downloads/sboms_cdx

$ go run main.go transfer --input-adapter=folder --in-folder-path="/home/linuzz/Downloads/sboms_cdx"  --output-adapter=dtrack --out-dtrack-url="http://localhost:8081" --out-dtrack-project-name="benchmark_cdx_116mb"
  • When Uploading 300 sboms of cdx type of total size 116 MB from folder, it consumes 249 MB memory
  • The final usage of ~236 MB and fluctuations (e.g., dropping to ~142.7 MB then rising again)

Case 3 (github --> dtrack)

API method to fetch SBOMs from all sigstore repos except docs

$ go run main.go transfer --input-adapter=github --in-github-url="https://github.com/sigstore/"  --in-github-exclude-repos=docs --in-github-method=api --output-adapter=dtrack --out-dtrack-url="http://localhost:8081/api/v1" --out-dtrack-project-name="benchmark_api"
  • Peak Memory: ~6.5 MB
  • Total 29 SBOMs
  • Total size 2.7 MB

Release methods to fetch SBOMs from all sigstore repos except docs

$ go run main.go transfer --input-adapter=github --in-github-url="https://github.com/sigstore/"  --in-github-exclude-repos=docs --in-github-method=release --output-adapter=dtrack --out-dtrack-url="http://localhost:8081/api/v1" --out-dtrack-project-name="benchmark_release"
  • Peak Memory: ~21.6 MB
  • Total 29 SBOMs
  • Total size 8.9 MB

Tool methods to fetch SBOMs from all sigstore repos except docs

$ go run main.go transfer --input-adapter=github --in-github-url="https://github.com/sigstore/"  --in-github-exclude-repos=docs --in-github-method=tool --output-adapter=dtrack --out-dtrack-url="http://localhost:8081/api/v1" --out-dtrack-project-name="benchmark_tool"
  • Peak Memory: ~11.5 MB
  • Total 29 SBOMs
  • Total size 7.3 MB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant