You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current SBOM (Software Bill of Materials) fetching and processing in our tool consumes excessive memory:
The FetchSBOMs method preloads all SBOMs into memory, which would becomes inefficient with large SBOMs.
The sbomProcessing function holds both original and converted SBOMs in memory at the same time, doubling memory usage.
The upload step keeps all SBOMs in memory until the entire process finishes.
If something went wrong while fetching, it throws an error, and restart fetching from very beginning.
This inefficiency could lead to cause performance degradation or crashes when processing large numbers of repositories or SBOMs due to memory exhaustion.
For example: it fetches 100 sbom at a time, then process all at once, and then upload it all at once.
Suggested Solution
Refactoring the pipeline to handle SBOMs one at a time using lazy loading and on-demand processing. Although it is kind of implemented while uploading an SBOMs, but still deal with accumlated SBOMs in memory.
For example:
sequential: it fetches 1 sbom(out of 100) at a time, then process one at once, and then upload one at once. And contiue to second one.
Parallel: we can run 3-4 process concurrently.
fetch --> process --> upload (1 SBOM) and then repeat for next SBOM and conitnue.
So, our motive to be, Get Metadata of the SBOM(like filename, download URL, size, etc) --> Wile processing download fetch one SBOM --> Then it downloads the SBOM, and SBOM is processed and converted --> and then converted SBOM being passed to upload SBOM.
In detail Implementation:
Get SBOMs One by One: The tool will only pick up what it needs to find each SBOM (like where it’s located) without grabbing everything right away.
Handle Each SBOM Separately: It will process one SBOM at a time—doing whatever needs to be done, like converting it—without holding onto the others.
Send It Off Quickly: As soon as an SBOM is ready, the tool will upload it to where it needs to go and then move on to the next one, keeping memory free.
The text was updated successfully, but these errors were encountered:
Case 1 (folder to dtrack): total sbom 100 with total size 77 MB
$ du -sh /home/linuzz/Downloads/sboms_cdx
77M /home/linuzz/Downloads/sboms_cdx
$ ls /home/linuzz/Downloads/sboms_cdx | wc -l
100
$ go run main.go transfer --input-adapter=folder --in-folder-path="/home/linuzz/Downloads/sboms_cdx" --output-adapter=dtrack --out-dtrack-url="http://localhost:8081" --out-dtrack-project-name="benchmark_cdx_3"
When Uploading 100 sboms of cdx type of total size 77 MB from folder, it consumes 219 MB of memory
Rises again to ~219 MB, with multiple ups and downs (e.g., ~98 MB, ~123 MB, ~151 MB).
Case 2 (folder to dtrack): total sbom 300 with total size 116 MB
$ ls -al /home/linuzz/Downloads/sboms_cdx | wc -l
300
$ du -sh /home/linuzz/Downloads/sboms_cdx
116M /home/linuzz/Downloads/sboms_cdx
$ go run main.go transfer --input-adapter=folder --in-folder-path="/home/linuzz/Downloads/sboms_cdx" --output-adapter=dtrack --out-dtrack-url="http://localhost:8081" --out-dtrack-project-name="benchmark_cdx_116mb"
When Uploading 300 sboms of cdx type of total size 116 MB from folder, it consumes 249 MB memory
The final usage of ~236 MB and fluctuations (e.g., dropping to ~142.7 MB then rising again)
Case 3 (github --> dtrack)
API method to fetch SBOMs from all sigstore repos except docs
$ go run main.go transfer --input-adapter=github --in-github-url="https://github.com/sigstore/" --in-github-exclude-repos=docs --in-github-method=api --output-adapter=dtrack --out-dtrack-url="http://localhost:8081/api/v1" --out-dtrack-project-name="benchmark_api"
Peak Memory: ~6.5 MB
Total 29 SBOMs
Total size 2.7 MB
Release methods to fetch SBOMs from all sigstore repos except docs
$ go run main.go transfer --input-adapter=github --in-github-url="https://github.com/sigstore/" --in-github-exclude-repos=docs --in-github-method=release --output-adapter=dtrack --out-dtrack-url="http://localhost:8081/api/v1" --out-dtrack-project-name="benchmark_release"
Peak Memory: ~21.6 MB
Total 29 SBOMs
Total size 8.9 MB
Tool methods to fetch SBOMs from all sigstore repos except docs
$ go run main.go transfer --input-adapter=github --in-github-url="https://github.com/sigstore/" --in-github-exclude-repos=docs --in-github-method=tool --output-adapter=dtrack --out-dtrack-url="http://localhost:8081/api/v1" --out-dtrack-project-name="benchmark_tool"
Problem
The current SBOM (Software Bill of Materials) fetching and processing in our tool consumes excessive memory:
This inefficiency could lead to cause performance degradation or crashes when processing large numbers of repositories or SBOMs due to memory exhaustion.
For example: it fetches 100 sbom at a time, then process all at once, and then upload it all at once.
Suggested Solution
Refactoring the pipeline to handle SBOMs one at a time using lazy loading and on-demand processing. Although it is kind of implemented while uploading an SBOMs, but still deal with accumlated SBOMs in memory.
For example:
sequential: it fetches 1 sbom(out of 100) at a time, then process one at once, and then upload one at once. And contiue to second one.
Parallel: we can run 3-4 process concurrently.
fetch --> process --> upload (1 SBOM) and then repeat for next SBOM and conitnue.
So, our motive to be, Get Metadata of the SBOM(like filename, download URL, size, etc) --> Wile processing download fetch one SBOM --> Then it downloads the SBOM, and SBOM is processed and converted --> and then converted SBOM being passed to upload SBOM.
In detail Implementation:
The text was updated successfully, but these errors were encountered: