Skip to content
This repository was archived by the owner on Jun 2, 2026. It is now read-only.

refactor: Split component manager registry wiring#530

Merged
jw-nvidia merged 1 commit into
NVIDIA:mainfrom
jw-nvidia:refactor/cm-providers
May 15, 2026
Merged

refactor: Split component manager registry wiring#530
jw-nvidia merged 1 commit into
NVIDIA:mainfrom
jw-nvidia:refactor/cm-providers

Conversation

@jw-nvidia

Copy link
Copy Markdown
Contributor

Description

  • Separate component manager descriptors into a catalog package and split registry construction into manager, factory spec, and registry files.
  • Consolidate builtin service setup around a manifest, provider registry construction, and focused tests.

Type of Change

  • Feature - New feature or functionality (feat:)
  • Fix - Bug fixes (fix:)
  • Chore - Modification or removal of existing functionality (chore:)
  • Refactor - Refactoring of existing functionality (refactor:)
  • Docs - Changes in documentation or OpenAPI schema (docs:)
  • CI - Changes in GitHub workflows. Requires additional scrutiny (ci:)
  • Version - Issuing a new release version (version:)

Services Affected

  • API - API models or endpoints updated
  • Workflow - Workflow service updated
  • DB - DB DAOs or migrations updated
  • Site Manager - Site Manager updated
  • Cert Manager - Cert Manager updated
  • Site Agent - Site Agent updated
  • Flow - Flow service updated
  • Powershelf Manager - Powershelf Manager updated
  • NVSwitch Manager - NVSwitch Manager updated

Related Issues (Optional)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

@jw-nvidia jw-nvidia requested a review from a team as a code owner May 14, 2026 20:18
@copy-pr-bot

copy-pr-bot Bot commented May 14, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented May 14, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 68863265-299c-4f3b-bf99-f0b9450b8915

📥 Commits

Reviewing files that changed from the base of the PR and between 1c6e1f2 and 217bff5.

📒 Files selected for processing (36)
  • flow/cmd/serve.go
  • flow/docs/component-manager-architecture.md
  • flow/docs/flow-architecture.md
  • flow/internal/task/componentmanager/builtin/builtin_test.go
  • flow/internal/task/componentmanager/builtin/component_manager_factories.go
  • flow/internal/task/componentmanager/builtin/config.go
  • flow/internal/task/componentmanager/builtin/config_test.go
  • flow/internal/task/componentmanager/builtin/helpers.go
  • flow/internal/task/componentmanager/builtin/manifest.go
  • flow/internal/task/componentmanager/builtin/provider_config_decoders.go
  • flow/internal/task/componentmanager/builtin/provider_config_decoders_test.go
  • flow/internal/task/componentmanager/builtin/setup.go
  • flow/internal/task/componentmanager/catalog/catalog.go
  • flow/internal/task/componentmanager/catalog/catalog_test.go
  • flow/internal/task/componentmanager/catalog/errors.go
  • flow/internal/task/componentmanager/componentmanager.go
  • flow/internal/task/componentmanager/componentmanager_test.go
  • flow/internal/task/componentmanager/compute/nico/nico.go
  • flow/internal/task/componentmanager/config/config.go
  • flow/internal/task/componentmanager/config/config_test.go
  • flow/internal/task/componentmanager/config/doc.go
  • flow/internal/task/componentmanager/config/errors.go
  • flow/internal/task/componentmanager/config/yaml.go
  • flow/internal/task/componentmanager/errors.go
  • flow/internal/task/componentmanager/factory_spec.go
  • flow/internal/task/componentmanager/factory_spec_test.go
  • flow/internal/task/componentmanager/manager.go
  • flow/internal/task/componentmanager/mock/mock.go
  • flow/internal/task/componentmanager/nvlswitch/nico/nico.go
  • flow/internal/task/componentmanager/nvlswitch/nvswitchmanager/nvswitchmanager.go
  • flow/internal/task/componentmanager/powershelf/nico/nico.go
  • flow/internal/task/componentmanager/powershelf/psm/psm.go
  • flow/internal/task/componentmanager/registry.go
  • flow/internal/task/componentmanager/registry_test.go
  • flow/internal/task/componentmanager/test_helpers_test.go
  • flow/internal/task/executor/temporalworkflow/activity/activity_test.go
💤 Files with no reviewable changes (8)
  • flow/internal/task/componentmanager/builtin/provider_config_decoders.go
  • flow/internal/task/componentmanager/builtin/provider_config_decoders_test.go
  • flow/internal/task/componentmanager/builtin/config.go
  • flow/internal/task/componentmanager/componentmanager_test.go
  • flow/internal/task/componentmanager/componentmanager.go
  • flow/internal/task/componentmanager/builtin/component_manager_factories.go
  • flow/internal/task/componentmanager/builtin/config_test.go
  • flow/internal/task/componentmanager/config/doc.go
✅ Files skipped from review due to trivial changes (2)
  • flow/internal/task/executor/temporalworkflow/activity/activity_test.go
  • flow/docs/flow-architecture.md
🚧 Files skipped from review as they are similar to previous changes (24)
  • flow/internal/task/componentmanager/powershelf/nico/nico.go
  • flow/internal/task/componentmanager/manager.go
  • flow/internal/task/componentmanager/test_helpers_test.go
  • flow/internal/task/componentmanager/factory_spec.go
  • flow/internal/task/componentmanager/powershelf/psm/psm.go
  • flow/internal/task/componentmanager/catalog/errors.go
  • flow/internal/task/componentmanager/config/yaml.go
  • flow/internal/task/componentmanager/builtin/setup.go
  • flow/internal/task/componentmanager/registry.go
  • flow/internal/task/componentmanager/builtin/helpers.go
  • flow/internal/task/componentmanager/nvlswitch/nvswitchmanager/nvswitchmanager.go
  • flow/docs/component-manager-architecture.md
  • flow/internal/task/componentmanager/nvlswitch/nico/nico.go
  • flow/internal/task/componentmanager/factory_spec_test.go
  • flow/internal/task/componentmanager/mock/mock.go
  • flow/internal/task/componentmanager/config/errors.go
  • flow/internal/task/componentmanager/catalog/catalog_test.go
  • flow/internal/task/componentmanager/catalog/catalog.go
  • flow/internal/task/componentmanager/config/config.go
  • flow/internal/task/componentmanager/config/config_test.go
  • flow/internal/task/componentmanager/compute/nico/nico.go
  • flow/internal/task/componentmanager/errors.go
  • flow/internal/task/componentmanager/registry_test.go
  • flow/internal/task/componentmanager/builtin/builtin_test.go

Summary by CodeRabbit

  • Bug Fixes

    • Improved provider and component manager initialization with stronger validation and clearer error reporting for missing or mismatched configurations.
  • Refactor

    • Reorganized component manager system to separate static descriptors from runtime factory specifications and to centralize registry construction.
  • Documentation

    • Updated architecture docs to reflect descriptor-based metadata and the new catalog/registry patterns.
  • Tests

    • Expanded coverage for catalog, registry, provider decoding, and config validation.

Walkthrough

Introduce a normalized component-manager catalog and typed errors, split static descriptors from runtime FactorySpec/factories, add a concurrency-safe Registry, wire builtin provider/manager initialization (config, decoder registry, provider registry, factory specs), update implementations and tests, and adjust serve initialization to use the new builtin provider registry.

Changes

Component Manager Catalog and Runtime Wiring

Layer / File(s) Summary
Catalog implementation and errors
flow/internal/task/componentmanager/catalog/catalog.go, flow/internal/task/componentmanager/catalog/errors.go
Add cmcatalog.Descriptor and cmcatalog.Catalog with normalization, selection, indexing, deterministic listing, and typed sentinel/structured errors.
Catalog unit tests
flow/internal/task/componentmanager/catalog/catalog_test.go
Comprehensive tests for normalization, equality, indexing, selection, copy/immutability, duplicate rejection, and typed error payloads.
FactorySpec types and selection
flow/internal/task/componentmanager/factory_spec.go
Introduce ManagerFactory and FactorySpec, normalize factory specs, and implement selectFactorySpecs to produce selected descriptors and factories mapping.
FactorySpec tests
flow/internal/task/componentmanager/factory_spec_test.go
Tests for selection behavior, empty selection, invalid factory specs, duplicate descriptor detection, and unknown implementation errors.
Public componentmanager interfaces
flow/internal/task/componentmanager/manager.go
Define exported ComponentManager, BringUpController, and FirmwareConsistencyChecker interfaces; switch to Descriptor() contract.
Registry implementation & tests
flow/internal/task/componentmanager/registry.go, flow/internal/task/componentmanager/registry_test.go
Concurrency-safe Registry, NewRegistry and createManager enforcing descriptor equality and manager creation, and extensive unit tests for initialization and lookup error paths.
Test helpers
flow/internal/task/componentmanager/test_helpers_test.go
Add test manager helper and factory helpers for building descriptors and FactorySpecs in tests.
Builtin manifest, helpers and setup
flow/internal/task/componentmanager/builtin/manifest.go, .../builtin/helpers.go, .../builtin/setup.go
Provide builtin descriptors, provider decoders, factory-spec builders, LoadConfig, NewProviderRegistry, and NewComponentManagerRegistry wiring.
Builtin tests
flow/internal/task/componentmanager/builtin/builtin_test.go
Tests for builtin defaults, config loading, provider registry construction and error cases, and decoder-registry assertions.
Config parsing/validation and errors
flow/internal/task/componentmanager/config/*.go, flow/internal/task/componentmanager/config/*_test.go
Change config completion and validation to use cmcatalog.Catalog.SelectedDescriptors; update signatures to accept the catalog; extend/alias error types for provider/manager identity context; update tests accordingly.
Implementation adapters
flow/internal/task/componentmanager/*/*, .../mock/mock.go
Update compute, nvlswitch, powershelf, psm, and mock implementations to export Descriptor() cmcatalog.Descriptor and FactorySpec() helpers; replace Type() with Descriptor() on Managers.
Docs
flow/docs/component-manager-architecture.md, flow/docs/flow-architecture.md
Replace Type() with Descriptor() in examples and document separation of catalog descriptors from runtime FactorySpec/factories and new internal layout.
Serve path
flow/cmd/serve.go
Remove local provider-registry init helper and call cmbuiltin.NewProviderRegistry(ctx, cmConfig) during serve initialization; drop unused provider import.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 34.65% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'refactor: Split component manager registry wiring' directly reflects the changeset's primary objective: separating component manager registry construction into distinct, composable units via catalog and factory spec patterns.
Description check ✅ Passed The description comprehensively addresses the PR's refactoring scope—catalog extraction, registry wiring restructuring, builtin manifest consolidation, and test additions—with explicit type classification (Refactor) and service impact (Flow), accurately representing the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
flow/internal/task/componentmanager/builtin/builtin_test.go (1)

96-105: ⚡ Quick win

Prefer implementation constants for component-manager assertions.

These assertions validate manager implementation selection but compare against provider-name constants. Using implementation constants keeps the test aligned with the actual contract (ComponentManagers maps component type → implementation name).

Proposed refactor
- assert.Equal(t, nicoprovider.ProviderName, componentManagers[devicetypes.ComponentTypeCompute])
- assert.Equal(t, nicoprovider.ProviderName, componentManagers[devicetypes.ComponentTypeNVLSwitch])
- assert.Equal(t, nicoprovider.ProviderName, componentManagers[devicetypes.ComponentTypePowerShelf])
+ assert.Equal(t, computenico.ImplementationName, componentManagers[devicetypes.ComponentTypeCompute])
+ assert.Equal(t, nvlswitchnico.ImplementationName, componentManagers[devicetypes.ComponentTypeNVLSwitch])
+ assert.Equal(t, powershelfnico.ImplementationName, componentManagers[devicetypes.ComponentTypePowerShelf])

  componentManagers[devicetypes.ComponentTypeCompute] = "mutated"
  assert.Equal(
      t,
-     nicoprovider.ProviderName,
+     computenico.ImplementationName,
      defaultServiceComponentManagers()[devicetypes.ComponentTypeCompute],
  )
- assert.Equal(t, nicoprovider.ProviderName, config.ComponentManagers[devicetypes.ComponentTypeCompute])
+ assert.Equal(t, computenico.ImplementationName, config.ComponentManagers[devicetypes.ComponentTypeCompute])

Also applies to: 167-167

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@flow/internal/task/componentmanager/builtin/builtin_test.go` around lines 96
- 105, The test is asserting component manager selections against provider-name
constants (nicoprovider.ProviderName); change those assertions to use the
provider's implementation constant instead (the implementation-name constant
exported by nicoprovider) so the test verifies the ComponentManagers mapping
maps component types to implementation names; update all assertions referencing
nicoprovider.ProviderName (including the mutation/assertion of
defaultServiceComponentManagers()) to use the implementation constant from
nicoprovider (e.g., the exported implementation constant) so they reflect the
contract exposed by componentManagers and defaultServiceComponentManagers().
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@flow/docs/component-manager-architecture.md`:
- Around line 340-359: The example registration snippet is missing imports for
the packages that define cmconfig.Config and componentmanager.FactorySpec;
update the import block to include the correct package paths that provide
cmconfig (symbol: cmconfig.Config) and componentmanager (symbol:
componentmanager.FactorySpec) so the functions serviceFactorySpecs and
serviceDescriptors compile; specifically add the import entries for the packages
that declare componentmanager and cmconfig, then ensure the import aliases (if
any) match the usage in serviceFactorySpecs and serviceDescriptors.

In `@flow/internal/task/componentmanager/catalog/catalog.go`:
- Around line 77-88: The Catalog.Get method currently returns a Descriptor that
contains the RequiredProviders slice by reference, risking external mutation of
internal state; modify Catalog.Get to defensively clone the slice before
returning (use slices.Clone on Descriptor.RequiredProviders) so the returned
Descriptor contains an independent copy, ensuring callers cannot mutate the
catalog's internal backing array; update the return path in Catalog.Get to
create a copy of the descriptor with RequiredProviders set to
slices.Clone(descriptor.RequiredProviders) and then return that copy along with
the ok flag.

In `@flow/internal/task/componentmanager/registry.go`:
- Around line 168-177: The GetAllManagers method currently dereferences r
without checking for a nil receiver, causing a panic on nil calls; modify
Registry.GetAllManagers to first check if r == nil and immediately return an
empty []ComponentManager (zero-length slice) in that case, otherwise proceed to
acquire r.mu.RLock(), defer r.mu.RUnlock(), iterate over r.active and append
managers as before; this keeps behavior consistent with other read APIs and
avoids locking a nil receiver.

---

Nitpick comments:
In `@flow/internal/task/componentmanager/builtin/builtin_test.go`:
- Around line 96-105: The test is asserting component manager selections against
provider-name constants (nicoprovider.ProviderName); change those assertions to
use the provider's implementation constant instead (the implementation-name
constant exported by nicoprovider) so the test verifies the ComponentManagers
mapping maps component types to implementation names; update all assertions
referencing nicoprovider.ProviderName (including the mutation/assertion of
defaultServiceComponentManagers()) to use the implementation constant from
nicoprovider (e.g., the exported implementation constant) so they reflect the
contract exposed by componentManagers and defaultServiceComponentManagers().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 00c2e358-bcbf-402b-8f04-b30200220512

📥 Commits

Reviewing files that changed from the base of the PR and between be71f81 and ae2e5b2.

📒 Files selected for processing (36)
  • flow/cmd/serve.go
  • flow/docs/component-manager-architecture.md
  • flow/docs/flow-architecture.md
  • flow/internal/task/componentmanager/builtin/builtin_test.go
  • flow/internal/task/componentmanager/builtin/component_manager_factories.go
  • flow/internal/task/componentmanager/builtin/config.go
  • flow/internal/task/componentmanager/builtin/config_test.go
  • flow/internal/task/componentmanager/builtin/helpers.go
  • flow/internal/task/componentmanager/builtin/manifest.go
  • flow/internal/task/componentmanager/builtin/provider_config_decoders.go
  • flow/internal/task/componentmanager/builtin/provider_config_decoders_test.go
  • flow/internal/task/componentmanager/builtin/setup.go
  • flow/internal/task/componentmanager/catalog/catalog.go
  • flow/internal/task/componentmanager/catalog/catalog_test.go
  • flow/internal/task/componentmanager/catalog/errors.go
  • flow/internal/task/componentmanager/componentmanager.go
  • flow/internal/task/componentmanager/componentmanager_test.go
  • flow/internal/task/componentmanager/compute/nico/nico.go
  • flow/internal/task/componentmanager/config/config.go
  • flow/internal/task/componentmanager/config/config_test.go
  • flow/internal/task/componentmanager/config/doc.go
  • flow/internal/task/componentmanager/config/errors.go
  • flow/internal/task/componentmanager/config/yaml.go
  • flow/internal/task/componentmanager/errors.go
  • flow/internal/task/componentmanager/factory_spec.go
  • flow/internal/task/componentmanager/factory_spec_test.go
  • flow/internal/task/componentmanager/manager.go
  • flow/internal/task/componentmanager/mock/mock.go
  • flow/internal/task/componentmanager/nvlswitch/nico/nico.go
  • flow/internal/task/componentmanager/nvlswitch/nvswitchmanager/nvswitchmanager.go
  • flow/internal/task/componentmanager/powershelf/nico/nico.go
  • flow/internal/task/componentmanager/powershelf/psm/psm.go
  • flow/internal/task/componentmanager/registry.go
  • flow/internal/task/componentmanager/registry_test.go
  • flow/internal/task/componentmanager/test_helpers_test.go
  • flow/internal/task/executor/temporalworkflow/activity/activity_test.go
💤 Files with no reviewable changes (8)
  • flow/internal/task/componentmanager/builtin/config.go
  • flow/internal/task/componentmanager/builtin/config_test.go
  • flow/internal/task/componentmanager/builtin/provider_config_decoders_test.go
  • flow/internal/task/componentmanager/componentmanager_test.go
  • flow/internal/task/componentmanager/config/doc.go
  • flow/internal/task/componentmanager/builtin/provider_config_decoders.go
  • flow/internal/task/componentmanager/componentmanager.go
  • flow/internal/task/componentmanager/builtin/component_manager_factories.go

Comment thread flow/docs/component-manager-architecture.md
Comment thread flow/internal/task/componentmanager/catalog/catalog.go
Comment thread flow/internal/task/componentmanager/registry.go
@jw-nvidia jw-nvidia force-pushed the refactor/cm-providers branch from ae2e5b2 to 1c6e1f2 Compare May 14, 2026 21:07
@jw-nvidia

Copy link
Copy Markdown
Contributor Author

/ok to test 1c6e1f2

@jw-nvidia jw-nvidia requested a review from kunzhao-nv May 14, 2026 21:21
@github-actions

Copy link
Copy Markdown

🔐 TruffleHog Secret Scan

No secrets or credentials found!

Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉

🔗 View scan details

🕐 Last updated: 2026-05-14 21:22:04 UTC | Commit: 1c6e1f2

@github-actions

github-actions Bot commented May 14, 2026

Copy link
Copy Markdown

🔍 Container Scan Summary

Service Total Critical High Medium Low Other
nico-flow 66 4 34 18 2 8
nico-nsm 82 2 28 43 9 0
nico-psm 67 4 35 18 2 8
nico-rest-api 100 6 53 30 3 8
nico-rest-cert-manager 65 4 34 18 1 8
nico-rest-db 66 4 34 18 2 8
nico-rest-site-agent 65 4 34 18 1 8
nico-rest-site-manager 65 4 34 18 1 8
nico-rest-workflow 67 4 35 18 2 8
TOTAL 643 36 321 199 23 64

Per-CVE detail lives in the per-service grype-* artifacts (JSON + SARIF). Severity counts only — no CVE IDs published here.

Comment thread flow/internal/task/componentmanager/builtin/manifest.go
Comment thread flow/internal/task/componentmanager/builtin/manifest.go
Comment thread flow/internal/task/componentmanager/manager.go
- Separate component manager descriptors into a catalog package and split
  registry construction into manager, factory spec, and registry files.
- Consolidate builtin service setup around a manifest, provider registry
  construction, and focused tests.

Signed-off-by: Jin Wang <jinwan@nvidia.com>
@jw-nvidia jw-nvidia force-pushed the refactor/cm-providers branch from 1c6e1f2 to 217bff5 Compare May 15, 2026 17:10
@jw-nvidia

Copy link
Copy Markdown
Contributor Author

/ok to test 217bff5

@jw-nvidia jw-nvidia merged commit e0bee3b into NVIDIA:main May 15, 2026
53 checks passed
@jw-nvidia jw-nvidia deleted the refactor/cm-providers branch May 15, 2026 17:55
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants