-
Notifications
You must be signed in to change notification settings - Fork 321
KFLUXINFRA-2752: Centralize SCC RBAC for tekton-pipelines-controller #9865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🤖 Gemini AI Assistant AvailableHi @eisraeli! I'm here to help with your pull request. You can interact with me using the following commands: Available Commands
How to Use
PermissionsOnly OWNER, MEMBER, or COLLABORATOR users can trigger my responses. This ensures secure and appropriate usage. This message was automatically added to help you get started with the Gemini AI assistant. Feel free to delete this comment if you don't need assistance. |
|
🤖 Hi @eisraeli, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
d44e6fd to
da1771c
Compare
da1771c to
3ff3863
Compare
|
I'm not sure we need to give the pipelines controller the permissions to access the scc, since the build-service already give this permission to the service accounts it create for running the pipelineruns. |
Are you sure this is true even in staging cluster? @jinqi7 do you know which SA is used in the e2e tests that run in the staging cluster? Is it an SA created by the build-service? |
|
The "appstudio-pipelines-scc" is being used in managed release pipelineruns in some product clusters. I am not sure if it's expected since release-service does not have related scc settings. In the staging clusters, the scc is not used there. The PR seems to avoid the gap. But I don't know others. |
The SA in release catalog e2e tests is not created by build-service for sure. But it maybe affected by some tekton operator settings. |
|
Thanks for the discussion. I think we have two different scenarios:
The original issue (KFLUXINFRA-2752) was triggered by release pipeline failures ( So as I see it, we can either apply:
|
|
I suggest to go with: Release-service fix (component-level): Update release-service to grant SCC permissions to the SAs it creates, similar to how build-service does it. This would require changes to the release-service. So we will have finer controller on who can use which SCC. |
@gbenhaim, thanks for the suggestion. |
They are not needed anymore and should be removed. Instead we should follow the recommendation i mentioned above. |
Thanks for the answer. If we can remove them with this PR, it's also ok for release-service. Because what confused us is the different behaviors between stage and product clusters.
@davidmogar @johnbieren , do we need to create a jira ticket for adding SCC related configuration in release-service as recommended to avoid similar situation again? |
|
I'm not really following this. We don't create SAs on the fly to run the release pipelineRuns with. They are long living SAs that are stored in git and maintained by releng. So "Update release-service to grant SCC permissions to the SAs it creates" doesn't make sense as a solution since we don't create SAs |
|
New changes are detected. LGTM label has been removed. |
Hi
|
|
/approve |
|
/approve @eisraeli feel free to unhold when you are ready to merge it. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: eisraeli, enarha, gbenhaim, jinqi7 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
🤖 Pipeline Failure AnalysisCategory: Test The pipeline failed because an end-to-end test failed due to a mismatch in the enterprise contract check status, where a "WARNING" was received instead of the expected "SUCCESS." 📋 Technical DetailsImmediate CauseThe Contributing FactorsWhile not the direct cause of this specific failure, the provided additional context indicates that several ApplicationSets are in an 'OutOfSync' state, and snapshot creation failed due to a lack of signing with Chains. These could be underlying issues in the environment or deployment processes that might contribute to unexpected states or warnings during builds. ImpactThe failure of this critical end-to-end test prevented the successful completion of the 🔍 Evidenceappstudio-e2e-tests/redhat-appstudio-e2eCategory: Logs:
|
🤖 Pipeline Failure AnalysisCategory: Timeout The AppStudio E2E tests failed due to a timeout caused by the 📋 Technical DetailsImmediate CauseThe Contributing FactorsAnalysis of
ImpactThe timeout in the E2E test execution step prevented the successful completion of the Prow job. This directly blocked the validation of infrastructure deployments for the AppStudio project in the CI pipeline. 🔍 Evidenceappstudio-e2e-tests/redhat-appstudio-e2eCategory: Logs:
|
|
/test appstudio-e2e-tests |
🤖 Pipeline Failure AnalysisCategory: Infrastructure The Prow job failed due to persistent DNS resolution errors preventing connectivity to the Kubernetes API server, which blocked essential data collection and test execution steps. 📋 Technical DetailsImmediate CauseMultiple Contributing FactorsThe ImpactThe inability to resolve the Kubernetes API server's DNS name prevented the execution of critical diagnostic and data collection steps. This ultimately blocked the Prow job from proceeding to run the e2e tests successfully, as the necessary environment connectivity and data gathering failed. 🔍 Evidenceappstudio-e2e-tests/gather-audit-logsCategory: Logs:
|
🤖 Pipeline Failure AnalysisCategory: Timeout End-to-end tests timed out due to excessive execution duration, preventing the completion of the pipeline. 📋 Technical DetailsImmediate CauseThe Contributing FactorsSeveral environmental factors may have contributed to the prolonged test execution. Analysis of ImpactThe timeout in the end-to-end test execution prevented any further steps in the pipeline from running, effectively blocking the entire CI job. This failure indicates a potential problem with the stability or performance of the AppStudio environment or the application under test, requiring further investigation into the root cause of the extended test duration. 🔍 Evidenceappstudio-e2e-tests/redhat-appstudio-e2eCategory: Logs:
|
|
/test appstudio-e2e-tests |
🤖 Pipeline Failure AnalysisCategory: Timeout The e2e tests for AppStudio failed due to a timeout, likely caused by underlying infrastructure instability including degraded ArgoCD applications and unready Tekton components. 📋 Technical DetailsImmediate CauseThe Contributing FactorsMultiple issues within the cluster environment likely contributed to the test execution delay and subsequent timeout. Specifically, the ImpactThe timeout prevented the successful completion of the AppStudio e2e tests. This means the deployment and functionality of the AppStudio components have not been validated for this specific PR, potentially allowing integration issues or regressions to go undetected. 🔍 Evidenceappstudio-e2e-tests/redhat-appstudio-e2eCategory: Analysis powered by prow-failure-analysis | Build: |
|
/test appstudio-e2e-tests |
🤖 Pipeline Failure AnalysisCategory: Timeout The end-to-end tests failed due to a timeout caused by the 📋 Technical DetailsImmediate CauseThe Contributing FactorsSeveral related issues contribute to this failure:
ImpactThe failure of the 🔍 Evidenceappstudio-e2e-tests/redhat-appstudio-e2eCategory: Logs:
|
🤖 Pipeline Failure AnalysisCategory: Timeout The Prow job 📋 Technical DetailsImmediate CauseThe Contributing FactorsSeveral pieces of evidence suggest underlying infrastructure instability that may have caused the test slowness:
ImpactThe timeout prevented the completion of the end-to-end tests for this Prow job. This failure means that the Prow job could not validate the functionality and stability of the Red Hat AppStudio infrastructure in its current state, potentially allowing undetected issues to persist. 🔍 Evidenceappstudio-e2e-tests/redhat-appstudio-e2eCategory: Logs:
|
|
@eisraeli: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Move the tekton-pipelines-controller-konflux-scc ClusterRole and
ClusterRoleBinding to the base rbac directory so it's automatically
applied to all clusters.
Previously, the SCC RBAC was only defined in some production clusters
(kflux-prd-rh02, kflux-prd-rh03, kflux-rhel-p01, kflux-osp-p01), causing
staging and other production clusters to use the wrong SCC
(restricted-v2 instead of appstudio-pipelines-scc). This mismatch caused
release pipeline failures in production that weren't caught during
staging testing.
This change: