KFLUXINFRA-2752: Centralize SCC RBAC for tekton-pipelines-controller #9865

eisraeli · 2026-01-06T21:04:05Z

Move the tekton-pipelines-controller-konflux-scc ClusterRole and
ClusterRoleBinding to the base rbac directory so it's automatically
applied to all clusters.

Previously, the SCC RBAC was only defined in some production clusters
(kflux-prd-rh02, kflux-prd-rh03, kflux-rhel-p01, kflux-osp-p01), causing
staging and other production clusters to use the wrong SCC
(restricted-v2 instead of appstudio-pipelines-scc). This mismatch caused
release pipeline failures in production that weren't caught during
staging testing.

This change:

Adds scc-rbac.yaml to components/pipeline-service/base/rbac/cluster-role/
Removes duplicate scc-rbac.yaml from individual cluster directories
Ensures all staging and production clusters use appstudio-pipelines-scc

github-actions · 2026-01-06T21:04:16Z

🤖 Gemini AI Assistant Available

Hi @eisraeli! I'm here to help with your pull request. You can interact with me using the following commands:

Available Commands

@gemini-cli /review - Request a comprehensive code review
- Example: @gemini-cli /review Please focus on security and performance
@gemini-cli <your question> - Ask me anything about the codebase
- Example: @gemini-cli How can I improve this function?
- Example: @gemini-cli What are the best practices for error handling here?

How to Use

Simply type one of the commands above in a comment on this PR
I'll analyze your code and provide detailed feedback
You can track my progress in the workflow logs

Permissions

Only OWNER, MEMBER, or COLLABORATOR users can trigger my responses. This ensures secure and appropriate usage.

This message was automatically added to help you get started with the Gemini AI assistant. Feel free to delete this comment if you don't need assistance.

github-actions · 2026-01-06T21:04:18Z

🤖 Hi @eisraeli, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

gbenhaim · 2026-01-06T22:19:32Z

I'm not sure we need to give the pipelines controller the permissions to access the scc, since the build-service already give this permission to the service accounts it create for running the pipelineruns.

mmalina · 2026-01-07T08:30:07Z

I'm not sure we need to give the pipelines controller the permissions to access the scc, since the build-service already give this permission to the service accounts it create for running the pipelineruns.

Are you sure this is true even in staging cluster? @jinqi7 do you know which SA is used in the e2e tests that run in the staging cluster? Is it an SA created by the build-service?

jinqi7 · 2026-01-07T09:20:40Z

The "appstudio-pipelines-scc" is being used in managed release pipelineruns in some product clusters. I am not sure if it's expected since release-service does not have related scc settings. In the staging clusters, the scc is not used there. The PR seems to avoid the gap. But I don't know others.

jinqi7 · 2026-01-07T09:23:23Z

I'm not sure we need to give the pipelines controller the permissions to access the scc, since the build-service already give this permission to the service accounts it create for running the pipelineruns.

Are you sure this is true even in staging cluster? @jinqi7 do you know which SA is used in the e2e tests that run in the staging cluster? Is it an SA created by the build-service?

The SA in release catalog e2e tests is not created by build-service for sure. But it maybe affected by some tekton operator settings.

eisraeli · 2026-01-11T09:39:15Z

Thanks for the discussion. I think we have two different scenarios:

Build pipelines: pipelines created by build-service, which already grants appstudio-pipelines-scc access via appstudio-pipelines-runner ClusterRole.
Release pipelines: pipelines created by release-service, which does not configure SCC permissions. This is the gap this PR addresses.

The original issue (KFLUXINFRA-2752) was triggered by release pipeline failures (rh-push-to-registry-redhat-io task), not build pipelines. Without the controller having SCC access, release pipeline pods fall back to restricted-v2 instead of appstudio-pipelines-scc.

So as I see it, we can either apply:

This PR (platform-level fix): Give the Tekton controller SCC access so it can assign appstudio-pipelines-scc to any pod, regardless of whether the pod's SA has direct SCC permissions. This provides a centralized fix.
Release-service fix (component-level): Update release-service to grant SCC permissions to the SAs it creates, similar to how build-service does it. This would require changes to the release-service.

Thoughts ?
@gbenhaim @mmalina @jinqi7

gbenhaim · 2026-01-11T09:43:04Z

I suggest to go with:

Release-service fix (component-level): Update release-service to grant SCC permissions to the SAs it creates, similar to how build-service does it. This would require changes to the release-service.

So we will have finer controller on who can use which SCC.
I also think a different SCC is needed for the release service, for example I don't think it need to run pods with the root user or have the same set of privileges the build pipeline needs.

jinqi7 · 2026-01-11T21:35:24Z

I suggest to go with:

Release-service fix (component-level): Update release-service to grant SCC permissions to the SAs it creates, similar to how build-service does it. This would require changes to the release-service.

So we will have finer controller on who can use which SCC. I also think a different SCC is needed for the release service, for example I don't think it need to run pods with the root user or have the same set of privileges the build pipeline needs.

@gbenhaim, thanks for the suggestion.
Can you please help to understand why the "tekton-pipelines-controller-konflux-scc" ClusterRole and ClusterRoleBinding are needed in product clusters? Not in stage clusters? We need to make the environments having identical configuration. Right? Should them be kept in all the clusters or removed?

gbenhaim · 2026-01-12T04:40:16Z

I suggest to go with:
Release-service fix (component-level): Update release-service to grant SCC permissions to the SAs it creates, similar to how build-service does it. This would require changes to the release-service.
So we will have finer controller on who can use which SCC. I also think a different SCC is needed for the release service, for example I don't think it need to run pods with the root user or have the same set of privileges the build pipeline needs.

@gbenhaim, thanks for the suggestion. Can you please help to understand why the "tekton-pipelines-controller-konflux-scc" ClusterRole and ClusterRoleBinding are needed in product clusters? Not in stage clusters? We need to make the environments having identical configuration. Right? Should them be kept in all the clusters or removed?

They are not needed anymore and should be removed. Instead we should follow the recommendation i mentioned above.

jinqi7 · 2026-01-12T05:54:35Z

They are not needed anymore and should be removed.

Thanks for the answer. If we can remove them with this PR, it's also ok for release-service. Because what confused us is the different behaviors between stage and product clusters.

Instead we should follow the recommendation i mentioned above.

@davidmogar @johnbieren , do we need to create a jira ticket for adding SCC related configuration in release-service as recommended to avoid similar situation again?

johnbieren · 2026-01-12T13:53:18Z

I'm not really following this. We don't create SAs on the fly to run the release pipelineRuns with. They are long living SAs that are stored in git and maintained by releng. So "Update release-service to grant SCC permissions to the SAs it creates" doesn't make sense as a solution since we don't create SAs

openshift-ci · 2026-01-22T12:30:58Z

New changes are detected. LGTM label has been removed.

eisraeli · 2026-01-22T12:36:22Z

@enarha: I see the pentest cluster has the same configuration: components/pipeline-service/production/pentest-p01/resources/kustomization.yaml and I guess it should be the as similar as possible to the production clusters.
Here is another reference: hack/new-cluster/templates/pipeline-service/resources/kustomization.yaml , not sure if it has to be removed as well.

Hi

PenTest cluster was removed (see commit b306477).
I've removed the configuration from the hack path as well.

@gbenhaim @enarha PTAL.

enarha · 2026-01-26T10:03:18Z

/approve

eisraeli · 2026-01-26T11:59:40Z

^ @adambkaplan

gbenhaim · 2026-01-26T19:12:57Z

/approve
/hold

@eisraeli feel free to unhold when you are ready to merge it.

openshift-ci · 2026-01-26T19:13:05Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: eisraeli, enarha, gbenhaim, jinqi7

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [gbenhaim]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

konflux-ci-qe-bot · 2026-01-28T11:39:59Z

🤖 Pipeline Failure Analysis

Category: Test

The pipeline failed because an end-to-end test failed due to a mismatch in the enterprise contract check status, where a "WARNING" was received instead of the expected "SUCCESS."

📋 Technical Details

Immediate Cause

The appstudio-e2e-tests/redhat-appstudio-e2e test suite failed because the verify-enterprise-contract check, a part of the HACBS pipelines scenario sample-python-basic-oci test, returned a "WARNING" status instead of the expected "SUCCESS" status. This discrepancy was identified by a test assertion that strictly required the "SUCCESS" outcome for the enterprise contract verification.

Contributing Factors

While not the direct cause of this specific failure, the provided additional context indicates that several ApplicationSets are in an 'OutOfSync' state, and snapshot creation failed due to a lack of signing with Chains. These could be underlying issues in the environment or deployment processes that might contribute to unexpected states or warnings during builds.

Impact

The failure of this critical end-to-end test prevented the successful completion of the pull-ci-redhat-appstudio-infra-deployments-main-appstudio-e2e-tests job. This blocked the validation of the application infrastructure's end-to-end deployment and functionality in the test environment.

🔍 Evidence

appstudio-e2e-tests/redhat-appstudio-e2e

Category: test
Root Cause: The root cause of the failure is a test assertion mismatch in the verify-enterprise-contract check within the HACBS pipelines scenario sample-python-basic-oci test. The test expected a "SUCCESS" status for the enterprise contract verification, but the actual output indicated a "WARNING".

Logs:

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 1008

[FAIL] [62.201 seconds]
[build-service-suite Build templates E2E test] HACBS pipelines scenario sample-python-basic-oci when the container image for component with Git source URL https://github.com/redhat-appstudio-qe/devfile-sample-python-basic is created and pushed to container registry [It] verify-enterprise-contract check should pass [build, build-templates, HACBS, pipeline-service, pipeline, sbom, slow, build-templates-e2e]
/tmp/tmp.WYfEbzJUMj/tests/build/build_templates.go:546

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 1019

Expected
      <[]v1.TaskRunResult | len:1, cap:1>: [
          {
              Name: "TEST_OUTPUT",
              Type: "string",
              Value: {
                  Type: "string",
                  StringVal: "{\"timestamp\":\"1769597877\",\"namespace\":\"\",\"successes\":85,\"failures\":0,\"warnings\":1,\"result\":\"WARNING\"}
",
                  ArrayVal: nil,
                  ObjectVal: nil,
              },
          },
      ]
  to contain elements
      <[]*tekton.TaskRunResultMatcher | len:1, cap:1>: [
          {
              name: "TEST_OUTPUT",
              jsonPath: "{$.result}",
              value: nil,
              jsonValue: <string>"[\"SUCCESS\"]",
              jsonMatcher: <*matchers.MatchJSONMatcher | 0xc0012b51a0>{
                  JSONToMatch: <string>"[\"SUCCESS\"]",
                  firstFailurePath: [<int>0],
              },
          },
      ]

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 1042

In [It] at: /tmp/tmp.WYfEbzJUMj/tests/build/build_templates.go:627 @ 01/28/26 10:57:59.778

Analysis powered by prow-failure-analysis | Build: 2016458087434555392

konflux-ci-qe-bot · 2026-02-01T12:57:20Z

🤖 Pipeline Failure Analysis

Category: Timeout

The AppStudio E2E tests failed due to a timeout caused by the build-service-in-cluster-local application not synchronizing, which prevented the test execution from completing.

📋 Technical Details

Immediate Cause

The appstudio-e2e-tests/redhat-appstudio-e2e step timed out. This timeout occurred because the build-service-in-cluster-local component remained in a "Degraded" state and did not synchronize within the allotted two-hour execution period.

Contributing Factors

Analysis of additional_context reveals several related issues that may have contributed to the instability of the build-service-in-cluster-local component:

The build-service Argo CD Application shows a 'Degraded' health status.
Multiple ApplicationSets, including application-api and build-service, are in an 'OutOfSync' state.
The tektonaddons.json indicates that Tekton Addons are not fully ready due to issues with 'openshift console resources' and the 'tkn-cli-serve' deployment.
The tektonconfigs.json shows that Tekton components are not ready, also citing issues with TektonAddon.

Impact

The timeout in the E2E test execution step prevented the successful completion of the Prow job. This directly blocked the validation of infrastructure deployments for the AppStudio project in the CI pipeline.

🔍 Evidence

appstudio-e2e-tests/redhat-appstudio-e2e

Category: timeout
Root Cause: The primary cause of the failure is a timeout during the end-to-end test execution. This appears to be related to the 'build-service-in-cluster-local' application not reaching a synchronized state within the allocated time, leading to subsequent process timeouts.

Logs:

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/step.log line 750

build-service-in-cluster-local Syncing Degraded

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/step.log line 1250

build-service-in-cluster-local Syncing Degraded

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/step.log line 7750

build-service-in-cluster-local Syncing Degraded

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/step.log line 7847

{"component":"entrypoint","file":"sigs.k8s.io/prow/pkg/entrypoint/run.go:169","func":"sigs.k8s.io/prow/pkg/entrypoint.Options.ExecuteProcess","level":"error","msg":"Process did not finish before 2h0m0s timeout","severity":"error","time":"2026-02-01T12:24:32Z"}

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/step.log line 7850

{"component":"entrypoint","file":"sigs.k8s.io/prow/pkg/entrypoint/run.go:267","func":"sigs.k8s.io/prow/pkg/entrypoint.gracefullyTerminate","level":"error","msg":"Process did not exit before 15s grace period","severity":"error","time":"2026-02-01T12:24:47Z"}

Analysis powered by prow-failure-analysis | Build: 2017904434360619008

eisraeli · 2026-02-01T16:30:22Z

/test appstudio-e2e-tests

konflux-ci-qe-bot · 2026-02-01T18:05:41Z

🤖 Pipeline Failure Analysis

Category: Infrastructure

The Prow job failed due to persistent DNS resolution errors preventing connectivity to the Kubernetes API server, which blocked essential data collection and test execution steps.

📋 Technical Details

Immediate Cause

Multiple gather-* steps (gather-audit-logs, gather-extra, gather-must-gather, redhat-appstudio-gather) failed because they were unable to resolve the DNS name of the Kubernetes API server (api.konflux-4-17-us-west-2-t927h.konflux-qe.devcluster.openshift.com) using the DNS server 172.30.0.10:53. This indicates a network or DNS configuration issue within the Prow environment or the target cluster.

Contributing Factors

The redhat-appstudio-e2e step experienced a termination of its entrypoint process, reporting "terminated" and a failure to gracefully terminate within the grace period. This could be a secondary effect of the cluster instability caused by the DNS issues or an independent problem. The must-gather logs from the additional context also reinforce the DNS resolution failures and suggest the cluster might not be in a healthy state, further contributing to the inability to collect necessary diagnostic information.

Impact

The inability to resolve the Kubernetes API server's DNS name prevented the execution of critical diagnostic and data collection steps. This ultimately blocked the Prow job from proceeding to run the e2e tests successfully, as the necessary environment connectivity and data gathering failed.

🔍 Evidence

appstudio-e2e-tests/gather-audit-logs

Category: infrastructure
Root Cause: The failure is due to a DNS resolution error preventing the must-gather tool from connecting to the Kubernetes API server. This could be caused by network misconfiguration, DNS server issues, or the cluster endpoint being unreachable.

Logs:

artifacts/appstudio-e2e-tests/gather-audit-logs/build-log.txt line 4

[must-gather      ] OUT 2026-02-01T18:01:05.686704261Z Get "https://api.konflux-4-17-us-west-2-t927h.konflux-qe.devcluster.openshift.com:6443/apis/image.openshift.io/v1/namespaces/openshift/imagestreams/must-gather": dial tcp: lookup api.konflux-4-17-us-west-2-t927h.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

artifacts/appstudio-e2e-tests/gather-audit-logs/build-log.txt line 14

error getting cluster version: Get "https://api.konflux-4-17-us-west-2-t927h.konflux-qe.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusterversions/version": dial tcp: lookup api.konflux-4-17-us-west-2-t927h.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

artifacts/appstudio-e2e-tests/gather-audit-logs/build-log.txt line 18

error getting cluster operators: Get "https://api.konflux-4-17-us-west-2-t927h.konflux-qe.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusteroperators": dial tcp: lookup api.konflux-4-17-us-west-2-t927h.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

artifacts/appstudio-e2e-tests/gather-audit-logs/build-log.txt line 24

Error running must-gather collection:
    creating temp namespace: Post "https://api.konflux-4-17-us-west-2-t927h.konflux-qe.devcluster.openshift.com:6443/api/v1/namespaces": dial tcp: lookup api.konflux-4-17-us-west-2-t927h.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

artifacts/appstudio-e2e-tests/gather-audit-logs/build-log.txt line 40

E0201 18:01:35.736623      49 memcache.go:265] couldn't get current server API group list: Get "https://api.konflux-4-17-us-west-2-t927h.konflux-qe.devcluster.openshift.com:6443/api?timeout=32s": dial tcp [REDACTED: Public IP (ipv4)]: i/o timeout

artifacts/appstudio-e2e-tests/gather-audit-logs/build-log.txt line 55

error running backup collection: Get "https://api.konflux-4-17-us-west-2-t927h.konflux-4-17-us-west-2-t927h.konflux-qe.devcluster.openshift.com:6443/api?timeout=32s": dial tcp: lookup api.konflux-4-17-us-west-2-t927h.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

appstudio-e2e-tests/gather-extra

Category: infrastructure
Root Cause: The CI environment failed to resolve the DNS name of the Kubernetes API server, preventing the 'gather-extra' step from connecting to the cluster to collect artifacts. This is likely due to a network or DNS configuration issue within the CI environment or the target cluster.

Logs:

artifacts/appstudio-e2e-tests/gather-extra/gather-extra-log.txt line 4

E0201 18:00:53.958920      29 memcache.go:265] couldn't get current server API group list: Get "https://api.konflux-4-17-us-west-2-t927h.konflux-qe.devcluster.openshift.com:6443/api?timeout=5s": dial tcp: lookup api.konflux-4-17-us-west-2-t927h.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

artifacts/appstudio-e2e-tests/gather-extra/gather-extra-log.txt line 9

E0201 18:00:58.989038      29 memcache.go:265] couldn't get current server API group list: Get "https://api.konflux-4-17-us-west-2-t927h.konflux-qe.devcluster.openshift.com:6443/api?timeout=5s": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

artifacts/appstudio-e2e-tests/gather-extra/gather-extra-log.txt line 10

Unable to connect to the server: context deadline exceeded (Client.Timeout exceeded while awaiting headers)

appstudio-e2e-tests/gather-must-gather

Category: infrastructure
Root Cause: The oc adm must-gather command failed due to network connectivity issues, specifically "connection refused" and "no such host" errors when trying to reach the OpenShift API server, suggesting the cluster might be down or inaccessible.

Logs:

artifacts/appstudio-e2e-tests/gather-must-gather/must-gather.log line 16

Error running must-gather collection:
    creating temp namespace: Post "https://api.konflux-4-17-us-west-2-t927h.konflux-qe.devcluster.openshift.com:6443/api/v1/namespaces": dial tcp [REDACTED: Public IP (ipv4)]: connect: connection refused

artifacts/appstudio-e2e-tests/gather-must-gather/must-gather.log line 24

E0201 18:00:47.163434      56 memcache.go:265] couldn't get current server API group list: Get "https://api.konflux-4-17-us-west-2-t927h.konflux-qe.devcluster.openshift.com:6443/api?timeout=32s": dial tcp: lookup api.konflux-4-17-us-west-2-t927h.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

artifacts/appstudio-e2e-tests/gather-must-gather/must-gather.log line 30

error running backup collection: Get "https://api.konflux-4-17-us-west-2-t927h.konflux-qe.devcluster.openshift.com:6443/api?timeout=32s": dial tcp: lookup api.konflux-4-17-us-west-2-t927h.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

artifacts/appstudio-e2e-tests/gather-must-gather/must-gather.log line 31

error: creating temp namespace: Post "https://api.konflux-4-17-us-west-2-t927h.konflux-qe.devcluster.openshift.com:6443/api/v1/namespaces": dial tcp [REDACTED: Public IP (ipv4)]: connect: connection refused

appstudio-e2e-tests/redhat-appstudio-e2e

Category: infrastructure
Root Cause: The primary cause of failure appears to be an infrastructure-level issue where the entrypoint process was terminated by an external signal, likely due to the job being cancelled or an underlying resource issue. The subsequent failure to gracefully terminate indicates that the termination signal was not handled as expected by the running process.

Logs:

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/log.txt line 777

{"component":"entrypoint","file":"sigs.k8s.io/prow/pkg/entrypoint/run.go:173","func":"sigs.k8s.io/prow/pkg/entrypoint.Options.ExecuteProcess","level":"error","msg":"Entrypoint received interrupt: terminated","severity":"error","time":"2026-02-01T17:54:55Z"}

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/log.txt line 781

make: *** [Makefile:25: ci/test/e2e] Terminated

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/log.txt line 783

{"component":"entrypoint","file":"sigs.k8s.io/prow/pkg/entrypoint/run.go:267","func":"sigs.k8s.io/prow/pkg/entrypoint.gracefullyTerminate","level":"error","msg":"Process did not exit before 15s grace period","severity":"error","time":"2026-02-01T17:55:10Z"}

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/log.txt line 785

{"component":"entrypoint","error":"os: process already finished","file":"sigs.k8s.io/prow/pkg/entrypoint/run.go:269","func":"sigs.k8s.io/prow/pkg/entrypoint.gracefullyTerminate","level":"error","msg":"Could not kill process after grace period","severity":"error","time":"2026-02-01T17:55:10Z"}

appstudio-e2e-tests/redhat-appstudio-gather

Category: infrastructure
Root Cause: The oc client is unable to resolve the hostname for the Kubernetes API server, indicating a networking or DNS configuration issue within the environment. This prevents the tool from connecting to the cluster to gather necessary information.

Logs:

artifacts/appstudio-e2e-tests/redhat-appstudio-gather/build-log.txt:73

E0201 18:01:42.142932      35 memcache.go:265] couldn't get current server API group list: Get "https://api.konflux-4-17-us-west-2-t927h.konflux-qe.devcluster.openshift.com:6443/api?timeout=5s": dial tcp: lookup api.konflux-4-17-us-west-2-t927h.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

artifacts/appstudio-e2e-tests/redhat-appstudio-gather/build-log.txt:296

Unable to connect to the server: dial tcp: lookup api.konflux-4-17-us-west-2-t927h.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

artifacts/appstudio-e2e-tests/redhat-appstudio-gather/build-log.txt:474

Error running must-gather collection:
    creating temp namespace: Post "https://api.konflux-4-17-us-west-2-t927h.konflux-qe.devcluster.openshift.com:6443/api/v1/namespaces": dial tcp: lookup api.konflux-4-17-us-west-2-t927h.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

artifacts/appstudio-e2e-tests/redhat-appstudio-gather/build-log.txt:487

error running backup collection: Get "https://api.konflux-4-17-us-west-2-t927h.konflux-qe.devcluster.openshift.com:6443/api?timeout=32s": dial tcp: lookup api.konflux-4-17-us-west-2-t927h.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

Analysis powered by prow-failure-analysis | Build: 2017998987306471424

konflux-ci-qe-bot · 2026-02-01T20:45:54Z

🤖 Pipeline Failure Analysis

Category: Timeout

End-to-end tests timed out due to excessive execution duration, preventing the completion of the pipeline.

📋 Technical Details

Immediate Cause

The appstudio-e2e-tests/redhat-appstudio-e2e step exceeded its 2-hour execution timeout. The process did not finish within the allotted time and was eventually terminated after a grace period.

Contributing Factors

Several environmental factors may have contributed to the prolonged test execution. Analysis of additional_context indicates potential issues with Tekton component readiness, specifically that the TektonAddon and TektonConfig resources are not in a ready state, with the tkn-cli-serve deployment not being ready. Additionally, a build-service-controller-manager deployment has exceeded its progress deadline, and numerous ApplicationSets are in an OutOfSync or have a Missing health status. These underlying system instabilities or performance degradations could be causing delays that lead to the end-to-end tests exceeding their allocated runtime.

Impact

The timeout in the end-to-end test execution prevented any further steps in the pipeline from running, effectively blocking the entire CI job. This failure indicates a potential problem with the stability or performance of the AppStudio environment or the application under test, requiring further investigation into the root cause of the extended test duration.

🔍 Evidence

appstudio-e2e-tests/redhat-appstudio-e2e

Category: timeout
Root Cause: The end-to-end tests failed because they exceeded the maximum allowed execution time. This suggests that the tests are either too long-running, or there is an issue within the test environment or the application under test that is causing excessive delays.

Logs:

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 1497

{"component":"entrypoint","file":"sigs.k8s.io/prow/pkg/entrypoint/run.go:169","func":"sigs.k8s.io/prow/pkg/entrypoint.Options.ExecuteProcess","level":"error","msg":"Process did not finish before 2h0m0s timeout","severity":"error","time":"2026-02-01T20:08:30Z"}

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 1499

{"component":"entrypoint","file":"sigs.k8s.io/prow/pkg/entrypoint/run.go:267","func":"sigs.k8s.io/prow/pkg/entrypoint.gracefullyTerminate","level":"error","msg":"Process did not exit before 15s grace period","severity":"error","time":"2026-02-01T20:08:45Z"}

Analysis powered by prow-failure-analysis | Build: 2018020254394880000

eisraeli · 2026-02-02T10:36:15Z

/test appstudio-e2e-tests

konflux-ci-qe-bot · 2026-02-02T13:20:26Z

🤖 Pipeline Failure Analysis

Category: Timeout

The e2e tests for AppStudio failed due to a timeout, likely caused by underlying infrastructure instability including degraded ArgoCD applications and unready Tekton components.

📋 Technical Details

Immediate Cause

The appstudio-e2e-tests/redhat-appstudio-e2e step timed out, exceeding the maximum allowed execution time of 2 hours. No specific test failures were identified, indicating the entire test suite did not complete.

Contributing Factors

Multiple issues within the cluster environment likely contributed to the test execution delay and subsequent timeout. Specifically, the build-service-in-cluster-local Argo CD Application was in a Degraded state, and numerous other Applications managed by ApplicationSets were OutOfSync or Missing. Additionally, the Tekton Addon was not fully ready due to issues with tkn-cli-serve deployment, impacting OpenShift console resources. An active static-admission AdmissionCheck with a 15-minute retry delay may have also introduced delays. A large number of artifact analyses also failed, suggesting potential data integrity or collection issues within the environment.

Impact

The timeout prevented the successful completion of the AppStudio e2e tests. This means the deployment and functionality of the AppStudio components have not been validated for this specific PR, potentially allowing integration issues or regressions to go undetected.

🔍 Evidence

appstudio-e2e-tests/redhat-appstudio-e2e

Category: timeout
Root Cause: The e2e tests timed out because the process, likely the execution of the tests themselves or a dependency, exceeded the 2-hour time limit. This could be due to a slow test execution, resource constraints, or an underlying issue in the test environment or the application being tested.

Analysis powered by prow-failure-analysis | Build: 2018272292835954688

eisraeli · 2026-02-03T08:28:45Z

/test appstudio-e2e-tests

konflux-ci-qe-bot · 2026-02-03T11:15:08Z

🤖 Pipeline Failure Analysis

Category: Timeout

The end-to-end tests failed due to a timeout caused by the build-service-in-cluster-local application remaining in a degraded state, preventing synchronization of critical Kubernetes resources.

📋 Technical Details

Immediate Cause

The appstudio-e2e-tests/redhat-appstudio-e2e step timed out because the build-service-in-cluster-local Kubernetes application failed to synchronize and remained in a "Degraded" state. This directly blocked the progression of the e2e tests.

Contributing Factors

Several related issues contribute to this failure:

The build-service-controller-manager deployment within the build-service-in-cluster-local application is also reported as "Degraded".
Numerous Argo CD ApplicationSets, including application-api and build-service, are in an "OutOfSync" status, suggesting a systemic issue with Argo CD's ability to reconcile desired states.
The tkn-cli-serve deployment, a component of Tekton addons, is not ready, leading to an error state in the addon TektonAddon resource.

Impact

The failure of the build-service-in-cluster-local application to synchronize prevented the e2e tests from completing. This indicates a potential issue with the core AppStudio infrastructure deployment or configuration within the test environment, which is a critical blocker for validating the system's functionality.

🔍 Evidence

appstudio-e2e-tests/redhat-appstudio-e2e

Category: timeout
Root Cause: The end-to-end tests timed out because the Kubernetes resources, specifically the build-service-in-cluster-local application, remained in a "Degraded" state and did not synchronize within the allocated time. This indicates a potential issue with the deployment or configuration of these resources in the test environment.

Logs:

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/step.log line 595

{"component":"entrypoint","file":"sigs.k8s.io/prow/pkg/entrypoint/run.go:169","func":"sigs.k8s.io/prow/pkg/entrypoint.Options.ExecuteProcess","level":"error","msg":"Process did not finish before 2h0m0s timeout","severity":"error","time":"2026-02-03T10:40:03Z"}

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/step.log line 597

build-service-in-cluster-local                      Synced   Degraded

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/step.log line 609

build-service-in-cluster-local                      Synced   Degraded

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/step.log line 611

Waiting 10 seconds for application sync

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/step.log line 776

build-service-in-cluster-local                      Synced   Degraded

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/step.log line 778

Waiting 10 seconds for application sync

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/step.log line 1597

{"component":"entrypoint","file":"sigs.k8s.io/prow/pkg/entrypoint/run.go:267","func":"sigs.k8s.io/prow/pkg/entrypoint.gracefullyTerminate","level":"error","msg":"Process did not exit before 15s grace period","severity":"error","time":"2026-02-03T10:40:18Z"}

Analysis powered by prow-failure-analysis | Build: 2018602560797020160

konflux-ci-qe-bot · 2026-02-03T14:26:00Z

🤖 Pipeline Failure Analysis

Category: Timeout

The Prow job pull-ci-redhat-appstudio-infra-deployments-main-appstudio-e2e-tests failed due to a timeout because the end-to-end tests took longer than the allocated 2 hours to complete, likely due to underlying infrastructure issues.

📋 Technical Details

Immediate Cause

The appstudio-e2e-tests/redhat-appstudio-e2e step exceeded its 2-hour execution timeout. The process was terminated after failing to complete within the allotted time.

Contributing Factors

Several pieces of evidence suggest underlying infrastructure instability that may have caused the test slowness:

Multiple Argo CD ApplicationSets are in an 'OutOfSync' and 'Missing' health status, indicating deployment or synchronization problems for critical components like application-api, build-service, and others.
The Argo CD Application build-service-in-cluster-local is in a 'Degraded' state.
The tektonaddons.json and tektonconfigs.json artifacts indicate that Tekton Addons are not ready, specifically citing issues with "openshift console resources not yet ready" and "tkn-cli-serve deployment not ready". These issues could affect the proper functioning and responsiveness of the CI/CD environment.

Impact

The timeout prevented the completion of the end-to-end tests for this Prow job. This failure means that the Prow job could not validate the functionality and stability of the Red Hat AppStudio infrastructure in its current state, potentially allowing undetected issues to persist.

🔍 Evidence

appstudio-e2e-tests/redhat-appstudio-e2e

Category: timeout
Root Cause: The e2e tests exceeded the allocated timeout period, indicating a potential issue with test execution time or environment responsiveness.

Logs:

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 724

{"component":"entrypoint","file":"sigs.k8s.io/prow/pkg/entrypoint/run.go:169","func":"sigs.k8s.io/prow/pkg/entrypoint.Options.ExecuteProcess","level":"error","msg":"Process did not finish before 2h0m0s timeout","severity":"error","time":"2026-02-03T13:53:34Z"}

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 727

{"component":"entrypoint","file":"sigs.k8s.io/prow/pkg/entrypoint/run.go:267","func":"sigs.k8s.io/prow/pkg/entrypoint.gracefullyTerminate","level":"error","msg":"Process did not exit before 15s grace period","severity":"error","time":"2026-02-03T13:53:49Z"}

Analysis powered by prow-failure-analysis | Build: 2018650650786664448

openshift-ci · 2026-02-03T14:26:06Z

@eisraeli: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/appstudio-e2e-tests	`f199097`	link	true	`/test appstudio-e2e-tests`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci bot requested review from enarha and ramessesii2 January 6, 2026 21:04

eisraeli marked this pull request as draft January 6, 2026 21:04

openshift-ci bot added the do-not-merge/work-in-progress label Jan 6, 2026

eisraeli force-pushed the KFLUXINFRA-2752 branch 2 times, most recently from d44e6fd to da1771c Compare January 6, 2026 22:09

eisraeli changed the title ~~KFLUXINFRA-2752: Add scc-rbac.yaml resource to kustomization files for staging environments~~ KFLUXINFRA-2752: Centralize SCC RBAC for tekton-pipelines-controller Jan 6, 2026

eisraeli force-pushed the KFLUXINFRA-2752 branch from da1771c to 3ff3863 Compare January 6, 2026 22:10

eisraeli removed the do-not-merge/work-in-progress label Jan 6, 2026

eisraeli marked this pull request as ready for review January 6, 2026 22:12

openshift-ci bot requested review from aThorp96 and mathur07 January 6, 2026 22:12

eisraeli requested review from enkeefe00, filariow, gbenhaim, hugares and manish-jangra January 6, 2026 22:13

redhat-appstudio deleted a comment from qijing20250606-maker Jan 11, 2026

openshift-ci bot removed the lgtm label Jan 22, 2026

eisraeli requested review from enarha and jinqi7 January 22, 2026 12:37

eisraeli assigned adambkaplan Jan 26, 2026

openshift-ci bot added the do-not-merge/hold label Jan 26, 2026

openshift-ci bot added the approved label Jan 26, 2026

eisraeli removed the do-not-merge/hold label Jan 28, 2026

Merge branch 'main' into KFLUXINFRA-2752

b289fd6

eisraeli added 2 commits January 29, 2026 17:13

Merge branch 'main' into KFLUXINFRA-2752

2df5e4d

Merge branch 'main' into KFLUXINFRA-2752

28cbcb4

Merge branch 'main' into KFLUXINFRA-2752

f69bdee

Merge branch 'main' into KFLUXINFRA-2752

2a86686

Merge branch 'main' into KFLUXINFRA-2752

f199097

KFLUXINFRA-2752: Centralize SCC RBAC for tekton-pipelines-controller #9865

Are you sure you want to change the base?

KFLUXINFRA-2752: Centralize SCC RBAC for tekton-pipelines-controller #9865

Uh oh!

Conversation

eisraeli commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 6, 2026

🤖 Gemini AI Assistant Available

Available Commands

How to Use

Permissions

Uh oh!

github-actions bot commented Jan 6, 2026

Uh oh!

gbenhaim commented Jan 6, 2026

Uh oh!

mmalina commented Jan 7, 2026

Uh oh!

jinqi7 commented Jan 7, 2026

Uh oh!

jinqi7 commented Jan 7, 2026

Uh oh!

eisraeli commented Jan 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gbenhaim commented Jan 11, 2026

Uh oh!

jinqi7 commented Jan 11, 2026

Uh oh!

gbenhaim commented Jan 12, 2026

Uh oh!

jinqi7 commented Jan 12, 2026

Uh oh!

johnbieren commented Jan 12, 2026

Uh oh!

openshift-ci bot commented Jan 22, 2026

Uh oh!

eisraeli commented Jan 22, 2026

Uh oh!

enarha commented Jan 26, 2026

Uh oh!

eisraeli commented Jan 26, 2026

Uh oh!

gbenhaim commented Jan 26, 2026

Uh oh!

openshift-ci bot commented Jan 26, 2026

Uh oh!

konflux-ci-qe-bot commented Jan 28, 2026

🤖 Pipeline Failure Analysis

📋 Technical Details

Immediate Cause

Contributing Factors

Impact

appstudio-e2e-tests/redhat-appstudio-e2e

Uh oh!

konflux-ci-qe-bot commented Feb 1, 2026

🤖 Pipeline Failure Analysis

📋 Technical Details

Immediate Cause

Contributing Factors

Impact

appstudio-e2e-tests/redhat-appstudio-e2e

Uh oh!

eisraeli commented Feb 1, 2026

Uh oh!

konflux-ci-qe-bot commented Feb 1, 2026

🤖 Pipeline Failure Analysis

📋 Technical Details

Immediate Cause

Contributing Factors

Impact

appstudio-e2e-tests/gather-audit-logs

appstudio-e2e-tests/gather-extra

appstudio-e2e-tests/gather-must-gather

appstudio-e2e-tests/redhat-appstudio-e2e

appstudio-e2e-tests/redhat-appstudio-gather

Uh oh!

konflux-ci-qe-bot commented Feb 1, 2026

eisraeli commented Jan 6, 2026 •

edited

Loading

eisraeli commented Jan 11, 2026 •

edited

Loading