[KFLUXINFRA-2805] Enable opentelemetry log collection on MPC Vm's on prod by mshaposhnik · Pull Request #10388 · redhat-appstudio/infra-deployments

mshaposhnik · 2026-02-05T08:37:17Z

Turn on MPC log collection on production.
Fixes: https://issues.redhat.com/browse/KFLUXINFRA-2805

Signed-off-by: Max Shaposhnyk <mshaposh@redhat.com>

openshift-ci · 2026-02-05T08:37:23Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mshaposhnik

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~components/multi-platform-controller/OWNERS~~ [mshaposhnik]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

github-actions · 2026-02-05T08:37:28Z

🤖 Gemini AI Assistant Available

Hi @mshaposhnik! I'm here to help with your pull request. You can interact with me using the following commands:

Available Commands

@gemini-cli /review - Request a comprehensive code review
- Example: @gemini-cli /review Please focus on security and performance
@gemini-cli <your question> - Ask me anything about the codebase
- Example: @gemini-cli How can I improve this function?
- Example: @gemini-cli What are the best practices for error handling here?

How to Use

Simply type one of the commands above in a comment on this PR
I'll analyze your code and provide detailed feedback
You can track my progress in the workflow logs

Permissions

Only OWNER, MEMBER, or COLLABORATOR users can trigger my responses. This ensures secure and appropriate usage.

This message was automatically added to help you get started with the Gemini AI assistant. Feel free to delete this comment if you don't need assistance.

github-actions · 2026-02-05T08:37:29Z

🤖 Hi @mshaposhnik, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

Signed-off-by: Max Shaposhnyk <mshaposh@redhat.com>

konflux-ci-qe-bot · 2026-02-05T09:11:44Z

🤖 Pipeline Failure Analysis

Category: Infrastructure

The E2E tests failed because the cluster's Kubernetes API server was unreachable due to DNS resolution errors and network timeouts, preventing essential diagnostic and testing steps from executing.

📋 Technical Details

Immediate Cause

Multiple steps, including gather-audit-logs, gather-extra, gather-must-gather, and redhat-appstudio-gather, failed because they could not resolve the hostname of the Kubernetes API server (api.konflux-4-17-us-west-2-jrr82.konflux-qe.devcluster.openshift.com). This is evidenced by repeated "dial tcp: lookup ... on 172.30.0.10:53: no such host" errors in the logs. Additionally, the gather-must-gather step experienced "i/o timeout" errors when attempting to connect to cluster APIs.

Contributing Factors

The appstudio-e2e-tests/redhat-appstudio-e2e step itself terminated unexpectedly with a "Terminated" signal, followed by a "Process did not exit before 15s grace period" and "Could not kill process after grace period" messages. This termination is likely a downstream effect of the underlying network and DNS issues that prevented communication with the cluster, rather than an independent failure of the test execution logic. The supplemental context also points to network-related errors and potential unreachability of the cluster API endpoint.

Impact

The inability to resolve the cluster API hostname and subsequent network timeouts prevented essential diagnostic tools (like must-gather and oc) from collecting necessary information and establishing a connection to the cluster. This fundamental infrastructure problem directly blocked the execution of the main E2E test suite (redhat-appstudio-e2e) and the collection of further diagnostic data, rendering the entire test run ineffective.

🔍 Evidence

appstudio-e2e-tests/gather-audit-logs

Category: infrastructure
Root Cause: The must-gather tool failed to resolve the hostname of the Kubernetes API server, indicating a DNS resolution problem within the cluster's network.

Logs:

artifacts/appstudio-e2e-tests/gather-audit-logs/run.log line 4

[must-gather      ] OUT 2026-02-05T09:08:09.285583307Z Get "https://api.konflux-4-17-us-west-2-jrr82.konflux-qe.devcluster.openshift.com:6443/apis/image.openshift.io/v1/namespaces/openshift/imagestreams/must-gather": dial tcp: lookup api.konflux-4-17-us-west-2-jrr82.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

artifacts/appstudio-e2e-tests/gather-audit-logs/run.log line 12

error getting cluster version: Get "https://api.konflux-4-17-us-west-2-jrr82.konflux-qe.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusterversions/version": dial tcp: lookup api.konflux-4-17-us-west-2-jrr82.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

artifacts/appstudio-e2e-tests/gather-audit-logs/run.log line 17

error getting cluster operators: Get "https://api.konflux-4-17-us-west-2-jrr82.konflux-qe.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusteroperators": dial tcp: lookup api.konflux-4-17-us-west-2-jrr82.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

artifacts/appstudio-e2e-tests/gather-audit-logs/run.log line 25

Error running must-gather collection:
    creating temp namespace: Post "https://api.konflux-4-17-us-west-2-jrr82.konflux-qe.devcluster.openshift.com:6443/api/v1/namespaces": dial tcp: lookup api.konflux-4-17-us-west-2-jrr82.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

artifacts/appstudio-e2e-tests/gather-audit-logs/run.log line 60

error running backup collection: Get "https://api.konflux-4-17-us-west-2-jrr82.konflux-qe.devcluster.openshift.com:6443/api?timeout=32s": dial tcp: lookup api.konflux-4-17-us-west-2-jrr82.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

appstudio-e2e-tests/gather-extra

Category: infrastructure
Root Cause: The failure is caused by a DNS resolution issue, where the cluster's API server hostname cannot be resolved by the configured DNS server. This prevents the step from connecting to the cluster to gather artifacts.

Logs:

artifacts/appstudio-e2e-tests/gather-extra/gather-extra.log line 3

E0205 09:08:03.210645      29 memcache.go:265] couldn't get current server API group list: Get "https://api.konflux-4-17-us-west-2-jrr82.konflux-qe.devcluster.openshift.com:6443/api?timeout=5s": dial tcp: lookup api.konflux-4-17-us-west-2-jrr82.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

artifacts/appstudio-e2e-tests/gather-extra/gather-extra.log line 10

Unable to connect to the server: dial tcp: lookup api.konflux-4-17-us-west-2-jrr82.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

appstudio-e2e-tests/gather-must-gather

Category: infrastructure
Root Cause: The must-gather tool failed to connect to the OpenShift API server due to network timeouts and DNS resolution errors, indicating an issue with cluster accessibility or network configuration.

Logs:

artifacts/appstudio-e2e-tests/gather-must-gather/step.log line 18

Error running must-gather collection:
    creating temp namespace: Post "https://api.konflux-4-17-us-west-2-jrr82.konflux-qe.devcluster.openshift.com:6443/api/v1/namespaces": dial tcp [REDACTED: Public IP (ipv4)]: i/o timeout

artifacts/appstudio-e2e-tests/gather-must-gather/step.log line 36

E0205 09:07:50.448054      53 memcache.go:265] couldn't get current server API group list: Get "https://api.konflux-4-17-us-west-2-jrr82.konflux-qe.devcluster.openshift.com:6443/api?timeout=32s": dial tcp: lookup api.konflux-4-17-us-west-2-jrr82.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

artifacts/appstudio-e2e-tests/gather-must-gather/step.log line 45

error running backup collection: Get "https://api.konflux-4-17-us-west-2-jrr82.konflux-qe.devcluster.openshift.com:6443/api?timeout=32s": dial tcp: lookup api.konflux-4-17-us-west-2-jrr82.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

artifacts/appstudio-e2e-tests/gather-must-gather/step.log line 46

error: creating temp namespace: Post "https://api.konflux-4-17-us-west-2-jrr82.konflux-qe.devcluster.openshift.com:6443/api/v1/namespaces": dial tcp [REDACTED: Public IP (ipv4)]: i/o timeout

appstudio-e2e-tests/redhat-appstudio-e2e

Category: infrastructure
Root Cause: The e2e tests failed due to an unexpected termination of the main process, likely caused by an external signal or a problem within the Kubernetes cluster environment that led to the process being killed after a grace period.

Logs:

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build.log line 650

{"component":"entrypoint","file":"sigs.k8s.io/prow/pkg/entrypoint/run.go:173","func":"sigs.k8s.io/prow/pkg/entrypoint.Options.ExecuteProcess","level":"error","msg":"Entrypoint received interrupt: terminated","severity":"error","time":"2026-02-05T09:02:37Z"}

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build.log line 651

make: *** [Makefile:25: ci/test/e2e] Terminated

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build.log line 653

{"component":"entrypoint","file":"sigs.k8s.io/prow/pkg/entrypoint/run.go:267","func":"sigs.k8s.io/prow/pkg/entrypoint.gracefullyTerminate","level":"error","msg":"Process did not exit before 15s grace period","severity":"error","time":"2026-02-05T09:02:52Z"}

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build.log line 654

{"component":"entrypoint","error":"os: process already finished","file":"sigs.k8s.io/prow/pkg/entrypoint/run.go:269","func":"sigs.k8s.io/prow/pkg/entrypoint.gracefullyTerminate","level":"error","msg":"Could not kill process after grace period","severity":"error","time":"2026-02-05T09:02:52Z"}

appstudio-e2e-tests/redhat-appstudio-gather

Category: infrastructure
Root Cause: The oc client is unable to resolve the hostname of the Kubernetes API server, indicating a DNS resolution issue within the cluster network or environment. This prevents the necessary oc commands from executing successfully.

Logs:

artifacts/appstudio-e2e-tests__redhat-appstudio-gather/oc-logs.txt

E0205 09:08:17.394477      45 memcache.go:265] couldn't get current server API group list: Get "https://api.konflux-4-17-us-west-2-jrr82.konflux-qe.devcluster.openshift.com:6443/api?timeout=5s": dial tcp: lookup api.konflux-4-17-us-west-2-jrr82.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

artifacts/appstudio-e2e-tests__redhat-appstudio-gather/oc-logs.txt

Unable to connect to the server: dial tcp: lookup api.konflux-4-17-us-west-2-jrr82.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

artifacts/appstudio-e2e-tests__redhat-appstudio-gather/oc-logs.txt

Error running must-gather collection:
    creating temp namespace: Post "https://api.konflux-4-17-us-west-2-jrr82.konflux-qe.devcluster.openshift.com:6443/api/v1/namespaces": dial tcp: lookup api.konflux-4-17-us-west-2-jrr82.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

artifacts/appstudio-e2e-tests__redhat-appstudio-gather/oc-logs.txt

error running backup collection: Get "https://api.konflux-4-17-us-west-2-jrr82.konflux-qe.devcluster.openshift.com:6443/api?timeout=32s": dial tcp: lookup api.konflux-4-17-us-west-2-jrr82.konflux-qe.devcluster.openshift.com on 172.30.0.10:53: no such host

Analysis powered by prow-failure-analysis | Build: 2019329481784692736

konflux-ci-qe-bot · 2026-02-05T11:43:27Z

🤖 Pipeline Failure Analysis

Category: Timeout

The end-to-end tests for AppStudio infrastructure failed due to a timeout caused by prolonged setup and synchronization of Kubernetes resources, including Argo CD.

📋 Technical Details

Immediate Cause

The appstudio-e2e-tests/redhat-appstudio-e2e step in the Prow job exceeded its 2-hour time limit. This timeout prevented the completion of the end-to-end test execution.

Contributing Factors

Several factors indicated in the additional_context may have contributed to the extended setup and synchronization times:

Multiple Argo CD ApplicationSets, such as 'application-api', 'build-service', and 'crossplane-control-plane', were in an OutOfSync state.
The 'build-service-controller-manager' deployment in the 'build-service' namespace was Degraded due to exceeding its progress deadline.
The tektonaddons reported an InstallerSetReady status of False due to issues with the 'tkn-cli-serve' deployment.

Impact

The timeout of the e2e test execution step prevented the successful validation of the AppStudio infrastructure deployment in the test environment, thereby blocking the CI pipeline.

🔍 Evidence

appstudio-e2e-tests/redhat-appstudio-e2e

Category: timeout
Root Cause: The e2e tests timed out because the setup and synchronization processes for various Kubernetes resources and operators, such as Argo CD, took longer than the allocated time limit for the test execution.

Logs:

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/step.log line 1279

{"component":"entrypoint","file":"sigs.k8s.io/prow/pkg/entrypoint/run.go:169","func":"sigs.k8s.io/prow/pkg/entrypoint.Options.ExecuteProcess","level":"error","msg":"Process did not finish before 2h0m0s timeout","severity":"error","time":"2026-02-05T11:05:47Z"}

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/step.log line 1281

{"component":"entrypoint","file":"sigs.k8s.io/prow/pkg/entrypoint/run.go:267","func":"sigs.k8s.io/prow/pkg/entrypoint.gracefullyTerminate","level":"error","msg":"Process did not exit before 15s grace period","severity":"error","time":"2026-02-05T11:06:02Z"}

Analysis powered by prow-failure-analysis | Build: 2019335848314540032

mshaposhnik · 2026-02-05T11:57:27Z

/test appstudio-e2e-tests

konflux-ci-qe-bot · 2026-02-05T14:37:37Z

🤖 Pipeline Failure Analysis

Category: Timeout

The Prow job timed out during the execution of end-to-end tests for AppStudio.

📋 Technical Details

Immediate Cause

The appstudio-e2e-tests/redhat-appstudio-e2e step of the Prow job pull-ci-redhat-appstudio-infra-deployments-main-appstudio-e2e-tests failed due to a timeout. The job exceeded its 2-hour time limit, leading to termination.

Contributing Factors

Analysis of the additional_context indicates potential underlying infrastructure issues that may have contributed to the prolonged test execution. Specifically, the build-service-controller-manager deployment was in a 'Degraded' state, and several Argo CD ApplicationSets reported errors and out-of-sync resources. These issues could have slowed down the deployment or setup phases of the e2e tests, preventing them from completing within the allotted time. Additionally, the tektonaddons.json artifact shows that TektonAddon components were not ready due to issues with openshift console resources and the 'tkn-cli-serve' deployment.

Impact

The timeout prevented the completion of the end-to-end test suite for the AppStudio infrastructure. This means that the tests designed to validate the functionality and stability of the AppStudio deployment were not executed successfully, leaving the overall health of the deployment unverified for this build.

🔍 Evidence

appstudio-e2e-tests/redhat-appstudio-e2e

Category: timeout
Root Cause: The e2e tests timed out, likely due to a prolonged execution of setup or deployment steps, or an actual test scenario that took too long to complete. The Process did not finish before 2h0m0s timeout and Process did not exit before 15s grace period messages indicate the job was terminated due to exceeding its time limit.

Logs:

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 1384

{"component":"entrypoint","file":"sigs.k8s.io/prow/pkg/entrypoint/run.go:169","func":"sigs.k8s.io/prow/pkg/entrypoint.Options.ExecuteProcess","level":"error","msg":"Process did not finish before 2h0m0s timeout","severity":"error","time":"2026-02-05T14:00:38Z"}

artifacts/appstudio-e2e-tests/redhat-appstudio-e2e/build-log.txt line 1386

{"component":"entrypoint","file":"sigs.k8s.io/prow/pkg/entrypoint/run.go:267","func":"sigs.k8s.io/prow/pkg/entrypoint.gracefullyTerminate","level":"error","msg":"Process did not exit before 15s grace period","severity":"error","time":"2026-02-05T14:00:53Z"}

Analysis powered by prow-failure-analysis | Build: 2019379860396314624

openshift-ci · 2026-02-05T14:37:48Z

@mshaposhnik: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/appstudio-e2e-tests	`f57d469`	link	true	`/test appstudio-e2e-tests`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

mshaposhnik added 2 commits February 5, 2026 10:35

Enable opentelemetry log collection on MPC Vm's on prod

57cec7f

Signed-off-by: Max Shaposhnyk <mshaposh@redhat.com>

Enable opentelemetry log collection on MPC Vm's on prod

0bd2d3f

Signed-off-by: Max Shaposhnyk <mshaposh@redhat.com>

openshift-ci bot requested review from hugares and meyrevived February 5, 2026 08:37

openshift-ci bot added the approved label Feb 5, 2026

mshaposhnik added the do-not-merge/hold label Feb 5, 2026

Fixup!

f57d469

Signed-off-by: Max Shaposhnyk <mshaposh@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[KFLUXINFRA-2805] Enable opentelemetry log collection on MPC Vm's on prod#10388

[KFLUXINFRA-2805] Enable opentelemetry log collection on MPC Vm's on prod#10388
mshaposhnik wants to merge 3 commits intoredhat-appstudio:mainfrom
mshaposhnik:logcollector_prod

mshaposhnik commented Feb 5, 2026

Uh oh!

openshift-ci bot commented Feb 5, 2026

Uh oh!

github-actions bot commented Feb 5, 2026

Uh oh!

github-actions bot commented Feb 5, 2026

Uh oh!

konflux-ci-qe-bot commented Feb 5, 2026

appstudio-e2e-tests/gather-audit-logs

appstudio-e2e-tests/gather-extra

appstudio-e2e-tests/gather-must-gather

appstudio-e2e-tests/redhat-appstudio-e2e

appstudio-e2e-tests/redhat-appstudio-gather

Uh oh!

konflux-ci-qe-bot commented Feb 5, 2026

appstudio-e2e-tests/redhat-appstudio-e2e

Uh oh!

mshaposhnik commented Feb 5, 2026

Uh oh!

konflux-ci-qe-bot commented Feb 5, 2026

appstudio-e2e-tests/redhat-appstudio-e2e

Uh oh!

openshift-ci bot commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mshaposhnik commented Feb 5, 2026

Uh oh!

openshift-ci bot commented Feb 5, 2026

Uh oh!

github-actions bot commented Feb 5, 2026

🤖 Gemini AI Assistant Available

Available Commands

How to Use

Permissions

Uh oh!

github-actions bot commented Feb 5, 2026

Uh oh!

konflux-ci-qe-bot commented Feb 5, 2026

🤖 Pipeline Failure Analysis

📋 Technical Details

Immediate Cause

Contributing Factors

Impact

appstudio-e2e-tests/gather-audit-logs

appstudio-e2e-tests/gather-extra

appstudio-e2e-tests/gather-must-gather

appstudio-e2e-tests/redhat-appstudio-e2e

appstudio-e2e-tests/redhat-appstudio-gather

Uh oh!

konflux-ci-qe-bot commented Feb 5, 2026

🤖 Pipeline Failure Analysis

📋 Technical Details

Immediate Cause

Contributing Factors

Impact

appstudio-e2e-tests/redhat-appstudio-e2e

Uh oh!

mshaposhnik commented Feb 5, 2026

Uh oh!

konflux-ci-qe-bot commented Feb 5, 2026

🤖 Pipeline Failure Analysis

📋 Technical Details

Immediate Cause

Contributing Factors

Impact

appstudio-e2e-tests/redhat-appstudio-e2e

Uh oh!

openshift-ci bot commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants