Skip to content

Conversation

@RyanRosario
Copy link

@RyanRosario RyanRosario commented Nov 20, 2025

What type of PR is this?

kind/cleanup

What this PR does / why we need it:

Adds an E2E test for multi-port enhancement. Currently verifyTrafficRouting is implemented, verifyMetrics to follow.

Which issue(s) this PR fixes:

Fixes #1768

Does this PR introduce a user-facing change?:

NONE


@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Nov 20, 2025
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Nov 20, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@netlify
Copy link

netlify bot commented Nov 20, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit c2bd9f3
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/6926f614d26c4300084d73c7
😎 Deploy Preview https://deploy-preview-1885--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: RyanRosario
Once this PR has been reviewed and has the lgtm label, please assign danehans for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 20, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @RyanRosario. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 20, 2025
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Nov 20, 2025
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 24, 2025
@RyanRosario RyanRosario changed the title [WIP] Add e2e test for multiport InferencePool enhancement Add e2e test for multiport InferencePool enhancement Nov 25, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 25, 2025
@RyanRosario
Copy link
Author

Hey @danehans and @nirrozenbaum , my first PR is ready for review.

@nirrozenbaum
Copy link
Contributor

nirrozenbaum commented Nov 25, 2025

/ok-to-test

Thanks @RyanRosario. seems like your PR needs a rebase.
it would be good to solve conflicts in order to see if the tests are passing.

additionally - please pay attention that your commits are not verified and if the PR is ready for review it would be good to remove the /hold to let others know this is ready.

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 25, 2025
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 25, 2025
@RyanRosario
Copy link
Author

/retest

@k8s-ci-robot
Copy link
Contributor

@RyanRosario: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-gateway-api-inference-extension-test-unit-main c2bd9f3 link true /test pull-gateway-api-inference-extension-test-unit-main

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@RyanRosario
Copy link
Author

Thank you for your patience!

The failing test seems to be related to issue 1872. Can we continue with review or should 1872 be resolved first?

const (
// defaultCurlTimeout is the default timeout for the curl command to get a response.
defaultCurlTimeout = 30 * time.Second
defaultCurlTimeout = 120 * time.Second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 min for curl timeout is a bit long.
why do we need to increase the 30 sec timeout?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right, though 30 was too short. When generating traffic in verifyMetrics I was getting timeouts (curl 28). I implemented retries but was still getting timeouts with 30s -- only increasing the timeout worked. I am not confident that increasing the number of retries alone would resolve it. I am open to suggestions.

fi
fi

kubectl delete crd inferencepools.inference.networking.k8s.io --ignore-not-found
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this script deletes the CRD?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be a timing issue in my local environment. Aborting or failing tests (moreso the former) followed by immediate re-execution sees an InferencePool that hasn't been torn down yet. I would imagine other devs experience this as well so I added it, but I can remove it if not a good practice.

@nirrozenbaum
Copy link
Contributor

Thank you for your patience!

The failing test seems to be related to issue 1872. Can we continue with review or should 1872 be resolved first?

failing test isn't blocking the review but it is blocking the merge.
if this is failing due to a flake, triggering a /retest should solve it (eventually).
if it's failing consistently, we might have a hidden issue here.

if !cmp.Equal(got, expected, cmpopts.SortSlices(func(a, b string) bool { return a < b })) {
return fmt.Errorf("actual (%v) != expected (%v); resp=%q", got, expected, resp)
if !cmp.Equal(gotPort, expectedPort, cmpopts.SortSlices(func(a, b int) bool { return a < b })) {
return fmt.Errorf("collecting ports... have %v, want %v", gotPort, expectedPort)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: restore original message

// Recommended: 3 retries with backoff
const maxRetries = 3
const backoff = 1 * time.Second

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding to top declaratiom

TTY: false,
}, parameterCodec)

fmt.Printf("Executing command in pod %s/%s: %v\n", testConfig.NsName, podName, cmd)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this.

// ExecCommandInPod runs a command in a given container of a given Pod, returning combined stdout+stderr.
func ExecCommandInPod(testConfig *TestConfig, podName, containerName string, cmd []string) (string, error) {
parameterCodec := runtime.NewParameterCodec(testConfig.Scheme)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding retry logic here.


var err error
// RETRY LOOP
for attempt := 0; attempt <= maxRetries; attempt++ {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider moving the retry logic into execCommandInPod

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update E2E tests to include multiport case

3 participants