Skip to content

Conversation

venkataanil
Copy link
Contributor

When public vlan is true, user can't override the value for controlplane_network_gateway. In prow, we want to use the bastion host as the default gateway instead of vlan's default gateway for some workloads. This allows user to specify the default gateway for both the public and private networks. If the user not defines it, then jetlag defines it using the existing mechanism.

When public vlan is true, user can't override the value for
controlplane_network_gateway. In prow, we want to use the bastion
host as the default gateway instead of vlan's default gateway
for some workloads. This allows user to specify the default gateway
for both the public and private networks. If the user not defines
it, then jetlag defines it using the existing mechanism.

Signed-off-by: venkataanil <[email protected]>
Copy link

openshift-ci bot commented Sep 4, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign josecastillolema for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@akrzos
Copy link
Member

akrzos commented Sep 4, 2025

When public vlan is true, user can't override the value for controlplane_network_gateway. In prow, we want to use the bastion host as the default gateway instead of vlan's default gateway for some workloads. This allows user to specify the default gateway for both the public and private networks. If the user not defines it, then jetlag defines it using the existing mechanism.

I believe we discussed this in a google doc but this is already possible without using public_vlan. There is a short snippet on the public_vlan variable and its intent in the all.sample.yml that I will include here:

...
# Set to true ONLY if you have a public routable vlan in your scalelab or performancelab cloud.
# Autoconfigures cluster_name, base_dns_name, controlplane_network_interface_idx, controlplane_network,
# controlplane_network_prefix, and controlplane_network_gateway to the values required for your cloud's public VLAN.
# SNO configures only the first cluster on the api dns resolvable address
# MNO/SNO still requires the correct value for bastion_controlplane_interface
public_vlan: false
...

As mentioned, you can use public_vlan to autoconfigure many variables for you instead of configuring them yourself. That means you can absolutely run a public routable network and decidedly configure it however you please without setting public_vlan to true. To reiterate, you do not have to set public_vlan just because you have a public routable vlan. I believe the custom goal you have for usage with a public_vlan is outside the idea or goal of the original vars intent and makes the expected behavior unclear if we add these changes.

@venkataanil
Copy link
Contributor Author

thanks @akrzos
We originally wanted all openshift network related workloads (BGP, external traffic, egress IP, any future workloads) in CI to go with public_vlan=false as

  • we don't want the cluster traffic (tarffic from pods to external) to reach the lab infrstatructure routers
  • it gives flexibility for the test to create networks and routes at scale wihout impacting (or messing up) lab infrastructure
  • all our manual testing in scale/alias lab is done with public_vlan=false

However this is introducing lot of additional work for @josecastillolema in matainaing the jobs with and wihtout vlans. So he is mandating public vlans unless the test doesn't support it. We can adapt his suggestion and use private network only if it is necessary.

So this PR is required if we want to go with public vlan for our network workloads.

@mohit-sheth fyi

Copy link
Member

@akrzos akrzos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets try the adjustment I suggested and run a few CI runs to see nothing else break so we can get this merged.

@@ -9,7 +9,6 @@ controlplane_network_interface_idx: 0
controlplane_network_interface: eth0
controlplane_network: 198.18.0.0/16
controlplane_network_prefix: 16
controlplane_network_gateway: "{{ controlplane_network | ansible.utils.nthhost(1) }}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately removing this when the file is widely symlinked between many roles will result in potential breakage in at least 2 roles

  • bastion-gogs
  • bastion-disconnected-haproxy

Since those roles have vars that depend on controlplane_network_gateway. You might ask why do these roles "import" by symlink the variables in ansible/roles/create-inventory/defaults/main/networks.yml, well the reason is a long time ago when jetlag was first being developed we attempted to have a single vars file (all.yml) and for it to have every possible var required therefore all vars were global, however the length of all.yml became very large and folks would muck with vars that rarely needed to be adjusted or changed. There was also very few variables that needed to be modified per deployment for a basic deployment. The end result was we could "localize" vars more into each role but this meant repeating many variables across different roles. We left the most needed variables in all.yml and put a "spot" on the bottom of the file to override variables that had been "localized" to the role. The solution we landed on at the time was to make "related" vars files that could be symlinked into roles to reduce repetition of vars and ideally improve maintainability. The symlinked vars files have both pros and cons, as a pro it means we have a single file to modify for a variable's existent, and a con is that when it comes time to modify that variable, it can have a much more significant effect than intended if it is not well understood what consumes this variable when it is symlinked. If we stuck to repeating each variable in each role's default variables file, then we would end up have many spots to potential modify for a new default variable value. As you can see both of these solutions have pros and cons. To this day, no one has taken on the task to attempting to improve the vars maintenance further. This means there can be instances where it seems like the appropriate solution is to remove a variable from a vars file however it will effect many more roles than you may have intended.

That being said, one potential solution here would be to change which var is referenced here:

From gogs_host: "{{ controlplane_network_gateway }}" to gogs_host: "{{ bastion_controlplane_ip }}"

I think that would solve the dependency issue we see here however I do not have the extensive means to test this at this time. Another potential solution would be adding yet another special variable just for your use case of using public_vlan for auto configuration but running a custom controlplane_network_gateway which I would prefer not to add yet another variable just for a single edge case.

Copy link
Contributor Author

@venkataanil venkataanil Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately removing this when the file is widely symlinked between many roles will result in potential breakage in at least 2 roles

  • bastion-gogs
  • bastion-disconnected-haproxy

Since those roles have vars that depend on controlplane_network_gateway. You might ask why do these roles "import" by symlink the variables in ansible/roles/create-inventory/defaults/main/networks.yml, well the reason is a long time ago when jetlag was first being developed we attempted to have a single vars file (all.yml) and for it to have every possible var required therefore all vars were global, however the length of all.yml became very large and folks would muck with vars that rarely needed to be adjusted or changed. There was also very few variables that needed to be modified per deployment for a basic deployment. The end result was we could "localize" vars more into each role but this meant repeating many variables across different roles. We left the most needed variables in all.yml and put a "spot" on the bottom of the file to override variables that had been "localized" to the role. The solution we landed on at the time was to make "related" vars files that could be symlinked into roles to reduce repetition of vars and ideally improve maintainability. The symlinked vars files have both pros and cons, as a pro it means we have a single file to modify for a variable's existent, and a con is that when it comes time to modify that variable, it can have a much more significant effect than intended if it is not well understood what consumes this variable when it is symlinked. If we stuck to repeating each variable in each role's default variables file, then we would end up have many spots to potential modify for a new default variable value. As you can see both of these solutions have pros and cons. To this day, no one has taken on the task to attempting to improve the vars maintenance further. This means there can be instances where it seems like the appropriate solution is to remove a variable from a vars file however it will effect many more roles than you may have intended.

That being said, one potential solution here would be to change which var is referenced here:

From gogs_host: "{{ controlplane_network_gateway }}" to gogs_host: "{{ bastion_controlplane_ip }}"

Agree.

I think that would solve the dependency issue we see here however I do not have the extensive means to test this at this time.

We can wait for this change until someone can verify "gogs" roles. Is it possible to request any folks who use "gogs" role to verify later when they get thechance.

Another potential solution would be adding yet another special variable just for your use case of using public_vlan for auto configuration but running a custom controlplane_network_gateway which I would prefer not to add yet another variable just for a single edge case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can wait for this change until someone can verify "gogs" roles. Is it possible to request any folks who use "gogs" role to verify later when they get thechance.

I'm not sure I understand, you can or can't wait? I suggested a fix after reviewing the code and following the vars, did you want to implement it or keep the patch in limbo?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants