No. of CPU Cores in Compute Cluster doesnt display correctly when Cluster has oversubscription more than 1 #9693

btzq · 2024-09-17T14:38:58Z

ISSUE TYPE

Bug Report

COMPONENT NAME

Compute Cluster UI

CLOUDSTACK VERSION

4.19.1.1

CONFIGURATION

OS / ENVIRONMENT

SUMMARY

We have 2 Compute Clusters:

Cluster 1 : Oversubcription = 1
Cluster 2 : Oversubcription = 2

Refering to the screenshot below, Cluster 1 is showing the correct No. of CPU Cores allocated.

But for Cluster 2, it seems inaccurate. The number of allocation exceeded whats available. Resulting in a UI error.

But the bigger problem now, is that both clusters have exceeded the allocation threshold, but there was no notification sent. And neither did cloudstack stop users from creating new virtual machines from the clusters.

Without this, Admins would not be able to ensure n+1 sufficient capacity in the event of a node failure.

In summary, there are 3 Issues:

No. Of Allocated CPU Cores should display the full available amount, after oversubscription
Admin should get Notification if exceeded the global setting threshold (cluster.cpu.allocated.capacity.notificationthreshold)
User should not be able to create resources into the cluster after exceeding global setting threshold (cluster.cpu.allocated.capacity.disablethreshold)

STEPS TO REPRODUCE

EXPECTED RESULTS

- No. Of Allocated CPU Cores should display the full available amount, after oversubscription
- Admin should get Notification if exceeded the global setting threshold (cluster.cpu.allocated.capacity.notificationthreshold)
- User should not be able to create resources into the cluster after exceeding global setting threshold (cluster.cpu.allocated.capacity.disablethreshold)

ACTUAL RESULTS

- No. Of Allocated CPU Cores should not displaying total available cores correctly, after oversubscription.
- Admin did not get Notification if exceeded the global setting threshold (cluster.cpu.allocated.capacity.notificationthreshold)
- User is able to create resources into the cluster after exceeding global setting threshold (cluster.cpu.allocated.capacity.disablethreshold)

The text was updated successfully, but these errors were encountered:

weizhouapache · 2024-09-17T14:44:08Z

@btzq
the # of CPU cores does not take the overprovisioning factor into consideration.

btzq · 2024-09-17T14:47:09Z

@weizhouapache i see, is this expected?

And if it doesnt take overprovisioning into account, does this mean the global settings will not work as intended for clusters with overprovisioning?

(cluster.cpu.allocated.capacity.notificationthreshold)
(cluster.cpu.allocated.capacity.disablethreshold)

weizhouapache · 2024-09-17T17:00:05Z

@weizhouapache i see, is this expected?

And if it doesnt take overprovisioning into account, does this mean the global settings will not work as intended for clusters with overprovisioning?

(cluster.cpu.allocated.capacity.notificationthreshold)

(cluster.cpu.allocated.capacity.disablethreshold)

@btzq
The cpu capacity used in resource calculation and vm allocation is
cpu cores * cpu speed * overprovisioning factor
It does consider the overprovisioning factor. so no issues.

btzq · 2024-09-18T13:05:09Z

@weizhouapache Does this mean that:

(cluster.cpu.allocated.capacity.notificationthreshold)
(cluster.cpu.allocated.capacity.disablethreshold)

Only triggers if the 'CPU' Field (Not # of CPU Cores) exceed the threshold?

But not all CPUs are 2,000Mhz. AMD 9554 is 3.1Ghz. And in the scenario of a mix cluster, it becomes even more complicated?

And when a node fails, how does cloudstack determine which remaining nodes the VM should failover to?
Is it based on 'CPU'? Or ''# of CPU Cores)'?

weizhouapache · 2024-09-18T13:16:22Z

@weizhouapache Does this mean that:

(cluster.cpu.allocated.capacity.notificationthreshold)

(cluster.cpu.allocated.capacity.disablethreshold)

Only triggers if the 'CPU' Field (Not # of CPU Cores) exceed the threshold?

But not all CPUs are 2,000Mhz. AMD 9554 is 3.1Ghz. And in the scenario of a mix cluster, it becomes even more complicated?

And when a node fails, how does cloudstack determine which remaining nodes the VM should failover to? Is it based on 'CPU'? Or ''# of CPU Cores)'?

all operations are based on "CPU".
the host with faster cpu (in mhz) is considered to have more cpu resources than the hosts with slower cpu.

the # of CPU Cores is only returned in the listCapacity response and displayed on the dashboard. that's all.
it was introduced in commit 088cca2
nothing will change even if we remove the capacity type CAPACITY_TYPE_CPU_CORE and related codes.

btzq · 2024-09-18T13:46:29Z

@weizhouapache I went through this explanation:

#6743

In this case, would it make sense for us to set 1.0Ghz all CPU? This would mean all instance have the same share.

That way, the 'CPU' field will display the same value as the number of core allocated and remaining.

We just have to make sure that 'CPU Cap' in the Compute Offering is disabled? That way, the number of remaining CPU is the same as the # of Cores left, and there will be no change to the guest VM performances?

weizhouapache · 2024-09-18T18:22:15Z

@weizhouapache I went through this explanation:

#6743

In this case, would it make sense for us to set 1.0Ghz all CPU? This would mean all instance have the same share.

Yes, I think so.
Actually I think we should have a global setting to indicate whether the cpu speed must be same and the value of the cpu speed.
In many use cases, the cpu speed is totally useless.

That way, the 'CPU' field will display the same value as the number of core allocated and remaining.

We just have to make sure that 'CPU Cap' in the Compute Offering is disabled? That way, the number of remaining CPU is the same as the # of Cores left, and there will be no change to the guest VM performances?

btzq changed the title ~~Compute Cluster # of CPU Cores doesnt seem accurate?~~ No. of CPU Cores in Compute Cluster doesnt display correctly when Cluster has oversubscription more than 1 Sep 17, 2024

DaanHoogland added the status:needs-functional-definition label Sep 19, 2024

DaanHoogland modified the milestones: 4.20.1.0, unplanned Sep 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No. of CPU Cores in Compute Cluster doesnt display correctly when Cluster has oversubscription more than 1 #9693

No. of CPU Cores in Compute Cluster doesnt display correctly when Cluster has oversubscription more than 1 #9693

btzq commented Sep 17, 2024

weizhouapache commented Sep 17, 2024

btzq commented Sep 17, 2024

weizhouapache commented Sep 17, 2024

btzq commented Sep 18, 2024

weizhouapache commented Sep 18, 2024

btzq commented Sep 18, 2024

weizhouapache commented Sep 18, 2024

No. of CPU Cores in Compute Cluster doesnt display correctly when Cluster has oversubscription more than 1 #9693

No. of CPU Cores in Compute Cluster doesnt display correctly when Cluster has oversubscription more than 1 #9693

Comments

btzq commented Sep 17, 2024

ISSUE TYPE

COMPONENT NAME

CLOUDSTACK VERSION

CONFIGURATION

OS / ENVIRONMENT

SUMMARY

STEPS TO REPRODUCE

EXPECTED RESULTS

ACTUAL RESULTS

weizhouapache commented Sep 17, 2024

btzq commented Sep 17, 2024

weizhouapache commented Sep 17, 2024

btzq commented Sep 18, 2024

weizhouapache commented Sep 18, 2024

btzq commented Sep 18, 2024

weizhouapache commented Sep 18, 2024