-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvm: ref-count storage pool usage #9498
base: 4.19
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## 4.19 #9498 +/- ##
============================================
+ Coverage 15.08% 15.11% +0.02%
+ Complexity 11192 11190 -2
============================================
Files 5406 5406
Lines 473215 473237 +22
Branches 61680 58357 -3323
============================================
+ Hits 71386 71523 +137
- Misses 393880 393906 +26
+ Partials 7949 7808 -141
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clgtm, you do have a good test scenario for this, do you @rp- ? Or is it only intermitted (i.e. not automatable)
I'm not sure it is easy to reproducible automate that, as it is a timing/parallelism issue. But we have 2 customers who didn't report any issues with this yet. |
@blueorangutan package |
@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 10620 |
@blueorangutan test |
@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests |
[SF] Trillian test result (tid-11065)
|
@blueorangutan package |
@rohityadavcloud a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 10927 |
Packaging result [SF]: ✖️ el8 ✖️ el9 ✔️ debian ✖️ suse15. SL-JID 10950 |
I guess the failed packaging is nothing related to this PR? |
looks like a test was too slow , so might have to do with too busy container. retrying @rp- |
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 10988 |
@blueorangutan test |
@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests |
[SF] Trillian test result (tid-11364)
|
@blueorangutan package |
@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 11033 |
@blueorangutan test |
@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests |
[SF] Trillian test result (tid-11408)
|
Are there more logs to this "needs access to storage pool"? |
sorry @rp- , missing context here. If you are referring to the smoke tests, the download does contain the management server logs. |
I see exceptions like this:
But here the agent logs would be interesting. |
ah, no, i don´t have those. I'll run again without the teardown step and have a look. |
@blueorangutan test keepEnv |
@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests |
[SF] Trillian test result (tid-11452)
|
/** | ||
* Thread-safe increment storage pool usage refcount | ||
* @param uuid UUID of the storage pool to increment the count | ||
*/ | ||
private void incStoragePoolRefCount(String uuid) { | ||
synchronized (storagePoolRefCounts) { | ||
int refCount = storagePoolRefCounts.computeIfAbsent(uuid, k -> 0); | ||
refCount += 1; | ||
storagePoolRefCounts.put(uuid, refCount); | ||
} | ||
} | ||
|
||
/** | ||
* Thread-safe decrement storage pool usage refcount for the given uuid and return if storage pool still in use. | ||
* @param uuid UUID of the storage pool to decrement the count | ||
* @return true if the storage pool is still used, else false. | ||
*/ | ||
private boolean decStoragePoolRefCount(String uuid) { | ||
synchronized (storagePoolRefCounts) { | ||
Integer refCount = storagePoolRefCounts.get(uuid); | ||
if (refCount != null && refCount > 1) { | ||
s_logger.debug(String.format("Storage pool %s still in use, refCount %d", uuid, refCount)); | ||
refCount -= 1; | ||
storagePoolRefCounts.put(uuid, refCount); | ||
return true; | ||
} else { | ||
storagePoolRefCounts.remove(uuid); | ||
return false; | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rp- , should inc and dec maybe both call the same bit of synchronised code so that no inc and dec can happen at the same time as well? The storagePoolRefCounts
can be a synchronised map to prevent strangeties happening on the map, but we also want to prevent multiple access on the elements.
/** | |
* Thread-safe increment storage pool usage refcount | |
* @param uuid UUID of the storage pool to increment the count | |
*/ | |
private void incStoragePoolRefCount(String uuid) { | |
synchronized (storagePoolRefCounts) { | |
int refCount = storagePoolRefCounts.computeIfAbsent(uuid, k -> 0); | |
refCount += 1; | |
storagePoolRefCounts.put(uuid, refCount); | |
} | |
} | |
/** | |
* Thread-safe decrement storage pool usage refcount for the given uuid and return if storage pool still in use. | |
* @param uuid UUID of the storage pool to decrement the count | |
* @return true if the storage pool is still used, else false. | |
*/ | |
private boolean decStoragePoolRefCount(String uuid) { | |
synchronized (storagePoolRefCounts) { | |
Integer refCount = storagePoolRefCounts.get(uuid); | |
if (refCount != null && refCount > 1) { | |
s_logger.debug(String.format("Storage pool %s still in use, refCount %d", uuid, refCount)); | |
refCount -= 1; | |
storagePoolRefCounts.put(uuid, refCount); | |
return true; | |
} else { | |
storagePoolRefCounts.remove(uuid); | |
return false; | |
} | |
} | |
} | |
/** | |
* adjust refcount | |
*/ | |
private int adjustStoragePoolRefCount(Sting uuid, int adjustment) { | |
synchronised(uuid) { | |
// some access on the storagePoolRefCounts.key(uuid) element | |
Integer refCount = storagePoolRefCounts.computeIfAbsent(uuid, k -> 0); | |
refCount += adjustment; | |
storagePoolRefCounts.put(uuid, refCount); | |
if (refCount < 1) { | |
storagePoolRefCounts.remove(uuid); | |
} else { | |
storagePoolRefCounts.put(uuid, refCount); | |
} | |
} | |
return refCount; | |
} | |
/** | |
* Thread-safe increment storage pool usage refcount | |
* @param uuid UUID of the storage pool to increment the count | |
*/ | |
private void incStoragePoolRefCount(String uuid) { | |
adjustStoragePoolRefCount(uuid, 1); | |
} | |
/** | |
* Thread-safe decrement storage pool usage refcount for the given uuid and return if storage pool still in use. | |
* @param uuid UUID of the storage pool to decrement the count | |
* @return true if the storage pool is still used, else false. | |
*/ | |
private boolean decStoragePoolRefCount(String uuid) { | |
return adjustStoragePoolRefCount(uuid, -1) > 0; | |
} |
???
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't fully follow you here, any map access is synchronized
so there can't be any concurrent access to elements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or it just stroke me, that maybe the LibVirtStorageAdaptor
is no singleton object and there could be more instances, but than it would probably be good enough to make the map static
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two remarks:
- Can you considder a threadsafe collection ? i.e. concurrent map.
- the map can be guarded (as concurrent map or by your own synchronised blocks) but I don't think that will guard its elements from being guarded, only the structure of the map itself.
- (there is always a third) blocking access to the map globally may impact performance (though in this case that would have to be on very big hosts.
I think you are right about making the map static, btw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Sure I can also use concurrenthashmap
- Sure if you leak the elements out of the synchronization blocks they are not thread safe anymore, but the current code doesn't do that.
- As far as I read, concurrenthashmap only locks on element level, so switching to that would already be good.
If a storage pool is used by e.g. 2 concurrent snapshot->template actions, if the first action finished it removed the netfs mount point for the other action. Now the storage pools are usage ref-counted and will only deleted if there are no more users.
669f29d
to
c599a58
Compare
@blueorangutan package |
@rp- a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 11134 |
@blueorangutan test |
1 similar comment
@blueorangutan test |
@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests |
[SF] Trillian test result (tid-11507)
|
storagePoolRefCounts.put(uuid, refCount); | ||
if (refCount < 1) { | ||
storagePoolRefCounts.remove(uuid); | ||
} else { | ||
storagePoolRefCounts.put(uuid, refCount); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
storagePoolRefCounts.put(uuid, refCount); | |
if (refCount < 1) { | |
storagePoolRefCounts.remove(uuid); | |
} else { | |
storagePoolRefCounts.put(uuid, refCount); | |
} | |
if (refCount < 1) { | |
storagePoolRefCounts.remove(uuid); | |
} else { | |
storagePoolRefCounts.put(uuid, refCount); | |
} |
Why put the refCount only to remove or put it again in the next instruction?
* adjust refcount | ||
*/ | ||
private int adjustStoragePoolRefCount(String uuid, int adjustment) { | ||
synchronized (uuid) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this synchronized work as expected? Do we guarantee that every String with the same uuid as value is the same object?
Description
If a storage pool is used by e.g. 2 concurrent snapshot->template actions, if the first action finished it removed the netfs mount point for the other action.
Now the storage pools are usage ref-counted and will only deleted if there are no more users.
Fixes: #8899
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?
Run several snapshot to template actions, that are executed on the same host.
How did you try to break this feature and the system with this change?