-
-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrence issue #1521
Comments
Hi @bitchecker Seems to be similar to #995. You can try to play with "parallelism" parameter, or perhaps update you |
Hi @bpg, Of course, setup |
@bitchecker, have you tried using the virtio disk interface as described here? Ultimately, this is not an issue with the provider but rather a bottleneck in the PVE I/O subsystem, which is exacerbated by Terraform’s parallel provisioning of VMs. You could also try moving your VM source (template or disk image) to a different physical datastore, if you have that option. I have found that doing so drastically improves the performance of VM creation when I create and destroy dozens of them in acceptance tests. |
Hi @bpg, I don't think that can be related to I/O issue because the server is running on NVMe drives. While I run |
Testing on a testing vm I get this error:
|
Building a new machine I get the same error also with
so between |
Do VM templates have fixed |
The VM Template is of course the same for all the machine and has not a fixed ID, but proxmox assigned it when I created it. For VMs, we're using the same logic...via code I'm asking 4 machines and I demand to proxmox the VMID. |
Ah, that could be the reason. Under the hood, the provider retrieves the VM IDs from PVE before creating the VMs. This operation is not atomic, so when multiple VMs without IDs are being created in parallel, there’s a chance of assigning the same ID to two or more VMs. There are a few reasons why the provider handles it this way instead of relying on PVE to allocate the IDs. Let me see if this can be improved. |
Hi, |
Hm... the problem lies with the PVE's This requirement has driven the current implementation. As a result, any VM creation, whether new or clone, first obtains an ID (or uses the provided one if it's in the template) before calling From the initial error message, I assume you're cloning a template in your use case. So if you don't like to use resource "proxmox_virtual_environment_vm" "test_vm" {
count = 3
name = "test-vm-${count.index}"
vm_id = 100 + count.index
clone {
vm_id = 123
}
...
} The regular VM create can be improved to make it more reliable, and I'll address it at some point. |
The duplicate IDs should be mitigated by #1557. Anyway, I'm going to close this ticket. Please test with the new 0.66 release and reopen if the issue is still there. |
Hi @bpg, Another point is that on your PR I can see that with or without the |
Hi @bitchecker I've added a new section here, as this is a provider-wide feature
Just to confirm, "result is the same" means you've tried the new version in your environment, and you see the same error `can't lock file '/var/lock/qemu-server/lock-XXXX.conf' ? |
Hi @bpg,
oh, no... I was just reporting that the first output you reported (without |
I just tried to add the new option and I can confirm that no regressions on existing guests! |
Hi @bpg, Please re-open this issue. |
What is the error do you see? There were 2 different lock errors in the previous reports: lock on the VM config, and lock on the storage. Also "599 response" error, which is a different issue. |
For what I can see, it seems that the "lock problem" is now moved to the template clone operation:
|
hi @bpg, |
Hey @bitchecker, I was able to intermittently reproduce this storage lock error during tests with high concurrency—specifically, when creating 8 or more machines from the same clone on the same physical storage. However, this issue is mitigated when clones/target VMs are distributed across different storage devices or when using high-throughput Ceph storage, as in my case. In my opinion, this error points to an I/O bottleneck within the PVE storage subsystem. The most effective solution seems to be reducing the throughput on the storage devices that are experiencing the lock. I recommend utilizing the parallelism setting or moving the clone source to a different storage to alleviate the issue. While I could add retry logic to the provider’s code for handling the clone operation, it won't guarantee reliability. It’s difficult to predict the duration of the storage lock and determine an appropriate retry strategy. Another option would be to throttle parallel cloning within the provider, but that would essentially replicate the parallelism functionality that Terraform already offers. |
Describe the bug
Hi,
using the provider creating multiple resources at the same time using the same template return a timeout error:
To Reproduce
I created a module that manage the
proxmox_virtual_environment_vm
resource, and in my final code I invoke that module 4 time.plan
goes perfectly ok, showing all the resources that will be created, but when I run anapply
I get the reported error.For "skip" the error, I need to run
apply
multiple time (getting N-1 times the reported error) and finish with all the resources created.single
8.2.4
latest
v1.9.5
Fedora 40
The text was updated successfully, but these errors were encountered: