You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been noticing terraform getting stuck trying to interact with ECP a lot lately.
For example, I attempted to destroy a cluster that failed to completely deploy with make clean.. terraform destroy started running and eventually stalled (presumably due to network/VPN hiccup). Four hours later, nothing was progressing, the process can't be terminated without kill -9, and terraform leaves the resources in a 'locked' state.
I don't know a way to recover from this and have wasted a ton of time trying to address it already; terraform does have a force-unlock subcommand, but attempting to run that yields Local state cannot be unlocked by another process . When this happened previously, I manually deleted the lockfile but that didn't allow terraform destroy to run again either.
I ended up having to delete the buildir manually and spend about 2 hours drilling into resources in the openstack console to ensure everything was cleaned up, but we should either determine a way to recover from this kind of scenario (which I've now hit again) in a graceful way that allows terraform to clean things up, or provide another subcommand in catapult to clean resources from ECP using the openstack CLIs instead of terraform
The text was updated successfully, but these errors were encountered:
In the past I have used the following snippet to delete only lb, secgroups, and nets from ECP: https://gitlab.suse.de/snippets/338. Mind you that this is not easy, as there's several recursive dependencies, and a specific order. Personally, I think replicating terraform on our own is a bad idea here, we are gonna be playing cat and mouse on our own for all the clouds.
I have opened SUSE/skuba#1051 to have the CaaSP terraform files create an Openstack Stack, which should be easier to delete.
It seems that the error you are facing is on the Terraform side. Catapult justs call a terraform destroy, and deletes the folder if it succeeds. One can call make clean as much as they want; and if Terraform is failing, one can use terraform knowledge to work around anything.
I really see no way to work around this, and I don't think Catapult can get more intelligent than Terraform. If it were for me, I would close this issue, as I see it as out of scope :/.
Uh oh!
There was an error while loading. Please reload this page.
I've been noticing terraform getting stuck trying to interact with ECP a lot lately.
For example, I attempted to destroy a cluster that failed to completely deploy with
make clean
.. terraform destroy started running and eventually stalled (presumably due to network/VPN hiccup). Four hours later, nothing was progressing, the process can't be terminated withoutkill -9
, and terraform leaves the resources in a 'locked' state.I don't know a way to recover from this and have wasted a ton of time trying to address it already; terraform does have a
force-unlock
subcommand, but attempting to run that yieldsLocal state cannot be unlocked by another process
. When this happened previously, I manually deleted the lockfile but that didn't allow terraform destroy to run again either.I ended up having to delete the buildir manually and spend about 2 hours drilling into resources in the openstack console to ensure everything was cleaned up, but we should either determine a way to recover from this kind of scenario (which I've now hit again) in a graceful way that allows terraform to clean things up, or provide another subcommand in catapult to clean resources from ECP using the openstack CLIs instead of terraform
The text was updated successfully, but these errors were encountered: