Skip to content

We need a way to delete resources when terraform gets stuck #179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
HartS opened this issue Apr 17, 2020 · 1 comment
Open

We need a way to delete resources when terraform gets stuck #179

HartS opened this issue Apr 17, 2020 · 1 comment

Comments

@HartS
Copy link
Contributor

HartS commented Apr 17, 2020

I've been noticing terraform getting stuck trying to interact with ECP a lot lately.

For example, I attempted to destroy a cluster that failed to completely deploy with make clean.. terraform destroy started running and eventually stalled (presumably due to network/VPN hiccup). Four hours later, nothing was progressing, the process can't be terminated without kill -9, and terraform leaves the resources in a 'locked' state.

I don't know a way to recover from this and have wasted a ton of time trying to address it already; terraform does have a force-unlock subcommand, but attempting to run that yields Local state cannot be unlocked by another process . When this happened previously, I manually deleted the lockfile but that didn't allow terraform destroy to run again either.

I ended up having to delete the buildir manually and spend about 2 hours drilling into resources in the openstack console to ensure everything was cleaned up, but we should either determine a way to recover from this kind of scenario (which I've now hit again) in a graceful way that allows terraform to clean things up, or provide another subcommand in catapult to clean resources from ECP using the openstack CLIs instead of terraform

@viccuad
Copy link
Member

viccuad commented Apr 20, 2020

I share your frustration.

In the past I have used the following snippet to delete only lb, secgroups, and nets from ECP: https://gitlab.suse.de/snippets/338. Mind you that this is not easy, as there's several recursive dependencies, and a specific order. Personally, I think replicating terraform on our own is a bad idea here, we are gonna be playing cat and mouse on our own for all the clouds.

I have opened SUSE/skuba#1051 to have the CaaSP terraform files create an Openstack Stack, which should be easier to delete.

It seems that the error you are facing is on the Terraform side. Catapult justs call a terraform destroy, and deletes the folder if it succeeds. One can call make clean as much as they want; and if Terraform is failing, one can use terraform knowledge to work around anything.

I really see no way to work around this, and I don't think Catapult can get more intelligent than Terraform. If it were for me, I would close this issue, as I see it as out of scope :/.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants