Skip to content

High availability cloud switcher#37

Open
kbcbals wants to merge 34 commits intomainfrom
br_cloud_switcher
Open

High availability cloud switcher#37
kbcbals wants to merge 34 commits intomainfrom
br_cloud_switcher

Conversation

@kbcbals
Copy link
Contributor

@kbcbals kbcbals commented Jun 28, 2022

Description

High availability of the GZ:
Healtcheck instance is spun off in AWS 'us-east-1' region, keeps an 'eye' on the GZ. when the site is down,
invokes the infra in Azure, to kep the site running uninterrupted.

Related Issue

issue 594, in the webrepo
issue 36, in the infra repo

Motivation and Context

provide the much needed high availability via orchestratiion of the multi cloud setup.

How Has This Been Tested?

spun up the cluster and tested it

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@kbcbals kbcbals added the enhancement New feature or request label Jun 28, 2022
@kbcbals kbcbals self-assigned this Jun 28, 2022
@kbcbals kbcbals linked an issue Jun 28, 2022 that may be closed by this pull request
1 task
@jamesgeddes
Copy link
Contributor

I'm happy as long as @giulio-giunta is happy

@kbcbals
Copy link
Contributor Author

kbcbals commented Jul 10, 2022

code review for the infra branch : br_cloud_switcher on the geek.zone/infra

reviewer : Guilio Gunta (GG)
owner : Bala (BB)

start time : 8.30pm - 10th July
End time : 9.30pm - 10th July

items raised by GG:

  1. in the .circleci/config.yml
    workflow: main-infra
    job: init-switcher-job
    - squash all the run's to a single run command (speed up the build)
    BB: agreed to squash all the run's to a single run

  2. in the .circleci/config.yml
    workflow: deploy_switcher_infra_aws
    job: deploy_infra_aws_job
    - squash all the run's to a single run command (speed up the build)
    BB: agreed to squash all the run's to a single run

  3. in the .circleci/config.yml
    workflow: deploy_infra_azure_job
    job: deploy_infra_azure_job
    - squash all the run's to a single run command (speed up the build)
    BB: agreed to squash all the run's to a single run

  4. security loophole, SSH port 22 is open.
    GG pointed out the public IP was available.
    BB: there's no public ip in the VPC.
    GG: VPC peering was suggested
    BB: VPC peering, is unnecessary and is absoultely not needed in this situation as the only VPC
    involved is the one us-east-1 region and the Circle CI.
    changes to be done are the port 22 not being opened. yes done.

  5. GG: Docker image can be used to accomplish
    BB: yes, one docker image can be created and be available in the dockerhub to accomplish this.

    Next code review will happen with one reviewer and one observer along with the owner, as agreed by BB and GG

@giulio-giunta
Copy link
Contributor

I think the folder you created and named "Terraform" should have a more specific name to avoid confusion with the terraform folders inside the aws and azure folders.

I don't think there is a need to create an extra image for dependencies when you already have geekzone/infra... which as you said makes the ci pipeline faster.
If you wanna create an image just to have your own image, go for it, but don't think it's the right approach and would add redundancy in the repo.

Regarding the next code review, no need to have one reviewer + one observer + the owner: I had already reviewed your work when you raised the same PR in the web repo, but you never came back. Then you asked for my review again, so today I decided to have a meeting with you to show that there was no test backing your PR and to explain my points to you.

However, if you don't like my suggestions, it doesn't have to be necessarily me reviewing your work!

@kbcbals
Copy link
Contributor Author

kbcbals commented Jul 16, 2022

@giulio-giunta:
Please find my answers below:

<<< I think the folder you created and named "Terraform" should have a more specific name to avoid confusion with the
<<< terraform folders inside the aws and azure folders.
changed the name of the folder

<<< I don't think there is a need to create an extra image for dependencies when you already have geekzone/infra... which as
<<< you said makes the ci pipeline faster.
<<< If you wanna create an image just to have your own image, go for it, but don't think it's the right approach and would add
<<< redundancy in the repo.
I also think so. there's no needed for a docker image, the terraform bits can be in the CI part of pipeline as its not tied-up with the AWS infra or AZURE infra, he suggestion was from you in the first place.

<<< Regarding the next code review, no need to have one reviewer + one observer + the owner: I had already reviewed your
<<< work when you raised the same PR in the web repo, but you never came back. Then you asked for my review again, so
<<< today I decided to have a meeting with you to show that there was no test backing your PR and to explain my points to <<< you. However, if you don't like my suggestions, it doesn't have to be necessarily me reviewing your work!
I have moved this PR from web repo to the infra repo, as per your suggestion, only. I don't know why you are saying like that and I have changed as per every reviewer's comments , if it makes sense and the reviewer explains the disadvantages in any bit of my PR. I do really appreciate your every reviewers suggestions and make every change , if its reasonable and make the changes, then and there and submit for re-review.
I don't take anything personal, Its mainly to do with the quality of the process. In our company, we do document every code review for future reference and that's the norm to break knowledge Silos and spread the knowledge.
anyways I'm making the changes and submit for review again :)

@giulio-giunta
Copy link
Contributor

giulio-giunta commented Jul 17, 2022

@kbcbals @jamesgeddes

Sorry for coming back after a while, but I've had a very busy week!

I think building our own executor in a docker image, with all the dependencies needed for our cicd pipeline, is a great thing as that makes the pipeline faster and gives us full control of what we run our code on. We can even scan that image with Snyk and make sure there are no vulnerabilities!
On top of that, there might me cases when the URLs to download dependencies with time change, resulting in a pipeline fail. This applies to the AMI used by the cloud-switcher ec2 instance (a baked AMI in my opinion is preferable to one built on the spot).

Bala, it's true that using a docker image was my suggestion in the first place, but you agreed it makes the pipeline faster. Why having a faster pipeline is not important for you anymore?

You said there was no public IP address assigned to the ec2 instance where the health checks for the k8s cluster are performed, but I see the opposite in line 139 in cloud-switcher/main.tf and the Sonarcloud check failed just because of that: https://sonarcloud.io/project/security_hotspots?id=GeekZoneHQ_infra&pullRequest=37&resolved=false&types=SECURITY_HOTSPOT
Can you make clear the reason to ssh in that box? Also, why do we need to allow outbound traffic from port 22 in that box to anywhere? Do we need to ssh from that box somewhere else?

Finally, I don't take anything personal either, but I've been advised to reply to comments in a PR when it has been reviewed. I made 10 comments, but you replied only to 1. All the others were ignored: GeekZoneHQ/web#600

So, I think we don't need to have meetings with one reviewer + one observer + the owner, if we can use the tools Github offer to collaborate in a simple and communicative way.

@giulio-giunta
Copy link
Contributor

@kbcbals @jamesgeddes

By the way, the pipeline succeeded, but I think to fully test the work we should see a new cluster being created when the health checks fail. Has it been created?

@kbcbals
Copy link
Contributor Author

kbcbals commented Jul 17, 2022

@giulio-giunta:
Please find my responses below:
<<< I think building our own executor in a docker image, with all the dependencies
<<< needed for our cicd pipeline, is a great thing as that makes the pipeline faster
<<< and gives us full control of what we run our code on. We can even scan that image
<<< with Snyk and make sure there are no vulnerabilities!
<<< On top of that, there might me cases when the URLs to download dependencies with
<<< time change, resulting in a pipeline fail. This applies to the AMI used by the
<<< cloud-switcher ec2 instance (a baked AMI in my opinion is preferable to one built
<<< on the spot).
Have you created any AMI, for me to use ? which baked AMI are you using ? Can you point me to that please.

<<< Bala, it's true that using a docker image was my suggestion in the first place,
<<< but you agreed it makes the pipeline faster. Why having a faster pipeline is not
<<< important for you anymore?
Following are my reasons :

  1. The problem with the docker image, its impossible to rotate the secrets.
  2. The size of docker image is close to 577mb , and is bloated in my opinion. takes time to upload and takes time to download. The meaning of CI is lost and we create one image for every change in the pipeline. In couple of months time we will have many docker images images lying around.
  3. There are many versions of the docker image and which one should anyone download for spinning the infra ?
  4. There is ** no documentation ** on how to run these docker images, which version of docker image to use, to spin which version of the infra ? what are the contents of these docker images ?

<<< You said there was no public IP address assigned to the ec2 instance where the health checks for the k8s
<<< cluster are performed, but I see the opposite in line 139 in cloud-switcher/main.tf and the Sonarcloud check
<<< failed just because of that: https://sonarcloud.io/project/security_hotspots?id=GeekZoneHQ_infra&pullRequest=37&resolved=false&types=SECURITY_HOTSPOT
<<< Can you make clear the reason to ssh in that box? Also, why do we need to allow outbound traffic from port
<<< 22 in that box to anywhere? Do we need to ssh from that box somewhere else?
I'm still testing it. did not release for re-review. Im sorry, Im using the AWS budget as well.

<<< Finally, I don't take anything personal either, but I've been advised to reply to comments in a PR when it
<<< has been reviewed. I made 10 comments, but you replied only to 1. All the others were ignored: GeekZoneHQ/web#600
I did go through the list and I could find only 'one'
GeekZoneHQ/web#600
Can you point me to the other '9' as well please.

So, I think we don't need to have meetings with one reviewer + one observer + the owner, if we can use the tools Github offer to collaborate in a simple and communicative way.
<<< makes sense. I guess its needed to have transparency in the review process.

@kbcbals
Copy link
Contributor Author

kbcbals commented Jul 17, 2022

@kbcbals @jamesgeddes

By the way, the pipeline succeeded, but I think to fully test the work we should see a new cluster being created when the health checks fail. Has it been created?

Yes. I need you help in checking whether the cluster is spun in Azure.

@kbcbals
Copy link
Contributor Author

kbcbals commented Jul 17, 2022

cloud-switcher
@giulio-giunta
BTW, this is what I'm trying to achieve. Let me know, if you think, it won't work,

@giulio-giunta
Copy link
Contributor

@kbcbals @jamesgeddes,

I haven't created a baked AMI, but if you want I can do that.

Regarding the geekzone/infra image we need to clarify the following:

  • It currently contains all the k8s manifests and Terraform config files because I needed a way to create and destroy clusters whenever a new PR is raised in the "web" repo, just for testing. We don't need that once we go live and the cluster is running all the time, so the size will decrease a lot.
  • Sealed-secrets (encrypted) shouldn't go in the image, again this is a temporary solution to allow on demand creation and destruction of k8s clusters.
  • You won't have many infra docker images laying around because there is no tag assigned to it when it is buillt, so by default docker assigns the "latest" tag. There is only one infra docker image, the "latest", which will be used when it is downloaded as the executor for the cicd pipeline. I remember you asking me this question, maybe you forgot. Anyway, you can check that in DockerHub: https://hub.docker.com/repository/registry-1.docker.io/geekzone/infra/tags?page=1&ordering=last_updated
  • You're right, there is no documentation, but as I said above, the k8s manifests and Terraform files will be in the image just temporarily. Anyway, I'm sure you are able to understand from the Dockerfile what goes in that image, so you can find out easily by yourself; if in doubt just ask.

If you're still testing, maybe the PR should be called "draft".
By the way, do you need that ec2 instance to have a public IP address or not? Does it need to live in a public subnet or it can stay in a private subnet instead?

For the other 9 unreplied comments you can check again here

@giulio-giunta
Copy link
Contributor

@kbcbals @jamesgeddes
By the way, the pipeline succeeded, but I think to fully test the work we should see a new cluster being created when the health checks fail. Has it been created

Yes. I need you help in checking whether the cluster is spun in Azure.

I think this should show in Circleci UI. Btw, do you have access to Azure?

@sonarqubecloud
Copy link

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot E 1 Security Hotspot
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

High availability of the infra

3 participants