High availability cloud switcher by kbcbals · Pull Request #37 · GeekZoneHQ/infra

kbcbals · 2022-06-28T21:13:20Z

Description

High availability of the GZ:
Healtcheck instance is spun off in AWS 'us-east-1' region, keeps an 'eye' on the GZ. when the site is down,
invokes the infra in Azure, to kep the site running uninterrupted.

Related Issue

issue 594, in the webrepo
issue 36, in the infra repo

Motivation and Context

provide the much needed high availability via orchestratiion of the multi cloud setup.

How Has This Been Tested?

spun up the cluster and tested it

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have read the CONTRIBUTING document.
I have added tests to cover my changes.
All new and existing tests passed.

jamesgeddes · 2022-07-06T06:34:50Z

I'm happy as long as @giulio-giunta is happy

kbcbals · 2022-07-10T20:56:34Z

code review for the infra branch : br_cloud_switcher on the geek.zone/infra

reviewer : Guilio Gunta (GG)
owner : Bala (BB)

start time : 8.30pm - 10th July
End time : 9.30pm - 10th July

items raised by GG:

in the .circleci/config.yml
workflow: main-infra
job: init-switcher-job
- squash all the run's to a single run command (speed up the build)
BB: agreed to squash all the run's to a single run
in the .circleci/config.yml
workflow: deploy_switcher_infra_aws
job: deploy_infra_aws_job
- squash all the run's to a single run command (speed up the build)
BB: agreed to squash all the run's to a single run
in the .circleci/config.yml
workflow: deploy_infra_azure_job
job: deploy_infra_azure_job
- squash all the run's to a single run command (speed up the build)
BB: agreed to squash all the run's to a single run
security loophole, SSH port 22 is open.
GG pointed out the public IP was available.
BB: there's no public ip in the VPC.
GG: VPC peering was suggested
BB: VPC peering, is unnecessary and is absoultely not needed in this situation as the only VPC
involved is the one us-east-1 region and the Circle CI.
changes to be done are the port 22 not being opened. yes done.
GG: Docker image can be used to accomplish
BB: yes, one docker image can be created and be available in the dockerhub to accomplish this.

Next code review will happen with one reviewer and one observer along with the owner, as agreed by BB and GG

giulio-giunta · 2022-07-10T23:47:27Z

I think the folder you created and named "Terraform" should have a more specific name to avoid confusion with the terraform folders inside the aws and azure folders.

I don't think there is a need to create an extra image for dependencies when you already have geekzone/infra... which as you said makes the ci pipeline faster.
If you wanna create an image just to have your own image, go for it, but don't think it's the right approach and would add redundancy in the repo.

Regarding the next code review, no need to have one reviewer + one observer + the owner: I had already reviewed your work when you raised the same PR in the web repo, but you never came back. Then you asked for my review again, so today I decided to have a meeting with you to show that there was no test backing your PR and to explain my points to you.

However, if you don't like my suggestions, it doesn't have to be necessarily me reviewing your work!

kbcbals · 2022-07-16T06:15:12Z

@giulio-giunta:
Please find my answers below:

<<< I think the folder you created and named "Terraform" should have a more specific name to avoid confusion with the
<<< terraform folders inside the aws and azure folders.
changed the name of the folder

<<< I don't think there is a need to create an extra image for dependencies when you already have geekzone/infra... which as
<<< you said makes the ci pipeline faster.
<<< If you wanna create an image just to have your own image, go for it, but don't think it's the right approach and would add
<<< redundancy in the repo.
I also think so. there's no needed for a docker image, the terraform bits can be in the CI part of pipeline as its not tied-up with the AWS infra or AZURE infra, he suggestion was from you in the first place.

<<< Regarding the next code review, no need to have one reviewer + one observer + the owner: I had already reviewed your
<<< work when you raised the same PR in the web repo, but you never came back. Then you asked for my review again, so
<<< today I decided to have a meeting with you to show that there was no test backing your PR and to explain my points to <<< you. However, if you don't like my suggestions, it doesn't have to be necessarily me reviewing your work!
I have moved this PR from web repo to the infra repo, as per your suggestion, only. I don't know why you are saying like that and I have changed as per every reviewer's comments , if it makes sense and the reviewer explains the disadvantages in any bit of my PR. I do really appreciate your every reviewers suggestions and make every change , if its reasonable and make the changes, then and there and submit for re-review.
I don't take anything personal, Its mainly to do with the quality of the process. In our company, we do document every code review for future reference and that's the norm to break knowledge Silos and spread the knowledge.
anyways I'm making the changes and submit for review again :)

giulio-giunta · 2022-07-17T12:25:12Z

@kbcbals @jamesgeddes

Sorry for coming back after a while, but I've had a very busy week!

I think building our own executor in a docker image, with all the dependencies needed for our cicd pipeline, is a great thing as that makes the pipeline faster and gives us full control of what we run our code on. We can even scan that image with Snyk and make sure there are no vulnerabilities!
On top of that, there might me cases when the URLs to download dependencies with time change, resulting in a pipeline fail. This applies to the AMI used by the cloud-switcher ec2 instance (a baked AMI in my opinion is preferable to one built on the spot).

Bala, it's true that using a docker image was my suggestion in the first place, but you agreed it makes the pipeline faster. Why having a faster pipeline is not important for you anymore?

You said there was no public IP address assigned to the ec2 instance where the health checks for the k8s cluster are performed, but I see the opposite in line 139 in cloud-switcher/main.tf and the Sonarcloud check failed just because of that: https://sonarcloud.io/project/security_hotspots?id=GeekZoneHQ_infra&pullRequest=37&resolved=false&types=SECURITY_HOTSPOT
Can you make clear the reason to ssh in that box? Also, why do we need to allow outbound traffic from port 22 in that box to anywhere? Do we need to ssh from that box somewhere else?

Finally, I don't take anything personal either, but I've been advised to reply to comments in a PR when it has been reviewed. I made 10 comments, but you replied only to 1. All the others were ignored: GeekZoneHQ/web#600

So, I think we don't need to have meetings with one reviewer + one observer + the owner, if we can use the tools Github offer to collaborate in a simple and communicative way.

giulio-giunta · 2022-07-17T12:36:21Z

@kbcbals @jamesgeddes

By the way, the pipeline succeeded, but I think to fully test the work we should see a new cluster being created when the health checks fail. Has it been created?

kbcbals · 2022-07-17T20:44:31Z

@giulio-giunta:
Please find my responses below:
<<< I think building our own executor in a docker image, with all the dependencies
<<< needed for our cicd pipeline, is a great thing as that makes the pipeline faster
<<< and gives us full control of what we run our code on. We can even scan that image
<<< with Snyk and make sure there are no vulnerabilities!
<<< On top of that, there might me cases when the URLs to download dependencies with
<<< time change, resulting in a pipeline fail. This applies to the AMI used by the
<<< cloud-switcher ec2 instance (a baked AMI in my opinion is preferable to one built
<<< on the spot).
Have you created any AMI, for me to use ? which baked AMI are you using ? Can you point me to that please.

<<< Bala, it's true that using a docker image was my suggestion in the first place,
<<< but you agreed it makes the pipeline faster. Why having a faster pipeline is not
<<< important for you anymore?
Following are my reasons :

The problem with the docker image, its impossible to rotate the secrets.
The size of docker image is close to 577mb , and is bloated in my opinion. takes time to upload and takes time to download. The meaning of CI is lost and we create one image for every change in the pipeline. In couple of months time we will have many docker images images lying around.
There are many versions of the docker image and which one should anyone download for spinning the infra ?
There is ** no documentation ** on how to run these docker images, which version of docker image to use, to spin which version of the infra ? what are the contents of these docker images ?

<<< You said there was no public IP address assigned to the ec2 instance where the health checks for the k8s
<<< cluster are performed, but I see the opposite in line 139 in cloud-switcher/main.tf and the Sonarcloud check
<<< failed just because of that: https://sonarcloud.io/project/security_hotspots?id=GeekZoneHQ_infra&pullRequest=37&resolved=false&types=SECURITY_HOTSPOT
<<< Can you make clear the reason to ssh in that box? Also, why do we need to allow outbound traffic from port
<<< 22 in that box to anywhere? Do we need to ssh from that box somewhere else?
I'm still testing it. did not release for re-review. Im sorry, Im using the AWS budget as well.

<<< Finally, I don't take anything personal either, but I've been advised to reply to comments in a PR when it
<<< has been reviewed. I made 10 comments, but you replied only to 1. All the others were ignored: GeekZoneHQ/web#600
I did go through the list and I could find only 'one'
GeekZoneHQ/web#600
Can you point me to the other '9' as well please.

So, I think we don't need to have meetings with one reviewer + one observer + the owner, if we can use the tools Github offer to collaborate in a simple and communicative way.
<<< makes sense. I guess its needed to have transparency in the review process.

kbcbals · 2022-07-17T20:45:39Z

@kbcbals @jamesgeddes

By the way, the pipeline succeeded, but I think to fully test the work we should see a new cluster being created when the health checks fail. Has it been created?

Yes. I need you help in checking whether the cluster is spun in Azure.

kbcbals · 2022-07-17T20:52:03Z

@giulio-giunta
BTW, this is what I'm trying to achieve. Let me know, if you think, it won't work,

giulio-giunta · 2022-07-21T06:56:50Z

@kbcbals @jamesgeddes,

I haven't created a baked AMI, but if you want I can do that.

Regarding the geekzone/infra image we need to clarify the following:

It currently contains all the k8s manifests and Terraform config files because I needed a way to create and destroy clusters whenever a new PR is raised in the "web" repo, just for testing. We don't need that once we go live and the cluster is running all the time, so the size will decrease a lot.
Sealed-secrets (encrypted) shouldn't go in the image, again this is a temporary solution to allow on demand creation and destruction of k8s clusters.
You won't have many infra docker images laying around because there is no tag assigned to it when it is buillt, so by default docker assigns the "latest" tag. There is only one infra docker image, the "latest", which will be used when it is downloaded as the executor for the cicd pipeline. I remember you asking me this question, maybe you forgot. Anyway, you can check that in DockerHub: https://hub.docker.com/repository/registry-1.docker.io/geekzone/infra/tags?page=1&ordering=last_updated
You're right, there is no documentation, but as I said above, the k8s manifests and Terraform files will be in the image just temporarily. Anyway, I'm sure you are able to understand from the Dockerfile what goes in that image, so you can find out easily by yourself; if in doubt just ask.

If you're still testing, maybe the PR should be called "draft".
By the way, do you need that ec2 instance to have a public IP address or not? Does it need to live in a public subnet or it can stay in a private subnet instead?

For the other 9 unreplied comments you can check again here

giulio-giunta · 2022-07-24T13:01:52Z

@kbcbals @jamesgeddes
By the way, the pipeline succeeded, but I think to fully test the work we should see a new cluster being created when the health checks fail. Has it been created

Yes. I need you help in checking whether the cluster is spun in Azure.

I think this should show in Circleci UI. Btw, do you have access to Azure?

…stance

cloud-switcher/main.tf

sonarqubecloud · 2022-07-26T22:12:37Z

SonarCloud Quality Gate failed.

0 Bugs
0 Vulnerabilities
1 Security Hotspot
0 Code Smells

No Coverage information
0.0% Duplication

balab added 2 commits June 28, 2022 00:32

added the high availability cloud switcher

7678706

added the terraform fiiles for the high availability

02e7a92

kbcbals added the enhancement New feature or request label Jun 28, 2022

kbcbals requested review from aevivas, crouchingtigerhiddenadam, giulio-giunta and jamesgeddes June 28, 2022 21:13

kbcbals self-assigned this Jun 28, 2022

kbcbals linked an issue Jun 28, 2022 that may be closed by this pull request

High availability of the infra #36

Open

1 task

balab added 7 commits July 13, 2022 23:51

modified according to GG's comments

1d4dfdf

modified the circle ci

25f5c60

modified the circle ci

9ab02b6

change the trigger to infra

502fb82

added the ip to check

def2dd0

added the ip to check

0ab6112

added the ip to check

eeab7a9

balab added 7 commits July 16, 2022 11:09

modified the ci file to have the changes for the envsubst

cd87dc1

fixed the trigger issue

44f213d

trigger a build

dc33cd5

cloud switcher release branch

729a966

cloud switcher release branch

8c25bfe

cloud switcher release branch

ca0e443

added the support

ba2d114

balab added 13 commits July 18, 2022 00:28

modified the main.tf to add the additional ports

c3efa78

revert teh changes

c7f5563

fixed the trigger issue

43e3d5f

fixed the trigger issue#

eae16c8

stopped the multiple trigger

24d416f

stopped the multiple trigger

c4e699b

added the delete bit

fe6fc08

added the delete bit

90df274

added the delete bit

2e19200

mapped the dns names

046fa49

mapped the dns names

190850a

added the terraform bits

f1abe98

added the terraform bits]

48d963b

added the nat gateway and the private subnet, with the healthcheck in…

39de027

…stance

github-advanced-security bot found potential problems Jul 26, 2022

View reviewed changes

cloud-switcher/main.tf Fixed Show fixed Hide fixed

balab added 4 commits July 26, 2022 22:22

fixed the ssh access ip

a3d312d

fixed the ip issue~

19cf316

fixed the ip issue~

40d5200

aws build issue

b285bbd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High availability cloud switcher#37

High availability cloud switcher#37
kbcbals wants to merge 34 commits intomainfrom
br_cloud_switcher

kbcbals commented Jun 28, 2022 •

edited

Loading

Uh oh!

jamesgeddes commented Jul 6, 2022

Uh oh!

kbcbals commented Jul 10, 2022

Uh oh!

giulio-giunta commented Jul 10, 2022

Uh oh!

kbcbals commented Jul 16, 2022 •

edited

Loading

Uh oh!

giulio-giunta commented Jul 17, 2022 •

edited

Loading

Uh oh!

giulio-giunta commented Jul 17, 2022

Uh oh!

kbcbals commented Jul 17, 2022 •

edited

Loading

Uh oh!

kbcbals commented Jul 17, 2022

Uh oh!

kbcbals commented Jul 17, 2022 •

edited

Loading

Uh oh!

giulio-giunta commented Jul 21, 2022

Uh oh!

giulio-giunta commented Jul 24, 2022

Uh oh!

Uh oh!

sonarqubecloud bot commented Jul 26, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kbcbals commented Jun 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

Motivation and Context

How Has This Been Tested?

Types of changes

Checklist:

Uh oh!

jamesgeddes commented Jul 6, 2022

Uh oh!

kbcbals commented Jul 10, 2022

Uh oh!

giulio-giunta commented Jul 10, 2022

Uh oh!

kbcbals commented Jul 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

@giulio-giunta: Please find my answers below:

Uh oh!

giulio-giunta commented Jul 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

giulio-giunta commented Jul 17, 2022

Uh oh!

kbcbals commented Jul 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kbcbals commented Jul 17, 2022

Uh oh!

kbcbals commented Jul 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

giulio-giunta commented Jul 21, 2022

Uh oh!

giulio-giunta commented Jul 24, 2022

Uh oh!

Uh oh!

sonarqubecloud bot commented Jul 26, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kbcbals commented Jun 28, 2022 •

edited

Loading

kbcbals commented Jul 16, 2022 •

edited

Loading

@giulio-giunta:
Please find my answers below:

giulio-giunta commented Jul 17, 2022 •

edited

Loading

kbcbals commented Jul 17, 2022 •

edited

Loading

kbcbals commented Jul 17, 2022 •

edited

Loading