New Quartermaster fully-distributed architecture #539
Replies: 4 comments 9 replies
-
|
General remark: I did not find a way to comment on a line, only as a complete comment. Maybe, or at least I think, it is better to make normal document and do a pull request so we can comment per line and than have a discussion per line. That way a discussion per point can be made and you don't have to quote all the time what you are referring to? More Distributed:
The split of the client looks okay for starters. More Customizable
More RobustSo what do you propose here, this is the difficult part I think. How do you make sure that you know an assigned task is done and reported completely. Do you want to implement a sort of messaging system where pods get a task assigned by the master and report to the master again when they finished? open points
Possible next stepsThe previous paragraph looks like foot notes, but refer to things coming here, that is confusing. Here I would need more technical knowledge on how QM works to provide comments. |
Beta Was this translation helpful? Give feedback.
-
|
Great work. I read it several times and looked up a couple of things that I didn't know. You seem to have spent quite some thoughts into this concept. Great work! The idea of using separate builders for each programming language is very interesting! I also like the separation of builder and analyser, but then I think that this could result in quite a number of analysers. Would this still be easy to handle? I wonder if it might be more practical to bundle let's say Python builder with Python analysers in one unit.
I would imagine Docker images reside in a dedicated (private) container registry. Modules and CI/CD in one repository. Is security (secret management, service mesh, encryption for intra-cluster communication) part of your concept? Might be useful to include security from the beginning. I will read again tomorrow with a clear brain and add thoughts if need be. |
Beta Was this translation helpful? Give feedback.
-
Lots of questions :) To come to a conclusion about this point we need to answer these questions: A : B:
|
Beta Was this translation helpful? Give feedback.
-
|
I also agree with most of the ideas presented in the proposal with some exceptions :-)
The master should be just the coordinator, without having also roles as synchronising modules or inserting data into databases.
Here I think, that the most suitable approach would be to have a dedicated microservice that takes care of all the communication with the database. If the master or any other component/module needs data from the database, it should
It should go through the dedicated microservice
Every module should have its own repository. This would make sharing the module development/responsibility between teams easier.
For the microservices synchronization: Kafka
For service discovery, check out the spring netflix eureka project from the java world.
I don't know if it's a good idea to store the instance ID in the database. This seems like a workaround. What will happen when you get new instance IDs?
I think docker composer could be a solution for local testing. I also found this article. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
This proposal intends to abandon Quartermaster's monolithic architecture in terms of cloud infrastructure and repository management.
Table of contents
Goals
These changes would make Quartermaster:
These points will be individually addressed down below.
Current status
Besides
initContainers and aStatefulSetfor our DGraph database, Quartermaster runs its entire logic in two containers:qmstr-clientcontainer: it executes commands according to the Quartermaster's workflow.It is in charge of starting the so-called phases (e.g., build, analysis, etc.).
Currently, it builds the project, and it successively delegates the execution of the other phases to the
qmstr-mastercontainer.qmstr-mastercontainer: despite its name, it does not behave as such.It takes orders from the
qmstr-clientcontainer and executes them.Proposal
Following up, a more detailed plan on how to achieve the aforementioned goals.
More distributed
Some progress has already been done on the infrastructure side: the Quartermaster's main branch has already a dedicated Kubernetes folder, making Quartermaster capable of running in a cluster.
Its logic, however, is not yet fully-distributed as all Quartermaster's modules are running in only two containers (i.e., everything in
qmstr-masterexcept for the build phase).The biggest design change would consist of changing these containers' roles while maintaining a relatively-similar workflow.
First, the master should behave as such, meaning that it shouldn't do anything rather than:
Secondly, the
qmstr-clientcontainer should be split into multiple entities so that each of those only takes care of a specific task.More specifically, it would be split into the following entities:
It only builds the project using the proper Quartermaster wrapper so that build info will be stored into the database through the master¹.
There would be multiple builders, at least one for every programming language.
They would all have to be gRPC servers as they would need to wait for a signal from the master commanding them to start their execution.
Results would be sent to the master¹⁻⁷.
They would all have to be gRPC servers as well.
Easier to maintain
Separating those modules into multiple repositories can simplify maintenance and the overall project structure⁹.
Modules would be stored together with their corresponding Docker images and CI/CD stages, breaking down the current pipeline without having to go through the hassle of writing conditional stages⁴.
In particular, the following could end up in a dedicated repository:
Snapshots
Enhanced maintainability should also mean that developers are able to build and test their modules without the need of running those that depend on them
(i.e., building a reporter without having to launch at least one builder and at least one analyzer before).
While the current implementation stores snapshots in between phases, to my understanding it doesn't make use of them.
Also in case the previous statement turns out to be false, such functionality should be readjusted to this new fully-distributed architecture anyway.
Snapshots would not be implementable without having a long-running, production-ready, and multi-tenant⁸ DGraph instance always running in the background.
More customizable
Currently, Quartermaster goes through the different phases (e.g., build, analysis, etc.) imperatively: it's the
qmstr-clientcontainer that dictates the workflow and the phases to be executed.Yet, the master requires a ConfigMap containing configuration for all phases to be run.
Essentially, Quartermaster follows an imperative approach when it comes to workflow definition while having at the same time a declarative configuration for the
qmstr-mastercontainer.As emphasized in the "More distributed" paragraph, Quartermaster should follow a declarative approach exclusively.
The phases to be executed and their order should not be imposed by the
qmstr-clientcontainer issuing commands, but rather from the configuration file given to the master.The latter would orchestrate and synchronize modules, effectively behaving like a "master".
More robust
Synchronization between the master, client, and database, has always been addressed with simple sleep commands.
While this trivial solution works fine locally, a more complex synchronization mechanism is required when different entities may be scheduled at different times in the cluster⁶.
These entities should also take disruptions and evictions into account.
Sequence diagram
WIP, just a sketch.
Open points
Related: softwareengineering.stackexchange.com/a/374033
A trivial solution may consist of adding a "Quartermaster instance ID" field to all objects that are being inserted in the database and let every Quartermaster instance operate only on their own.
This would also allow snapshots.
Possible next steps
This paragraph tries to come up with a possible plan for the next achievable steps, assuming that the previous open points have been already addressed.
Beta Was this translation helpful? Give feedback.
All reactions