Skip to content
silarsis edited this page Oct 20, 2014 · 11 revisions

Used a template from the CoreOS sample templates, modified slightly. Running 3 servers on t2.micros.

Had some issues with getting fleetd running - deleting and re-creating the stack with a new discovery ID seems to have fixed it (so not sure why it didn't work first time around, possibly used then emptied the list of servers for that ID?)

fleetctl works as advertised - once you specify a FLEETCTL_TUNNEL pointing at any of the CoreOS servers, it'll manage docker containers on all of them. Where possible, I intend to use this as the backbone for setting up some other PaaS (eg. Mesos).

There are some logs - particularly around failure to run units - that only appear on the server via journalctl, not via fleetctl journal, at least as far as I can see.

Right now, if you rebuild a stack the fleetctl-added bits get forgotten, I think - perhaps this is resolved by having two ASGs, or two stacks, so there's always some parts of the cluster alive.

Questions

  • How do we deal with load balancing? Do we have to reimplement the concept at a PaaS level, or is there some inherent support for AWS LB?
  • Likewise security groups, and access to external resources?
  • What's the deployment process for the PaaS itself?
    • Cloudformation is easiest. Probably you'd run up a single ASG with at least a couple of servers just to keep the discoveryid alive (and maybe act as masters for anything else), then run the rest in another ASG (possibly another stack or multiple stacks?)
    • You could use the discovery ID as a form of stack id, as each server with the same ID will join the same managed cluster of machines.
  • What's the deployment process for containers once the PaaS is up?
    • fleetctl by preference - see the /coreos/mesos subdirectory for more details.
  • Can we do zero downtime updates of microservices?
  • What's the support, if any, for service discovery?
  • Logging - where does it go to, how is it supported? Ditto alarms, metrics - NewRelic, PagerDuty, Logstash/Splunk.
  • CI, if any - if not, how does it fit with existing systems?
  • Orchestration of multi-container installs?

Other things to Investigate

  • If we split the stack into two stacks, or even two ASGs, can we keep some servers in the cluster alive and thus dodge both burning the discoveryid and losing our fleetctl-added bits?
  • On creating or updating the stack, we should really have a way to find an IP as an entry point. Do we want an elastic IP attached to something?

References

Clone this wiki locally