diff --git a/docs/ha-docker-swarm/design.md b/docs/ha-docker-swarm/design.md index d52f2a8..306ab40 100644 --- a/docs/ha-docker-swarm/design.md +++ b/docs/ha-docker-swarm/design.md @@ -1,7 +1,64 @@ -The "private cloud" platform is: +# Introduction -* **Highly-available** (can tolerate the failure of a single component) -* **Scalable** (can add resource or capacity as required) -* **Portable** (run it on your garage server today, run it in AWS tomorrow) -* **Secure** (access protected with LetsEncrypt certificates) -* **Automated** (requires minimal care and feeding) +In the design described below, the "private cloud" platform is: + +* **Highly-available** (_can tolerate the failure of a single component_) +* **Scalable** (_can add resource or capacity as required_) +* **Portable** (_run it on your garage server today, run it in AWS tomorrow_) +* **Secure** (_access protected with LetsEncrypt certificates_) +* **Automated** (_requires minimal care and feeding_) + +## Design Decisions + +**Where possible, services will be highly available.** + +This means that: + +* At least 3 docker swarm manager nodes are required, to provide fault-tolerance of a single failure. +* GlusterFS is employed for share filesystem, because it too can be made tolerant of a single failure. + +**Where multiple solutions to a requirement exist, preference will be given to the most portable solution.** + +This means that: + +* Services are defined using docker-compose v3 YAML syntax +* Services are portable, meaning a particular stack could be shut down and moved to a new provider with minimal effort. + +## High availability + +### Normal function + +Assuming 3 nodes, under normal circumstances the following is illustrated: + +* All 3 nodes provide shared storage via GlusterFS, which is provided by a docker container on each node. (i.e., not running in swarm mode) +* All 3 nodes participate in the Docker Swarm as managers. +* The various containers belonging to the application "stacks" deployed within Docker Swarm are automatically distributed amongst the swarm nodes. +* Persistent storage for the containers is provide via GlusterFS mount. +* The **traefik** service (in swarm mode) receives incoming requests (on http and https), and forwards them to individual containers. Traefik knows the containers names because it's able to access the docker socket. +* All 3 nodes run keepalived, at different priorities. Since traefik is running as a swarm service and listening on TCP 80/443, requests made to the keepalived VIP and arriving at **any** of the swarm nodes will be forwarded to the traefik container (no matter which node it's on), and then onto the target backend. + +![HA function](images/docker-swarm-ha-function.png) + +### Node failure + +In the case of a failure (or scheduled maintenance) of one of the nodes, the following is illustrated: + +* The failed node no longer participates in GlusterFS, but the remaining nodes provide enough fault-tolerance for the cluster to operate. +* The remaining two nodes in Docker Swarm achieve a quorum and agree that the failed node is to be removed. +* The (possibly new) leader manager node reschedules the containers known to be running on the failed node, onto other nodes. +* The **traefik** service is either restarted or unaffected, and as the backend containers stop/start and change IP, traefik is aware and updates accordingly. +* The keepalived VIP continues to function on the remaining nodes, and docker swarm continues to forward any traffic received on TCP 80/443 to the appropriate node. + +![HA function](images/docker-swarm-node-failure.png) + +### Node restore + +When the failed (or upgraded) host is restored to service, the following is illustrated: + +* GlusterFS regains full redundancy +* Docker Swarm managers become aware of the recovered node, and will use it for scheduling **new** containers +* Existing containers which were migrated off the node are not migrated backend +* Keepalived VIP regains full redundancy + + +![HA function](images/docker-swarm-node-restore.png) diff --git a/docs/ha-docker-swarm/docker.md b/docs/ha-docker-swarm/docker.md index 7db5a05..dba9513 100644 --- a/docs/ha-docker-swarm/docker.md +++ b/docs/ha-docker-swarm/docker.md @@ -12,8 +12,9 @@ We start building our cloud with virtual machines. You could use bare-metal mach ## Preparation -1. Install Virtual machines - -* Hosts must be within the same subnet, and connected on a low-latency link (i.e., no WAN links) - -2. Setup super-user access for your admin user, as a member of the "docker" group +### Install latest docker +``` +systemctl disable docker --now +systemctl enable docker-latest --now +sed -i '/DOCKERBINARY/s/^#//g' /etc/sysconfig/docker +``` diff --git a/docs/ha-docker-swarm/images/docker-swarm-ha-function.png b/docs/ha-docker-swarm/images/docker-swarm-ha-function.png new file mode 100644 index 0000000..754c662 Binary files /dev/null and b/docs/ha-docker-swarm/images/docker-swarm-ha-function.png differ diff --git a/docs/ha-docker-swarm/images/docker-swarm-node-failure.png b/docs/ha-docker-swarm/images/docker-swarm-node-failure.png new file mode 100644 index 0000000..8114937 Binary files /dev/null and b/docs/ha-docker-swarm/images/docker-swarm-node-failure.png differ diff --git a/docs/ha-docker-swarm/images/docker-swarm-node-restore.png b/docs/ha-docker-swarm/images/docker-swarm-node-restore.png new file mode 100644 index 0000000..8ee2a0b Binary files /dev/null and b/docs/ha-docker-swarm/images/docker-swarm-node-restore.png differ diff --git a/docs/ha-docker-swarm/vms.md b/docs/ha-docker-swarm/vms.md index 38ab896..8bbeaa4 100644 --- a/docs/ha-docker-swarm/vms.md +++ b/docs/ha-docker-swarm/vms.md @@ -1,28 +1,41 @@ # Introduction -We start building our cloud with virtual machines. You could use bare-metal machines as well, the configuration would be the same. Given that most readers (myself included) will be using virtual infrastructure, from now on I'll be referring strictly to VMs. +Let's start building our cloud with virtual machines. You could use bare-metal machines as well, the configuration would be the same. Given that most readers (myself included) will be using virtual infrastructure, from now on I'll be referring strictly to VMs. + +I chose the "[Atomic](https://www.projectatomic.io/)" CentOS/Fedora image for the VM layer because: + +1. I want less responsibility for maintaining the system, including ensuring regular software updates and reboots. Atomic's idempotent nature means the OS is largely real-only, and updates/rollbacks are "atomic" (haha) procedures, which can be easily rolled back if required. +2. For someone used to administrating servers individually, Atomic is a PITA. You have to employ [tricky](atomic-trick2) [tricks](atomic-trick1) to get it to install in a non-cloud environment. It's not designed for tweaking or customizing beyond what cloud-config is capable of. For my purposes, this is good, because it forces me to change my thinking - to consider every daemon as a container, and every config as code, to be checked in and version-controlled. Atomic forces this thinking on you. +3. I want the design to be as "portable" as possible. While I run it on VPSs now, I may want to migrate it to a "cloud" provider in the future, and I'll want the most portable, reproducible design. + +[atomic-trick1]:https://spinningmatt.wordpress.com/2014/01/08/a-recipe-for-starting-cloud-images-with-virt-install/ +[atomic-trick2]:http://blog.oddbit.com/2015/03/10/booting-cloud-images-with-libvirt/ ## Ingredients 3 x Virtual Machines, each with: + * CentOS/Fedora Atomic * At least 1GB RAM -* At least 20GB disk space (but it'll be tight) -* Connectivity to each other within the same subnet, and on a low-latency link (i.e., no WAN links) +* At least 20GB disk space (_but it'll be tight_) +* Connectivity to each other within the same subnet, and on a low-latency link (_i.e., no WAN links_) ## Preparation ### Install Virtual machines -1. Install Virtual machines -2. Setup super-user access for your admin user, as a member of the "docker" group +1. Install / launch virtual machines. +2. The default username on CentOS atomic is "centos", and you'll have needed to supply your SSH key during the build process. If you're not using a platform with cloud-init support (i.e., you're building a VM manually, not provisioning it through a cloud provider), you'll need to refer to [trick #1][atomic-trick1] and [#2][atomic-trick2] for a means to override the automated setup, apply a manual password to the CentOS account, and enable SSH password logins. -I chose the "Atomic" CentOS/Fedora image because: +### Upgrade Atomic -1. I want less responsibility for maintaining the system, including ensuring regular software updates and reboots. Atomic's idempotent nature means the OS is largely real-only, and updates/rollbacks are "atomic" (haha) procedures, which can be easily rolled back if required. -2. For someone used to administrating servers individually, Atomic is a PITA. You have to employ [tricky](http://blog.oddbit.com/2015/03/10/booting-cloud-images-with-libvirt/) [tricks](https://spinningmatt.wordpress.com/2014/01/08/a-recipe-for-starting-cloud-images-with-virt-install/) to get it to install in a non-cloud environment. It's not designed for tweaking or customizing beyond what cloud-config is capable of. For my purposes, this is good, because it forces me to change my thinking - to consider every daemon as a container, and every config as code, to be checked in and version-controlled. Atomic forces this thinking on you. -3. I want the design to be as "portable" as possible. While I run it on VPSs now, I may want to migrate it to a "cloud" provider in the future, and I'll want the most portable, reproducible design. +Run ```atomic host upgrade```, and reboot if necessary. -atomic host upgrade +## Serving + +After completing the above, you should have: + +* [X] 3 fresh atomic instances, at the latest releases +* [X] A user belonging to the docker group for administration diff --git a/mkdocs.yml b/mkdocs.yml index b6e72a4..7853dbd 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -20,7 +20,7 @@ pages: - HA Docker Swarm: - Index: ha-docker-swarm/index.md - Design: ha-docker-swarm/design.md - - VMs: ha-docker-swarm/design.md + - VMs: ha-docker-swarm/vms.md - Persistent Storage: beginner/beginner.md - Keepalived: advanced/keepalived.md - Docker Swarm Mode: advanced/keepalived.md