1
0
mirror of https://github.com/funkypenguin/geek-cookbook/ synced 2025-12-13 09:46:23 +00:00

Updated details on VM builds

This commit is contained in:
David Young
2017-07-16 22:59:04 +12:00
parent 5548cccb24
commit d9897c5807
7 changed files with 93 additions and 22 deletions

View File

@@ -1,7 +1,64 @@
The "private cloud" platform is:
# Introduction
* **Highly-available** (can tolerate the failure of a single component)
* **Scalable** (can add resource or capacity as required)
* **Portable** (run it on your garage server today, run it in AWS tomorrow)
* **Secure** (access protected with LetsEncrypt certificates)
* **Automated** (requires minimal care and feeding)
In the design described below, the "private cloud" platform is:
* **Highly-available** (_can tolerate the failure of a single component_)
* **Scalable** (_can add resource or capacity as required_)
* **Portable** (_run it on your garage server today, run it in AWS tomorrow_)
* **Secure** (_access protected with LetsEncrypt certificates_)
* **Automated** (_requires minimal care and feeding_)
## Design Decisions
**Where possible, services will be highly available.**
This means that:
* At least 3 docker swarm manager nodes are required, to provide fault-tolerance of a single failure.
* GlusterFS is employed for share filesystem, because it too can be made tolerant of a single failure.
**Where multiple solutions to a requirement exist, preference will be given to the most portable solution.**
This means that:
* Services are defined using docker-compose v3 YAML syntax
* Services are portable, meaning a particular stack could be shut down and moved to a new provider with minimal effort.
## High availability
### Normal function
Assuming 3 nodes, under normal circumstances the following is illustrated:
* All 3 nodes provide shared storage via GlusterFS, which is provided by a docker container on each node. (i.e., not running in swarm mode)
* All 3 nodes participate in the Docker Swarm as managers.
* The various containers belonging to the application "stacks" deployed within Docker Swarm are automatically distributed amongst the swarm nodes.
* Persistent storage for the containers is provide via GlusterFS mount.
* The **traefik** service (in swarm mode) receives incoming requests (on http and https), and forwards them to individual containers. Traefik knows the containers names because it's able to access the docker socket.
* All 3 nodes run keepalived, at different priorities. Since traefik is running as a swarm service and listening on TCP 80/443, requests made to the keepalived VIP and arriving at **any** of the swarm nodes will be forwarded to the traefik container (no matter which node it's on), and then onto the target backend.
![HA function](images/docker-swarm-ha-function.png)
### Node failure
In the case of a failure (or scheduled maintenance) of one of the nodes, the following is illustrated:
* The failed node no longer participates in GlusterFS, but the remaining nodes provide enough fault-tolerance for the cluster to operate.
* The remaining two nodes in Docker Swarm achieve a quorum and agree that the failed node is to be removed.
* The (possibly new) leader manager node reschedules the containers known to be running on the failed node, onto other nodes.
* The **traefik** service is either restarted or unaffected, and as the backend containers stop/start and change IP, traefik is aware and updates accordingly.
* The keepalived VIP continues to function on the remaining nodes, and docker swarm continues to forward any traffic received on TCP 80/443 to the appropriate node.
![HA function](images/docker-swarm-node-failure.png)
### Node restore
When the failed (or upgraded) host is restored to service, the following is illustrated:
* GlusterFS regains full redundancy
* Docker Swarm managers become aware of the recovered node, and will use it for scheduling **new** containers
* Existing containers which were migrated off the node are not migrated backend
* Keepalived VIP regains full redundancy
![HA function](images/docker-swarm-node-restore.png)

View File

@@ -12,8 +12,9 @@ We start building our cloud with virtual machines. You could use bare-metal mach
## Preparation
1. Install Virtual machines
* Hosts must be within the same subnet, and connected on a low-latency link (i.e., no WAN links)
2. Setup super-user access for your admin user, as a member of the "docker" group
### Install latest docker
```
systemctl disable docker --now
systemctl enable docker-latest --now
sed -i '/DOCKERBINARY/s/^#//g' /etc/sysconfig/docker
```

Binary file not shown.

After

Width:  |  Height:  |  Size: 314 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 333 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 310 KiB

View File

@@ -1,28 +1,41 @@
# Introduction
We start building our cloud with virtual machines. You could use bare-metal machines as well, the configuration would be the same. Given that most readers (myself included) will be using virtual infrastructure, from now on I'll be referring strictly to VMs.
Let's start building our cloud with virtual machines. You could use bare-metal machines as well, the configuration would be the same. Given that most readers (myself included) will be using virtual infrastructure, from now on I'll be referring strictly to VMs.
I chose the "[Atomic](https://www.projectatomic.io/)" CentOS/Fedora image for the VM layer because:
1. I want less responsibility for maintaining the system, including ensuring regular software updates and reboots. Atomic's idempotent nature means the OS is largely real-only, and updates/rollbacks are "atomic" (haha) procedures, which can be easily rolled back if required.
2. For someone used to administrating servers individually, Atomic is a PITA. You have to employ [tricky](atomic-trick2) [tricks](atomic-trick1) to get it to install in a non-cloud environment. It's not designed for tweaking or customizing beyond what cloud-config is capable of. For my purposes, this is good, because it forces me to change my thinking - to consider every daemon as a container, and every config as code, to be checked in and version-controlled. Atomic forces this thinking on you.
3. I want the design to be as "portable" as possible. While I run it on VPSs now, I may want to migrate it to a "cloud" provider in the future, and I'll want the most portable, reproducible design.
[atomic-trick1]:https://spinningmatt.wordpress.com/2014/01/08/a-recipe-for-starting-cloud-images-with-virt-install/
[atomic-trick2]:http://blog.oddbit.com/2015/03/10/booting-cloud-images-with-libvirt/
## Ingredients
3 x Virtual Machines, each with:
* CentOS/Fedora Atomic
* At least 1GB RAM
* At least 20GB disk space (but it'll be tight)
* Connectivity to each other within the same subnet, and on a low-latency link (i.e., no WAN links)
* At least 20GB disk space (_but it'll be tight_)
* Connectivity to each other within the same subnet, and on a low-latency link (_i.e., no WAN links_)
## Preparation
### Install Virtual machines
1. Install Virtual machines
2. Setup super-user access for your admin user, as a member of the "docker" group
1. Install / launch virtual machines.
2. The default username on CentOS atomic is "centos", and you'll have needed to supply your SSH key during the build process. If you're not using a platform with cloud-init support (i.e., you're building a VM manually, not provisioning it through a cloud provider), you'll need to refer to [trick #1][atomic-trick1] and [#2][atomic-trick2] for a means to override the automated setup, apply a manual password to the CentOS account, and enable SSH password logins.
I chose the "Atomic" CentOS/Fedora image because:
### Upgrade Atomic
1. I want less responsibility for maintaining the system, including ensuring regular software updates and reboots. Atomic's idempotent nature means the OS is largely real-only, and updates/rollbacks are "atomic" (haha) procedures, which can be easily rolled back if required.
2. For someone used to administrating servers individually, Atomic is a PITA. You have to employ [tricky](http://blog.oddbit.com/2015/03/10/booting-cloud-images-with-libvirt/) [tricks](https://spinningmatt.wordpress.com/2014/01/08/a-recipe-for-starting-cloud-images-with-virt-install/) to get it to install in a non-cloud environment. It's not designed for tweaking or customizing beyond what cloud-config is capable of. For my purposes, this is good, because it forces me to change my thinking - to consider every daemon as a container, and every config as code, to be checked in and version-controlled. Atomic forces this thinking on you.
3. I want the design to be as "portable" as possible. While I run it on VPSs now, I may want to migrate it to a "cloud" provider in the future, and I'll want the most portable, reproducible design.
Run ```atomic host upgrade```, and reboot if necessary.
atomic host upgrade
## Serving
After completing the above, you should have:
* [X] 3 fresh atomic instances, at the latest releases
* [X] A user belonging to the docker group for administration