Configure High Availability
- If you lose the Master Node, this is not the end of the world - as long as your Worker Nodes and their applications are up and running, this is good.
- If for example a pod on a Worker Node crashes however the replica set controller on the Master Node would need to tell the Worker Node to create a new pod.
- However, the Master Node is not available, nor are any of the controllers and schedulers on the Master Node.
- The kube-api server would also not be available and you could not access the cluster externally.
- However, the Master Node is not available, nor are any of the controllers and schedulers on the Master Node.
- Therefore, multiple Master Nodes are needed for a high availability configuration.
- If for example a pod on a Worker Node crashes however the replica set controller on the Master Node would need to tell the Worker Node to create a new pod.
- Having High Availability means you have redundancy across every component in the cluster.
- Therefore, no single point of failure.
- Focus will be Master and Control Plane components.
- Remember, the Master Node hosts these components:
- ETCD
- API Server
- Controller Manager
- Scheduler
- In an HA setup, the same components from above would also run on the new Master Node as well.
- How does running multiple instances with the same components work?
- They work on one request at a time.
- All of the components on both Master Servers can be running at the same time.
- The Kube Controller utility talks to the API Server to have things done.
- The Kube Control utility needs to reach the Master node at port
6443.- With two Master Nodes in play, which do we point the kubectl to?
- Should send the same request to both of them.
- Need a load balancer configured in front of the Master Nodes, which splits traffic between the API servers.
- We then point the Kubectl utility to that load balancer.
- Can use HAProxy, Nginx and so on as the load balancer.
- We then point the Kubectl utility to that load balancer.
- Need a load balancer configured in front of the Master Nodes, which splits traffic between the API servers.
- Should send the same request to both of them.
- With two Master Nodes in play, which do we point the kubectl to?
- The Kube Control utility needs to reach the Master node at port
- What about the Scheduler and Controller Manager?
- These are Controllers that watch the state of the cluster and take actions.
- The Controller Manager consists of things like the Replication Controller.
- This watches the state of pods and takes the necessary actions.
- Creating a new pod when one fails for example.
- If multiple instances run in parallel, may duplicate actions and therefore create more pods than is actually required.
- Therefore, a pair of two Master Nodes would run in a “Active” / “Standby” mode. One would be “Active” and the other would be in “Standby” mode.
- Whom decides among the two which is active and which is passive - this is decided through a leader election process.
- When a ControllerManager process is elected, must specify the
leader-electoption using the below command:kube-controller-manager --leader-elect true [other option]
- When a ControllerManager process is elected, must specify the
- Whom decides among the two which is active and which is passive - this is decided through a leader election process.
- This watches the state of pods and takes the necessary actions.
- With
leader-electset totrue, with this option, aLockis placed on the Kube-Controller-Manager Endpoint. Whichever process updates the endpoint with its information, receive the lease and becomes the active of the two. The other becomes passive. - There is also the
--leader-elect-lease-duration 15soption. This is by default set to 15 seconds. - The lease is renewed every 10 seconds. This is the
--leader-elect-renew-deadlinemeasure. - Both of the processes try to become the leader every 2 seconds, which is set via the
--leader-elect-retry-periodoption.- If one process fails due to the Master Node crashing, the other process can then acquire the Master Lock.
- Regarding ETCD.
- There are two topologies.
Stacked Control Plane Nodes Topology- Requires fewer nodes.
- Redundancy can be compromised, if Control Plane instances are lost.
- The other is when ETCD is separated from the Control Plane nodes and ran on its own set of servers. This topology is called
External ETCD Topology- Less risky - a failed Control Plane node does not affect the cluster.
- Requires twice the amount of servers however, for the ETCD nodes.
- Less risky - a failed Control Plane node does not affect the cluster.
- There are two topologies.
- Check under the
/etc/systemd/system/kube-apiserver.serviceand you’ll see an option like--etcd-servers=and can then assign the ETCD server that way.- The API Server needs to point to the right address of the ETCD server.
- Can read and write data through any of the available ETCD server instances.
- That’s why a list of ETCD servers is specified in the kube-api configuration.
- Can read and write data through any of the available ETCD server instances.
- The API Server needs to point to the right address of the ETCD server.