Post

K3s, lightweight home kubernetes cluster

Setting up a high-availability K3s cluster

This is a guide to provision a K3s cluster in a high availability configuration.

The cluster is quick and easy to setup, although if you’re looking for the easiest method to setup the same type of cluster, I’d recommend checking out Techno Tim’s ansible playbook for k3s that will bootstrap a cluster in minutes.

Installation

1. Provision virtual machines running Ubuntu Server 22.04.3

Provision the number of virtual machines (or bare-metal servers) that you’d like to use for your cluster, with the minimum number being 3. I use proxmox to manage my homelab environment, so I’ll be using virtual machines. I’m going to provision 3 virtual machines, which will operate as my control planes, and nodes. I’m going to assume you’re comfortable installing Ubuntu Server, and will not go into detail on how to do so.

When creating a high-availability cluster, whether it’s with K3s or Kubernetes, the control-planes must be in an odd number

Generally I make my control-planes disk size around 30-50 GB, and my worker nodes around 20-30 GB. I also make sure to give the control-planes more RAM and CPU than the worker nodes.But if you are planning to use Longhorn for storage, you may want to give the worker nodes more disk space, as Longhorn will use the worker nodes for storage. If you’d like to know the general node requirements for K3s, you can check the official documentation

2. Update and prepare each server.

This should be done for each server.

  1. Update the freshly installed machines

    1
    
    sudo apt update; sudo apt upgrade -y;
    
  2. Reconfiure unattended-upgrades

    1
    
    sudo dpkg-reconfigure --priority=low unattended-upgrades
    
  3. Verify unattended upgrades are enabled

    1
    
    sudo nano /etc/apt/apt.conf.d/20auto-upgrades
    

    Ensure the file looks like this:

    1
    2
    
    APT::Periodic::Update-Package-Lists "1";
    APT::Periodic::Unattended-Upgrade "1";
    

    Disable automatic reboots by editing the file:

    1
    
    sudo nano /etc/apt/apt.conf.d/50unattended-upgrades
    

    and ensuring this line is commented out (remove the // at the beginning of the line)

    1
    
    Unattended-Upgrade::Automatic-Reboot "false";
    
  4. Ensure the server has a static IP address

    For the steps below Replace *.yaml with the file that contains your network configuration. If you’re unsure, you can check the contents of the directory with ls /etc/netplan/. Once you’ve made the changes, you can test them with sudo netplan try. If the changes are successful, you can apply them with sudo netplan apply.

    1
    
     sudo nano /etc/netplan/*.yaml
    

    Ensure the file looks similar to this:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    
     network:
     ethernets:
       ens18:
         addresses:
           - 192.168.56.150/24
         nameservers:
           addresses:
             - 1.1.1.1
             - 1.0.0.1
           search: []
         routes:
           - to: default
             via: 192.168.56.1
     version: 2
    
  5. If using LVM for your storage, you may want to resize the root partition to use the entire disk. You can do this with the following commands:

    1
    
    sudo lvm
    
    1
    
    lvscan
    

    Find your root partition, and resize it with:

    1
    
    lvextend -l +100%FREE /dev/ubuntu-vg/ubuntu-lv
    

    Replace /dev/ubuntu-vg/ubuntu-lv with the path to your root partition.

    1
    
    exit
    

    Then resize the filesystem with:

    1
    
     sudo resize2fs /dev/vgubuntu-server/root
    
  6. Set the hostname (optional)

    1
    
    sudo hostnamectl set-hostname k3s-control-plane-1
    
  7. Set the timezone

    1
    
    sudo timedatectl set-timezone America/Chicago
    

    Replace America/Chicago with your timezone.

  8. Setup the firewall (optional, but recommended)

    1
    2
    3
    4
    
    sudo ufw default deny incoming
    sudo ufw default allow outgoing
    sudo ufw allow ssh
    sudo ufw enable
    
  9. Setup fail2ban (optional, but recommended)

    1
    2
    3
    
    sudo apt install fail2ban -y
    sudo cp /etc/fail2ban/fail2ban.{conf,local}
    sudo cp /etc/fail2ban/jail.{conf,local}
    

    Set the backend to systemd

    1
    
    sudo nano /etc/fail2ban/jail.local
    
    1
    
    backend = systemd
    

    Enable and start fail2ban

    1
    2
    
     sudo systemctl enable fail2ban
     sudo systemctl start fail2ban
    

3. Prepare the control plane servers for K3s

We are going to be using KubeVIP to create a virtual IP for the control plane servers. This will allow us to use a single IP to access the control plane, and if one of the control plane servers goes down, the virtual IP will move to another server. This is a requirement for a high-availability cluster.

  1. Install the Docker Engine

    I won’t go into detail on how to install Docker in this guide, as it’s well documented on their website. I recommend using the official Docker documentation to install Docker.

  2. Create the K3s manifests folder

    1
    
    sudo mkdir -p /var/lib/rancher/k3s/server/manifests/
    
  3. Download the KubeVIP RBAC manifest

    1
    
    sudo curl https://kube-vip.io/manifests/rbac.yaml > kube-vip-rbac.yaml
    

    Move the file to the K3s manifests folder

    1
    
    sudo mv kube-vip-rbac.yaml /var/lib/rancher/k3s/server/manifests
    
  4. Generate the DaemonSet manifest for KubeVIP

    It’s a good idea to check the official KubeVIP documentation for the latest instructions for generating the DaemonSet manifest. We are using the ARP method for this guide.

    Export the IP address intended for the virtual IP (use an IP address that’s not in use on your network, and is within the same subnet as the control plane servers)

    1
    
    export VIP=192.168.1.45
    

    Set the INTERFACE name to the name of the interface on the control plane(s) which will announce the VIP. In many Linux distributions this can be found with the ip a command.

    1
    
    export INTERFACE=ens18
    

    Get the latest version of the kube-vip release by parsing the GitHub API. This step requires that jq and curl are installed.

    1
    
    KVVERSION=$(curl -sL https://api.github.com/repos/kube-vip/kube-vip/releases | jq -r ".[0].name")
    

    Create the kube-vip alias

    1
    
     alias kube-vip="sudo ctr image pull ghcr.io/kube-vip/kube-vip:$KVVERSION; sudo ctr run --rm --net-host ghcr.io/kube-vip/kube-vip:$KVVERSION vip /kube-vip"
    

    I’ve added sudo to the alias command before each instance of ctr, as I have not configured my user to run docker commands without sudo. If you have, you can remove the sudo from the alias command.

    Generate the DaemonSet manifest

    1
    2
    3
    4
    5
    6
    7
    8
    9
    
     kube-vip manifest daemonset \
     --interface $INTERFACE \
     --address $VIP \
     --inCluster \
     --taint \
     --controlplane \
     --services \
     --arp \
     --leaderElection > kube-vip-daemonset.yaml
    

    I’ve written the output to a file called kube-vip-daemonset.yaml by adding > kube-vip-daemonset.yaml but you can name it whatever you’d like.

    Move the file to the K3s manifests folder

    1
    
    sudo mv kube-vip-daemonset.yaml /var/lib/rancher/k3s/server/manifests
    

4. Install K3s on the control plane servers

  1. Run the K3s install script on the first control plane server

    1
    
    curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION={VERSION} K3S_TOKEN={TOKEN} sh -s - server --flannel-iface={INTERFACE} --disable servicelb --disable traefik --node-taint node-role.kubernetes.io/master=true:NoSchedule --tls-san {KUBE_VIP_IP} --cluster-init --write-kubeconfig-mode 644 --node-ip {NODE_IP} --kube-controller-manager-arg bind-address=0.0.0.0 --kube-proxy-arg metrics-bind-address=0.0.0.0 --kube-scheduler-arg bind-address=0.0.0.0 --etcd-expose-metrics true --kubelet-arg containerd=/run/k3s/containerd/containerd.sock
    
    • Replace {VERSION} with the version of K3s you’d like to install. You can find the latest version on the K3s GitHub releases page, the version should be in the format v1.29.1+k3s1.
    • Replace {TOKEN} with a token random string of characters, it cannot contain any special characters. This token will be used to join the control plane servers to the cluster. you can use openssl rand -hex 64 to generate a random string.
    • -s is used to run the script in silent mode.
    • server is used to install K3s as a server.
    • --flanel-iface={INTERFACE} is the interface that flannel will use for the overlay network. Replace {INTERFACE} with the name of the interface on your server.
    • -- disable servicelb and --disable traefik are used because we are using KubeVIP for the virtual IP, and we will be using MetalLB for the load balancer.
    • --node-taint node-role.kubernetes.io/master=true:NoSchedule is used to prevent workloads from being scheduled on the control plane servers. if you’d like to schedule workloads on the control plane servers, you can remove this flag.
    • --tls-san {KUBE_VIP_IP} value should be the IP address of the virtual IP set by KubeVIP.
    • --cluster-init is used to initialize the cluster on the first control plane server.
    • --write-kubeconfig-mode 644 is used to set the permissions on the kubeconfig file.
    • --node-ip {NODE_IP} is the IP address of the server. Replace {NODE_IP} with the IP address of the server.
    • --kube-controller-manager-arg bind-address=0.0.0.0 is used to bind the controller manager to all interfaces. This is required for the virtual IP to work.
    • --kube-proxy-arg metrics-bind-address=0.0.0.0 is used to bind the kube-proxy to all interfaces. This is required for metrics to be exposed.
    • --kube-scheduler-arg bind-address=0.0.0.0 is used to bind the scheduler to all interfaces. This is required for the virtual IP to work.
    • --etcd-expose-metrics true is used to expose etcd metrics.
    • --kubelet-arg containerd=/run/k3s/containerd/containerd.sock is used to set the container runtime to containerd.
  2. Verify the K3s server is running

    1
    
    sudo k3s kubectl get nodes
    

    You should see the control plane server listed as a node.

  3. Run the K3s install script on the remaining control plane servers with the same token, so the command will look like this:

    1
    
    curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION={VERSION} K3S_TOKEN={TOKEN} sh -s - server --flannel-iface={INTERFACE} --disable servicelb --disable traefik --node-taint node-role.kubernetes.io/master=true:NoSchedule --tls-san {KUBE_VIP_IP} --write-kubeconfig-mode 644 --node-ip {NODE_IP} --server https://{KUBE_VIP_IP}:6443--kube-controller-manager-arg bind-address=0.0.0.0 --kube-proxy-arg metrics-bind-address=0.0.0.0 --kube-scheduler-arg bind-address=0.0.0.0 --etcd-expose-metrics true --kubelet-arg containerd=/run/k3s/containerd/containerd.sock
    
    • Remove --cluster-init flag, as the remaining control plane servers will join the cluster, not initialize it.
    • --node-ip {NODE_IP} is the IP address of the server. Replace {NODE_IP} with the IP address of the currently used control-plane server.
    • Add --server https://{KUBE_VIP_IP}:6443 to the end of the command, replacing {KUBE_VIP_IP} with the IP address of the KubeVIP virtual IP.
    • Remove --tls-san {KUBE_VIP_IP} as it’s not needed for the remaining control plane servers.
  4. Verify the other control plane servers have joined the cluster by running this command on the first control-plane.

    1
    
    sudo k3s kubectl get nodes
    

    You should see the control plane servers listed.

  5. Copy the kubeconfig file from the first control plane server to your local machine

    1
    
    sudo cat /etc/rancher/k3s/k3s.yaml
    

    Copy the contents of the file to your local machine into a file located at ~/.kube/config

    1
    
    nano ~/.kube/config
    

    Before saving the file, replace the server value with the virtual IP address of KubeVIP that you set earlier, changing it from https://127.0.0.1:6443 to https://{KUBE_VIP_IP}:6443.

5. Connect the worker nodes to the cluster

  1. Run the K3s install script on the worker nodes

    If you plan to use Longhorn storage you can install the dependencies now.

    1
    
    sudo apt install nfs-common open-iscsi -y; sudo systemctl enable open-iscsi --now
    

    Otherwise, continue with the K3s install script

    1
    
    curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION={VERSION} K3S_TOKEN={TOKEN} K3S_URL=https://{KUBE_VIP_IP}:6443 sh -s - --node-ip {NODE_IP} --flannel-iface={INTERFACE}
    
    • Replace {VERSION} with the version of K3s you installed on the control plane servers.
    • Replace {TOKEN} with the token you used to join the control plane servers to the cluster.
    • Replace {KUBE_VIP_IP} with the IP address of the virtual IP set by KubeVIP.
    • Replace {NODE_IP} with the IP address of the worker node.
    • Replace {INTERFACE} with the name of the interface on the worker node.
  2. Verify the worker nodes have joined the cluster by running this command on the first control-plane, or with kubectl on your local machine with the copied kubeconfig.

    1
    
    sudo k3s kubectl get nodes
    
    1
    
    sudo kubectl get nodes
    

    You should see the worker nodes listed.

6. Install MetalLB

MetalLB is a load balancer that will allow us to expose services to the network from our on-premises cluster. We will use MetalLB to expose any services we’d like to access from our network.

  1. Apply the MetalLB manifest

    1
    
    kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.14.3/config/manifests/metallb-native.yaml
    

    You can install the latest version of MetalLB by checking the official documentation

  2. Define an address pool for MetalLB to use for assigning IP Addresses. This should be a set of IP Addresses that is outside of your DHCP server scope so other devices on your network don’t accidentally get assigned to any of these addresses. You can check the official documentation for more information on how to do this: Configuring MetalLB

    1
    
    nano metallb_pool.yaml
    

    insert the following, and modify as needed to suit your network

    1
    2
    3
    4
    5
    6
    7
    8
    
    apiVersion: metallb.io/v1beta1
    kind: IPAddressPool
    metadata:
       name: first-pool
       namespace: metallb-system
    spec:
       addresses:
       - 192.168.9.1-192.168.9.5
    

    apply the manifest

    1
    
    sudo kubectl apply -f metallb_pool.yaml
    
  3. Now we will create an L2Advertisement so we can use these IP addresses on our network. We are setting up Layer 2 mode, which is the easiest to setup, but you can use BGP if you’d like. You can check the official documentation for more information on how to do this: Configuring MetalLB

    1
    
    nano metallb_l2.yaml
    

    insert the following

    1
    2
    3
    4
    5
    
    apiVersion: metallb.io/v1beta1
    kind: L2Advertisement
    metadata:
       name: example
       namespace: metallb-system
    

    apply the manifest

    1
    
    sudo kubectl apply -f metallb_l2.yaml
    

7. Install Cert-Manager

Cert-Manager is a Kubernetes add-on to automate the management and issuance of TLS certificates from various issuing sources. It will ensure that our services are secure by providing them with TLS certificates. I’ll be using the official documentation to install Cert-Manager, and using Helm to install it. If you don’t have Helm installed, you can install it by following the official documentation.

  1. Add the Jetstack Helm repository

    1
    
    helm repo add jetstack https://charts.jetstack.io --force-update
    
  2. Update the Helm repositories

    1
    
    helm repo update
    
  3. Install Cert-Manager

    1
    2
    3
    4
    5
    6
    
    helm install \
       cert-manager jetstack/cert-manager \
       --namespace cert-manager \
       --create-namespace \
       --version v1.14.3 \
       --set installCRDs=true
    

8. Install Longhorn

Longhorn is a distributed block storage system for Kubernetes. It’s a great way to provide persistent storage for your workloads. I’ll be using the official documentation to install Longhorn, and using Helm to install it. If you don’t have Helm installed, you can install it by following the official documentation.

  1. Add the Longhorn Helm repository

    1
    
    helm repo add longhorn https://charts.longhorn.io
    
  2. Update the Helm repositories

    1
    
    helm repo update
    
  3. Install Longhorn

    1
    
    helm install longhorn longhorn/longhorn --namespace longhorn-system --create-namespace --version 1.6.0
    

9. Install Prometheus for cluster monitoring

Prometheus is a monitoring and alerting toolkit that is used to monitor the health of your cluster, we’ll be installing the community Helm chart for Prometheus. for this section, I’d recommend checking out Techno Tim’s guide on installing Prometheus with Helm, as it’s a great guide that I’ve used in the past. You can find it here.

Conclusion

You should now have a high-availability K3s cluster that is ready to use. You can now deploy workloads to the cluster, and expose them to your network using MetalLB, and secure them with Cert-Manager. You can also use Longhorn to provide persistent storage for your workloads, and monitor the health of your cluster with Prometheus.

Thanks for reading, and I hope this guide was helpful to you.

This post is licensed under CC BY 4.0 by the author.