hosting
server
servers
infrastructure
kubernetes

Summary¶

Omni is a platform for installing Talos to vm's and bare metal devices. Although it's not required to run Talos, it does make things easier. Setup isn't terribly difficult, although in my case I did rely on DDNS to route traffic to it. The setup details of local DDNS isn't covered by this post.

Talos is a minimal linux distrobution with only 12 binaries installed by default, designed to run Kubernetes.

Configuration of Omni is beyond the scope of this article as well, as it's well documented on the siderolabs website.

The only notes I have for the configuration is I used the docker-compose version, moved the authenatication strings directly into the compose file, and changed the default listening ports from 0.0.0.0:port to [::]:port to allow listening on ipv6 as well as ipv4.

Configuring first cluster¶

After Omni is running and configured properly, I used Omni to download an ISO around the latest Talos stable release (1.9.2), with Use Siderolink GRPC Tunnel option selected. I uploaded this iso to proxmox, and made 6 virtual machines (3 for control planes and 3 for worker nodes), giving 4 cores and 32gb of ram to each of the control planes, and 16 cores with 32gb of ram to the worker nodes. Each of these booted off the ISO and automatically connected back to the Omni instance.

Next I opened the Machine Classes page in Omni, and created two classes that matched: omni.sidero.dev/cores==4 and omni.sidero.dev/cores==16 respectively.

After creating the two machine classes I went to the Clusters page and hit Create Cluster. From here I had the choice of manually setting each node as a control plane or worker node with the icons next to each node, or to use the machine classes I created earlier. I chose the latter. I clicked Machine Class next to the control plane line, and selected the 4 cores class for the control planes, then the same for the workers execpt selecting the 16 cores class. I also enabled all 3 of the features (Encrypt disks, Workload service proxying, Use embedded discovery service)

Then I spun up the new cluster. I did run into an issue where one of the nodes had some strange issue grabbing an encryption key from omni and it stalled. I destroyed the cluster after 2x control planes and 3x workers were in the Ready state while the 3^rd control plane was stuck in the Installing state. I force rebooted the stuck installing node in my virtualization manager, and tried again. The second attempt worked without issue. Probably didn't need to destroy and recreate and could have fixed it simply by rebooting the one stuck node. But this is what I did.

Configuring the new cluster¶

From here things got a bit different. Due to using Omni instead of configuring Talos by hand, it set up the things to use oidc. So I downloaded kubectl on my windows machine, as well as Krew. I followed the Krew documentation to install it in windows via itself, added both kubectl and krew to the system path with setx /M ..., and was able use kubectl krew install oidc-login . Now I was able to download the kubeconfig from omni, use kubectl oidc-login to pop open a web browser window and login. Something I was unable to figure out how to do from the CLI vm's I usually operate these types of things from.

The very first thing I did was connect the cluster to my Rancher instance because Rancher kind of kicks ass for cluster management and configuration.

Now what¶

Now I've got a kubernetes cluster running with 3 control planes and 3 workers, with Rancher having full access to configure it. Although it's still not doing anything useful.

Next steps¶

✔️▢ Install a load balancer - MetalLB

✔️▢ Configure MetalLB with FRR mode to create IPV4 and IPV6 routes on my Cisco 3750x switch

✔️▢ Install a persistant storage provider - Ceph

✔️▢ Ensure Ceph is connected and working.

✔️▢ Install an ingress - Typically i use Traefik but this time I plan to try HAProxy

✔️▢ Install a workload - For this cluster the first workload I will attempt to install will be a Matrix server.

✔️▢ Test internal and external access to workload