Skip to content

Omni, Talos, and Rancher 2.10.x

  • hosting
  • server
  • servers
  • infrastructure
  • kubernetes

Setting up

I upgraded my Rancher instance from a pretty old version and for days I've been trying to get this Omni + Talos cluster connected to Rancher. For the life of me it would not work.

I'd install the agent via kubectl --kubeconfig .\talos-default-kubeconfig.yaml apply -f https://XXXX/v3/import/XXXXXXXX.yaml

and it just wouldn't connect. I got the logs via first finding the pod name with:

    # kubectl --kubeconfig .\talos-default-kubeconfig.yaml --namespace cattle-system get pods
    NAME                                    READY   STATUS   RESTARTS     AGE
    cattle-cluster-agent-7b76956b97-vdbxk   0/1     Error    1 (5s ago)   12s

And then displayed the logs

# kubectl --kubeconfig .\talos-default-kubeconfig.yaml --namespace cattle-system logs cattle-cluster-agent-7b76956b97-vdbxk

..............................
time="2025-02-11T02:23:47Z" level=info msg="Rancher agent version v2.10.2 is starting"
time="2025-02-11T02:23:47Z" level=info msg="Testing connection to https://XXXX using trusted certificate authorities within: /etc/kubernetes/ssl/certs/serverca"
time="2025-02-11T02:23:47Z" level=error msg="Could not securely connect to https://XXXX: Get \"https://XXXX\": tls: failed to verify certificate: x509: certificate signed by unknown authority"

What the heck? The certificate is signed by Lets Encrypt and is working fine for browsers. Finally I found that if I run rancher with the argument --no-cacerts according to the docs it should work better when using an ingress that creates certificates (an external Traefik instance in my case). Finally I got a slightly different error (And yes I tried the curl command too, it did not work.)

time="2025-02-11T02:34:35Z" level=info msg="Rancher agent version v2.10.2 is starting"
time="2025-02-11T02:34:36Z" level=error msg="unable to read CA file from /etc/kubernetes/ssl/certs/serverca: open /etc/kubernetes/ssl/certs/serverca: no such file or directory"
time="2025-02-11T02:34:36Z" level=error msg="Strict CA verification is enabled but encountered error finding root CA"

Interesting. After some googling I found someone who said to go into the Rancher UI, go to the global configuration page, and change the agent-tls-mode option. In rancher this involves clicking on the Globe icon near the bottom left, then searching for agent-tls-mode, and switching it from Strict to System Store.

One more time I did a delete -f, and then an apply -f and finally it worked. This took me 4 freaking days to figure out.