Troubleshooting Talos Worker Node Issues in Kubernetes Cluster

0
23
Asked By CuriousCoder42 On

I'm a hobbyist experimenting with Talos and Kubernetes, and I'm having trouble adding a second worker node to my cluster. After I boot up Talos and apply the worker configuration, the node gets stuck waiting for the `service "apid" to be "up"`. Eventually, I encounter a connection error, reverting to waiting for `apid` again. The error message indicates a transport issue: authentication handshake failed due to a TLS certificate verification error (x509). I'm looking for any debugging tips or insights that could help solve this problem. I've already tried generating a new worker.yaml file using secrets from the existing control plane config, but that didn't work. Any help would be appreciated!

3 Answers

Answered By ConfigGuru On

Do you have the worker configuration from your first node? You could apply the default vanilla configuration initially. This typically works with multiple nodes without any issues.

CuriousCoder42 -

I attempted that initially but also tried several variations, including certSans options and generating a new configuration using existing secrets, but none of that helped.

Answered By DataDynamo On

What’s the system time like on your new worker node? It’s often a good idea to check if it’s correct because time discrepancies can cause these handshake issues.

CuriousCoder42 -

I made sure to set the system time correctly. I'm syncing both nodes using Cloudflare, so they should be in sync.

Answered By TechieTinker On

Have you checked if you're using the correct talosconfig file? Make sure you're either using the `--talosconfig` flag or have it placed in `~/.talos/config`. Sharing the exact commands you've executed might also help us troubleshoot better. A good starting point for issues like these is the [Talos troubleshooting guide](https://docs.siderolabs.com/talos/v1.8/troubleshooting/troubleshooting).

HobbyistHero -

Yes, I’ve tried both methods. I downloaded the image from the image factory and put it on a Ventoy drive. I updated the machine config with the necessary details and everything seemed to boot fine at first. But after applying the config, I can no longer connect via TCP 50000, which seems to be related to my current issue.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.