📝 Author
Birat Aryal — birataryal.github.io
Created Date: 2026-03-11
Updated Date: Wednesday 11th March 2026 00:13:06
Website - birataryal.com.np
Repository - Birat Aryal
LinkedIn - Birat Aryal
DevSecOps Engineer | System Engineer | Cyber Security Analyst | Network Engineer
Cluster API initialization fails
Symptoms
clusterctl init fails or controllers do not start.
Possible causes
- Incorrect provider configuration
- Internet access issues
- Invalid clusterctl.yaml
Checks
clusterctl config repositories
kubectl get pods -A
VMs are not created in vCenter
Symptoms
Cluster YAML applied but no VMs appear.
Possible causes
- Incorrect template name
- Invalid resource pool
- Incorrect folder path
- Wrong datastore
Checks
kubectl get machines -A
kubectl get vspheremachines -A
kubectl -n capv-system logs deploy/capv-controller-manager
Verify vCenter objects:
govc ls "/${VSPHERE_DATACENTER}/vm"
VM created but node never joins cluster
Symptoms
VM boots but does not appear in Kubernetes nodes.
Possible causes
- cloud-init failure
- missing gateway
- DNS misconfiguration
- kubeadm failure
Check inside node
cloud-init status
journalctl -u kubelet -xe
ip route
cat /etc/resolv.conf
Workers stuck in pending state
Symptoms
kubectl get machines
shows workers stuck in Provisioning.
Possible causes
- Control plane unreachable
- API VIP unreachable
- network misconfiguration
Checks
clusterctl describe cluster <cluster-name>
kubectl get kubeadmcontrolplanes
Cluster unreachable
Symptoms
CAPV logs show:
cluster is not reachable: connect: no route to host
Possible causes
- missing default route
- incorrect gateway
- kube-vip not running
Checks
ip route
ping <gateway>
curl -k https://<VIP>:6443/healthz
Calico does not start
Symptoms
Pods in calico-system not running.
Possible causes
- wrong pod CIDR
- invalid manifest patch
- CRS label mismatch
Checks
kubectl get clusterresourcesets
kubectl get configmap calico-manifest
kubectl get pods -n calico-system
vSphere CPI ImagePullBackOff
Symptoms
ImagePullBackOff
Cause
Incorrect CPI version.
Fix
Ensure version matches Kubernetes version.
Example:
registry.k8s.io/cloud-pv-vsphere/cloud-provider-vsphere:v1.30.0
Cluster deletion stuck
Symptoms
Cluster resources remain after deletion.
Possible causes
- finalizers
- orphaned VMs
Fix
Remove finalizers carefully:
kubectl patch vspheremachine <name> -p '{"metadata":{"finalizers":[]}}'
Warning
Removing finalizers should only be used when normal deletion fails.