Skip to content

📝 Author

Birat Aryalbirataryal.github.io Created Date: 2026-03-11
Updated Date: Wednesday 11th March 2026 00:13:06
Website - birataryal.com.np
Repository - Birat Aryal
LinkedIn - Birat Aryal
DevSecOps Engineer | System Engineer | Cyber Security Analyst | Network Engineer

Cluster API initialization fails

Symptoms

clusterctl init fails or controllers do not start.

Possible causes

  • Incorrect provider configuration
  • Internet access issues
  • Invalid clusterctl.yaml

Checks

Bash
clusterctl config repositories  
kubectl get pods -A

VMs are not created in vCenter

Symptoms

Cluster YAML applied but no VMs appear.

Possible causes

  • Incorrect template name
  • Invalid resource pool
  • Incorrect folder path
  • Wrong datastore

Checks

Bash
kubectl get machines -A  
kubectl get vspheremachines -A  
kubectl -n capv-system logs deploy/capv-controller-manager

Verify vCenter objects:

Bash
govc ls "/${VSPHERE_DATACENTER}/vm"

VM created but node never joins cluster

Symptoms

VM boots but does not appear in Kubernetes nodes.

Possible causes

  • cloud-init failure
  • missing gateway
  • DNS misconfiguration
  • kubeadm failure

Check inside node

Bash
cloud-init status  
journalctl -u kubelet -xe  
ip route  
cat /etc/resolv.conf

Workers stuck in pending state

Symptoms

Bash
kubectl get machines

shows workers stuck in Provisioning.

Possible causes

  • Control plane unreachable
  • API VIP unreachable
  • network misconfiguration

Checks

Bash
clusterctl describe cluster <cluster-name>  
kubectl get kubeadmcontrolplanes

Cluster unreachable

Symptoms

CAPV logs show:

cluster is not reachable: connect: no route to host

Possible causes

  • missing default route
  • incorrect gateway
  • kube-vip not running

Checks

Bash
ip route  
ping <gateway>  
curl -k https://<VIP>:6443/healthz

Calico does not start

Symptoms

Pods in calico-system not running.

Possible causes

  • wrong pod CIDR
  • invalid manifest patch
  • CRS label mismatch

Checks

Bash
kubectl get clusterresourcesets  
kubectl get configmap calico-manifest  
kubectl get pods -n calico-system

vSphere CPI ImagePullBackOff

Symptoms

ImagePullBackOff

Cause

Incorrect CPI version.

Fix

Ensure version matches Kubernetes version.

Example:

registry.k8s.io/cloud-pv-vsphere/cloud-provider-vsphere:v1.30.0


Cluster deletion stuck

Symptoms

Cluster resources remain after deletion.

Possible causes

  • finalizers
  • orphaned VMs

Fix

Remove finalizers carefully:

Bash
kubectl patch vspheremachine <name> -p '{"metadata":{"finalizers":[]}}'

Warning

Removing finalizers should only be used when normal deletion fails.