proxmox delete and recreate cluster

posted on 2016-12-21 22:48

In case you have the questionable idea of renaming a hypervisor of your proxmox cluster, you are going to feel some pain. (It won't work and you will get scared wether you fucked your system landscape up or not. Been there, done that.)

The only viable and reproducible approach I found was removing all cluster configurations from all HV's, rebooting them, then recreating the cluster on one HV and readding all the others again.

read this, or continue at your own peril

To sum it up again, some notes before:

  • tested with proxmox 4.4
  • do all the next steps on all hosts
  • rebooting is neccessary aftwards, of all hv's. maybe not, but at least the first node had to be rebooted
  • the sqlite output, if any is shown, should only appear once
  • working ssh between your hv's is neccessary, errors or warning will prevent you from readding nodes after the cluster recreation
  • backup /etc/pve before doing anything, as you will lose all your vm configurations in the process, but these can be copied back afterwards.
  • no guarantee that this post will cover everything

howto remove all clusterconfigs

Let's go: (this is completely paste-able)

# backup
cp -va /etc/pve /root

systemctl stop pvestatd.service
systemctl stop pvedaemon.service
systemctl stop pve-cluster.service
systemctl stop corosync
systemctl stop pve-cluster
pmxcfs -l
rm /etc/pve/corosync.conf
rm /etc/corosync/*
rm /var/lib/corosync/*
rm -rf /etc/pve/nodes/*
sqlite3 /var/lib/pve-cluster/config.db "select * from tree where name='corosync.conf'"
sqlite3 /var/lib/pve-cluster/config.db "delete from tree where name='corosync.conf'"
sqlite3 /var/lib/pve-cluster/config.db "select * from tree where name='corosync.conf'"

Check for error messages, then:


Recreate the cluster on the first HV: (or whichever one you see fit)

pvecm create CLUSTER-NAME

Then readd all other HVs to your newly created cluster. From each of them, do:

#test ssh

if that does work, add, else see below how to troubleshoot
pvecm add IP-OF-FIRST-HV

troubleshooting SSH issues

Adding nodes works best with keyauth (Don't know wether I ever tried it without, to be honest, but I doubt it works.). In case you have reinstalled a node or something, try connecting via ssh from the host in question to your 'first' hv.

Read the error message closely, as known hosts are stored in /etc/ssh/ssh_known_hosts, not ~/.ssh/known_hosts:

# in case you have trouble on a certain host
> /root/.ssh/known_hosts
> /etc/ssh/ssh_known_hosts
ssh-copy-id FIRST_HV

As said before, ssh errors or warnings won't let you add vm's to a cluster.

browser not working

Once you have completed the stuff above, close all browsertabs you had opened to access your cluster. Simply refreshing them does not seem to work.

finishing touches (fix your vms before you become stressed out)

When looking at the webgui, you might become scared, as all your virtual hosts seem to be missing. This happens with VM's, but I guess the same happens with Containers, too.

In fact, we worked on proxmox cluster filesystem where it stores a lot of its settings, which gets mounted at /etc/pve aftwards. Which happens to be stored completely under /var/lib/pve-cluster/config.db as a sqlite3 database.

There all file contents (the actual character that get written into the config file(s)), the inode of the file that shall be created, along with the folder structure etc. etc. .

Once your cluster is running, try diff / colordiff to spot the exact differences. (I.e. colordiff /root/pve /etc/pve to see the file contents) Or a simple find /root/pve -iname "*conf" might also do.

Copy the configs back to their original locations, and everything should be fine.

proxmox: manual cluster migration

posted on 2015-07-13 18:10:53

  1. change drbd on current node to 'active', if needed
  2. service pve-cluster stop
  3. pmxcfs -l
  4. start vm again (qm start <vm-id>)

To rebuild the cluster again:

  • unmount /etc/pve
  • service pve-cluster start

