Cleanup
Cleaning up a Cluster¶
If you want to tear down the cluster and bring up a new one, be aware of the following resources that will need to be cleaned up:
rook-ceph
namespace: The Rook operator and cluster created byoperator.yaml
andcluster.yaml
(the cluster CRD)/var/lib/rook
: Path on each host in the cluster where configuration is cached by the ceph mons and osds
Note that if you changed the default namespaces or paths such as dataDirHostPath
in the sample yaml files, you will need to adjust these namespaces and paths throughout these instructions.
If you see issues tearing down the cluster, see the Troubleshooting section below.
If you are tearing down a cluster frequently for development purposes, it is instead recommended to use an environment such as Minikube that can easily be reset without worrying about any of these steps.
Delete the Block and File artifacts¶
First you will need to clean up the resources created on top of the Rook cluster.
These commands will clean up the resources from the block and file walkthroughs (unmount volumes, delete volume claims, etc). If you did not complete those parts of the walkthrough, you can skip these instructions:
After those block and file resources have been cleaned up, you can then delete your Rook cluster. This is important to delete before removing the Rook operator and agent or else resources may not be cleaned up properly.
Delete the CephCluster CRD¶
Edit the CephCluster
and add the cleanupPolicy
WARNING: DATA WILL BE PERMANENTLY DELETED AFTER DELETING THE CephCluster
CR WITH cleanupPolicy
.
Once the cleanup policy is enabled, any new configuration changes in the CephCluster will be blocked. Nothing will happen until the deletion of the CR is requested, so this cleanupPolicy
change can still be reverted if needed.
Checkout more details about the cleanupPolicy
here
Delete the CephCluster
CR.
Verify that the cluster CR has been deleted before continuing to the next step.
If the cleanupPolicy
was applied, then wait for the rook-ceph-cleanup
jobs to be completed on all the nodes. These jobs will perform the following operations:
- Delete the directory
/var/lib/rook
(or the path specified by thedataDirHostPath
) on all the nodes - Wipe the data on the drives on all the nodes where OSDs were running in this cluster
Note: The cleanup jobs might not start if the resources created on top of Rook Cluster are not deleted completely. See
Delete the Operator and related Resources¶
This will begin the process of the Rook Ceph operator and all other resources being cleaned up. This includes related resources such as the agent and discover daemonsets with the following commands:
If the cleanupPolicy
was applied and the cleanup jobs have completed on all the nodes, then the cluster tear down has been successful. If you skipped adding the cleanupPolicy
then follow the manual steps mentioned below to tear down the cluster.
Delete the data on hosts¶
Attention
The final cleanup step requires deleting files on each host in the cluster. All files under the dataDirHostPath
property specified in the cluster CRD will need to be deleted. Otherwise, inconsistent state will remain when a new cluster is started.
Connect to each machine and delete /var/lib/rook
, or the path specified by the dataDirHostPath
.
In the future this step will not be necessary when we build on the K8s local storage feature.
If you modified the demo settings, additional cleanup is up to you for devices, host paths, etc.
Zapping Devices¶
Disks on nodes used by Rook for osds can be reset to a usable state with methods suggested below. Note that these scripts are not one-size-fits-all. Please use them with discretion to ensure you are not removing data unrelated to Rook and/or Ceph.
Disks can be zapped fairly easily. A single disk can usually be cleared with some or all of the steps below.
Ceph can leave LVM and device mapper data that can lock the disks, preventing the disks from being used again. These steps can help to free up old Ceph disks for re-use. Note that this only needs to be run once on each node and assumes that all Ceph disks are being wiped. If only some disks are being wiped, you will have to manually determine which disks map to which device mapper devices.
If disks are still reported locked, rebooting the node often helps clear LVM-related holds on disks.
Troubleshooting¶
If the cleanup instructions are not executed in the order above, or you otherwise have difficulty cleaning up the cluster, here are a few things to try.
The most common issue cleaning up the cluster is that the rook-ceph
namespace or the cluster CRD remain indefinitely in the terminating
state. A namespace cannot be removed until all of its resources are removed, so look at which resources are pending termination.
Look at the pods:
If a pod is still terminating, you will need to wait or else attempt to forcefully terminate it (kubectl delete pod <name>
).
Now look at the cluster CRD:
If the cluster CRD still exists even though you have executed the delete command earlier, see the next section on removing the finalizer.
Removing the Cluster CRD Finalizer¶
When a Cluster CRD is created, a finalizer is added automatically by the Rook operator. The finalizer will allow the operator to ensure that before the cluster CRD is deleted, all block and file mounts will be cleaned up. Without proper cleanup, pods consuming the storage will be hung indefinitely until a system reboot.
The operator is responsible for removing the finalizer after the mounts have been cleaned up. If for some reason the operator is not able to remove the finalizer (i.e., the operator is not running anymore), you can delete the finalizer manually with the following command:
This command will patch the following CRDs on v1.3:
Within a few seconds you should see that the cluster CRD has been deleted and will no longer block other cleanup such as deleting the rook-ceph
namespace.
If the namespace is still stuck in Terminating state, you can check which resources are holding up the deletion and remove the finalizers and delete those
Remove critical resource finalizers¶
Rook adds a finalizer ceph.rook.io/disaster-protection
to resources critical to the Ceph cluster so that the resources will not be accidentally deleted.
The operator is responsible for removing the finalizers when a CephCluster is deleted. If for some reason the operator is not able to remove the finalizers (i.e., the operator is not running anymore), you can remove the finalizers manually with the following commands: