(NOTE: "Terminating" state is deprecated in versions > v3.0.0)
A drive with Terminating state indicates that an InUse drive was removed physically from the system and volumes on the drive are unreachable. If a new drive is attached as replacement of the removed drive, it is considered as a new drive and follow steps as mentioned here to make use of the new drive.
$ kubectl directpv drives list --status="terminating"
DRIVE CAPACITY ALLOCATED FILESYSTEM VOLUMES NODE ACCESS-TIER STATUS
/dev/xvdb 8.0 GiB 1.0 GiB xfs 2 directpv-2 - Terminating
In such cases, the corresponding volumes will be indicated as follows,
$ kubectl directpv volumes list --drives /dev/xvdb --nodes directpv-2 --all
VOLUME CAPACITY NODE DRIVE PODNAME PODNAMESPACE
pvc-ea019d52-673b-4715-a8a4-a913dd49166d 512 MiB directpv-2 xvdb minio-2 default *[DRIVE LOST] Please refer https://github.com/minio/directpv/blob/master/docs/troubleshooting.md
pvc-c973cc37-01bd-4e67-abea-b1905f19fc17 512 MiB directpv-2 xvdb minio-2 default *[DRIVE LOST] Please refer https://github.com/minio/directpv/blob/master/docs/troubleshooting.md
To clean up the abandoned volumes and reschedule them, the respective PVCs has to be deleted.
$ kubectl delete pvc minio-data-3-minio-2 minio-data-1-minio-2
persistentvolumeclaim "minio-data-3-minio-2" deleted
persistentvolumeclaim "minio-data-1-minio-2" deleted
The deleted PVCs will be re-created and provisions volumes successfully on the remaining "Ready" or "InUse" drives based on the requested topology specifications.
kubectl directpv volumes purge
command can be used to purge the lost, failed or released volumes in the cluster. This command should be used for special cases like one of the following
- When the pods and corresponding PVCs were force deleted. Force deletion might skip few necessary volume cleanups and make them stale.
- When the corresponding drive is removed or detached from the cluster.
kubectl directpv volumes list
would indicate such lost volumes with an error tag. - The volumes were deleted when directpv pod running in that node was down.
- etc..
Plese check kubectl directpv volumes purge --help
for more helpers.
(NOTE: The PVs of these stale volumes should be in "released" or "failed" state in-order to purge them)
After v3.0.0, the removed or detached InUse drives will show up in the drives list with an error message indicating that the drive is lost. In such cases, the corresponding volumes need to be purged first. ie, the corresponding PVCs has to be cleaned-up first and then the drive can be purged.
(NOTE: before deleting the lost PVCs, please cordon the node, to avoid any PVC conflicts)
Here is an example with a step-by-step procedure to handle drive replacements,
STEP 1: After detaching the InUse drive and replacing it with a fresh drive, the new drive will show up in the list as "Available"
[root@control-plane ~] kubectl directpv drives list
DRIVE CAPACITY ALLOCATED FILESYSTEM VOLUMES NODE ACCESS-TIER STATUS
/dev/vdb 512 MiB 83 MiB xfs 8 control-plane.minikube.internal - InUse
/dev/vdc 512 MiB - xfs 8 control-plane.minikube.internal - InUse* drive is lost or corrupted
/dev/vdd 512 MiB - - - control-plane.minikube.internal - Available
here the drive /dev/vdc
is detached and new drive /dev/vdd
is attached to the node.
STEP 2: Format the newly attached drive to make it "Ready" for workloads to utilize it
[root@control-plane ~] kubectl directpv drives format --drives /dev/vdd --nodes control-plane.minikube.internal
[root@control-plane ~] kubectl directpv drives list
DRIVE CAPACITY ALLOCATED FILESYSTEM VOLUMES NODE ACCESS-TIER STATUS
/dev/vdb 512 MiB 83 MiB xfs 8 control-plane.minikube.internal - InUse
/dev/vdc 512 MiB - xfs 8 control-plane.minikube.internal - InUse* drive is lost or corrupted
/dev/vdd 512 MiB - xfs - control-plane.minikube.internal - Ready
STEP 3: Cordon the node to stop kubernetes to schedule any workloads during the maintenance
[root@control-plane ~] kubectl cordon control-plane.minikube.internal
node/control-plane.minikube.internal cordoned
STEP 4: Check the lost volumes from the detached drive
[root@control-plane ~] kubectl directpv volumes list --all --drives /dev/vdc --nodes control-plane.minikube.internal --pvc
VOLUME CAPACITY NODE DRIVE PODNAME PODNAMESPACE PVC
pvc-2b261763-bc5b-4a84-9d6d-33588e008dee 10 MiB control-plane.minikube.internal vdc minio-2 default *Drive Lost minio-data-2-minio-2
pvc-3bc3302a-2954-4f43-88f4-a081f68f9818 10 MiB control-plane.minikube.internal vdc minio-3 default *Drive Lost minio-data-3-minio-3
pvc-9a131013-3501-4540-bd10-7fa1a0f81bbf 10 MiB control-plane.minikube.internal vdc minio-1 default *Drive Lost minio-data-3-minio-1
pvc-a443e955-a77a-4b06-9649-49d7d17504cd 10 MiB control-plane.minikube.internal vdc minio-3 default *Drive Lost minio-data-2-minio-3
pvc-b80ea783-ba2a-44b9-8a16-77fe1f7a8537 10 MiB control-plane.minikube.internal vdc minio-1 default *Drive Lost minio-data-2-minio-1
pvc-b8b13ce8-10db-4274-908e-3480249a05e5 10 MiB control-plane.minikube.internal vdc minio-0 default *Drive Lost minio-data-3-minio-0
pvc-ba0b1c56-14cb-4a58-ba1c-4a2b9427ca18 10 MiB control-plane.minikube.internal vdc minio-0 default *Drive Lost minio-data-1-minio-0
pvc-e4f15027-0bc2-44c4-a271-b54141ebc42a 10 MiB control-plane.minikube.internal vdc minio-2 default *Drive Lost minio-data-3-minio-2
STEP 5: Delete the corresponding pods and PVCs of the lost volumes
[root@control-plane ~] kubectl delete pods minio-0 minio-1 minio-2 minio-3 -n default
pod "minio-0" deleted
pod "minio-1" deleted
pod "minio-2" deleted
pod "minio-3" deleted
[root@control-plane ~]
verify the PVCs to be deleted
root@control-plane ~] kubectl directpv volumes list --lost --drives /dev/vdc --nodes control-plane.minikube.internal --pvc | awk '{print $10}' | paste -s -d " " -
minio-data-2-minio-2 minio-data-3-minio-3 minio-data-3-minio-1 minio-data-2-minio-3 minio-data-2-minio-1 minio-data-3-minio-0 minio-data-1-minio-0 minio-data-3-minio-2
you can use the following one-liner to delete the lost PVCs
[root@control-plane ~] kubectl directpv volumes list --lost --drives /dev/vdc --nodes control-plane.minikube.internal --pvc | awk '{print $10}' | paste -s -d " " - | xargs kubectl delete pvc
persistentvolumeclaim "minio-data-2-minio-2" deleted
persistentvolumeclaim "minio-data-3-minio-3" deleted
persistentvolumeclaim "minio-data-3-minio-1" deleted
persistentvolumeclaim "minio-data-2-minio-3" deleted
persistentvolumeclaim "minio-data-2-minio-1" deleted
persistentvolumeclaim "minio-data-3-minio-0" deleted
persistentvolumeclaim "minio-data-1-minio-0" deleted
persistentvolumeclaim "minio-data-3-minio-2" deleted
[root@control-plane ~]
wait till the lost volumes are purged successfully
[root@control-plane ~] kubectl directpv drives list
DRIVE CAPACITY ALLOCATED FILESYSTEM VOLUMES NODE ACCESS-TIER STATUS
/dev/vdb 512 MiB 83 MiB xfs 8 control-plane.minikube.internal - InUse
/dev/vdc 512 MiB - xfs - control-plane.minikube.internal - Ready* drive is lost or corrupted
/dev/vdd 512 MiB - xfs - control-plane.minikube.internal - Ready
you can now purge the lost drive
[root@control-plane ~] kubectl-directpv drives purge --drives /dev/vdc --nodes control-plane.minikube.internal
[root@control-plane ~] kubectl directpv drives list
DRIVE CAPACITY ALLOCATED FILESYSTEM VOLUMES NODE ACCESS-TIER STATUS
/dev/vdb 512 MiB 83 MiB xfs 8 control-plane.minikube.internal - InUse
/dev/vdd 512 MiB - xfs - control-plane.minikube.internal - Ready
STEP 6: Uncordon the node to resume scheduling
[root@control-plane ~] kubectl uncordon control-plane.minikube.internal
node/control-plane.minikube.internal uncordoned
STEP 7: New PVCs will be created and volumes will be allocated on the new drive based on the more free capacity approach.
(NOTE: here, you might want to restart the "pending" pod(s) once if there is a pod-PVC race conflict)
[root@control-plane ~] kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
minio-0 1/1 Running 0 2m25s 172.17.0.7 control-plane.minikube.internal <none> <none>
minio-1 1/1 Running 0 2m17s 172.17.0.6 control-plane.minikube.internal <none> <none>
minio-2 1/1 Running 0 2m8s 172.17.0.8 control-plane.minikube.internal <none> <none>
minio-3 1/1 Running 0 117s 172.17.0.9 control-plane.minikube.internal <none> <none>
[root@control-plane ~] kubectl directpv drives list
DRIVE CAPACITY ALLOCATED FILESYSTEM VOLUMES NODE ACCESS-TIER STATUS
/dev/vdb 512 MiB 83 MiB xfs 8 control-plane.minikube.internal - InUse
/dev/vdd 512 MiB 83 MiB xfs 8 control-plane.minikube.internal - InUse
If the device FS attributes are not updated in /run/udev/data/b<maj>:<min>
file by the udev service, the following warnings will show up in directpv pods in direct-csi-min-io
namespace.
W0615 11:17:08.484072 19851 utils.go:130] [name] ID_FS_TYPE not found in /run/udev/data/b200:2. Please refer https://github.com/minio/directpv/blob/master/docs/troubleshooting.md#troubleshooting
W0615 11:17:08.484123 19851 utils.go:139] [name] ID_FS_UUID not found in /run/udev/data/b200:2. Please refer https://github.com/minio/directpv/blob/master/docs/troubleshooting.md#troubleshooting
The following command will trigger the udev service to sync the attribute values in /run/udev/data/b<maj:min>
sudo udevadm control --reload-rules && sudo udevadm trigger
(Note: Also verify if the systemd-udevd services are running on the host)