We need a storage-pool evacuate / lost command

Currently,

  • if we want to remove a storage pool from a node, we must manually migrate resources to other nodes/storage pools,
  • if we want to permanently remove a failed storage pool, we must first create a new temporary storage pool with the same name, so we can migrate/delete the resources, as linstor doesn’t allow changes on unavailable storage pools.

Both these cases could be significantly improved with the introduction of a storage-pool evacuate / lost subcommands. This is a feature request, but it should not be hard to implement, as we already have the functionality as part of the node evacuate / lost subcommands. It just needs to be targeted at a specific storage pool only.

Hello!

Thank you for your suggestion. But I can only partially agree that this “should be an easy feature”. Even if it is counterintuitive, something like sp evac should be easier than sp lost. Sure, in very easy cases both scenarios should be quite straight forward to implement. Just migrate+delete or drop the resource that is in the storage pool in question.

However, things get more complicated when you have resources in multiple storage pools. That can easily be the case when using DRBD with external metadata, or Cache, Writecache or similar layers that need special devices (for caching in this instance) which would usually be put on much faster disks than the large data-disk. How should LINSTOR react if one of those storage pools is “lost”? Simply dropping the resource if at least one of its disk’s storage pool is lost might also be too harsh.

We tried to implement a sp lost a few years ago but ran into quite difficult scenarios which was the reason we postponed the feature (+ nobody asked for that feature until now). I will bring this feature up in an internal discussion again and see if we can now figure out some meaningful way to deal with these kind of situations.

Thanks for letting us know about this feature request!

Thank you for taking the time to reply.

I understand that while the functionality already exists as part of the “node lost” command, the cleanup of a hypothetical “sp lost” could be tricky if multiple layers are used, and only one is affected by a lost pool.

In my limited view if a user calls this command, the whole layer stack should be cleaned. Why else would he call it? If the user just wants to replace a failed disk/pool, he can already do this with the existing procedure of creating a new pool with the same name.

I must add I’m not familiar with the failure modes of linstor if a multi-layer setup has a failed sp, be it cache or data.