Spdk bdev

Hi

I wonder if I correctly understand SPDK support in linstor.

SPDK is listed as storage providers, however, even though I compiled and installed on the node, it still shows “red”.

What I am looking for is using the “local” SPDK driver to acces the NVMe devices.
(similar to fio with SPDK_BDEV driver).

Is this even possible or intended somehow? The SPDK fio example contains an ioengine (ioengine=/opt/spdk/build/fio/spdk_bdev) for testing, this made me thinking, that it might be possible:

# ./scripts/rpc.py bdev_nvme_attach_controller -b NVMe0 -t PCIe -a 0000:03:00.0
NVMe0n1
# ./scripts/rpc.py bdev_get_bdevs
[
  {
    "name": "NVMe0n1",
    "aliases": [
      "00000000-0000-0000-8ce3-8ee225a6ef01"
    ],
    "product_name": "NVMe disk",
    "block_size": 512,
    "num_blocks": 7501476528,
    "uuid": "00000000-0000-0000-8ce3-8ee225a6ef01",
    "numa_id": 0,
...
    }
  }
]

and fio driver then can access by using

# cat fio_bdev.json 
{
"subsystems": [
{
"subsystem": "bdev",
"config": [
{
"method": "bdev_nvme_attach_controller",
"params": {
"trtype": "PCIe",
"name":"Nvme0",
"traddr":"0000:03:00.0"

The above just to further iterate what I am looking for, direct to the local device rather than using SPDK as NVMe-oF backend.

Would be great to get some insight.

Thanks!
M.

If you run linstor node info --full you should get explanations why some of your providers and/or layers are not supported, for example like this line:

  SPDK: IO exception occured when running 'rpc.py spdk_get_version': Cannot run program "rpc.py": error=2, No such file or directory

That means that LINSTOR is looking for the rpc.py executable but could not find it. Since you were referring to ./scripts/rpc.py, I guess you should try to make the rpc.py accessible from anywhere (include it in your PATH or move it to /usr/bin/ or the other usual places) and linstor node reconnect $node_name the satellite again.

Hi

Thanks for clarifing. However, is there a way to get more log output, as it is in path available:

:~# linstor node reconnect node-2
SUCCESS:
    Nodes [node-2] will be reconnected.
SUCCESS:
Description:
    Node 'node-2' authenticated
Details:
    Supported storage providers: [diskless, lvm, lvm_thin, zfs, zfs_thin, file, file_thin, remote_spdk, exos, ebs_init, ebs_target]
    Supported resource layers  : [drbd, luks, nvme, writecache, cache, storage]
    Unsupported storage providers:
        SPDK: 'rpc.py spdk_get_version' returned with exit code 1
        STORAGE_SPACES: This tool does not exist on the Linux platform.
        STORAGE_SPACES_THIN: This tool does not exist on the Linux platform.
    
    Unsupported resource layers:
        BCACHE: IO exception occured when running 'make-bcache -h': Cannot run program "make-bcache": error=2, No such file or directory
:~# ssh node-2 rpc.py spdk_get_version
{
  "version": "SPDK v25.01-pre git sha1 d0bb35429",
  "fields": {
    "major": 25,
    "minor": 1,
    "patch": 0,
    "suffix": "-pre",
    "commit": "d0bb35429"
  }
}
:~# ssh node-2 which rpc.py
/usr/bin/rpc.py

You see the command that LINSTOR tried to execute to determine the version: rpc.py spdk_get_version. Can you run that command manually and see the output?

root@node-1:~# ssh node-2 rpc.py spdk_get_version
{
  "version": "SPDK v25.01-pre git sha1 d0bb35429",
  "fields": {
    "major": 25,
    "minor": 1,
    "patch": 0,
    "suffix": "-pre",
    "commit": "d0bb35429"
  }
}

Sorry, I must have missed that part of your previous message. And I agree, this looks weird. 3 things come to my mind that could be checked:

  1. Restart the linstor-satellite.service instead of just reconnecting to it
  2. Run rcp.py spdk_get_version; echo $? to also check the exit code
  3. If nothing else helps, enable trace logging via linstor_satellite.toml, restart the satellite and check the logs what LINSTOR gets as an output of its attempt. Hopefully it is more than “no output but ‘exit code 1’”…

did not change a thing or get more info out of it …

I mean, I did restarting the node many time and have still no clue. However, before we both spent too much time on this ?! can you just tell me if the ultimate result would be to connect storage directly to a SPDK-BDEV ?

I tested spdk and bdev with fio to make sure it is working and I get the desired performance gain - so I want to say, it is installed and does work, though, just not detected from linstor.

Exit Code is 0, restarting service did not change a thing.

Mar 07 22:32:49 node-2 Satellite[2662493]: 2025-03-07 22:32:49.381 [Main] INFO  LINSTOR/Satellite/ff6880 SYSTEM - Checking support for SPDK: NOT supported:
Mar 07 22:32:49 node-2 Satellite[2662493]: 2025-03-07 22:32:49.381 [Main] DEBUG LINSTOR/Satellite/ff6880 SYSTEM -    'rpc.py spdk_get_version' returned with exit code 1
root@node-2:~# rpc.py spdk_get_version; echo $?
{
  "version": "SPDK v25.01-pre git sha1 d0bb35429",
  "fields": {
    "major": 25,
    "minor": 1,
    "patch": 0,
    "suffix": "-pre",
    "commit": "d0bb35429"
  }
}
0

interestingly, compared to a node where spdk is not installed, I get a different message … so kind of can find it but does not work either.

Does that hint anything?

root@node-1:~# linstor node info --full
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node     ┊ Diskless ┊ LVM ┊ LVMThin ┊ ZFS/Thin ┊ File/Thin ┊ SPDK ┊ EXOS ┊ Remote SPDK ┊ Storage Spaces ┊ Storage Spaces/Thin ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ node-1   ┊ +        ┊ +   ┊ +       ┊ +        ┊ +         ┊ -    ┊ +    ┊ +           ┊ -              ┊ -                   ┊
┊ node-2   ┊ +        ┊ +   ┊ +       ┊ +        ┊ +         ┊ -    ┊ +    ┊ +           ┊ -              ┊ -                   ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Unsupported storage providers:
 node-1: 
  SPDK: IO exception occured when running 'rpc.py spdk_get_version': Cannot run program "rpc.py": error=2, No such file or directory
  STORAGE_SPACES: This tool does not exist on the Linux platform.
  STORAGE_SPACES_THIN: This tool does not exist on the Linux platform.
 node-2: 
  SPDK: 'rpc.py spdk_get_version' returned with exit code 1
  STORAGE_SPACES: This tool does not exist on the Linux platform.
  STORAGE_SPACES_THIN: This tool does not exist on the Linux platform.

I admit that I have not used SPDK in a while and I had a bit of trouble getting it to work again. When I tried to upgrade my outdated SPDK installation to v25.1.0-pre, for some reason it got installed into /usr/local/local/lib/... instead of /usr/local/lib/.... That lead to some interesting behavior, namely that when running rpc.py spdk_get_version I got a python error message No module named 'spdk' when trying to import it.
The intesting point was that this (obviously) caused an exit code 1, just as your LINSTOR satellite claims.

However, after fixing the weirdly broken installation (i.e. move SPDK from /usr/local/local/lib/... to the correct directoy) rpc.py spdk_get_version shows the same output as you have (except the git commit hash, but that does not really matter).
At this point, starting the satellite also detected that SPDK was (correctly) installed and reported SPDK to be supported.

LINSTOR also just runs run.py spdk_get_version, but right now I do not see why LINSTOR’s attempt in running this command should fail but you running it manually works just fine.

As you also noticed the difference in error messages, I assume that node-2 can indeed find and run rpc.py but something after that goes wrong. Unfortunately not even LINSTOR’s TRACE logging would log those outputs.
I do not know why or how that should be the case, but just to double check, can you please run something like find / -name rpc.py on your node-2 to make sure you have only one version installed? This is just to make sure that you are running one (working) version but LINSTOR (somehow) tries a different, broken version…

Hi

Thanks for your help! So after fixing the libraries I still could not make it work on all nodes, so I decided to write a small logging wrapper for rpc.py which gave me the error message (finally!)

2025-03-10 15:01:32,675 - ERR: Error while connecting to /var/tmp/spdk.sock
Is SPDK application running?
Error details: Invalid or non-existing address: '/var/tmp/spdk.sock'

so well, spdk is indeed runnning but then I figured from looking into linstor-satellite.service that

[Service]
...
PrivateTmp=yes

prevents it from seeing the socket.

So I moved the sock to a different location:

spdk_tgt -r /var/run/spdk.sock

and also adding:

def log_and_execute(*args):
    command = [sys.executable, "/opt/spdk/scripts/rpc.py", "-s", "/var/run/spdk.sock"] + list(args)

to my wrapper. Finally, I can see SPDK support :wink:

But a bit hacky. Any idea of the real solution if not this hack? Feels a bit rough, though.

KR M.