Error: unknown systemd FreezerState

ggq · January 10, 2025, 8:02am

Running drbd-reactorctl status get err message when use ha linstor

Follow the documentation

https://linbit.com/drbd-user-guide/linstor-guide-1_0-cn/#s-linstor_ha

ggq · January 10, 2025, 8:14am

Not sure if this error caused linstor-gateway to fail to create iscsi.

here is linstor-gateway output (command : linstor-gateway iscsi create iqn.2025-01.rcr.test:ss 172.17.0.0/8 2G -r iso_res_group --loglevel debug)

DEBU[0000] {"iqn":"iqn.2025-01.rcr.test:info","resource_group":"iso_res_group","volumes":[{"number":1,"size_kib":2097152,"file_system_root_owner":{"User":"","Group":""}}],"service_ips":["172.17.0.0/8"],"status":{"state":"Unknown","service":"Stopped","primary":"","nodes":null,"volumes":null},"gross_size":false,"implementation":""} 
DEBU[0000] curl -X 'POST' -d '{"iqn":"iqn.2025-01.rcr.test:info","resource_group":"iso_res_group","volumes":[{"number":1,"size_kib":2097152,"file_system_root_owner":{"User":"","Group":""}}],"service_ips":["172.17.0.0/8"],"status":{"state":"Unknown","service":"Stopped","primary":"","nodes":null,"volumes":null},"gross_size":false,"implementation":""}
' -H 'Accept: application/json' -H 'Content-Type: application/json' -H 'User-Agent: linstor-gateway/1.7.0-g6e676b4f35e3e2b90cffb32637e44e16ae3c0559' 'http://localhost:8080/api/v2/iscsi' 
DEBU[0000] Status code not within 200 to 400, but 400 (Bad Request) 
ERRO[0000] failed to create iscsi resource: failed to retrieve existing configs: failed to fetch file list: Get "http://localhost:3370/v1/files?content=true&limit=0&offset=0": dial tcp [::1]:3370: connect: connection refused

and journalctl -xe log

gist.github.com

https://gist.github.com/ggqshr/9ce0c3f0559251e11649670f14165ae5

journalctl_output

Jan 10 16:09:20 node1 Satellite[2346]: 2025-01-10 16:09:20.370 [DeviceManager] INFO  LINSTOR/Satellite/72c577 SYSTEM - Aligning ss/0 size from 65588 KiB to 65600 KiB to be a multiple of extent size 16 KiB (from Storage Pool)
Jan 10 16:09:20 node1 Satellite[2346]: 2025-01-10 16:09:20.371 [DeviceManager] INFO  LINSTOR/Satellite/72c577 SYSTEM - Aligning ss/1 size from 2097640 KiB to 2097648 KiB to be a multiple of extent size 16 KiB (from Storage Pool)
Jan 10 16:09:20 node1 Satellite[2346]: 2025-01-10 16:09:20.440 [DeviceManager] INFO  LINSTOR/Satellite/72c577 SYSTEM - Volume number 0 of resource 'ss' [ZFS] created
Jan 10 16:09:20 node1 Satellite[2346]: 2025-01-10 16:09:20.501 [DeviceManager] INFO  LINSTOR/Satellite/72c577 SYSTEM - Volume number 1 of resource 'ss' [ZFS] created
Jan 10 16:09:20 node1 Satellite[2346]: 2025-01-10 16:09:20.507 [DeviceManager] INFO  LINSTOR/Satellite/72c577 SYSTEM - DRBD regenerated resource file: /var/lib/linstor.d/ss.res
Jan 10 16:09:21 node1 Satellite[2346]: 2025-01-10 16:09:21.172 [DeviceManager] INFO  LINSTOR/Satellite/72c577 SYSTEM - DRBD meta data created for ss/0
Jan 10 16:09:21 node1 Satellite[2346]: 2025-01-10 16:09:21.177 [DeviceManager] INFO  LINSTOR/Satellite/72c577 SYSTEM - DRBD skipping initial sync for ss/0
Jan 10 16:09:21 node1 Satellite[2346]: 2025-01-10 16:09:21.951 [DeviceManager] INFO  LINSTOR/Satellite/72c577 SYSTEM - DRBD meta data created for ss/1
Jan 10 16:09:21 node1 Satellite[2346]: 2025-01-10 16:09:21.955 [DeviceManager] INFO  LINSTOR/Satellite/72c577 SYSTEM - DRBD skipping initial sync for ss/1
Jan 10 16:09:21 node1 kernel: drbd ss: Starting worker thread (node-id 0)

This file has been truncated. show original

wanzenbug · January 10, 2025, 8:26am

Looks like the linstor-controller is not actually runnning.

The error of drbd-reactorctl is interesting. Can you show the output of:

systemctl show --property=FreezerState drbd-promote@linstor_db.service
systemctl show --property=FreezerState var-lib-linstor.mount

ggq · January 10, 2025, 8:28am

The linstor-controller is running on node2.

ggq · January 10, 2025, 8:33am

And i also check drbd-promote@linstor_db.service

ggq · January 10, 2025, 8:34am

And var-lib-linstor.mount

wanzenbug · January 10, 2025, 8:48am

For the gateway error: you need to add all possible controller urls to the /etc/linstor-gateway/linstor-gateway.toml file:

[linstor]
controllers = ["10.10.1.1", "10.10.1.2", "10.10.1.3"]

(Use the right DNS names/IP Addresses for your nodes), then restart the linstor-gateway service

ggq · January 10, 2025, 8:55am

Still same error, regardless of whether to add /etc/linstor-gateway/linstor-gateway.toml or not, I can see corresponding logs in linstor-controller, but the final result is failure.

wanzenbug · January 10, 2025, 9:08am

Have you seen this error

Jan 10 16:09:43 node1 ocf-rs-wrapper[5496]: Jan 10 16:09:43 INFO: Running start for /dev/drbd/by-res/ss/0 on /srv/ha/internal/ss
Jan 10 16:09:43 node1 ocf-rs-wrapper[5496]: Jan 10 16:09:43 ERROR: There is one or more mounts mounted under /srv/ha/internal/ss.
Jan 10 16:09:43 node1 ocf-rs-wrapper[5492]: ERROR [ocf_rs_wrapper] Filesystem:fs_cluster_private_ss,s-a-m,start: FAILED with exit code 6

Is there already something mounted in /srv/ha/internal/ss?

ggq · January 10, 2025, 9:11am

Nothing, i can’t see anything under /srv

ggq · January 10, 2025, 9:20am

I found that there are two lines of ExecStop in the Service section of drbd-promote@linstor_db.service. Is this the reason for the abnormal output of drbd-
reactorctl?

liniac · January 10, 2025, 5:47pm

I’m noticing this earlier shared journal output is from node1:

Jan 10 16:09:26 node1 ocf-rs-wrapper[5362]: Jan 10 16:09:26 INFO: Running start for /dev/drbd/by-res/ss/0 on /srv/ha/internal/ss
Jan 10 16:09:26 node1 ocf-rs-wrapper[5362]: Jan 10 16:09:26 ERROR: There is one or more mounts mounted under /srv/ha/internal/ss.
Jan 10 16:09:26 node1 ocf-rs-wrapper[5358]: ERROR [ocf_rs_wrapper] Filesystem:fs_cluster_private_ss,s-a-m,start: FAILED with exit code 6
Jan 10 16:09:26 node1 systemd[1]: ocf.rs@fs_cluster_private_ss.service: Main process exited, code=exited, status=6/NOTCONFIGURED

But the screenshot you shared for /srv is from node2.

It seems like promotion was attempted on node1 based on this output, (which would necessarily preclude promotion on node2) but I am curious about what might be mounted under /srv/ha/internal/ss for node1?

ggq · January 11, 2025, 12:00am

The results of node1 and node2 are the same. Sorry, I forgot to post the results related to node1 before.

ggq · January 11, 2025, 12:02am

If you guys need more logs and other information, fell free to contact me anytime.

liniac · January 13, 2025, 7:35pm

In the source code for the Filesystem resource agent which is producing that error, I am seeing it checks /proc/mounts and /etc/mtab to determine if there are existing mounts mounted under the mount point.

After reconfirming the There is one or more mounts mounted error presents in the journalctl output after attempting another linstor-gateway iscsi create command (using a different unique name in your IQN and different IP address within your chosen subnet), do you find the path specified when you cat either /proc/mounts or /etc/mtab on the same node with that error in its system journal?

ggq · January 14, 2025, 2:30am

New linstor-gateway cmd: linstor-gateway iscsi create iqn.2025-02.rcr.test:info1 172.17.1.0/23 2G -r iso_res_group --loglevel debug

And journalctl output:

gist.github.com

https://gist.github.com/ggqshr/d2f44ba48675b8679c20f7557f172b8f

journalctl_output_20250114

Jan 14 10:22:04 node2 dbus-daemon[1577]: [system] Activating via systemd: service name='org.freedesktop.hostname1' unit='dbus-org.freedesktop.hostname1.service' requested by ':1.94' (uid=1000 pid=27476 comm="/usr/bin/hostnamectl --transient " label="unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023")
Jan 14 10:22:04 node2 systemd[1]: Starting Hostname Service...
░░ Subject: A start job for unit systemd-hostnamed.service has begun execution
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░ 
░░ A start job for unit systemd-hostnamed.service has begun execution.
░░ 
░░ The job identifier is 11228.
Jan 14 10:22:04 node2 dbus-daemon[1577]: [system] Successfully activated service 'org.freedesktop.hostname1'

This file has been truncated. show original

and /proc/mounts with timestamp with 1 second interval:

gist.github.com

https://gist.github.com/ggqshr/3d34a234d3e68d708354ce89960ffd80

mounts

=== 2025-01-14 10:22:42 ===
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
sysfs /sys sysfs rw,seclabel,nosuid,nodev,noexec,relatime 0 0
devtmpfs /dev devtmpfs rw,seclabel,nosuid,size=4096k,nr_inodes=1048576,mode=755 0 0
securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev/shm tmpfs rw,seclabel,nosuid,nodev 0 0
devpts /dev/pts devpts rw,seclabel,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,seclabel,nosuid,nodev,size=26270108k,nr_inodes=819200,mode=755 0 0
cgroup2 /sys/fs/cgroup cgroup2 rw,seclabel,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot 0 0
none /sys/fs/bpf bpf rw,nosuid,nodev,noexec,relatime,mode=700 0 0

This file has been truncated. show original

and /etc/mtab also with timestamp with 1 second interval:

gist.github.com

https://gist.github.com/ggqshr/79a95146556d82a13094c2eddea88ed8

mtab

=== 2025-01-14 10:22:43 ===
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
sysfs /sys sysfs rw,seclabel,nosuid,nodev,noexec,relatime 0 0
devtmpfs /dev devtmpfs rw,seclabel,nosuid,size=4096k,nr_inodes=1048576,mode=755 0 0
securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev/shm tmpfs rw,seclabel,nosuid,nodev 0 0
devpts /dev/pts devpts rw,seclabel,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,seclabel,nosuid,nodev,size=26270108k,nr_inodes=819200,mode=755 0 0
cgroup2 /sys/fs/cgroup cgroup2 rw,seclabel,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot 0 0
none /sys/fs/bpf bpf rw,nosuid,nodev,noexec,relatime,mode=700 0 0

This file has been truncated. show original

@liniac

ggq · January 14, 2025, 6:19am

Additional, i find some log in dmesg -kT, I don’t know if these will help.

gist.github.com

https://gist.github.com/ggqshr/ba2d410d3f38b185a08388182da7d2ca

dmesg

[Tue Jan 14 10:22:54 2025] drbd info1: Starting worker thread (node-id 1)
[Tue Jan 14 10:22:54 2025] debugfs: Directory 'drbd1002' with parent 'block' already present!
[Tue Jan 14 10:22:54 2025] debugfs: Directory 'drbd1003' with parent 'block' already present!
[Tue Jan 14 10:22:54 2025] drbd info1 node1: Starting sender thread (peer-node-id 0)
[Tue Jan 14 10:22:54 2025] drbd info1 node3: Starting sender thread (peer-node-id 2)
[Tue Jan 14 10:22:54 2025] drbd info1/0 drbd1002: meta-data IO uses: blk-bio
[Tue Jan 14 10:22:54 2025] drbd info1/0 drbd1002: disk( Diskless -> Attaching ) [attach]
[Tue Jan 14 10:22:54 2025] drbd info1/0 drbd1002: Maximum number of peer devices = 7
[Tue Jan 14 10:22:54 2025] drbd info1: Method to ensure write ordering: flush
[Tue Jan 14 10:22:54 2025] drbd info1/0 drbd1002: drbd_bm_resize called with capacity == 131096

This file has been truncated. show original

liniac · January 17, 2025, 12:30am

I see that you’ve opened and closed an issue in the DRBD Reactor Github regarding the FreezerState, I wanted to link that back here for future forum users:

https://github.com/LINBIT/drbd-reactor/issues/14

Regarding troubleshooting the iscsi creation error, it doesn’t seem like there’s an obvious culprit in your mounts via the methods we’ve used so far, so I would suggest a more granular review of the Filesystem resource agent logic, which is where we are seeing the fatal error in your case.

I reccomend modifying the resource agent script itself to temporarily add set -x to do this. Navigate to /usr/lib/ocf/resource.d/heartbeat (or wherever your resource agent directory is) on each node, and after making a backup copy of the Filesystem file there, modify the original file to add set -x on a new line at the beginning of the script, I have had success placing it above the #Defaults line.

After you’ve modified those files, use the linstor-gateway iscsi create command again so the actions of the resource agent may be captured in the system journal. Once performed, you can find the actions of the resource agent output to those system journals and that should provide more insight on what step it is failing and why.

ggq · January 17, 2025, 12:50am

Thanks, I’ll try it later.

ggq · February 11, 2025, 3:28am

Finally, after manually creating the folder, everything was ok.

mkdir -p /srv/ha/internal

Topic		Replies	Views
Gateway failure to create iscsi target General	7	158	August 1, 2024
LINSTOR Gateway v1.8.0 Release Announcements drbd	0	11	March 18, 2025
Linstor-gateway, connect iscsi target, nfs export to linux, vmware, window via linstor-gateway General	21	227	September 25, 2024
Linstor-gateway on diskless satellites LINSTOR	3	198	May 14, 2024
How to create iscsi based on existing resource using linstor-gateway LINSTOR	1	34	February 12, 2025

Error: unknown systemd FreezerState

Related topics