Abnormal Load average on first vm start from Cloudstack template

Hi everyone! Maybe someone has faced with similar problem and can help me.

First of all i’m using:
Cloudstack 4.19.1.1
Hypervisor KVM on ubuntu 24.04
Primary storage linstor
4 disks nodes
HA Linstor controller with DRBD reactor - virtual ip and linstor db

When I creating template or registering it from URL, and trying to start VM for firs time, after cloudstack ui shows this template as downloaded on primary storage, load average starts to grow, and instance creating takes too long. It’s can take about 40 mins, or even can totaly overload host and ony way to bring in to a health state its reboot. This load average is present on all disk nods where linstor creates related devices. After first instamce creation, all folowing tryes compliting without any problems, and vm’s starts immediately.

pc utility shows growing number of vgs and lvs processes. Like this:

393650 ? D 0:00 vgs -o vg_name,vg_extent_size,vg_size,vg_free --separator ; --units k --noheadings --nosuffix
394149 ? D 0:00 lvs -o lv_name,lv_path,lv_size,vg_name,pool_lv,data_percent,lv_attr,metadata_percent,chunk_size --separator ; --noheadings --units k --nosuffix

Nothing suspisies in Cloudstack agent logs, but in Linstor sattelite I have found many reports like this:

ERROR REPORT 66C5C167-EF4D9-000006

============================================================

Application: LINBIT® LINSTOR
Module: Satellite
Version: 1.29.0
Build ID: b2be7208a777f0743d4c7187062678cd5416fccf
Build time: 2024-07-31T11:02:51+00:00
Error time: 2024-08-22 09:59:58
Node: node2
Thread: MainWorkerPool-4

============================================================

# Reported error:

Category: LinStorException
Class name: StorageException
Class canonical name: com.linbit.linstor.storage.StorageException
Generated at: Method 'genericExecutor', Source file 'Commands.java', Line [#120](https://github.com/apache/cloudstack/pull/120)

Error message: Failed to query 'lvs' info

ErrorContext:
Cause: External command timed out
Details: External command: lvs -o lv_name,lv_path,lv_size,vg_name,pool_lv,data_percent,lv_attr,metadata_percent,chunk_size --separator ';' --noheadings --units k --nosuffix

Call backtrace:

Method                                   Native Class:Line number
genericExecutor                          N      com.linbit.linstor.storage.utils.Commands:120
genericExecutor                          N      com.linbit.linstor.storage.utils.Commands:63
lvs                                      N      com.linbit.linstor.layer.storage.lvm.utils.LvmCommands:99
lambda$getLvsInfoImpl$1                  N      com.linbit.linstor.layer.storage.lvm.utils.LvmUtils:320
execWithRetry                            N      com.linbit.linstor.layer.storage.lvm.utils.LvmUtils:669
getLvsInfoImpl                           N      com.linbit.linstor.layer.storage.lvm.utils.LvmUtils:317
getLvsInfo                               N      com.linbit.linstor.layer.storage.lvm.utils.LvmUtils:282
fetchAllocatedSizes                      N      com.linbit.linstor.layer.storage.lvm.LvmThinProvider:710
getVlmAllocatedCapacities                N      com.linbit.linstor.core.apicallhandler.StltApiCallHandlerUtils:185
executeInScope                           N      com.linbit.linstor.api.protobuf.ReqVlmAllocated:84
lambda$executeReactive$0                 N      com.linbit.linstor.api.protobuf.ReqVlmAllocated:69
doInScope                                N      com.linbit.linstor.core.apicallhandler.ScopeRunner:175
lambda$fluxInScope$0                     N      com.linbit.linstor.core.apicallhandler.ScopeRunner:100
call                                     N      reactor.core.publisher.MonoCallable:72
trySubscribeScalarMap                    N      reactor.core.publisher.FluxFlatMap:127
subscribeOrReturn                        N      reactor.core.publisher.MonoFlatMapMany:49
subscribe                                N      reactor.core.publisher.Flux:8759
onNext                                   N      reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:195
request                                  N      reactor.core.publisher.Operators$ScalarSubscription:2545
onSubscribe                              N      reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:141
subscribe                                N      reactor.core.publisher.MonoJust:55
subscribe                                N      reactor.core.publisher.MonoDeferContextual:55
subscribe                                N      reactor.core.publisher.Flux:8773
trySubscribeScalarMap                    N      reactor.core.publisher.FluxFlatMap:200
subscribeOrReturn                        N      reactor.core.publisher.MonoFlatMapMany:49
subscribe                                N      reactor.core.publisher.Flux:8759
onNext                                   N      reactor.core.publisher.FluxFlatMap$FlatMapMain:427
slowPath                                 N      reactor.core.publisher.FluxArray$ArraySubscription:127
request                                  N      reactor.core.publisher.FluxArray$ArraySubscription:100
onSubscribe                              N      reactor.core.publisher.FluxFlatMap$FlatMapMain:371
subscribe                                N      reactor.core.publisher.FluxMerge:70
subscribe                                N      reactor.core.publisher.Flux:8773
onComplete                               N      reactor.core.publisher.FluxConcatArray$ConcatArraySubscriber:258
subscribe                                N      reactor.core.publisher.FluxConcatArray:78
subscribe                                N      reactor.core.publisher.InternalFluxOperator:62
subscribe                                N      reactor.core.publisher.FluxDefer:54
subscribe                                N      reactor.core.publisher.Flux:8773
onNext                                   N      reactor.core.publisher.FluxFlatMap$FlatMapMain:427
drainAsync                               N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:453
drain                                    N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:724
onNext                                   N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:256
drainFused                               N      reactor.core.publisher.SinkManyUnicast:319
drain                                    N      reactor.core.publisher.SinkManyUnicast:362
tryEmitNext                              N      reactor.core.publisher.SinkManyUnicast:237
tryEmitNext                              N      reactor.core.publisher.SinkManySerialized:100
processInOrder                           N      com.linbit.linstor.netcom.TcpConnectorPeer:420
doProcessMessage                         N      com.linbit.linstor.proto.CommonMessageProcessor:228
lambda$processMessage$2                  N      com.linbit.linstor.proto.CommonMessageProcessor:165
onNext                                   N      reactor.core.publisher.FluxPeek$PeekSubscriber:185
runAsync                                 N      reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:440
run                                      N      reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:527
call                                     N      reactor.core.scheduler.WorkerTask:84
call                                     N      reactor.core.scheduler.WorkerTask:37
run                                      N      java.util.concurrent.FutureTask:317
run                                      N      java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304
runWorker                                N      java.util.concurrent.ThreadPoolExecutor:1144
run                                      N      java.util.concurrent.ThreadPoolExecutor$Worker:642
run                                      N      java.lang.Thread:1583

# Caused by:

Category: Exception
Class name: ChildProcessTimeoutException
Class canonical name: com.linbit.ChildProcessTimeoutException
Generated at: Method 'waitFor', Source file 'ChildProcessHandler.java', Line [#184](https://github.com/apache/cloudstack/pull/184)

Call backtrace:

Method                                   Native Class:Line number
waitFor                                  N      com.linbit.extproc.ChildProcessHandler:184
waitFor                                  N      com.linbit.extproc.ChildProcessHandler:124
syncProcess                              N      com.linbit.extproc.ExtCmd:160
exec                                     N      com.linbit.extproc.ExtCmd:92
genericExecutor                          N      com.linbit.linstor.storage.utils.Commands:79
genericExecutor                          N      com.linbit.linstor.storage.utils.Commands:63
lvs                                      N      com.linbit.linstor.layer.storage.lvm.utils.LvmCommands:99
lambda$getLvsInfoImpl$1                  N      com.linbit.linstor.layer.storage.lvm.utils.LvmUtils:320
execWithRetry                            N      com.linbit.linstor.layer.storage.lvm.utils.LvmUtils:669
getLvsInfoImpl                           N      com.linbit.linstor.layer.storage.lvm.utils.LvmUtils:317
getLvsInfo                               N      com.linbit.linstor.layer.storage.lvm.utils.LvmUtils:282
fetchAllocatedSizes                      N      com.linbit.linstor.layer.storage.lvm.LvmThinProvider:710
getVlmAllocatedCapacities                N      com.linbit.linstor.core.apicallhandler.StltApiCallHandlerUtils:185
executeInScope                           N      com.linbit.linstor.api.protobuf.ReqVlmAllocated:84
lambda$executeReactive$0                 N      com.linbit.linstor.api.protobuf.ReqVlmAllocated:69
doInScope                                N      com.linbit.linstor.core.apicallhandler.ScopeRunner:175
lambda$fluxInScope$0                     N      com.linbit.linstor.core.apicallhandler.ScopeRunner:100
call                                     N      reactor.core.publisher.MonoCallable:72
trySubscribeScalarMap                    N      reactor.core.publisher.FluxFlatMap:127
subscribeOrReturn                        N      reactor.core.publisher.MonoFlatMapMany:49
subscribe                                N      reactor.core.publisher.Flux:8759
onNext                                   N      reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:195
request                                  N      reactor.core.publisher.Operators$ScalarSubscription:2545
onSubscribe                              N      reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:141
subscribe                                N      reactor.core.publisher.MonoJust:55
subscribe                                N      reactor.core.publisher.MonoDeferContextual:55
subscribe                                N      reactor.core.publisher.Flux:8773
trySubscribeScalarMap                    N      reactor.core.publisher.FluxFlatMap:200
subscribeOrReturn                        N      reactor.core.publisher.MonoFlatMapMany:49
subscribe                                N      reactor.core.publisher.Flux:8759
onNext                                   N      reactor.core.publisher.FluxFlatMap$FlatMapMain:427
slowPath                                 N      reactor.core.publisher.FluxArray$ArraySubscription:127
request                                  N      reactor.core.publisher.FluxArray$ArraySubscription:100
onSubscribe                              N      reactor.core.publisher.FluxFlatMap$FlatMapMain:371
subscribe                                N      reactor.core.publisher.FluxMerge:70
subscribe                                N      reactor.core.publisher.Flux:8773
onComplete                               N      reactor.core.publisher.FluxConcatArray$ConcatArraySubscriber:258
subscribe                                N      reactor.core.publisher.FluxConcatArray:78
subscribe                                N      reactor.core.publisher.InternalFluxOperator:62
subscribe                                N      reactor.core.publisher.FluxDefer:54
subscribe                                N      reactor.core.publisher.Flux:8773
onNext                                   N      reactor.core.publisher.FluxFlatMap$FlatMapMain:427
drainAsync                               N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:453
drain                                    N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:724
onNext                                   N      reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:256
drainFused                               N      reactor.core.publisher.SinkManyUnicast:319
drain                                    N      reactor.core.publisher.SinkManyUnicast:362
tryEmitNext                              N      reactor.core.publisher.SinkManyUnicast:237
tryEmitNext                              N      reactor.core.publisher.SinkManySerialized:100
processInOrder                           N      com.linbit.linstor.netcom.TcpConnectorPeer:420
doProcessMessage                         N      com.linbit.linstor.proto.CommonMessageProcessor:228
lambda$processMessage$2                  N      com.linbit.linstor.proto.CommonMessageProcessor:165
onNext                                   N      reactor.core.publisher.FluxPeek$PeekSubscriber:185
runAsync                                 N      reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:440
run                                      N      reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:527
call                                     N      reactor.core.scheduler.WorkerTask:84
call                                     N      reactor.core.scheduler.WorkerTask:37
run                                      N      java.util.concurrent.FutureTask:317
run                                      N      java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304
runWorker                                N      java.util.concurrent.ThreadPoolExecutor:1144
run                                      N      java.util.concurrent.ThreadPoolExecutor$Worker:642
run                                      N      java.lang.Thread:1583

END OF ERROR REPORT.

I have tryed to disable apparmor and firewall, but it’s not helped
I would be appreciate for any help with this.

Seting up global_filter = [ “r|^/dev/drbd|” ] in /etc/lvm/lvm.conf, solved this problem.

3 Likes

Yeah, we have the global_filter setting now several times in our Linstor user guide documentation.