-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
LocalAI version:
v3.12.1 (fcecc12), localai/localai:latest-gpu-intel docker image.
Environment, CPU architecture, OS, and Version:
Linux host 6.12.33-production+truenas #1 SMP PREEMPT_DYNAMIC Mon Feb 23 17:38:27 UTC 2026 x86_64 GNU/Linux, GPU - Intel Arc A380 DG2
Describe the bug
When I'm trying to use the qwen3-asr-0.6b STT model with the intel-qwen-asr backend, CPU is used instead of the GPU (Intel Arc A380 DG2).
I've determined this by observing GPU usage with intel_gpu_top (it doesn't spike when STT is doing the processing) and CPU usage with htop (CPU usage spikes on multiple cores when STT processing is being done).
During startup, no GPU and no VRAM are detected by the image:
Mar 08 08:12:04 DEBUG GPU vendor gpuVendor="" caller={caller.file="/build/pkg/system/state.go" caller.L=54 }
Mar 08 08:12:04 DEBUG Total available VRAM vram=0 caller={caller.file="/build/pkg/system/state.go" caller.L=56 }
However, sycl-ls does identify the GPU:
root@d9b579c8993c:/# sycl-ls
[level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Arc(TM) A380 Graphics 12.56.5 [1.6.33578+15]
[opencl:cpu][opencl:0] Intel(R) OpenCL, AMD Ryzen 7 3800X 8-Core Processor OpenCL 3.0 (Build 0) [2025.20.10.0.10_160000]
[opencl:gpu][opencl:1] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A380 Graphics OpenCL 3.0 NEO [25.18.33578]
Also, in intel_gpu_top, a python entry does show up when the model is loaded, but usage is always at 0.
This issue seems specific to the qwen-asr backend, as running an LLM qwen3-4b model using the intel-sycl-f16-llama-cpp backend does not face the same issue. With it, there is GPU usage per intel_gpu_top.
To Reproduce
- Install
qwen3-asr-0.6bandintel-qwen-asrbackend on a system with an Intel GPU - Make a request using the following template
curl http://localhost:8080/v1/audio/transcriptions -H "Content-Type: multipart/form-data" -F file="@<FILE_PATH>" -F model="qwen3-asr-0.6b"
As audio input, sample from whisper.cpp can be used: https://github.com/ggml-org/whisper.cpp/blob/master/samples/jfk.wav - Observe lack of GPU activity and presence of heavy CPU usage during processing.
Expected behavior
For intel-qwen-asr backend to actually utilize the Intel GPU.
Logs
CPU info:
model name : AMD Ryzen 7 3800X 8-Core Processor
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sev sev_es
CPU: AVX found OK
CPU: AVX2 found OK
CPU: no AVX512 found
Mar 08 08:31:38 DEBUG GPU vendor gpuVendor="" caller={caller.file="/build/pkg/system/state.go" caller.L=54 }
Mar 08 08:31:38 DEBUG Total available VRAM vram=0 caller={caller.file="/build/pkg/system/state.go" caller.L=56 }
Mar 08 08:31:38 INFO Using forced capability run file capabilityRunFile="/run/localai/capability" capability="intel\n" env="" caller={caller.file="/build/pkg/system/capabilities.go" caller.L=98 }
Mar 08 08:31:38 INFO Starting LocalAI threads=8 modelsPath="/models" caller={caller.file="/build/core/application/startup.go" caller.L=31 }
Mar 08 08:31:38 INFO LocalAI version version="v3.12.1 (fcecc12e57be39bad2ebf50cf729408b64409553)" caller={caller.file="/build/core/application/startup.go" caller.L=32 }
Mar 08 08:31:38 DEBUG agent_tasks.json not found, starting with empty tasks caller={caller.file="/build/core/services/agent_jobs.go" caller.L=129 }
Mar 08 08:31:38 DEBUG agent_jobs.json not found, starting with empty jobs caller={caller.file="/build/core/services/agent_jobs.go" caller.L=193 }
Mar 08 08:31:38 INFO AgentJobService started retention_days=30 caller={caller.file="/build/core/services/agent_jobs.go" caller.L=1347 }
Mar 08 08:31:38 DEBUG CPU capabilities capabilities=[3dnowprefetch abm adx aes aperfmperf apic arat avic avx avx2 bmi1 bmi2 bpext cat_l3 cdp_l3 clflush clflushopt clwb clzero cmov cmp_legacy constant_tsc cpb cpuid cqm cqm_llc cqm_mbm_local cqm_mbm_total cqm_occup_llc cr8_legacy cx16 cx8 de decodeassists extapic extd_apicid f16c flushbyasid fma fpu fsgsbase fxsr fxsr_opt ht hw_pstate ibpb ibs irperf lahf_lm lbrv lm mba mca mce misalignsse mmx mmxext monitor movbe msr mtrr mwaitx nonstop_tsc nopl npt nrip_save nx osvw overflow_recov pae pat pausefilter pclmulqdq pdpe1gb perfctr_core perfctr_llc perfctr_nb pfthreshold pge pni popcnt pse pse36 rapl rdpid rdpru rdrand rdseed rdt_a rdtscp rep_good sep sev sev_es sha_ni skinit smap smca smep ssbd sse sse2 sse4_1 sse4_2 sse4a ssse3 stibp succor svm svm_lock syscall tce topoext tsc tsc_scale umip v_spec_ctrl v_vmsave_vmload vgif vmcb_clean vme vmmcall wbnoinvd wdt xgetbv1 xsave xsavec xsaveerptr xsaveopt xtopology] caller={caller.file="/build/core/application/startup.go" caller.L=40 }
Mar 08 08:31:38 DEBUG No system backends found caller={caller.file="/build/core/gallery/backends.go" caller.L=335 }
Mar 08 08:31:38 DEBUG Registering backend name="intel-qwen-asr" runFile="/backends/intel-qwen-asr/run.sh" caller={caller.file="/build/core/gallery/backends.go" caller.L=445 }
Mar 08 08:31:38 DEBUG Registering backend name="qwen-asr" runFile="/backends/intel-qwen-asr/run.sh" caller={caller.file="/build/core/gallery/backends.go" caller.L=445 }
Mar 08 08:31:38 INFO Preloading models path="/models" caller={caller.file="/build/core/config/model_config_loader.go" caller.L=269 }
Model name: qwen3-asr-0.6b
Mar 08 08:31:38 DEBUG Config overrides overrides=map[backend:qwen-asr known_usecases:[transcript] parameters:map[model:Qwen/Qwen3-ASR-0.6B]] caller={caller.file="/build/core/gallery/models.go" caller.L=170 }
Mar 08 08:31:38 DEBUG Written config file file="/models/qwen3-asr-0.6b.yaml" caller={caller.file="/build/core/gallery/models.go" caller.L=276 }
Mar 08 08:31:38 DEBUG Written gallery file file="/models/._gallery_qwen3-asr-0.6b.yaml" caller={caller.file="/build/core/gallery/models.go" caller.L=286 }
Mar 08 08:31:38 DEBUG Installed model model="qwen3-asr-0.6b" caller={caller.file="/build/core/gallery/models.go" caller.L=136 }
Mar 08 08:31:38 DEBUG Installing backend backend="qwen-asr" caller={caller.file="/build/core/gallery/models.go" caller.L=138 }
Mar 08 08:31:38 DEBUG No system backends found caller={caller.file="/build/core/gallery/backends.go" caller.L=335 }
Mar 08 08:31:38 DEBUG Model name="qwen3-asr-0.6b" config={/models/qwen3-asr-0.6b.yaml {{Qwen/Qwen3-ASR-0.6B} false 0 0xc00011ab70 0xc00011ab78 0xc00011ab80 0xc00011abb0 false 0 false 0 0 0 0 0 0xc00011aba8 0xc00011aba0 0xc00011ab48 {false} <nil> map[] 0 0 0 0 } qwen3-asr-0.6b 0xc00011ab68 0xc00011ab60 0xc00011abb8 map[] 0xc00011abb9 qwen-asr { false <nil> } [FLAG_TRANSCRIPT] 0xc00011abd0 { } [] [] [] map[] {false {false false false false false false []} [] [] [] [] [] [] <nil>} {<nil> <nil> <nil> [] []} map[] { 0 0 false false 0xc00011ab98 0xc00011ab90 0xc00011ab88 <nil> 0xc00011abb8 0xc00011abb9 0xc00011abb9 0xc00011abb9 [] [] [] [] [] 0xc00011abc0 false [] [] 0 false 0 0 false false 0 0 0 false {0 0 0} <nil> false 0 0 0 0 0} {false false 0 } 0 {0 0} { } false [] [] [] { } {0 0 false false false false false 0 0 false}} caller={caller.file="/build/core/application/startup.go" caller.L=117 }
Mar 08 08:31:38 DEBUG runtime_settings.json not found, using defaults caller={caller.file="/build/core/application/startup.go" caller.L=214 }
Mar 08 08:31:38 DEBUG Auto loading model into memory from file model="qwen3-asr-0.6b" file="Qwen/Qwen3-ASR-0.6B" caller={caller.file="/build/core/application/startup.go" caller.L=148 }
Mar 08 08:31:38 INFO BackendLoader starting modelID="qwen3-asr-0.6b" backend="qwen-asr" model="Qwen/Qwen3-ASR-0.6B" caller={caller.file="/build/pkg/model/initializers.go" caller.L=159 }
Mar 08 08:31:38 DEBUG Loading model in memory from file file="/models/Qwen/Qwen3-ASR-0.6B" caller={caller.file="/build/pkg/model/loader.go" caller.L=218 }
Mar 08 08:31:38 DEBUG Loading Model with gRPC modelID="qwen3-asr-0.6b" file="/models/Qwen/Qwen3-ASR-0.6B" backend="qwen-asr" options={qwen-asr Qwen/Qwen3-ASR-0.6B qwen3-asr-0.6b {{}} 0xc0005e6c08 map[] 20 2 false} caller={caller.file="/build/pkg/model/initializers.go" caller.L=53 }
Mar 08 08:31:38 DEBUG Loading external backend uri="/backends/intel-qwen-asr/run.sh" caller={caller.file="/build/pkg/model/initializers.go" caller.L=77 }
Mar 08 08:31:38 DEBUG external backend is file file=&{run.sh 192 448 {0 63907278621 0x4f5f7e0} {131 24703 1 33216 0 0 0 0 192 512 9 {1771681821 0} {1771681821 0} {1772944781 583702502} [0 0 0]}} caller={caller.file="/build/pkg/model/initializers.go" caller.L=80 }
Mar 08 08:31:38 DEBUG Loading GRPC Process process="/backends/intel-qwen-asr/run.sh" caller={caller.file="/build/pkg/model/process.go" caller.L=112 }
Mar 08 08:31:38 DEBUG GRPC Service will be running id="qwen3-asr-0.6b" address="127.0.0.1:35089" caller={caller.file="/build/pkg/model/process.go" caller.L=114 }
Mar 08 08:31:38 DEBUG GRPC Service state dir dir="/tmp/go-processmanager2152813416" caller={caller.file="/build/pkg/model/process.go" caller.L=138 }
Mar 08 08:31:38 DEBUG GRPC Service Started caller={caller.file="/build/pkg/model/initializers.go" caller.L=92 }
Mar 08 08:31:38 DEBUG Wait for the service to start up caller={caller.file="/build/pkg/model/initializers.go" caller.L=105 }
Mar 08 08:31:38 DEBUG Options options=ContextSize:1024 Seed:774469262 NBatch:512 MMap:true NGPULayers:9999999 Threads:8 FlashAttention:"auto" caller={caller.file="/build/pkg/model/initializers.go" caller.L=106 }
Mar 08 08:31:38 DEBUG GRPC stdout id="qwen3-asr-0.6b-127.0.0.1:35089" line="Initializing libbackend for intel-qwen-asr" caller={caller.file="/build/pkg/model/process.go" caller.L=162 }
Mar 08 08:31:38 DEBUG GRPC stdout id="qwen3-asr-0.6b-127.0.0.1:35089" line="Using portable Python" caller={caller.file="/build/pkg/model/process.go" caller.L=162 }
Mar 08 08:31:38 DEBUG GRPC stdout id="qwen3-asr-0.6b-127.0.0.1:35089" line="Added /backends/intel-qwen-asr/lib to LD_LIBRARY_PATH for GPU libraries" caller={caller.file="/build/pkg/model/process.go" caller.L=162 }
Mar 08 08:31:40 DEBUG GRPC stderr id="qwen3-asr-0.6b-127.0.0.1:35089" line="/backends/intel-qwen-asr/venv/lib/python3.12/site-packages/transformers/utils/hub.py:110: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead." caller={caller.file="/build/pkg/model/process.go" caller.L=153 }
Mar 08 08:31:40 DEBUG GRPC stderr id="qwen3-asr-0.6b-127.0.0.1:35089" line=" warnings.warn(" caller={caller.file="/build/pkg/model/process.go" caller.L=153 }
Mar 08 08:31:44 DEBUG GRPC stderr id="qwen3-asr-0.6b-127.0.0.1:35089" line="Server started. Listening on: 127.0.0.1:35089" caller={caller.file="/build/pkg/model/process.go" caller.L=153 }
Mar 08 08:31:44 DEBUG GRPC Service Ready caller={caller.file="/build/pkg/model/initializers.go" caller.L=113 }
Mar 08 08:31:44 DEBUG GRPC: Loading model with options options={{{} [] [] 0xc00084b958} 0 [] Qwen/Qwen3-ASR-0.6B 1024 774469262 512 false false true false false false false 9999999 8 0 0 0 0 /models/Qwen/Qwen3-ASR-0.6B false 0 false 0 0 false 0 false false 0 0 0 false 0 0 0 0 0 0 0 auto false /models [] [] [] [] false []} caller={caller.file="/build/pkg/model/initializers.go" caller.L=136 }
Mar 08 08:31:44 DEBUG GRPC stderr id="qwen3-asr-0.6b-127.0.0.1:35089" line="Loading Qwen3-ASR from Qwen/Qwen3-ASR-0.6B" caller={caller.file="/build/pkg/model/process.go" caller.L=153 }
Mar 08 08:31:45 DEBUG GRPC stderr id="qwen3-asr-0.6b-127.0.0.1:35089" line="The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details." caller={caller.file="/build/pkg/model/process.go" caller.L=153 }
Mar 08 08:31:49 DEBUG GRPC stderr id="qwen3-asr-0.6b-127.0.0.1:35089" line="Qwen3-ASR model loaded successfully" caller={caller.file="/build/pkg/model/process.go" caller.L=153 }
Mar 08 08:31:49 DEBUG reading file for dynamic config update filename="/configuration/api_keys.json" caller={caller.file="/build/core/application/config_file_watcher.go" caller.L=65 }
Mar 08 08:31:49 DEBUG processing api keys runtime update numKeys=0 caller={caller.file="/build/core/application/config_file_watcher.go" caller.L=138 }
Mar 08 08:31:49 DEBUG no API keys discovered from dynamic config file caller={caller.file="/build/core/application/config_file_watcher.go" caller.L=152 }
Mar 08 08:31:49 DEBUG total api keys after processing numKeys=0 caller={caller.file="/build/core/application/config_file_watcher.go" caller.L=155 }
Mar 08 08:31:49 DEBUG reading file for dynamic config update filename="/configuration/external_backends.json" caller={caller.file="/build/core/application/config_file_watcher.go" caller.L=65 }
Mar 08 08:31:49 DEBUG processing external_backends.json caller={caller.file="/build/core/application/config_file_watcher.go" caller.L=164 }
Mar 08 08:31:49 DEBUG external backends loaded from external_backends.json caller={caller.file="/build/core/application/config_file_watcher.go" caller.L=181 }
Mar 08 08:31:49 DEBUG reading file for dynamic config update filename="/configuration/runtime_settings.json" caller={caller.file="/build/core/application/config_file_watcher.go" caller.L=65 }
Mar 08 08:31:49 DEBUG processing runtime_settings.json caller={caller.file="/build/core/application/config_file_watcher.go" caller.L=189 }
Mar 08 08:31:49 DEBUG runtime settings loaded from runtime_settings.json caller={caller.file="/build/core/application/config_file_watcher.go" caller.L=359 }
Mar 08 08:31:49 INFO core/startup process completed! caller={caller.file="/build/core/application/startup.go" caller.L=163 }
Mar 08 08:31:49 INFO LocalAI is started and running address=":8080" caller={caller.file="/build/core/cli/run.go" caller.L=293 }
Mar 08 08:31:53 INFO HTTP request method="GET" path="/readyz" status=200 caller={caller.file="/build/core/http/app.go" caller.L=118 }
Mar 08 08:32:04 DEBUG overriding empty model name in request body with value found earlier in middleware chain context localModelName="qwen3-asr-0.6b" caller={caller.file="/build/core/http/middleware/request.go" caller.L=138 }
Mar 08 08:32:04 DEBUG input.Input input="<nil>" caller={caller.file="/build/core/http/middleware/request.go" caller.L=412 }
Mar 08 08:32:04 DEBUG Audio file copied dst="/tmp/whisper2055781948/recording.wav" caller={caller.file="/build/core/http/endpoints/openai/transcription.go" caller.L=74 }
Mar 08 08:32:04 DEBUG Model already loaded in memory model="qwen3-asr-0.6b" caller={caller.file="/build/pkg/model/loader.go" caller.L=256 }
Mar 08 08:32:04 DEBUG Checking model availability model="qwen3-asr-0.6b" caller={caller.file="/build/pkg/model/loader.go" caller.L=259 }
Mar 08 08:32:04 DEBUG Model already loaded model="qwen3-asr-0.6b" caller={caller.file="/build/pkg/model/initializers.go" caller.L=246 }
Mar 08 08:32:05 DEBUG GRPC stderr id="qwen3-asr-0.6b-127.0.0.1:35089" line="Setting `pad_token_id` to `eos_token_id`:151645 for open-end generation." caller={caller.file="/build/pkg/model/process.go" caller.L=153 }
Mar 08 08:32:06 DEBUG Transcribed transcription=&{[{0 0s 0s Turn off kitchen lights. [] }] Turn off kitchen lights.} caller={caller.file="/build/core/http/endpoints/openai/transcription.go" caller.L=81 }
Mar 08 08:32:06 INFO HTTP request method="POST" path="/v1/audio/transcriptions" status=200 caller={caller.file="/build/core/http/app.go" caller.L=118 }
Additional context
Metadata
Metadata
Assignees
Labels
Projects
Status