-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Bug Report: Server crashes after generating first response with Falcon3-10B-Instruct-1.58bit #532
Description
Describe the bug
I am running the official bitnet.cpp server (run_inference_server.py or equivalent) with the Falcon3-10B-Instruct-1.58bit model (GGUF I2_S from tiiuae/Falcon3-10B-Instruct-1.58bit-GGUF).
UPD: with official bitnet too.
Steps to reproduce:
-
Start the server on port 8080.
-
Connect via Open WebUI (OpenAI-compatible API).
-
Send the first chat message → model generates a normal response.
-
Send the second message → server immediately crashes with exit code 3221225725 (0xC00000FD — stack overflow?).
Logs during model load:
llm_load_vocab: control token: 2019 '>>UNUSED_1893<<' is not marked as EOG
llm_load_vocab: control token: 2022 '>>UNUSED_1896<<' is not marked as EOG
...
llm_load_print_meta: arch = llama
llm_load_print_meta: model type = 13B
llm_load_print_meta: model ftype = I2_S - 2 bpw ternary
llm_load_print_meta: general.name = Falcon3-10B-Instruct-1.58bit
llm_load_print_meta: BOS/EOS/EOT/EOG token = 11 '<|endoftext|>'
llm_load_print_meta: PAD token = 2023 '<|pad|>'
The crash happens right after the second prompt processing (n_prompt_tokens = 269 in the log).
Environment
-
OS: Windows 11
-
bitnet.cpp version: latest (cloned from microsoft/BitNet)
-
Model: Falcon3-10B-Instruct-1.58bit (I2_S)
-
Frontend: Open WebUI
-
Context: 4096
Expected behavior
The server should continue working for multiple turns without crashing.
Additional context
This looks like a vocabulary / special token handling issue specific to Falcon3 1.58-bit models (many >>UNUSED_xxxx<< control tokens are not properly marked as EOG). Similar warnings appear in other issues, but the crash after the first successful response seems new.
I can provide the full log file (server_20260405_122502.log) if needed.
Thank you!