Bug Report: Server crashes after generating first response with Falcon3-10B-Instruct-1.58bit

**Describe the bug**

I am running the official `bitnet.cpp` server (`run_inference_server.py` or equivalent) with the **Falcon3-10B-Instruct-1.58bit** model (GGUF I2_S from tiiuae/Falcon3-10B-Instruct-1.58bit-GGUF).
UPD: with official bitnet too.

Steps to reproduce:

1. Start the server on port 8080.

2. Connect via **Open WebUI** (OpenAI-compatible API).

3. Send the first chat message → model generates a normal response.

4. Send the **second** message → server immediately crashes with exit code **3221225725** (0xC00000FD — stack overflow?).

**Logs during model load:**

```

llm_load_vocab: control token: 2019 '>>UNUSED_1893<<' is not marked as EOG

llm_load_vocab: control token: 2022 '>>UNUSED_1896<<' is not marked as EOG

...

llm_load_print_meta: arch = llama

llm_load_print_meta: model type = 13B

llm_load_print_meta: model ftype = I2_S - 2 bpw ternary

llm_load_print_meta: general.name = Falcon3-10B-Instruct-1.58bit

llm_load_print_meta: BOS/EOS/EOT/EOG token = 11 '<|endoftext|>'

llm_load_print_meta: PAD token = 2023 '<|pad|>'

```

The crash happens right after the second prompt processing (`n_prompt_tokens = 269` in the log).

**Environment**

- OS: Windows 11

- bitnet.cpp version: latest (cloned from microsoft/BitNet)

- Model: Falcon3-10B-Instruct-1.58bit (I2_S)

- Frontend: Open WebUI

- Context: 4096

**Expected behavior**

The server should continue working for multiple turns without crashing.

**Additional context**

This looks like a vocabulary / special token handling issue specific to Falcon3 1.58-bit models (many `>>UNUSED_xxxx<<` control tokens are not properly marked as EOG). Similar warnings appear in other issues, but the crash after the first successful response seems new.

I can provide the full log file (`server_20260405_122502.log`) if needed.

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug Report: Server crashes after generating first response with Falcon3-10B-Instruct-1.58bit #532

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug Report: Server crashes after generating first response with Falcon3-10B-Instruct-1.58bit #532

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions