Skip to content

Open WebUI support #527

@arlm

Description

@arlm

I could connect to Open WebUI, but it returns errors:

To run the server:

python run_inference_server.py -m models/Falcon3-10B-Instruct-1.58bit/ggml-model-i2_s.gguf --host 127.0.0.1 --port 8080

Connect on Open WebUI as OpenAI API with the URL http://127.0.0.1:8080/v1, use a bearer token, and type anything as the token.

The model should be available for you on chats already, but when trying to talk I receive:

Response payload is not completed: <TransferEncodingError: 400, message='Not enough data to satisfy transfer length header.'>

And the server dies with the error:

Error occurred while running command: Command '['build/bin/llama-server', '-m', 'models/Falcon3-10B-Instruct-1.58bit/ggml-model-i2_s.gguf', '-c', '2048', '-t', '2', '-n', '4096', '-ngl', '0', '--temp', '0.8', '--host', '127.0.0.1', '--port', '8080', '--no-mmap', '-np', '1', '-b', '1', '-nocb']' died with <Signals.SIGBUS: 10>.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions