-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Open WebUI support #527
Copy link
Copy link
Open
Description
I could connect to Open WebUI, but it returns errors:
To run the server:
python run_inference_server.py -m models/Falcon3-10B-Instruct-1.58bit/ggml-model-i2_s.gguf --host 127.0.0.1 --port 8080Connect on Open WebUI as OpenAI API with the URL http://127.0.0.1:8080/v1, use a bearer token, and type anything as the token.
The model should be available for you on chats already, but when trying to talk I receive:
Response payload is not completed: <TransferEncodingError: 400, message='Not enough data to satisfy transfer length header.'>
And the server dies with the error:
Error occurred while running command: Command '['build/bin/llama-server', '-m', 'models/Falcon3-10B-Instruct-1.58bit/ggml-model-i2_s.gguf', '-c', '2048', '-t', '2', '-n', '4096', '-ngl', '0', '--temp', '0.8', '--host', '127.0.0.1', '--port', '8080', '--no-mmap', '-np', '1', '-b', '1', '-nocb']' died with <Signals.SIGBUS: 10>.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels