chatbees-dev-llm

Run locally with the small LLMs and ChatBees test container to develop and test LLM applications. This repo describes how to run the small LLMs locally.

The gte-multilingual-base model is used for embedding. The chat completion supports 3 models:

Google Gemma2 2B-Instruct model, using Hugging face local-gemma.
Meta Llama3.2 1B-Instruct or 3B-Instruct model.

Simply run python start_server.py to start a simple server that hosts these 2 models. To specify which model to use, set env before python start_server.py

export ENV_LOCAL_COMPLETION_MODEL=google/gemma-2-2b-it
export ENV_LOCAL_COMPLETION_MODEL=meta-llama/Llama-3.2-1B-Instruct or meta-llama/Llama-3.2-3B-Instruct

For the first run, you need to add a read-only Hugging face token to download the models to local disk. You can explicitly add your huggingface token to ~/.cache/huggingface/token, or call below code.

from huggingface_hub import login
login(token=your_hf_read_only_token)

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
LICENSE		LICENSE
README.md		README.md
llm_factory.py		llm_factory.py
llm_llama.py		llm_llama.py
llm_local_gemma2.py		llm_local_gemma2.py
local_small_llm.py		local_small_llm.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
server.py		server.py
start_server.py		start_server.py
test_llm.py		test_llm.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

chatbees-dev-llm

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

chatbees-dev-llm

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages