fix: Q1_0_g128 CPU kernel - correct output and AVX-512 SIMD#3
fix: Q1_0_g128 CPU kernel - correct output and AVX-512 SIMD#3jordankzf wants to merge 1 commit intoPrismML-Eng:prismfrom
Conversation
|
Worked as a charm in my Ryzen 5700U. rafaelfrequiao@ideapad:~$ echo "=== 1. Preparando o repositório com o PR #3 ==="
cd ~/ai-lab echo "=== 1. Preparando o repositório com o PR #3 ==="
cd ~/ai-laba.cpp-bonsai
rm -rf llama.cpp-bonsaib.com/PrismML-Eng/llama.cpp.git llama.cpp-bonsai
git clone https://github.com/PrismML-Eng/llama.cpp.git llama.cpp-bonsai
cd llama.cpp-bonsai
# Aqui está a mágica: baixando a correção exata do Pull Request 3
# Aqui está a mágica: baixando a correção exata do Pull Request 3
git fetch origin pull/3/head:correcao-cpu
git checkout correcao-cpu
echo "=== 2. Compilando com a correção ==="
echo "=== 2. Compilando com a correção ==="
cmake -B buildbuild -j$(nproc) --target llama-cli llama-server
cmake --build build -j$(nproc) --target llama-cli llama-server
echo "=== 3. Verificando o modelo 8B ==="
echo "=== 3. Verificando o modelo 8B ==="f/8B/Bonsai-8B.gguf ]; then
if [ ! -f ~/ai-lab/Bonsai-demo/models/gguf/8B/Bonsai-8B.gguf ]; then
echo "Modelo não encontrado. Baixando o Bonsai 8B..."
mkdir -p ~/ai-lab/Bonsai-demo/models/gguf/8B8B/Bonsai-8B.gguf "https://huggingface.co/prism-ml/Bonsai-8B-gguf/re curl -L -o ~/ai-lab/Bonsai-demo/models/gguf/8B/Bonsai-8B.gguf "https://huggingface.co/prism-ml/Bonsai-8B-gguf/resolve/main/Bonsai-8B.gguf"
fi
echo "=== 4. O Teste de Fogo ==="
echo "=== 4. O Teste de Fogo ==="
./build/bin/llama-cli \mo/models/gguf/8B/*.gguf \
-m ~/ai-lab/Bonsai-demo/models/gguf/8B/*.gguf \
-p "A capital do Brasil é " \
-n 50 \
-t 8
=== 1. Preparando o repositório com o PR #3 ===
Clonando en 'llama.cpp-bonsai'...
remote: Enumerating objects: 66862, done.
remote: Counting objects: 100% (39/39), done.
remote: Compressing objects: 100% (37/37), done.
remote: Total 66862 (delta 8), reused 2 (delta 2), pack-reused 66823 (from 2)
Recibiendo objetos: 100% (66862/66862), 307.07 MiB | 3.32 MiB/s, listo.
Resolviendo deltas: 100% (47439/47439), listo.
remote: Enumerating objects: 9, done.
remote: Counting objects: 100% (7/7), done.
remote: Compressing objects: 100% (7/7), done.
remote: Total 9 (delta 0), reused 0 (delta 0), pack-reused 2 (from 1)
Desempaquetando objetos: 100% (9/9), 31.66 KiB | 810.00 KiB/s, listo.
Desde https://github.com/PrismML-Eng/llama.cpp
* [nueva referencia] refs/pull/3/head -> correcao-cpu
Cambiado a rama 'correcao-cpu'
=== 2. Compilando com a correção ===
-- The C compiler identification is GNU 14.2.0
-- The CXX compiler identification is GNU 14.2.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMAKE_BUILD_TYPE=Release
-- Found Git: /usr/bin/git (found version "2.47.3")
-- The ASM compiler identification is GNU
-- Found assembler: /usr/bin/cc
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- ccache found, compilation results will be cached. Disable with GGML_CCACHE=OFF.
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- GGML_SYSTEM_ARCH: x86
-- Including CPU backend
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- x86 detected
-- Adding CPU backend variant ggml-cpu: -march=native
-- ggml version: 0.9.7
-- ggml commit: aec184c6b
-- Found OpenSSL: /usr/lib/x86_64-linux-gnu/libcrypto.so (found version "3.5.4")
-- Performing Test OPENSSL_VERSION_SUPPORTED
-- Performing Test OPENSSL_VERSION_SUPPORTED - Success
-- OpenSSL found: 3.5.4
-- Generating embedded license file for target: common
-- Configuring done (3.9s)
-- Generating done (0.3s)
-- Build files have been written to: /home/rafaelfrequiao/ai-lab/llama.cpp-bonsai/build
[ 0%] Building C object ggml/src/CMakeFiles/ggml-base.dir/ggml-quants.c.o
[ 1%] Building CXX object ggml/src/CMakeFiles/ggml-base.dir/ggml-opt.cpp.o
[ 1%] Building C object ggml/src/CMakeFiles/ggml-base.dir/ggml.c.o
[ 1%] Building CXX object ggml/src/CMakeFiles/ggml-base.dir/ggml-backend.cpp.o
[ 1%] Building CXX object ggml/src/CMakeFiles/ggml-base.dir/ggml.cpp.o
[ 3%] Building CXX object ggml/src/CMakeFiles/ggml-base.dir/gguf.cpp.o
[ 3%] Building C object ggml/src/CMakeFiles/ggml-base.dir/ggml-alloc.c.o
[ 3%] Building CXX object ggml/src/CMakeFiles/ggml-base.dir/ggml-threading.cpp.o
[ 3%] Building CXX object vendor/cpp-httplib/CMakeFiles/cpp-httplib.dir/httplib.cpp.o
[ 3%] Building CXX object common/CMakeFiles/build_info.dir/build-info.cpp.o
[ 3%] Built target build_info
[ 3%] Linking CXX static library libcpp-httplib.a
[ 3%] Built target cpp-httplib
[ 3%] Linking CXX shared library ../../bin/libggml-base.so
[ 3%] Built target ggml-base
[ 5%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/unary-ops.cpp.o
[ 5%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/mmq.cpp.o
[ 5%] Building C object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/quants.c.o
[ 7%] Building C object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.c.o
[ 7%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/repack.cpp.o
[ 9%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/vec.cpp.o
[ 9%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/traits.cpp.o
[ 9%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ops.cpp.o
[ 9%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/amx.cpp.o
[ 9%] Building C object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/arch/x86/quants.c.o
[ 9%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/binary-ops.cpp.o
[ 9%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/hbm.cpp.o
[ 9%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.cpp.o
[ 9%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/arch/x86/repack.cpp.o
[ 9%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/llamafile/sgemm.cpp.o
[ 9%] Linking CXX shared library ../../bin/libggml-cpu.so
[ 9%] Built target ggml-cpu
[ 11%] Building CXX object ggml/src/CMakeFiles/ggml.dir/ggml-backend-reg.cpp.o
[ 11%] Building CXX object ggml/src/CMakeFiles/ggml.dir/ggml-backend-dl.cpp.o
[ 11%] Linking CXX shared library ../../bin/libggml.so
[ 11%] Built target ggml
[ 13%] Building CXX object src/CMakeFiles/llama.dir/llama.cpp.o
[ 13%] Building CXX object src/CMakeFiles/llama.dir/llama-arch.cpp.o
[ 15%] Building CXX object src/CMakeFiles/llama.dir/llama-chat.cpp.o
[ 15%] Building CXX object src/CMakeFiles/llama.dir/llama-batch.cpp.o
[ 15%] Building CXX object src/CMakeFiles/llama.dir/llama-cparams.cpp.o
[ 15%] Building CXX object src/CMakeFiles/llama.dir/llama-adapter.cpp.o
[ 15%] Building CXX object src/CMakeFiles/llama.dir/llama-context.cpp.o
[ 17%] Building CXX object src/CMakeFiles/llama.dir/llama-kv-cache.cpp.o
[ 17%] Building CXX object src/CMakeFiles/llama.dir/llama-io.cpp.o
[ 17%] Building CXX object src/CMakeFiles/llama.dir/llama-hparams.cpp.o
[ 17%] Building CXX object src/CMakeFiles/llama.dir/llama-graph.cpp.o
[ 17%] Building CXX object src/CMakeFiles/llama.dir/llama-impl.cpp.o
[ 17%] Building CXX object src/CMakeFiles/llama.dir/llama-kv-cache-iswa.cpp.o
[ 17%] Building CXX object src/CMakeFiles/llama.dir/llama-grammar.cpp.o
[ 17%] Building CXX object src/CMakeFiles/llama.dir/llama-memory.cpp.o
[ 19%] Building CXX object src/CMakeFiles/llama.dir/llama-memory-hybrid.cpp.o
[ 19%] Building CXX object src/CMakeFiles/llama.dir/llama-quant.cpp.o
[ 19%] Building CXX object src/CMakeFiles/llama.dir/llama-mmap.cpp.o
[ 19%] Building CXX object src/CMakeFiles/llama.dir/llama-memory-hybrid-iswa.cpp.o
[ 19%] Building CXX object src/CMakeFiles/llama.dir/llama-model.cpp.o
[ 19%] Building CXX object src/CMakeFiles/llama.dir/llama-model-saver.cpp.o
[ 19%] Building CXX object src/CMakeFiles/llama.dir/llama-memory-recurrent.cpp.o
[ 19%] Building CXX object src/CMakeFiles/llama.dir/llama-sampler.cpp.o
[ 23%] Building CXX object src/CMakeFiles/llama.dir/llama-vocab.cpp.o
[ 23%] Building CXX object src/CMakeFiles/llama.dir/llama-model-loader.cpp.o
[ 23%] Building CXX object src/CMakeFiles/llama.dir/unicode.cpp.o
[ 25%] Building CXX object src/CMakeFiles/llama.dir/models/arcee.cpp.o
[ 25%] Building CXX object src/CMakeFiles/llama.dir/models/apertus.cpp.o
[ 25%] Building CXX object src/CMakeFiles/llama.dir/unicode-data.cpp.o
[ 25%] Building CXX object src/CMakeFiles/llama.dir/models/arctic.cpp.o
[ 25%] Building CXX object src/CMakeFiles/llama.dir/models/afmoe.cpp.o
[ 25%] Building CXX object src/CMakeFiles/llama.dir/models/arwkv7.cpp.o
[ 26%] Building CXX object src/CMakeFiles/llama.dir/models/baichuan.cpp.o
[ 26%] Building CXX object src/CMakeFiles/llama.dir/models/bert.cpp.o
[ 26%] Building CXX object src/CMakeFiles/llama.dir/models/bailingmoe2.cpp.o
[ 26%] Building CXX object src/CMakeFiles/llama.dir/models/bailingmoe.cpp.o
[ 26%] Building CXX object src/CMakeFiles/llama.dir/models/chameleon.cpp.o
[ 26%] Building CXX object src/CMakeFiles/llama.dir/models/chatglm.cpp.o
[ 26%] Building CXX object src/CMakeFiles/llama.dir/models/bitnet.cpp.o
[ 26%] Building CXX object src/CMakeFiles/llama.dir/models/codeshell.cpp.o
[ 28%] Building CXX object src/CMakeFiles/llama.dir/models/bloom.cpp.o
[ 30%] Building CXX object src/CMakeFiles/llama.dir/models/cohere2-iswa.cpp.o
[ 30%] Building CXX object src/CMakeFiles/llama.dir/models/cogvlm.cpp.o
[ 30%] Building CXX object src/CMakeFiles/llama.dir/models/dbrx.cpp.o
[ 30%] Building CXX object src/CMakeFiles/llama.dir/models/command-r.cpp.o
[ 32%] Building CXX object src/CMakeFiles/llama.dir/models/deci.cpp.o
[ 32%] Building CXX object src/CMakeFiles/llama.dir/models/deepseek.cpp.o
[ 32%] Building CXX object src/CMakeFiles/llama.dir/models/deepseek2.cpp.o
[ 32%] Building CXX object src/CMakeFiles/llama.dir/models/delta-net-base.cpp.o
[ 32%] Building CXX object src/CMakeFiles/llama.dir/models/dots1.cpp.o
[ 34%] Building CXX object src/CMakeFiles/llama.dir/models/dream.cpp.o
[ 34%] Building CXX object src/CMakeFiles/llama.dir/models/ernie4-5-moe.cpp.o
[ 34%] Building CXX object src/CMakeFiles/llama.dir/models/ernie4-5.cpp.o
[ 34%] Building CXX object src/CMakeFiles/llama.dir/models/exaone4.cpp.o
[ 34%] Building CXX object src/CMakeFiles/llama.dir/models/eurobert.cpp.o
[ 36%] Building CXX object src/CMakeFiles/llama.dir/models/exaone-moe.cpp.o
[ 36%] Building CXX object src/CMakeFiles/llama.dir/models/falcon-h1.cpp.o
[ 38%] Building CXX object src/CMakeFiles/llama.dir/models/gemma-embedding.cpp.o
[ 38%] Building CXX object src/CMakeFiles/llama.dir/models/exaone.cpp.o
[ 38%] Building CXX object src/CMakeFiles/llama.dir/models/gemma2-iswa.cpp.o
[ 38%] Building CXX object src/CMakeFiles/llama.dir/models/gemma3.cpp.o
[ 38%] Building CXX object src/CMakeFiles/llama.dir/models/gemma.cpp.o
[ 38%] Building CXX object src/CMakeFiles/llama.dir/models/falcon.cpp.o
[ 38%] Building CXX object src/CMakeFiles/llama.dir/models/gpt2.cpp.o
[ 38%] Building CXX object src/CMakeFiles/llama.dir/models/glm4.cpp.o
[ 38%] Building CXX object src/CMakeFiles/llama.dir/models/glm4-moe.cpp.o
[ 40%] Building CXX object src/CMakeFiles/llama.dir/models/gemma3n-iswa.cpp.o
[ 40%] Building CXX object src/CMakeFiles/llama.dir/models/grok.cpp.o
[ 40%] Building CXX object src/CMakeFiles/llama.dir/models/grovemoe.cpp.o
[ 42%] Building CXX object src/CMakeFiles/llama.dir/models/gptneox.cpp.o
[ 44%] Building CXX object src/CMakeFiles/llama.dir/models/granite.cpp.o
[ 44%] Building CXX object src/CMakeFiles/llama.dir/models/hunyuan-dense.cpp.o
[ 44%] Building CXX object src/CMakeFiles/llama.dir/models/granite-hybrid.cpp.o
[ 46%] Building CXX object src/CMakeFiles/llama.dir/models/jais.cpp.o
[ 46%] Building CXX object src/CMakeFiles/llama.dir/models/jais2.cpp.o
[ 46%] Building CXX object src/CMakeFiles/llama.dir/models/internlm2.cpp.o
[ 46%] Building CXX object src/CMakeFiles/llama.dir/models/kimi-linear.cpp.o
[ 46%] Building CXX object src/CMakeFiles/llama.dir/models/jamba.cpp.o
[ 46%] Building CXX object src/CMakeFiles/llama.dir/models/hunyuan-moe.cpp.o
[ 46%] Building CXX object src/CMakeFiles/llama.dir/models/lfm2.cpp.o
[ 48%] Building CXX object src/CMakeFiles/llama.dir/models/llada-moe.cpp.o
[ 48%] Building CXX object src/CMakeFiles/llama.dir/models/llada.cpp.o
[ 48%] Building CXX object src/CMakeFiles/llama.dir/models/maincoder.cpp.o
[ 48%] Building CXX object src/CMakeFiles/llama.dir/models/llama.cpp.o
[ 48%] Building CXX object src/CMakeFiles/llama.dir/models/llama-iswa.cpp.o
[ 50%] Building CXX object src/CMakeFiles/llama.dir/models/mamba-base.cpp.o
[ 50%] Building CXX object src/CMakeFiles/llama.dir/models/mamba.cpp.o
[ 51%] Building CXX object src/CMakeFiles/llama.dir/models/mimo2-iswa.cpp.o
[ 51%] Building CXX object src/CMakeFiles/llama.dir/models/minicpm3.cpp.o
[ 51%] Building CXX object src/CMakeFiles/llama.dir/models/minimax-m2.cpp.o
[ 51%] Building CXX object src/CMakeFiles/llama.dir/models/mpt.cpp.o
[ 51%] Building CXX object src/CMakeFiles/llama.dir/models/modern-bert.cpp.o
[ 51%] Building CXX object src/CMakeFiles/llama.dir/models/nemotron.cpp.o
[ 51%] Building CXX object src/CMakeFiles/llama.dir/models/mistral3.cpp.o
[ 53%] Building CXX object src/CMakeFiles/llama.dir/models/nemotron-h.cpp.o
[ 53%] Building CXX object src/CMakeFiles/llama.dir/models/neo-bert.cpp.o
[ 53%] Building CXX object src/CMakeFiles/llama.dir/models/olmo2.cpp.o
[ 53%] Building CXX object src/CMakeFiles/llama.dir/models/olmo.cpp.o
[ 55%] Building CXX object src/CMakeFiles/llama.dir/models/olmoe.cpp.o
[ 55%] Building CXX object src/CMakeFiles/llama.dir/models/openelm.cpp.o
[ 55%] Building CXX object src/CMakeFiles/llama.dir/models/openai-moe-iswa.cpp.o
[ 55%] Building CXX object src/CMakeFiles/llama.dir/models/pangu-embedded.cpp.o
[ 55%] Building CXX object src/CMakeFiles/llama.dir/models/orion.cpp.o
[ 55%] Building CXX object src/CMakeFiles/llama.dir/models/plamo.cpp.o
[ 57%] Building CXX object src/CMakeFiles/llama.dir/models/paddleocr.cpp.o
[ 57%] Building CXX object src/CMakeFiles/llama.dir/models/phi2.cpp.o
[ 57%] Building CXX object src/CMakeFiles/llama.dir/models/phi3.cpp.o
[ 59%] Building CXX object src/CMakeFiles/llama.dir/models/plamo2.cpp.o
[ 59%] Building CXX object src/CMakeFiles/llama.dir/models/plm.cpp.o
[ 59%] Building CXX object src/CMakeFiles/llama.dir/models/qwen.cpp.o
[ 61%] Building CXX object src/CMakeFiles/llama.dir/models/qwen2moe.cpp.o
[ 61%] Building CXX object src/CMakeFiles/llama.dir/models/qwen2.cpp.o
[ 61%] Building CXX object src/CMakeFiles/llama.dir/models/qwen2vl.cpp.o
[ 61%] Building CXX object src/CMakeFiles/llama.dir/models/plamo3.cpp.o
[ 61%] Building CXX object src/CMakeFiles/llama.dir/models/qwen3.cpp.o
[ 63%] Building CXX object src/CMakeFiles/llama.dir/models/qwen35.cpp.o
[ 63%] Building CXX object src/CMakeFiles/llama.dir/models/qwen35moe.cpp.o
[ 63%] Building CXX object src/CMakeFiles/llama.dir/models/qwen3moe.cpp.o
[ 63%] Building CXX object src/CMakeFiles/llama.dir/models/qwen3next.cpp.o
[ 65%] Building CXX object src/CMakeFiles/llama.dir/models/qwen3vl-moe.cpp.o
[ 65%] Building CXX object src/CMakeFiles/llama.dir/models/qwen3vl.cpp.o
[ 65%] Building CXX object src/CMakeFiles/llama.dir/models/refact.cpp.o
[ 65%] Building CXX object src/CMakeFiles/llama.dir/models/rnd1.cpp.o
[ 67%] Building CXX object src/CMakeFiles/llama.dir/models/rwkv6.cpp.o
[ 67%] Building CXX object src/CMakeFiles/llama.dir/models/rwkv6-base.cpp.o
[ 67%] Building CXX object src/CMakeFiles/llama.dir/models/rwkv6qwen2.cpp.o
[ 67%] Building CXX object src/CMakeFiles/llama.dir/models/rwkv7.cpp.o
[ 67%] Building CXX object src/CMakeFiles/llama.dir/models/rwkv7-base.cpp.o
[ 69%] Building CXX object src/CMakeFiles/llama.dir/models/seed-oss.cpp.o
[ 69%] Building CXX object src/CMakeFiles/llama.dir/models/smallthinker.cpp.o
[ 69%] Building CXX object src/CMakeFiles/llama.dir/models/stablelm.cpp.o
[ 69%] Building CXX object src/CMakeFiles/llama.dir/models/step35-iswa.cpp.o
[ 69%] Building CXX object src/CMakeFiles/llama.dir/models/starcoder.cpp.o
[ 69%] Building CXX object src/CMakeFiles/llama.dir/models/smollm3.cpp.o
[ 71%] Building CXX object src/CMakeFiles/llama.dir/models/starcoder2.cpp.o
[ 71%] Building CXX object src/CMakeFiles/llama.dir/models/t5-dec.cpp.o
[ 71%] Building CXX object src/CMakeFiles/llama.dir/models/t5-enc.cpp.o
[ 73%] Building CXX object src/CMakeFiles/llama.dir/models/xverse.cpp.o
[ 73%] Building CXX object src/CMakeFiles/llama.dir/models/wavtokenizer-dec.cpp.o
[ 73%] Linking CXX shared library ../bin/libllama.so
[ 73%] Built target llama
[ 75%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/glm4v.cpp.o
[ 75%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/clip.cpp.o
[ 75%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/conformer.cpp.o
[ 75%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/mtmd.cpp.o
[ 75%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/nemotron-v2-vl.cpp.o
[ 75%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/mtmd-audio.cpp.o
[ 75%] Building CXX object common/CMakeFiles/common.dir/arg.cpp.o
[ 75%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/cogvlm.cpp.o
[ 75%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/kimik25.cpp.o
[ 75%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/internvl.cpp.o
[ 76%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/llava.cpp.o
[ 76%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/mtmd-helper.cpp.o
[ 76%] Building CXX object common/CMakeFiles/common.dir/chat-parser-xml-toolcall.cpp.o
[ 76%] Building CXX object common/CMakeFiles/common.dir/chat-parser.cpp.o
[ 78%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/kimivl.cpp.o
[ 78%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/llama4.cpp.o
[ 80%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/minicpmv.cpp.o
[ 80%] Building CXX object common/CMakeFiles/common.dir/chat.cpp.o
[ 80%] Building CXX object common/CMakeFiles/common.dir/chat-peg-parser.cpp.o
[ 80%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/paddleocr.cpp.o
[ 80%] Building CXX object common/CMakeFiles/common.dir/common.cpp.o
[ 80%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/pixtral.cpp.o
[ 80%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/qwen3vl.cpp.o
[ 80%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/siglip.cpp.o
[ 80%] Building CXX object common/CMakeFiles/common.dir/console.cpp.o
[ 80%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/whisper-enc.cpp.o
[ 80%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/mobilenetv5.cpp.o
[ 82%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/youtuvl.cpp.o
[ 82%] Building CXX object common/CMakeFiles/common.dir/json-partial.cpp.o
[ 82%] Building CXX object common/CMakeFiles/common.dir/download.cpp.o
[ 84%] Building CXX object tools/mtmd/CMakeFiles/mtmd.dir/models/qwen2vl.cpp.o
[ 86%] Building CXX object common/CMakeFiles/common.dir/debug.cpp.o
[ 86%] Building CXX object common/CMakeFiles/common.dir/json-schema-to-grammar.cpp.o
[ 86%] Building CXX object common/CMakeFiles/common.dir/llguidance.cpp.o
[ 86%] Building CXX object common/CMakeFiles/common.dir/ngram-map.cpp.o
[ 88%] Building CXX object common/CMakeFiles/common.dir/log.cpp.o
[ 88%] Building CXX object common/CMakeFiles/common.dir/ngram-cache.cpp.o
[ 88%] Building CXX object common/CMakeFiles/common.dir/preset.cpp.o
[ 88%] Building CXX object common/CMakeFiles/common.dir/ngram-mod.cpp.o
[ 90%] Building CXX object common/CMakeFiles/common.dir/peg-parser.cpp.o
[ 92%] Building CXX object common/CMakeFiles/common.dir/speculative.cpp.o
[ 92%] Building CXX object common/CMakeFiles/common.dir/unicode.cpp.o
[ 92%] Building CXX object common/CMakeFiles/common.dir/sampling.cpp.o
[ 92%] Linking CXX shared library ../../bin/libmtmd.so
[ 92%] Building CXX object common/CMakeFiles/common.dir/jinja/lexer.cpp.o
[ 92%] Building CXX object common/CMakeFiles/common.dir/regex-partial.cpp.o
[ 92%] Building CXX object common/CMakeFiles/common.dir/jinja/runtime.cpp.o
[ 92%] Building CXX object common/CMakeFiles/common.dir/jinja/parser.cpp.o
[ 92%] Building CXX object common/CMakeFiles/common.dir/jinja/caps.cpp.o
[ 94%] Building CXX object common/CMakeFiles/common.dir/jinja/value.cpp.o
[ 94%] Building CXX object common/CMakeFiles/common.dir/__/license.cpp.o
[ 94%] Building CXX object common/CMakeFiles/common.dir/jinja/string.cpp.o
[ 96%] Linking CXX static library libcommon.a
[ 96%] Built target mtmd
[ 96%] Built target common
[ 96%] Building CXX object tools/server/CMakeFiles/server-context.dir/server-queue.cpp.o
[ 96%] Building CXX object tools/server/CMakeFiles/server-context.dir/server-task.cpp.o
[ 96%] Building CXX object tools/server/CMakeFiles/server-context.dir/server-context.cpp.o
[ 98%] Building CXX object tools/server/CMakeFiles/server-context.dir/server-common.cpp.o
[ 98%] Linking CXX static library libserver-context.a
[ 98%] Built target server-context
[100%] Building CXX object tools/cli/CMakeFiles/llama-cli.dir/cli.cpp.o
[100%] Linking CXX executable ../../bin/llama-cli
[100%] Built target llama-cli
[ 0%] Built target build_info
[ 0%] Built target cpp-httplib
[ 3%] Built target ggml-base
[ 9%] Built target ggml-cpu
[ 11%] Built target ggml
[ 73%] Built target llama
[ 82%] Built target mtmd
[ 96%] Built target common
[ 98%] Built target server-context
[ 98%] Generating index.html.gz.hpp
[ 98%] Generating loading.html.hpp
[ 98%] Building CXX object tools/server/CMakeFiles/llama-server.dir/server-http.cpp.o
[100%] Building CXX object tools/server/CMakeFiles/llama-server.dir/server.cpp.o
[100%] Building CXX object tools/server/CMakeFiles/llama-server.dir/server-models.cpp.o
[100%] Linking CXX executable ../../bin/llama-server
[100%] Built target llama-server
=== 3. Verificando o modelo 8B ===
=== 4. O Teste de Fogo ===
Loading model...
▄▄ ▄▄
██ ██
██ ██ ▀▀█▄ ███▄███▄ ▀▀█▄ ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██ ██ ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
██ ██
▀▀ ▀▀
build : b8195-aec184c6b
model : Bonsai-8B.gguf
modalities : text
available commands:
/exit or Ctrl+C stop or exit
/regen regenerate the last response
/clear clear the chat history
/read add a text file
> A capital do Brasil é
A capital do Brasil é **Brasília**.
[ Prompt: 0,2 t/s | Generation: 0,2 t/s ]
>
Exiting...
llama_memory_breakdown_print: | memory breakdown [MiB] | total free self model context compute unaccounted |
llama_memory_breakdown_print: | - Host | 10627 = 1099 + 9216 + 312 |
rafaelfrequiao@ideapad:~/ai-lab/llama.cpp-bonsai$ |
|
This look great thanks, there was a few CPU kernel fixes and did not see them until I pushed my changes. For now removed the buggy x86, will merge one of the correct AVX ones. Could you run the KL divergence tests described here: #8 |
|
@khosravipasha I can't run the KL divergence tests. I have The Note that |
The Q1_0_g128 vec_dot kernel had a bug where `sumi` was declared as `int` but accumulated `float` partial products (`d1 * sumi_block`), causing float-to-int truncation that destroyed dot product results and produced gibberish output on CPU. Additionally, the x86 kernel was purely scalar (one bit at a time). This adds an AVX-512BW path that processes 32 elements per iteration using mask_sub + madd + fma, with a single horizontal reduction at the end. Benchmarks (Bonsai-8B, CPU-only, AVX-512): Before: 0.73 t/s prompt, 0.65 t/s generation (gibberish output) After: 23.2 t/s prompt, 13.5 t/s generation (coherent output)
aec184c to
082e830
Compare
|
The f16 GGUF isn't available on HuggingFace so I converted it from prism-ml/Bonsai-1.7B-unpacked (safetensors) using convert_hf_to_gguf.py --outtype f16. Setup: f16 reference: converted from prism-ml/Bonsai-1.7B-unpacked safetensors Same top p: 0.075 +/- 0.017 % |
|
@jordankzf Might mean some issue with the kernels, can you run the same command without your changes? In the meantime I will check the 1.7B unpacked weights to see if they are good. Also might not need to do 100 chunks for this test, few chunks are okay (at least until you get close to 0). Is the output from the mode cohesive? Try few complicated to see if the kernels are working. Two options to get the fp16-gguf:
|
|
Made the changes @khosravipasha Please have a look! KL divergence and coherence test results. Setup: f16 reference: dequantized from Bonsai-1.7B.gguf via llama-quantize --allow-requantize ... F16 Same top p: 97.843 +/- 0.288 % Coherence test (Bonsai-1.7B, complex prompts): Q: Explain the difference between TCP and UDP in networking Q: Write a haiku about programming Q: What causes ocean tides? Explain briefly. All responses are coherent and factually correct. 1.7B model runs at 33-40 t/s on CPU (AVX-512). |
Summary
Bug
sumiwas declaredintbut accumulatedfloatpartial products (d1 * sumi_block), silently truncating to zero for small scale values. Affects both the x86 and generic fallback kernels.Changes
ggml/src/ggml-cpu/arch/x86/quants.c
int sumi->float sumiin scalar fallback#if defined(__AVX512BW__)path: sign-extend int8->int16, mask-negate via_mm512_mask_sub_epi16, pairwise reduce via_mm512_madd_epi16, float accumulate via_mm512_fmadd_ps, single horizontal sum at the endggml/src/ggml-cpu/quants.c
int sumi->float sumiin generic fallbackBenchmarks (Bonsai-8B, CPU-only, Intel Ice Lake AVX-512)