feat(openllama): support openllama-3B#25
Open
xingchensong wants to merge 1 commit intoMegEngine:mainfrom
Open
Conversation
chenqy4933
reviewed
May 31, 2023
| ./build/bin/quantize ${PATH_TO_HUGGINGFACE_OPENLLAMA}/ggml-model-f16.bin ggml-model-q4_0.bin q4_0 | ||
| ``` | ||
|
|
||
| - 克隆仓库后,需要将 commit 回退到 b608b55,因为 InferLLM 最高只支持 ggjt.v1 格式的模型,而 llama.cpp 目前 (commit: 7552ac586380f202b75b18aa216ecfefbd438d94) 已更新到 ggjt.v3 且不向前兼容 |
Collaborator
There was a problem hiding this comment.
模型格式是可以自定义的,ChatGLM 中就是自定义的模型格式,自定义的模型格式需要在graph中加对应的解析方法就可以
| @@ -0,0 +1,142 @@ | |||
| diff --git a/convert.py b/convert.py | |||
Collaborator
There was a problem hiding this comment.
可以直接在 InferLLM 中添加一个convert.py 以及量化的cpp吗?这样就不依赖于llama.cpp这个工程了
Contributor
Author
There was a problem hiding this comment.
ok,本来的确是这么想的,但是需要一些工作量,所以偷懒用现在这种方式了
Collaborator
|
赞,非常好的想法 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Brief Intro
This PR provides a guide to quantize and run openllama-3B.
The reasons why we need openllama-3B:
Initial Result
The answer looks good to me
TODO