issue/1124: minicpm-sala model by Ceng23333 · Pull Request #1125 · InfiniTensor/InfiniCore

Ceng23333 · 2026-04-07T09:08:33Z

#1124

wooway777

除评论内容外还需要解conflict

wooway777 · 2026-04-09T06:59:06Z

test/infinicore/ops/infllmv2_attention.py

这三个测试最好不要放在这个文件夹里，infinicore/ops里面放的都是基于测试框架写出来的测试，可以由run.py统一运行。

这种单独写的测试放在这里其他人也很难注意到

wooway777 · 2026-04-09T07:01:40Z

test/infinicore/ops/simple_gla_decode_recurrent.py

wooway777 · 2026-04-09T07:01:48Z

test/infinicore/ops/simple_gla_prefill.py

wooway777 · 2026-04-09T07:04:44Z

include/infinicore/ops/infllmv2_api.hpp

这些跟infinicore/adaptors里的同名接口的区别在于多几个参数是么？
如果他们起到的是类似的效果，是不是应该放在adaptor里面

wooway777 · 2026-04-09T07:05:09Z

include/infinicore/ops/infllmv2_attention.hpp

这些是否需要注册成图算子？

wooway777 · 2026-04-09T07:29:44Z

src/infiniop/ops/simple_gla_prefill/nvidia/simple_gla_prefill_nvidia_cuda.cu

+
+namespace {
+
+__device__ __forceinline__ float bf16_to_f32(__nv_bfloat16 x) { return __bfloat162float(x); }


这么写完海光之类的应该会炸

wooway777 · 2026-04-09T07:32:11Z

src/infiniop/ops/simple_gla_prefill/nvidia/simple_gla_prefill_nvidia_cuda.cu

+
+template <>
+struct Convert<__half> {
+    __device__ static float to_f32(__half x) { return f16_to_f32(x); }


感觉仓库里已经有不少转换函数了，好多kernel里都自己定义了转换函数···不确定应不应该什么时候整理一下

wooway777 · 2026-04-09T07:36:03Z

src/infiniop/ops/simple_gla_prefill/operator.cc

+    infiniopTensorDescriptor_t v_desc,
+    infiniopTensorDescriptor_t g_gamma_desc) {
+
+#define CREATE_CUDA(CASE, NAMESPACE)                                                           \


我在swiglu里定义create cuda是为了区分cuda/类cuda和通用实现

wooway777 · 2026-04-09T07:38:44Z

.gitmodules

 	path = third_party/nlohmann_json
 	url = https://github.com/nlohmann/json.git
 	branch = master
+[submodule "third_party/infllmv2_cuda_impl"]


这个如果是nv专用的，是不是应该考虑像fla和cutlass一样要求手动拉取，而不是加成submodule所有平台不管用不用都clone

wooway777 · 2026-04-09T07:39:16Z

xmake.lua

 end

+-- InfLLM-V2 direct kernels (requires aten; link against infllmv2_cuda_impl .so)
+option("infllmv2")


readme最好有相关体现

PanZezhong1725 · 2026-04-09T08:09:11Z

include/infinicore/ops/infllmv2_attention.hpp

+//
+// Returns:
+//   [total_q, nheads, head_dim]
+Tensor infllmv2_varlen(const Tensor &q,


函数名字要体现是attention算子，下同

PanZezhong1725 · 2026-04-09T08:14:56Z

src/infinicore/ops/infllmv2_attention/infllmv2_attention.cc

+        }
+        auto cpu_lens = seqlens_k.to(at::kCPU);
+        int32_t len0 = cpu_lens.numel() > 0 ? cpu_lens.data_ptr<int32_t>()[0] : -1;
+        f << "[infinicore][infllmv2][" << op_name << "]"


为什么不用spdlog

PanZezhong1725 · 2026-04-09T08:16:31Z

src/infinicore/ops/simple_gla_attention/simple_gla_attention.cc

+#include <stdexcept>
+#include <vector>
+
+namespace infinicore::op {


这个算子实现是不是应该放到infiniop里去

PanZezhong1725 · 2026-04-09T08:18:32Z

xmake.lua

    add_files("src/infinicore/pybind11/**.cc")

    set_installdir("python/infinicore")
+    after_build(function (target)


这应该是install的工作

Signed-off-by: Ceng23333 <441651826@qq.com>

Ceng23333 requested a review from a team April 7, 2026 09:08

Ceng23333 changed the title ~~squash for rebase~~ issue/1124: minicpm-sala model Apr 7, 2026

Ceng23333 force-pushed the minicpm_sala_patches branch from c99c78a to fb8eba4 Compare April 8, 2026 02:55

Ceng23333 requested review from Ziminli, kilinchange, voltjia and wooway777 April 8, 2026 03:18

wooway777 requested changes Apr 9, 2026

View reviewed changes

wooway777 requested a review from PanZezhong1725 April 9, 2026 07:55

PanZezhong1725 requested changes Apr 9, 2026

View reviewed changes

Ceng23333 added 3 commits April 10, 2026 09:09

issue/1124: minicpm-sala model

e2f3fbb

Signed-off-by: Ceng23333 <441651826@qq.com>

fix infllmv2 link

9ab03c7

Signed-off-by: Ceng23333 <441651826@qq.com>

resolve comments

b2c5e1b

Signed-off-by: Ceng23333 <441651826@qq.com>

Ceng23333 force-pushed the minicpm_sala_patches branch from 0435097 to b2c5e1b Compare April 10, 2026 12:40

Ceng23333 added 2 commits April 10, 2026 12:51

clean 3rdparty

ba76204

Signed-off-by: Ceng23333 <441651826@qq.com>

fix format

2c4bdf4

Signed-off-by: Ceng23333 <441651826@qq.com>


		namespace {

		__device__ __forceinline__ float bf16_to_f32(__nv_bfloat16 x) { return __bfloat162float(x); }

Conversation

Ceng23333 commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wooway777 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Ceng23333 commented Apr 7, 2026 •

edited

Loading