issue/294: minicpm-sala model by Ceng23333 · Pull Request #295 · InfiniTensor/InfiniLM

Ceng23333 · 2026-04-08T08:12:00Z

pengcheng888 · 2026-04-08T08:45:27Z

csrc/models/minicpm_sala/minicpm_sala_for_causal_lm.hpp

+// - HyPE (RoPE on linear layers; NoPE on sparse layers)
 class MiniCPMSALAForCausalLM : public InfinilmModel {
 public:
    MiniCPMSALAForCausalLM(std::shared_ptr<infinilm::config::ModelConfig> model_config,


https://github.com/pengcheng888/InfiniLM/blob/main/csrc/models/minicpm_sala/minicpm_sala_for_causal_lm.hpp 请参开接口。移除rank_info和 attention_backend 参数。

移除接口rank和backend参数

移除接口参数

pengcheng888 · 2026-04-08T08:46:27Z

csrc/models/minicpm_sala/minicpm_sala_for_causal_lm.hpp

+private:
    INFINICORE_NN_MODULE(MiniCPMSALAModel, model);
-    INFINICORE_NN_MODULE(infinilm::layers::linear::ReplicatedLinear, lm_head);
+    INFINICORE_NN_MODULE(infinicore::nn::Linear, lm_head);


使用infinilm::layers::linear::ReplicatedLinear， infinicore::nn::Linear不再使用

也要修改

pengcheng888 · 2026-04-08T08:47:15Z

csrc/models/minicpm_sala/minicpm_sala_for_causal_lm.hpp

+    std::unique_ptr<cache::CacheConfig> cache_config_;
 };

-std::shared_ptr<infinilm::config::ModelConfig> create_minicpm_sala_model_config(std::shared_ptr<infinilm::config::ModelConfig> model_config);


实现这个create_minicpm_sala_model_config函数。

pengcheng888 · 2026-04-08T08:49:12Z

csrc/models/minicpm_sala/minicpm_sala_for_causal_lm.cpp

+const cache::CacheConfig *MiniCPMSALAForCausalLM::get_cache_config() const {
+    return cache_config_.get();
 }



kvcache创建 minicpm_sala_allocate_kv_cache_tensors.cpp文件中

pengcheng888 · 2026-04-08T08:49:29Z

csrc/models/minicpm_sala/minicpm_sala_for_causal_lm.cpp


 } // namespace infinilm::models::minicpm_sala

-namespace {


添加上模型的注册

pengcheng888 · 2026-04-08T08:50:44Z

csrc/models/minicpm_sala/minicpm_sala_decoder_layer.hpp

+
+class MiniCPMSALADecoderLayer : public infinicore::nn::Module {
+public:
+    MiniCPMSALADecoderLayer(std::shared_ptr<infinilm::config::ModelConfig> model_config,


https://github.com/pengcheng888/InfiniLM/blob/main/csrc/models/minicpm_sala/minicpm_sala_decoderLayer.hpp 参考实现接口，移除多余的参数

MiniCPMSALADecoderLayer的移除rank_info和attention_backend参数

pengcheng888 · 2026-04-08T08:51:08Z

csrc/models/minicpm_sala/minicpm_sala_decoder_layer.hpp

+                               std::optional<infinicore::Tensor> cu_seqlens,
+                               std::optional<infinicore::Tensor> block_tables,
+                               std::optional<infinicore::Tensor> slot_mapping) const;
+


移除多余的参数，forward只需要(const infinicore::Tensor &positions,
infinicore::Tensor &hidden_states,
infinicore::Tensor &residual);

pengcheng888 · 2026-04-08T08:51:31Z

csrc/models/minicpm_sala/minicpm_sala_decoder_layer.hpp

+                               std::optional<infinicore::Tensor> block_tables,
+                               std::optional<infinicore::Tensor> slot_mapping) const;
+
+    void set_rotary_emb(const std::shared_ptr<infinicore::nn::RoPE> &rotary_emb);


移除set_rotary_emb和reset_cache函数

移除set_rotary_emb函数

pengcheng888 · 2026-04-08T08:52:48Z

csrc/models/minicpm_sala/minicpm_sala_attention.hpp

+#include "../../backends/attention_backends.hpp"
+#include "../../cache/kv_cache.hpp"
+#include "../../config/model_config.hpp"
+#include "../../engine/distributed/distributed.hpp"


attention拆成两个类

pengcheng888 · 2026-04-08T08:53:08Z

csrc/models/model_factory.cpp

-#include "models_registry.hpp"
+#include "llama/llama.hpp"
+#include "minicpm_sala/minicpm_sala_for_causal_lm.hpp"



不要修改这个文件的任何代码

pengcheng888 · 2026-04-08T08:53:59Z

csrc/engine/rank_worker.cpp


 #include "../global_state/global_state.hpp"
 #include "../models/model_factory.hpp"
 #include "../models/models_registry.hpp"


新增模型，不要修改框架层面上的代码。不能修改该文件

pengcheng888 · 2026-04-08T08:54:25Z

csrc/config/config_factory.cpp

    const std::string model_type = model_config->get<std::string>("model_type");
    const auto &config_map = models::get_model_config_map();
    auto it = config_map.find(model_type);
    if (it != config_map.end()) {


新增模型，不要修改框架层面上的代码。不能修改该文件

pengcheng888 · 2026-04-08T08:55:58Z

csrc/cache/kv_cache.hpp


 #include <algorithm>
 #include <limits>
 #include <memory>


按照新的方式创建kvcache

pengcheng888 · 2026-04-08T08:56:54Z

csrc/models/minicpm_sala/minicpm_sala_model.cpp

+
+void MiniCPMSALAModel::reset_cache(const cache::CacheConfig *cache_config) {
+    if (cache_config == nullptr) {
+        kv_cache_minicpm4_ = nullptr;


kvcache创建的代码在csrc/models/minicpm_sala/minicpm_sala_allocate_kv_cache_tensors.cpp中

pengcheng888 · 2026-04-08T08:57:48Z

csrc/models/minicpm_sala/minicpm_sala_model.cpp

+    if (auto static_cfg = dynamic_cast<const cache::StaticKVCacheConfig *>(cache_config)) {
+        // Allocate separate caches by KV shape to avoid per-layer padding copies.


按照新的方式创建kvcache

pengcheng888 · 2026-04-08T08:58:54Z

csrc/models/minicpm_sala/minicpm_sala_attention.cpp

-        INFINICORE_NN_MODULE_INIT(o_gate, hidden_size_, num_attention_heads * head_dim_,
-                                  model_config->get_quantization_method(), use_bias_, dtype, device);
-    }
+void MiniCPMSALAAttention::set_rotary_emb(const std::shared_ptr<infinicore::nn::RoPE> &rotary_emb) {


删除set_rotary_emb函数

pengcheng888 · 2026-04-08T08:59:02Z

csrc/models/minicpm_sala/minicpm_sala_attention.hpp

+                               std::optional<infinicore::Tensor> cu_seqlens,
+                               std::optional<infinicore::Tensor> block_tables,
+                               std::optional<infinicore::Tensor> slot_mapping) const;
+


删除set_rotary_emb函数

pengcheng888 · 2026-04-08T08:59:14Z

csrc/models/minicpm_sala/minicpm_sala_decoder_layer.cpp

+    INFINICORE_NN_MODULE_INIT(mlp, model_config, device);
+}
+
+void MiniCPMSALADecoderLayer::set_rotary_emb(const std::shared_ptr<infinicore::nn::RoPE> &rotary_emb) {


删除set_rotary_emb函数

pengcheng888 · 2026-04-08T08:59:29Z

csrc/models/minicpm_sala/minicpm_sala_decoder_layer.cpp

+void MiniCPMSALADecoderLayer::reset_cache() {
+    self_attn_->reset_cache();


删除reset_cache函数

pengcheng888 · 2026-04-08T09:00:55Z

csrc/engine/infer_engine.cpp


    auto to_device = [&](const std::optional<infinicore::Tensor> &t)
        -> std::optional<infinicore::Tensor> {
-        return t.has_value() ? t.value()->to(device) : t;


不要修改

Signed-off-by: Ceng23333 <441651826@qq.com>

pengcheng888 · 2026-04-09T01:53:51Z

csrc/models/minicpm_sala/minicpm_sala_for_causal_lm.hpp

    void reset_cache(const cache::CacheConfig *cache_config) override;

-protected:
+    const cache::CacheConfig *get_cache_config() const override;


get_cache_config()属于 infinimodel的抽象类了，移除具体模型中的get_cache_config函数

pengcheng888 · 2026-04-09T01:55:25Z

csrc/models/minicpm_sala/minicpm_sala_for_causal_lm.hpp

    INFINICORE_NN_MODULE(MiniCPMSALAModel, model);
-    INFINICORE_NN_MODULE(infinilm::layers::linear::ReplicatedLinear, lm_head);
+    INFINICORE_NN_MODULE(infinicore::nn::Linear, lm_head);
+    std::unique_ptr<cache::CacheConfig> cache_config_;


移除cache_config_参数

pengcheng888 · 2026-04-09T01:57:54Z

csrc/models/minicpm_sala/minicpm_sala_model.hpp

+    MiniCPMSALAModel(std::shared_ptr<infinilm::config::ModelConfig> model_config,
+                     const infinicore::Device &device,
+                     engine::distributed::RankInfo rank_info = engine::distributed::RankInfo(),
+                     backends::AttentionBackend attention_backend = backends::AttentionBackend::Default);


移除MiniCPMSALAModel的rank_info和attention_backend参数

attention_backend_ = infinilm::global_state::get_infinilm_config().attention_backend;

const engine::distributed::RankInfo &rank_info = infinilm::global_state::get_tensor_model_parallel_rank_info();

pengcheng888 · 2026-04-09T02:00:35Z

csrc/models/minicpm_sala/minicpm_sala_model.hpp

+                     engine::distributed::RankInfo rank_info = engine::distributed::RankInfo(),
+                     backends::AttentionBackend attention_backend = backends::AttentionBackend::Default);
+
+    infinicore::Tensor forward(const infinicore::Tensor &input_ids,


移除past_sequence_lengths total_sequence_lengths input_offsets cu_seqlens block_tables slot_mapping 这写参数。上面是attn_metadata的数据，只要attn计算时用到，不再一层一层的传递。

移除forward的attn_meta参数

pengcheng888 · 2026-04-09T02:01:44Z

csrc/models/minicpm_sala/minicpm_sala_model.hpp

+                               std::optional<infinicore::Tensor> block_tables,
+                               std::optional<infinicore::Tensor> slot_mapping) const;
+
+    void reset_cache(const cache::CacheConfig *cache_config);


reset_cache 属于 CausalLM类，移除。

pengcheng888 · 2026-04-09T02:03:33Z

csrc/models/minicpm_sala/minicpm_sala_model.hpp

+    INFINICORE_NN_MODULE(infinicore::nn::Embedding, embed_tokens);
+    INFINICORE_NN_MODULE_VEC(MiniCPMSALADecoderLayer, layers);
+    INFINICORE_NN_MODULE(infinicore::nn::RMSNorm, norm);
+    INFINICORE_NN_MODULE(infinicore::nn::RoPE, rotary_emb);


移除rotary_emb。 infinicore::nn::RoPE的对象在 minicpm_sala_attention类中，通过get_rope创建

pengcheng888 · 2026-04-09T02:05:57Z

csrc/models/minicpm_sala/minicpm_sala_decoder_layer.hpp

+    infinicore::Tensor forward(const infinicore::Tensor &hidden_states,
+                               const infinicore::Tensor &position_ids,
+                               std::shared_ptr<infinilm::cache::Cache> kv_cache,
+                               std::optional<infinicore::Tensor> past_sequence_lengths,


移除forward的这些 attn_metadata参数

pengcheng888 · 2026-04-09T02:06:37Z

csrc/models/minicpm_sala/minicpm_sala_decoder_layer.hpp

+                               std::optional<infinicore::Tensor> slot_mapping) const;
+
+    void set_rotary_emb(const std::shared_ptr<infinicore::nn::RoPE> &rotary_emb);
+    void reset_cache();


移除reset_cache

pengcheng888 · 2026-04-09T02:08:31Z

csrc/models/minicpm_sala/minicpm_sala_model.cpp

+
+        kv_cache_minicpm4_ = (minicpm4_layer_count > 0)


根据minicpm_sala_allocate_kv_cache_tensors.cpp文件创建kvcache。 kv_cache_minicpm4_和kv_cache_lightning_两个变量可以合并成一个

Signed-off-by: Ceng23333 <441651826@qq.com>

pengcheng888 · 2026-04-09T03:25:44Z

csrc/models/minicpm_sala/minicpm_sala_model.hpp

+    MiniCPMSALAModel(std::shared_ptr<infinilm::config::ModelConfig> model_config,
+                     const infinicore::Device &device,
+                     engine::distributed::RankInfo rank_info = engine::distributed::RankInfo(),
+                     backends::AttentionBackend attention_backend = backends::AttentionBackend::Default);


attention_backend_ = infinilm::global_state::get_infinilm_config().attention_backend;

pengcheng888 · 2026-04-09T03:25:51Z

csrc/models/minicpm_sala/minicpm_sala_model.hpp

+    MiniCPMSALAModel(std::shared_ptr<infinilm::config::ModelConfig> model_config,
+                     const infinicore::Device &device,
+                     engine::distributed::RankInfo rank_info = engine::distributed::RankInfo(),
+                     backends::AttentionBackend attention_backend = backends::AttentionBackend::Default);


const engine::distributed::RankInfo &rank_info = infinilm::global_state::get_tensor_model_parallel_rank_info();

pengcheng888 · 2026-04-09T03:26:24Z

csrc/models/minicpm_sala/minicpm_sala_model.hpp

+                     engine::distributed::RankInfo rank_info = engine::distributed::RankInfo(),
+                     backends::AttentionBackend attention_backend = backends::AttentionBackend::Default);
+
+    infinicore::Tensor forward(const infinicore::Tensor &input_ids,


移除forward的attn_meta参数

pengcheng888 · 2026-04-09T03:30:43Z

csrc/models/minicpm_sala/minicpm_sala_attention.hpp

+    infinicore::Tensor forward(const infinicore::Tensor &position_ids,
+                               const infinicore::Tensor &hidden_states) const;
+
+    void set_rotary_emb(const std::shared_ptr<infinicore::nn::RoPE> &rotary_emb);


移除set_rotary_emb函数

通过get_rope()创建 RoPE模块的对象

pengcheng888 · 2026-04-09T05:56:06Z

python/infinilm/modeling_utils.py

        for k in f.keys():
-            state_dict[k] = f.get_tensor(k).to(device=device)
+            # Explicitly cast dtype: some ops (e.g. embedding) may not support BF16 on all backends.
+            state_dict[k] = f.get_tensor(k).to(device=device, dtype=dtype)


去掉, dtype=dtype

pengcheng888 · 2026-04-09T05:58:08Z

python/infinilm/modeling_utils.py

+    scale_down = 1.0
+    scale_lm_head = 1.0
+    try:
+        with open(os.path.join(model_path, "config.json")) as f:


TODO: 后续config_json会从 model变量中读，而不是读取文件

pengcheng888 · 2026-04-09T06:00:28Z

python/infinilm/modeling_utils.py

+    scale_down = 1.0
+    scale_lm_head = 1.0
+    try:
+        with open(os.path.join(model_path, "config.json")) as f:


TODO: 后续config_json会从 model变量中读，而不是读取文件

pengcheng888 · 2026-04-09T06:04:00Z

python/infinilm/modeling_utils.py


+            # Apply MiniCPM scaling to loaded tensors (in torch space).
+            if scale_input != 1.0 and "model.embed_tokens.weight" in model_param:
+                model_param["model.embed_tokens.weight"] = (


根据model_type执行

Signed-off-by: Ceng23333 <441651826@qq.com>

pengcheng888 · 2026-04-10T01:54:10Z

csrc/models/minicpm_sala/minicpm_sala_for_causal_lm.hpp

    MiniCPMSALAForCausalLM(std::shared_ptr<infinilm::config::ModelConfig> model_config,
-                           const infinicore::Device &device);
+                           const infinicore::Device &device,
+                           engine::distributed::RankInfo rank_info = engine::distributed::RankInfo(),


移除后两个参数

pengcheng888 · 2026-04-10T01:55:35Z

csrc/models/minicpm_sala/minicpm_sala_for_causal_lm.cpp

+MiniCPMSALAForCausalLM::MiniCPMSALAForCausalLM(
+    std::shared_ptr<infinilm::config::ModelConfig> model_config,
+    const infinicore::Device &device,
+    engine::distributed::RankInfo rank_info,


engine::distributed::RankInfo rank_info,
backends::AttentionBackend attention_backend) 移除这两个参数

pengcheng888 · 2026-04-10T01:56:59Z

csrc/models/minicpm_sala/minicpm_sala_for_causal_lm.cpp

+    const Input &input) const {
+    auto input_ids = input.input_ids.value();
+    auto position_ids = input.position_ids.value();
+


移除这几个无用的变量

pengcheng888 · 2026-04-10T01:58:25Z

csrc/models/minicpm_sala/minicpm_sala_for_causal_lm.cpp

+    auto block_tables = input.block_tables;
+    auto slot_mapping = input.slot_mapping;
+
+    infinilm::global_state::get_forward_context().attn_metadata =


删除 infinilm::global_state::get_forward_context().attn_metadata 的赋值. 全局变量的 attn_metadata只能由框架赋值

pengcheng888 · 2026-04-10T02:00:18Z

csrc/models/minicpm_sala/minicpm_sala_model.hpp

+
+private:
+    std::shared_ptr<infinilm::config::ModelConfig> model_config_;
+    std::shared_ptr<infinicore::nn::RoPE> rotary_emb_;


删除 rotary_emb_ 变量.

pengcheng888 · 2026-04-10T02:01:40Z

csrc/models/minicpm_sala/minicpm_sala_model.cpp

+    INFINICORE_NN_MODULE_INIT(embed_tokens, vocab_size, hidden_size_, std::nullopt, dtype, device);
+    INFINICORE_NN_MODULE_INIT(norm, hidden_size_, model_config_->get<double>("rms_norm_eps"), dtype, device);
+
+    // Shared rotary embedding (used by lightning layers only) — match `get_rope` pattern.


MOdel类中的rotary_emb_变量没有被用到, 删除

pengcheng888 · 2026-04-10T02:02:39Z

csrc/models/minicpm_sala/minicpm_sala_model.cpp

+    compute_device_ = device;
+    const engine::distributed::RankInfo &rank_info = infinilm::global_state::get_tensor_model_parallel_rank_info();
+    const backends::AttentionBackend attention_backend = infinilm::global_state::get_infinilm_config().attention_backend;
+


没有的变量被使用删除掉.

pengcheng888 · 2026-04-10T02:03:55Z

csrc/models/minicpm_sala/minicpm_sala_decoder_layer.hpp

+                               const infinicore::Tensor &position_ids) const;
+
+private:
+    friend class MiniCPMSALAModel;


pengcheng888 · 2026-04-10T02:11:33Z

请将范围限定在minicpm_sala文件夹中 , 先让ai帮你移除多余的未使用到的头文件, 未使用到的变量. 然后根据最新的评论修改

Signed-off-by: Ceng23333 <441651826@qq.com>

wooway777 · 2026-04-10T03:32:37Z

infinicore那边的是不是也得改改

pengcheng888 · 2026-04-10T03:46:17Z

infinicore那边的是不是也得改改

是, 需要先合并infinicore的pr

Ceng23333 requested a review from a team April 8, 2026 08:12

pengcheng888 reviewed Apr 8, 2026

View reviewed changes

pengcheng888 requested changes Apr 8, 2026

View reviewed changes

squash for refactor

c8e6f32

Signed-off-by: Ceng23333 <441651826@qq.com>

Ceng23333 force-pushed the minicpm-sala branch from 2c913cb to c8e6f32 Compare April 8, 2026 11:11

Ceng23333 added 4 commits April 9, 2026 01:38

refactor minicpm-sala

d037eec

Signed-off-by: Ceng23333 <441651826@qq.com>

cleanup code

b936149

Signed-off-by: Ceng23333 <441651826@qq.com>

revert server

8f85cb7

Signed-off-by: Ceng23333 <441651826@qq.com>

revert some code

0d98e75

Signed-off-by: Ceng23333 <441651826@qq.com>

pengcheng888 reviewed Apr 9, 2026

View reviewed changes

refactor

0583ab5

Signed-off-by: Ceng23333 <441651826@qq.com>

pengcheng888 requested changes Apr 9, 2026

View reviewed changes

pengcheng888 reviewed Apr 9, 2026

View reviewed changes

pengcheng888 requested changes Apr 9, 2026

View reviewed changes

Ceng23333 added 4 commits April 9, 2026 06:12

refactor

e11223f

Signed-off-by: Ceng23333 <441651826@qq.com>

refactor

33eb78d

Signed-off-by: Ceng23333 <441651826@qq.com>

refactor

54a07dd

Signed-off-by: Ceng23333 <441651826@qq.com>

seperate 2 attn

f9f6a12

Signed-off-by: Ceng23333 <441651826@qq.com>

pengcheng888 reviewed Apr 10, 2026

View reviewed changes

pengcheng888 requested changes Apr 10, 2026

View reviewed changes

cleanup code

fe79f91

Signed-off-by: Ceng23333 <441651826@qq.com>

		if (auto static_cfg = dynamic_cast<const cache::StaticKVCacheConfig *>(cache_config)) {
		// Allocate separate caches by KV shape to avoid per-layer padding copies.

		void MiniCPMSALADecoderLayer::reset_cache() {
		self_attn_->reset_cache();

Conversation

Ceng23333 commented Apr 8, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pengcheng888 Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

pengcheng888 Apr 8, 2026 •

edited

Loading