Skip to content

issue/294: minicpm-sala model#295

Open
Ceng23333 wants to merge 11 commits intomainfrom
minicpm-sala
Open

issue/294: minicpm-sala model#295
Ceng23333 wants to merge 11 commits intomainfrom
minicpm-sala

Conversation

@Ceng23333
Copy link
Copy Markdown
Contributor

@Ceng23333 Ceng23333 requested a review from a team April 8, 2026 08:12
// - HyPE (RoPE on linear layers; NoPE on sparse layers)
class MiniCPMSALAForCausalLM : public InfinilmModel {
public:
MiniCPMSALAForCausalLM(std::shared_ptr<infinilm::config::ModelConfig> model_config,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

移除接口rank和backend参数

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

移除接口参数

private:
INFINICORE_NN_MODULE(MiniCPMSALAModel, model);
INFINICORE_NN_MODULE(infinilm::layers::linear::ReplicatedLinear, lm_head);
INFINICORE_NN_MODULE(infinicore::nn::Linear, lm_head);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

使用infinilm::layers::linear::ReplicatedLinear, infinicore::nn::Linear不再使用

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

也要修改

std::unique_ptr<cache::CacheConfig> cache_config_;
};

std::shared_ptr<infinilm::config::ModelConfig> create_minicpm_sala_model_config(std::shared_ptr<infinilm::config::ModelConfig> model_config);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

实现这个create_minicpm_sala_model_config函数。

const cache::CacheConfig *MiniCPMSALAForCausalLM::get_cache_config() const {
return cache_config_.get();
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kvcache创建 minicpm_sala_allocate_kv_cache_tensors.cpp文件中


} // namespace infinilm::models::minicpm_sala

namespace {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

添加上模型的注册


class MiniCPMSALADecoderLayer : public infinicore::nn::Module {
public:
MiniCPMSALADecoderLayer(std::shared_ptr<infinilm::config::ModelConfig> model_config,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MiniCPMSALADecoderLayer的移除rank_info和attention_backend参数

std::optional<infinicore::Tensor> cu_seqlens,
std::optional<infinicore::Tensor> block_tables,
std::optional<infinicore::Tensor> slot_mapping) const;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

移除多余的参数,forward只需要(const infinicore::Tensor &positions,
infinicore::Tensor &hidden_states,
infinicore::Tensor &residual);

std::optional<infinicore::Tensor> block_tables,
std::optional<infinicore::Tensor> slot_mapping) const;

void set_rotary_emb(const std::shared_ptr<infinicore::nn::RoPE> &rotary_emb);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

移除set_rotary_emb和reset_cache函数

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

移除set_rotary_emb函数

#include "../../backends/attention_backends.hpp"
#include "../../cache/kv_cache.hpp"
#include "../../config/model_config.hpp"
#include "../../engine/distributed/distributed.hpp"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attention拆成两个类

#include "models_registry.hpp"
#include "llama/llama.hpp"
#include "minicpm_sala/minicpm_sala_for_causal_lm.hpp"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不要修改这个文件的任何代码


#include "../global_state/global_state.hpp"
#include "../models/model_factory.hpp"
#include "../models/models_registry.hpp"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

新增模型,不要修改框架层面上的代码。不能修改该文件

const std::string model_type = model_config->get<std::string>("model_type");
const auto &config_map = models::get_model_config_map();
auto it = config_map.find(model_type);
if (it != config_map.end()) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

新增模型,不要修改框架层面上的代码。不能修改该文件


#include <algorithm>
#include <limits>
#include <memory>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

按照新的方式创建kvcache


void MiniCPMSALAModel::reset_cache(const cache::CacheConfig *cache_config) {
if (cache_config == nullptr) {
kv_cache_minicpm4_ = nullptr;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kvcache创建的代码在csrc/models/minicpm_sala/minicpm_sala_allocate_kv_cache_tensors.cpp中

Comment on lines +77 to +78
if (auto static_cfg = dynamic_cast<const cache::StaticKVCacheConfig *>(cache_config)) {
// Allocate separate caches by KV shape to avoid per-layer padding copies.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

按照新的方式创建kvcache

INFINICORE_NN_MODULE_INIT(o_gate, hidden_size_, num_attention_heads * head_dim_,
model_config->get_quantization_method(), use_bias_, dtype, device);
}
void MiniCPMSALAAttention::set_rotary_emb(const std::shared_ptr<infinicore::nn::RoPE> &rotary_emb) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删除set_rotary_emb函数

std::optional<infinicore::Tensor> cu_seqlens,
std::optional<infinicore::Tensor> block_tables,
std::optional<infinicore::Tensor> slot_mapping) const;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删除set_rotary_emb函数

INFINICORE_NN_MODULE_INIT(mlp, model_config, device);
}

void MiniCPMSALADecoderLayer::set_rotary_emb(const std::shared_ptr<infinicore::nn::RoPE> &rotary_emb) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删除set_rotary_emb函数

Comment on lines +41 to +42
void MiniCPMSALADecoderLayer::reset_cache() {
self_attn_->reset_cache();
Copy link
Copy Markdown
Collaborator

@pengcheng888 pengcheng888 Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删除reset_cache函数


auto to_device = [&](const std::optional<infinicore::Tensor> &t)
-> std::optional<infinicore::Tensor> {
return t.has_value() ? t.value()->to(device) : t;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不要修改

Signed-off-by: Ceng23333 <441651826@qq.com>
Signed-off-by: Ceng23333 <441651826@qq.com>
Signed-off-by: Ceng23333 <441651826@qq.com>
Signed-off-by: Ceng23333 <441651826@qq.com>
Signed-off-by: Ceng23333 <441651826@qq.com>
void reset_cache(const cache::CacheConfig *cache_config) override;

protected:
const cache::CacheConfig *get_cache_config() const override;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_cache_config()属于 infinimodel的抽象类了,移除具体模型中的get_cache_config函数

INFINICORE_NN_MODULE(MiniCPMSALAModel, model);
INFINICORE_NN_MODULE(infinilm::layers::linear::ReplicatedLinear, lm_head);
INFINICORE_NN_MODULE(infinicore::nn::Linear, lm_head);
std::unique_ptr<cache::CacheConfig> cache_config_;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

移除cache_config_参数

MiniCPMSALAModel(std::shared_ptr<infinilm::config::ModelConfig> model_config,
const infinicore::Device &device,
engine::distributed::RankInfo rank_info = engine::distributed::RankInfo(),
backends::AttentionBackend attention_backend = backends::AttentionBackend::Default);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

移除MiniCPMSALAModel的rank_info和attention_backend参数

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attention_backend_ = infinilm::global_state::get_infinilm_config().attention_backend;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const engine::distributed::RankInfo &rank_info = infinilm::global_state::get_tensor_model_parallel_rank_info();

engine::distributed::RankInfo rank_info = engine::distributed::RankInfo(),
backends::AttentionBackend attention_backend = backends::AttentionBackend::Default);

infinicore::Tensor forward(const infinicore::Tensor &input_ids,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

移除past_sequence_lengths total_sequence_lengths input_offsets cu_seqlens block_tables slot_mapping 这写参数。 上面是attn_metadata的数据,只要attn计算时用到,不再一层一层的传递。

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

移除forward的attn_meta参数

std::optional<infinicore::Tensor> block_tables,
std::optional<infinicore::Tensor> slot_mapping) const;

void reset_cache(const cache::CacheConfig *cache_config);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reset_cache 属于 CausalLM类,移除。

INFINICORE_NN_MODULE(infinicore::nn::Embedding, embed_tokens);
INFINICORE_NN_MODULE_VEC(MiniCPMSALADecoderLayer, layers);
INFINICORE_NN_MODULE(infinicore::nn::RMSNorm, norm);
INFINICORE_NN_MODULE(infinicore::nn::RoPE, rotary_emb);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

移除rotary_emb。 infinicore::nn::RoPE的对象在 minicpm_sala_attention类中,通过get_rope创建

infinicore::Tensor forward(const infinicore::Tensor &hidden_states,
const infinicore::Tensor &position_ids,
std::shared_ptr<infinilm::cache::Cache> kv_cache,
std::optional<infinicore::Tensor> past_sequence_lengths,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

移除forward的这些 attn_metadata参数

std::optional<infinicore::Tensor> slot_mapping) const;

void set_rotary_emb(const std::shared_ptr<infinicore::nn::RoPE> &rotary_emb);
void reset_cache();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

移除reset_cache

Comment on lines +89 to +90

kv_cache_minicpm4_ = (minicpm4_layer_count > 0)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

根据minicpm_sala_allocate_kv_cache_tensors.cpp文件创建kvcache。 kv_cache_minicpm4_和kv_cache_lightning_两个变量可以合并成一个

Signed-off-by: Ceng23333 <441651826@qq.com>
MiniCPMSALAModel(std::shared_ptr<infinilm::config::ModelConfig> model_config,
const infinicore::Device &device,
engine::distributed::RankInfo rank_info = engine::distributed::RankInfo(),
backends::AttentionBackend attention_backend = backends::AttentionBackend::Default);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attention_backend_ = infinilm::global_state::get_infinilm_config().attention_backend;

MiniCPMSALAModel(std::shared_ptr<infinilm::config::ModelConfig> model_config,
const infinicore::Device &device,
engine::distributed::RankInfo rank_info = engine::distributed::RankInfo(),
backends::AttentionBackend attention_backend = backends::AttentionBackend::Default);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const engine::distributed::RankInfo &rank_info = infinilm::global_state::get_tensor_model_parallel_rank_info();

engine::distributed::RankInfo rank_info = engine::distributed::RankInfo(),
backends::AttentionBackend attention_backend = backends::AttentionBackend::Default);

infinicore::Tensor forward(const infinicore::Tensor &input_ids,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

移除forward的attn_meta参数

infinicore::Tensor forward(const infinicore::Tensor &position_ids,
const infinicore::Tensor &hidden_states) const;

void set_rotary_emb(const std::shared_ptr<infinicore::nn::RoPE> &rotary_emb);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

移除set_rotary_emb函数

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

通过get_rope()创建 RoPE模块的对象

for k in f.keys():
state_dict[k] = f.get_tensor(k).to(device=device)
# Explicitly cast dtype: some ops (e.g. embedding) may not support BF16 on all backends.
state_dict[k] = f.get_tensor(k).to(device=device, dtype=dtype)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

去掉, dtype=dtype

scale_down = 1.0
scale_lm_head = 1.0
try:
with open(os.path.join(model_path, "config.json")) as f:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

再确认

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: 后续config_json会从 model变量中读,而不是读取文件

scale_down = 1.0
scale_lm_head = 1.0
try:
with open(os.path.join(model_path, "config.json")) as f:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: 后续config_json会从 model变量中读,而不是读取文件


# Apply MiniCPM scaling to loaded tensors (in torch space).
if scale_input != 1.0 and "model.embed_tokens.weight" in model_param:
model_param["model.embed_tokens.weight"] = (
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

根据model_type执行

Signed-off-by: Ceng23333 <441651826@qq.com>
Signed-off-by: Ceng23333 <441651826@qq.com>
Signed-off-by: Ceng23333 <441651826@qq.com>
Signed-off-by: Ceng23333 <441651826@qq.com>
MiniCPMSALAForCausalLM(std::shared_ptr<infinilm::config::ModelConfig> model_config,
const infinicore::Device &device);
const infinicore::Device &device,
engine::distributed::RankInfo rank_info = engine::distributed::RankInfo(),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

移除后两个参数

MiniCPMSALAForCausalLM::MiniCPMSALAForCausalLM(
std::shared_ptr<infinilm::config::ModelConfig> model_config,
const infinicore::Device &device,
engine::distributed::RankInfo rank_info,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

engine::distributed::RankInfo rank_info,
backends::AttentionBackend attention_backend) 移除这两个参数

const Input &input) const {
auto input_ids = input.input_ids.value();
auto position_ids = input.position_ids.value();

Copy link
Copy Markdown
Collaborator

@pengcheng888 pengcheng888 Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

移除这几个无用的变量

auto block_tables = input.block_tables;
auto slot_mapping = input.slot_mapping;

infinilm::global_state::get_forward_context().attn_metadata =
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删除 infinilm::global_state::get_forward_context().attn_metadata 的赋值. 全局变量的 attn_metadata只能由框架赋值


private:
std::shared_ptr<infinilm::config::ModelConfig> model_config_;
std::shared_ptr<infinicore::nn::RoPE> rotary_emb_;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删除 rotary_emb_ 变量.

INFINICORE_NN_MODULE_INIT(embed_tokens, vocab_size, hidden_size_, std::nullopt, dtype, device);
INFINICORE_NN_MODULE_INIT(norm, hidden_size_, model_config_->get<double>("rms_norm_eps"), dtype, device);

// Shared rotary embedding (used by lightning layers only) — match `get_rope` pattern.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MOdel类中的rotary_emb_变量没有被用到, 删除

compute_device_ = device;
const engine::distributed::RankInfo &rank_info = infinilm::global_state::get_tensor_model_parallel_rank_info();
const backends::AttentionBackend attention_backend = infinilm::global_state::get_infinilm_config().attention_backend;

Copy link
Copy Markdown
Collaborator

@pengcheng888 pengcheng888 Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没有的变量被使用删除掉.

const infinicore::Tensor &position_ids) const;

private:
friend class MiniCPMSALAModel;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

???

@pengcheng888
Copy link
Copy Markdown
Collaborator

pengcheng888 commented Apr 10, 2026

请将范围限定在minicpm_sala文件夹中 , 先让ai帮你移除多余的未使用到的头文件, 未使用到的变量. 然后根据最新的评论修改

Signed-off-by: Ceng23333 <441651826@qq.com>
@wooway777
Copy link
Copy Markdown
Collaborator

infinicore那边的是不是也得改改

@pengcheng888
Copy link
Copy Markdown
Collaborator

infinicore那边的是不是也得改改

是, 需要先合并infinicore的pr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants