LLM Collaboration - Code Completion

This repo provides the extended environments for CoMLRL.

Installation

pip install comlrl
# Install PyTorch compatible with your device

Or via conda-forge:

conda install -c conda-forge comlrl
# Install PyTorch compatible with your device

Dataset: ClassEval

Contents: each sample includes a class skeleton, method stubs (with docstrings or pass), and canonical hidden tests.
Splitting: train/train_magrpo.py loads explicit HF slices from dataset.train_split and dataset.eval_split (e.g., test[:50] and test[50:]).
Subsetting: if a split name is missing (e.g., ClassEval only has test), the loader falls back to the first available split before slicing.
Prompting: prompts include the sanitized class skeleton plus per-agent method assignments. The default strategy assigns 1-parameter methods to agent 0 and all other methods to agent 1.
Testing: reward code merges agent completions back into the skeleton and runs the provided unit tests inside a temporary directory to isolate state.

Settings

Key sections in configs/magrpo_classeval_config.yaml:

model: base checkpoint (Qwen/Qwen2.5-Coder-3B-Instruct by default), tokenizer/model kwargs, and device mapping.
dataset: dataset name and split strings (train_split, eval_split) for ClassEval sub-slices or local mirrors.
external: feedback configuration (use code_feedback for syntax/test diagnostics).
magrpo: forwarded to comlrl.trainers.magrpo.MAGRPOTrainer. Includes collaboration (num_agents, param-count assignment), sampling settings (num_generations, num_turns, temperature/top_p), rollout buffering (rollout_buffer_size), optimization hyperparameters, and IO controls.
reward_processor: optional post-processing for rewards (scale, shift).
output: persistence knobs (save final model, output paths, verbose debug prints).

Rewards, Logging, and Evaluation

rewards/CE_reward.py computes structured rewards:
- lv1: syntax score proportional to valid method outputs (range [0, 2]).
- lv2: unit-test bonus based on pass rate (passed/total), scaled to [0, 4].
- lv3: overlap penalty normalized by total methods (range [-1, 0]).
- reward shift: optional post-processing shift via reward_processor.shift.
Tests execute inside per-sample temporary directories to avoid polluted state and are automatically truncated on timeout.
Loggers are inherited from CoMLRL. Enable Weights & Biases by filling wandb.entity or disable it for offline debugging.

Name		Name	Last commit message	Last commit date
Latest commit History 131 Commits
configs		configs
external		external
loggers		loggers
rewards		rewards
train		train
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
demo_cc.gif		demo_cc.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Collaboration - Code Completion

Installation

Dataset: ClassEval

Settings

Rewards, Logging, and Evaluation

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

OpenMLRL/LLM_Collab_Code_Completion

Folders and files

Latest commit

History

Repository files navigation

LLM Collaboration - Code Completion

Installation

Dataset: ClassEval

Settings

Rewards, Logging, and Evaluation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages