Skip to content
View Battam1111's full-sized avatar
🎯
Focusing
🎯
Focusing

Organizations

@polyunlp

Block or report Battam1111

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
battam1111/README.md

Yanjun Chen

PhD student in the Department of Computing at The Hong Kong Polytechnic University.

My research focuses on reinforcement learning with human feedback, reward modeling, reasoning in large language models, and embodied AI. I am interested in building learning-based agents that are more reliable, adaptive, and practically useful.

About

I use this GitHub profile as a concise index to my research code, publications, and academic homepage.

For a fuller academic profile, including publications, scholarly metrics, writing, and contact information, please visit my personal website:

Research Interests

  • Reinforcement learning with human feedback
  • Reward modeling and feedback-driven learning
  • Reasoning in large language models
  • Embodied AI and learning-based agents

Selected Links

Selected Repositories

  • C3 Contextual Counterfactual Credit Assignment for multi-agent reinforcement learning in LLM collaboration.

  • AccuracyParadox-RLHF Research code for studying when stronger reward models do not necessarily yield better RLHF outcomes.

  • battam1111.github.io Source code for my academic homepage and writing.

Contact

Pinned Loading

  1. AccuracyParadox-RLHF AccuracyParadox-RLHF Public

    [EMNLP 2024 Main] Official implementation of the paper "The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models".

    Python 7

  2. C3 C3 Public

    Personal implementation of the paper "Contextual Counterfactual Credit Assignment for Multi-Agent Reinforcement Learning in LLM Collaboration".

    Python 1