自动论文推送（每月更新）

本项目每月自动从 arXiv 获取最新的论文，基于关键词进行筛选。

点击 'Watch' 按钮可以接收每月自动推送的邮件通知。

最后更新：2026-04-02 00:24

本次更新执行命令

D:\a\MyAutoPapers\MyAutoPapers\target\release\my_auto_papers.exe --keywords=
             efficient RL,
             partial observable markov decision process/pomdp,sparse reward reinforcement learning,
             casual RL/counterfactual RL/casual reinforcement learning,
             causal inference/causal discovery/counterfactual reasoning,
             video super resolution,
             knowledge graph/knowledge distillation/knowledge representation/knowledge transfer/knowledge embedding,
             combinatorial game theory/xiangqi/chinese chess,
             code llm,
             speech recognition,
             zero shot tracking/few shot tracking/pose tracking/pose estimation,
             text to 3d/image to 3d/text to texture,
             automated theorem proving/interactive theorem proving/formal verification
              --exclude-keywords=multi-agent,multiagent --per-keyword-max-result=8

参数详解

关键词：efficient RL, partial observable markov decision process/pomdp, sparse reward reinforcement learning, casual RL/counterfactual RL/casual reinforcement learning, causal inference/causal discovery/counterfactual reasoning, video super resolution, knowledge graph/knowledge distillation/knowledge representation/knowledge transfer/knowledge embedding, combinatorial game theory/xiangqi/chinese chess, code llm, speech recognition, zero shot tracking/few shot tracking/pose tracking/pose estimation, text to 3d/image to 3d/text to texture, automated theorem proving/interactive theorem proving/formal verification
排除关键词：multi-agent, multiagent
每关键词最大结果：8
目标领域：cs, stat
每关键词重试次数：3

论文汇总（115篇）

1. efficient RL

序号	标题	日期	摘要
1	Train at Moving Edge: Online-Verified Prompt Selection for Efficient RL Training of Large Reasoning Model	2026-03-26	展开 Reinforcement learning (RL) has become essential for post-training large language models (LLMs) in reasoning tasks. While scaling rollouts can stabilize training and enhance performance, the computational overhead is a critical issue. In algorithms like GRPO, multiple rollouts per prompt incur prohibitive costs, as a large portion of prompts provide negligible gradients and are thus of low utility. To address this problem, we investigate how to select high-utility prompts before the rollout phase. Our experimental analysis reveals that sample utility is non-uniform and evolving: the strongest learning signals concentrate at the ``learning edge", the intersection of intermediate difficulty and high uncertainty, which shifts as training proceeds. Motivated by this, we propose HIVE (History-Informed and online-VErified prompt selection), a dual-stage framework for data-efficient RL. HIVE utilizes historical reward trajectories for coarse selection and employs prompt entropy as a real-time proxy to prune instances with stale utility. By evaluating HIVE across multiple math reasoning benchmarks and models, we show that HIVE yields significant rollout efficiency without compromising performance.
2	End-to-End Efficient RL for Linear Bellman Complete MDPs with Deterministic Transitions	2026-03-24	展开 We study reinforcement learning (RL) with linear function approximation in Markov Decision Processes (MDPs) satisfying \emph{linear Bellman completeness} -- a fundamental setting where the Bellman backup of any linear value function remains linear. While statistically tractable, prior computationally efficient algorithms are either limited to small action spaces or require strong oracle assumptions over the feature space. We provide a computationally efficient algorithm for linear Bellman complete MDPs with \emph{deterministic transitions}, stochastic initial states, and stochastic rewards. For finite action spaces, our algorithm is end-to-end efficient; for large or infinite action spaces, we require only a standard argmax oracle over actions. Our algorithm learns an $\varepsilon$-optimal policy with sample and computational complexity polynomial in the horizon, feature dimension, and $1/\varepsilon$.
3	Reward-Zero: Language Embedding Driven Implicit Reward Mechanisms for Reinforcement Learning	2026-03-10	展开 We introduce Reward-Zero, a general-purpose implicit reward mechanism that transforms natural-language task descriptions into dense, semantically grounded progress signals for reinforcement learning (RL). Reward-Zero serves as a simple yet sophisticated universal reward function that leverages language embeddings for efficient RL training. By comparing the embedding of a task specification with embeddings derived from an agent's interaction experience, Reward-Zero produces a continuous, semantically aligned sense-of-completion signal. This reward supplements sparse or delayed environmental feedback without requiring task-specific engineering. When integrated into standard RL frameworks, it accelerates exploration, stabilizes training, and enhances generalization across diverse tasks. Empirically, agents trained with Reward-Zero converge faster and achieve higher final success rates than conventional methods such as PPO with common reward-shaping baselines, successfully solving tasks that hand-designed rewards could not in some complex tasks. In addition, we develop a mini benchmark for the evaluation of completion sense during task execution via language embeddings. These results highlight the promise of language-driven implicit reward functions as a practical path toward more sample-efficient, generalizable, and scalable RL for embodied agents. Code will be released after peer review.
4	MAPO: Mixed Advantage Policy Optimization for Long-Horizon Multi-Turn Dialogue	2026-03-06	展开 Subjective multi-turn dialogue tasks, such as emotional support, require conversational policies that adapt to evolving user states and optimize long-horizon interaction quality. However, reinforcement learning (RL) for such settings remains challenging due to the absence of reliable process supervision. Outcome-only training collapses credit assignment across turns into a single trajectory-level reward, while naïve turn-level group sampling incurs prohibitive rollout costs in interactive environments. We propose a critic-free and efficient RL algorithm named MAPO that leverages dense process feedback from a judge model and propagates long-horizon effects through Monte Carlo returns. To stabilize optimization, we introduce a mixed advantage estimator that combines turn-level normalization with batch-level normalization, enabling fine-grained yet scalable credit assignment. Across multiple subjective dialogue benchmarks, including EMPA, EmoBench, and EQ-Bench, and model scales ranging from 7B to 32B, our method consistently improves both training stability and final performance over outcome-only GRPO and single-level normalization baselines. On EMPA, we improve rates by up to 9 points and increase dialogue scores by as much as +43.2 over the 7B base model. Despite training only on EMPA-style environments, our approach generalizes well, yielding consistent improvements on unseen emotional-intelligence benchmarks, including up to +4 points on EmoBench and +3.5 on EQ-Bench. Together, these results demonstrate that dense process supervision combined with mixed-level normalization enables effective and scalable RL for subjective, open-ended multi-turn dialogue.
5	PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling	2025-12-02	展开 Consistent image generation requires faithfully preserving identities, styles, and logical coherence across multiple images, which is essential for applications such as storytelling and character design. Supervised training approaches struggle with this task due to the lack of large-scale datasets capturing visual consistency and the complexity of modeling human perceptual preferences. In this paper, we argue that reinforcement learning (RL) offers a promising alternative by enabling models to learn complex and subjective visual criteria in a data-free manner. To achieve this, we introduce PaCo-RL, a comprehensive framework that combines a specialized consistency reward model with an efficient RL algorithm. The first component, PaCo-Reward, is a pairwise consistency evaluator trained on a large-scale dataset constructed via automated sub-figure pairing. It evaluates consistency through a generative, autoregressive scoring mechanism enhanced by task-aware instructions and CoT reasons. The second component, PaCo-GRPO, leverages a novel resolution-decoupled optimization strategy to substantially reduce RL cost, alongside a log-tamed multi-reward aggregation mechanism that ensures balanced and stable reward optimization. Extensive experiments across the two representative subtasks show that PaCo-Reward significantly improves alignment with human perceptions of visual consistency, and PaCo-GRPO achieves state-of-the-art consistency performance with improved training efficiency and stability. Together, these results highlight the promise of PaCo-RL as a practical and scalable solution for consistent image generation. The project page is available at https://x-gengroup.github.io/HomePage_PaCo-RL/.
6	Partially Equivariant Reinforcement Learning in Symmetry-Breaking Environments	2025-11-30	展开 Group symmetries provide a powerful inductive bias for reinforcement learning (RL), enabling efficient generalization across symmetric states and actions via group-invariant Markov Decision Processes (MDPs). However, real-world environments almost never realize fully group-invariant MDPs; dynamics, actuation limits, and reward design usually break symmetries, often only locally. Under group-invariant Bellman backups for such cases, local symmetry-breaking introduces errors that propagate across the entire state-action space, resulting in global value estimation errors. To address this, we introduce Partially group-Invariant MDP (PI-MDP), which selectively applies group-invariant or standard Bellman backups depending on where symmetry holds. This framework mitigates error propagation from locally broken symmetries while maintaining the benefits of equivariance, thereby enhancing sample efficiency and generalizability. Building on this framework, we present practical RL algorithms -- Partially Equivariant (PE)-DQN for discrete control and PE-SAC for continuous control -- that combine the benefits of equivariance with robustness to symmetry-breaking. Experiments across Grid-World, locomotion, and manipulation benchmarks demonstrate that PE-DQN and PE-SAC significantly outperform baselines, highlighting the importance of selective symmetry exploitation for robust and sample-efficient RL. Project page: https://pranaboy72.github.io/perl_page/
7	Masked Auto-Regressive Variational Acceleration: Fast Inference Makes Practical Reinforcement Learning	2025-11-19	展开 Masked auto-regressive diffusion models (MAR) benefit from the expressive modeling ability of diffusion models and the flexibility of masked auto-regressive ordering. However, vanilla MAR suffers from slow inference due to its hierarchical inference mechanism: an outer AR unmasking loop and an inner diffusion denoising chain. Such decoupled structure not only harm the generation efficiency but also hinder the practical use of MAR for reinforcement learning (RL), an increasingly critical paradigm for generative model post-training.To address this fundamental issue, we introduce MARVAL (Masked Auto-regressive Variational Acceleration), a distillation-based framework that compresses the diffusion chain into a single AR generation step while preserving the flexible auto-regressive unmasking order. Such a distillation with MARVAL not only yields substantial inference acceleration but, crucially, makes RL post-training with verifiable rewards practical, resulting in scalable yet human-preferred fast generative models. Our contributions are twofold: (1) a novel score-based variational objective for distilling masked auto-regressive diffusion models into a single generation step without sacrificing sample quality; and (2) an efficient RL framework for masked auto-regressive models via MARVAL-RL. On ImageNet 256*256, MARVAL-Huge achieves an FID of 2.00 with more than 30 times speedup compared with MAR-diffusion, and MARVAL-RL yields consistent improvements in CLIP and image-reward scores on ImageNet datasets with entity names. In conclusion, MARVAL demonstrates the first practical path to distillation and RL of masked auto-regressive diffusion models, enabling fast sampling and better preference alignments.
8	Symmetry-Guided Memory Augmentation for Efficient Locomotion Learning	2025-02-03	展开 Training reinforcement learning (RL) policies for legged locomotion often requires extensive environment interactions, which are costly and time-consuming. We propose Symmetry-Guided Memory Augmentation (SGMA), a framework that improves training efficiency by combining structured experience augmentation with memory-based context inference. Our method leverages robot and task symmetries to generate additional, physically consistent training experiences without requiring extra interactions. To avoid the pitfalls of naive augmentation, we extend these transformations to the policy's memory states, enabling the agent to retain task-relevant context and adapt its behavior accordingly. We evaluate the approach on quadruped and humanoid robots in simulation, as well as on a real quadruped platform. Across diverse locomotion tasks involving joint failures and payload variations, our method achieves efficient policy training while maintaining robust performance, demonstrating a practical route toward data-efficient RL for legged robots.

2. partial observable markov decision process/pomdp

序号	标题	日期	摘要
1	Optimal Control of a Mesoscopic Information Engine	2026-03-31	展开 We analytically solve the finite-time control problem of driving an overdamped particle via an optical trap under costly measurement. By formulating this mesoscopic information engine within the Partially Observable Markov Decision Process (POMDP) framework, we demonstrate that the underlying Linear-Quadratic-Gaussian (LQG) dynamics reduce the optimal measurement and driving protocols to a one-dimensional algebraic Riccati recurrence. From this reduction, we derive the optimal feedback control law for the trap placement, which recovers the discontinuous Schmiedl-Seifert driving protocol in the continuous-time, open-loop limit. We map the operational phase space of the engine, deriving explicit physical bounds on the maximum power that can be extracted from thermal fluctuations. Taking the infinite-horizon limit, we find the exact periodic measurement schedules for the steady-state and derive the macroscopic velocity envelopes beyond which viscous drag forces the engine into a net-dissipative regime. We prove the emergence of deadline-induced blindness, a phenomenon where all measurement ceases as the deadline approaches regardless of their cost. Finally, we generalize the results to a variable-precision sensor.
2	Semantic Interaction Information mediates compositional generalization in latent space	2026-03-28	展开 Are there still barriers to generalization once all relevant variables are known? We address this question via a framework that casts compositional generalization as a variational inference problem over latent variables with parametric interactions. To explore this, we develop the Cognitive Gridworld, a stationary Partially Observable Markov Decision Process (POMDP) where observations are generated jointly by multiple latent variables, yet feedback is provided for only a single goal variable. This setting allows us to define Semantic Interaction Information (SII): a metric measuring the contribution of latent variable interactions to task performance. Using SII, we analyze Recurrent Neural Networks (RNNs) provided with these interactions, finding that SII explains the accuracy gap between Echo State and Fully Trained networks. Our analysis also uncovers a theoretically predicted failure mode where confidence decouples from accuracy, suggesting that utilizing interactions between relevant variables is a non-trivial capability. We then address a harder regime where the interactions must be learned by an embedding model. Learning how latent variables interact requires accurate inference, yet accurate inference depends on knowing those interactions. The Cognitive Gridworld reveals this circular dependence as a core challenge for continual meta-learning. We approach this dilemma via Representation Classification Chains (RCCs), a JEPA-style architecture that disentangles these processes: variable inference and variable embeddings are learned by separate modules through Reinforcement Learning and self-supervised learning, respectively. Lastly, we demonstrate that RCCs facilitate compositional generalization to novel combinations of relevant variables. Together, these results establish a grounded setting for evaluating goal-directed generalist agents.
3	The Myhill-Nerode Theorem for Bounded Interaction: Canonical Abstractions via Agent-Bounded Indistinguishability	2026-03-22	展开 Any capacity-limited observer induces a canonical quotient on its environment: two situations that no bounded agent can distinguish are, for that agent, the same. We formalise this for finite POMDPs. A fixed probe family of finite-state controllers induces a closed-loop Wasserstein pseudometric on observation histories and a probe-exact quotient merging histories that no controller in the family can distinguish. The quotient is canonical, minimal, and unique-a bounded-interaction analogue of the Myhill-Nerode theorem. For clock-aware probes, it is exactly decision-sufficient for objectives that depend only on the agent's observations and actions; for latent-state rewards, we use an observation-Lipschitz approximation bound. The main theorem object is the clock-aware quotient; scalable deterministic-stationary experiments study a tractable coarsening with gap measured on small exact cases and explored empirically at larger scale. We validate theorem-level claims on Tiger and GridWorld. We also report operational case studies on Tiger, GridWorld, and RockSample as exploratory diagnostics of approximation behavior and runtime, not as theorem-facing evidence when no exact cross-family certificate is available; heavier stress tests are archived in the appendix and artifact package.
4	GammaZero: Learning To Guide POMDP Belief Space Search With Graph Representations	2025-10-15	展开 We introduce an uncertainty-aware graph representation framework for learning to guide planning in Partially Observable Markov Decision Processes (POMDPs). Unlike existing approaches that require domain or problem size specific neural architectures, GammaZero leverages a unified graph-based belief representation that enables generalization across problem sizes within a domain. Our key insight is that belief states can be systematically transformed into uncertainty-aware graphs where structural patterns learned on small problems transfer to larger instances. We employ a graph neural network with a decoder architecture to learn value functions and policies from expert demonstrations on computationally tractable problems, then apply these learned heuristics to guide Monte Carlo tree search on larger problems. Experimental results on standard POMDP benchmarks demonstrate that GammaZero achieves comparable performance to BetaZero when trained and tested on the same-sized problems, while enabling zero-shot generalization to problems 2-6x larger than those seen during training.
5	Active Digital Twins via Active Inference	2025-06-17	展开 Digital twins are transforming engineering and applied sciences by enabling real-time monitoring, simulation, and predictive analysis of physical systems and processes. However, conventional digital twins rely primarily on passive data assimilation, which limits their adaptability in uncertain and dynamic environments. This paper introduces the active digital twin paradigm, based on active inference. Active inference is a neuroscience-inspired Bayesian framework for probabilistic reasoning and predictive modeling that unifies inference, decision-making, and learning under a single free energy minimization objective. By modeling the dynamics of the coupled physical--digital system as a partially observable Markov decision process, active digital twins autonomously balance pragmatic exploitation (maximizing goal-directed utility) and epistemic exploration (actively resolving uncertainty). As action becomes an integral part of the inference process, active digital twins actively seek information to maintain synchronization with, and learn from their physical counterparts. The proposed framework is assessed through virtual experiments of structural health monitoring and predictive maintenance of a railway bridge. The application showcases the step-by-step construction of a generative model enabling bidirectional perception--action interaction. The results demonstrate that active digital twins exhibit superior exploration capabilities compared to traditional reactive approaches, enabling enhanced autonomy and resilience.

3. sparse reward reinforcement learning

序号	标题	日期	摘要
1	Adaptive Correlation-Weighted Intrinsic Rewards for Reinforcement Learning	2026-02-27	展开 We propose ACWI (Adaptive Correlation Weighted Intrinsic), an adaptive intrinsic reward scaling framework designed to dynamically balance intrinsic and extrinsic rewards for improved exploration in sparse reward reinforcement learning. Unlike conventional approaches that rely on manually tuned scalar coefficients, which often result in unstable or suboptimal performance across tasks, ACWI learns a state dependent scaling coefficient online. Specifically, ACWI introduces a lightweight Beta Network that predicts the intrinsic reward weight directly from the agent state through an encoder based architecture. The scaling mechanism is optimized using a correlation based objective that encourages alignment between the weighted intrinsic rewards and discounted future extrinsic returns. This formulation enables task adaptive exploration incentives while preserving computational efficiency and training stability. We evaluate ACWI on a suite of sparse reward environments in MiniGrid. Experimental results demonstrate that ACWI consistently improves sample efficiency and learning stability compared to fixed intrinsic reward baselines, achieving superior performance with minimal computational overhead.
2	What Fundamental Structure in Reward Functions Enables Efficient Sparse-Reward Learning?	2025-09-04	展开 Sparse-reward reinforcement learning (RL) remains fundamentally hard: without structure, any agent needs $Ω(
3	LLM-Driven Intrinsic Motivation for Sparse Reward Reinforcement Learning	2025-08-25	展开 This paper explores the combination of two intrinsic motivation strategies to improve the efficiency of reinforcement learning (RL) agents in environments with extreme sparse rewards, where traditional learning struggles due to infrequent positive feedback. We propose integrating Variational State as Intrinsic Reward (VSIMR), which uses Variational AutoEncoders (VAEs) to reward state novelty, with an intrinsic reward approach derived from Large Language Models (LLMs). The LLMs leverage their pre-trained knowledge to generate reward signals based on environment and goal descriptions, guiding the agent. We implemented this combined approach with an Actor-Critic (A2C) agent in the MiniGrid DoorKey environment, a benchmark for sparse rewards. Our empirical results show that this combined strategy significantly increases agent performance and sampling efficiency compared to using each strategy individually or a standard A2C agent, which failed to learn. Analysis of learning curves indicates that the combination effectively complements different aspects of the environment and task: VSIMR drives exploration of new states, while the LLM-derived rewards facilitate progressive exploitation towards goals.
4	SuperRL: Reinforcement Learning with Supervision to Boost Language Model Reasoning	2025-06-01	展开 Large language models are increasingly used for complex reasoning tasks where high-quality offline data such as expert-annotated solutions and distilled reasoning traces are often available. However, in environments with sparse rewards, reinforcement learning struggles to sample successful trajectories, leading to inefficient learning. At the same time, these offline trajectories that represent correct reasoning paths are not utilized by standard on-policy reinforcement learning methods. We introduce SuperRL, a unified training framework that adaptively alternates between RL and SFT. Whenever every rollout for a given instance receives zero reward, indicating the absence of a learning signal, SuperRL falls back to SFT on the curated offline data. Extensive experiments across diverse reasoning benchmarks show that SuperRL surpasses vanilla RL by delivering higher sample efficiency, stronger generalization, and improved robustness under sparse rewards.
5	DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning	2025-05-26	展开 Sparse-reward reinforcement learning (RL) can model a wide range of highly complex tasks. Solving sparse-reward tasks is RL's core premise, requiring efficient exploration coupled with long-horizon credit assignment, and overcoming these challenges is key for building self-improving agents with superhuman ability. Prior work commonly explores with the objective of solving many sparse-reward tasks, making exploration of individual high-dimensional, long-horizon tasks intractable. We argue that solving such challenging tasks requires solving simpler tasks that are relevant to the target task, i.e., whose achieval will teach the agent skills required for solving the target task. We demonstrate that this sense of direction, necessary for effective exploration, can be extracted from existing RL algorithms, without leveraging any prior information. To this end, we propose a method for directed sparse-reward goal-conditioned very long-horizon RL (DISCOVER), which selects exploratory goals in the direction of the target task. We connect DISCOVER to principled exploration in bandits, formally bounding the time until the target task becomes achievable in terms of the agent's initial distance to the target, but independent of the volume of the space of all tasks. We then perform a thorough evaluation in high-dimensional environments. We find that the directed goal selection of DISCOVER solves exploration problems that are beyond the reach of prior state-of-the-art exploration methods in RL.
6	STAR-R1: Spatial TrAnsformation Reasoning by Reinforcing Multimodal LLMs	2025-05-21	展开 Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities across diverse tasks, yet they lag significantly behind humans in spatial reasoning. We investigate this gap through Transformation-Driven Visual Reasoning (TVR), a challenging task requiring identification of object transformations across images under varying viewpoints. While traditional Supervised Fine-Tuning (SFT) fails to generate coherent reasoning paths in cross-view settings, sparse-reward Reinforcement Learning (RL) suffers from inefficient exploration and slow convergence. To address these limitations, we propose STAR-R1, a novel framework that integrates a single-stage RL paradigm with a fine-grained reward mechanism tailored for TVR. Specifically, STAR-R1 rewards partial correctness while penalizing excessive enumeration and passive inaction, enabling efficient exploration and precise reasoning. Comprehensive evaluations demonstrate that STAR-R1 achieves state-of-the-art performance across all 11 metrics, outperforming SFT by 23% in cross-view scenarios. Further analysis reveals STAR-R1's anthropomorphic behavior and highlights its unique ability to compare all objects for improving spatial reasoning. Our work provides critical insights in advancing the research of MLLMs and reasoning models. The codes, model weights, and data will be publicly available at https://github.com/zongzhao23/STAR-R1.
7	Contextual Similarity Distillation: Ensemble Uncertainties with a Single Model	2025-03-14	展开 Uncertainty quantification is a critical aspect of reinforcement learning and deep learning, with numerous applications ranging from efficient exploration and stable offline reinforcement learning to outlier detection in medical diagnostics. The scale of modern neural networks, however, complicates the use of many theoretically well-motivated approaches such as full Bayesian inference. Approximate methods like deep ensembles can provide reliable uncertainty estimates but still remain computationally expensive. In this work, we propose contextual similarity distillation, a novel approach that explicitly estimates the variance of an ensemble of deep neural networks with a single model, without ever learning or evaluating such an ensemble in the first place. Our method builds on the predictable learning dynamics of wide neural networks, governed by the neural tangent kernel, to derive an efficient approximation of the predictive variance of an infinite ensemble. Specifically, we reinterpret the computation of ensemble variance as a supervised regression problem with kernel similarities as regression targets. The resulting model can estimate predictive variance at inference time with a single forward pass, and can make use of unlabeled target-domain data or data augmentations to refine its uncertainty estimates. We empirically validate our method across a variety of out-of-distribution detection benchmarks and sparse-reward reinforcement learning environments. We find that our single-model method performs competitively and sometimes superior to ensemble-based baselines and serves as a reliable signal for efficient exploration. These results, we believe, position contextual similarity distillation as a principled and scalable alternative for uncertainty quantification in reinforcement learning and general deep learning.
8	Dense Dynamics-Aware Reward Synthesis: Integrating Prior Experience with Demonstrations	2024-12-02	展开 Many continuous control problems can be formulated as sparse-reward reinforcement learning (RL) tasks. In principle, online RL methods can automatically explore the state space to solve each new task. However, discovering sequences of actions that lead to a non-zero reward becomes exponentially more difficult as the task horizon increases. Manually shaping rewards can accelerate learning for a fixed task, but it is an arduous process that must be repeated for each new environment. We introduce a systematic reward-shaping framework that distills the information contained in 1) a task-agnostic prior data set and 2) a small number of task-specific expert demonstrations, and then uses these priors to synthesize dense dynamics-aware rewards for the given task. This supervision substantially accelerates learning in our experiments, and we provide analysis demonstrating how the approach can effectively guide online learning agents to faraway goals.

4. casual RL/counterfactual RL/casual reinforcement learning

序号	标题	日期	摘要
1	Should I Trust You? Detecting Deception in Negotiations using Counterfactual RL	2025-02-18	展开 An increasingly common socio-technical problem is people being taken in by offers that sound ``too good to be true'', where persuasion and trust shape decision-making. This paper investigates how \abr{ai} can help detect these deceptive scenarios. We analyze how humans strategically deceive each other in \textit{Diplomacy}, a board game that requires both natural language communication and strategic reasoning. This requires extracting logical forms of proposed agreements in player communications and computing the relative rewards of the proposal using agents' value functions. Combined with text-based features, this can improve our deception detection. Our method detects human deception with a high precision when compared to a Large Language Model approach that flags many true messages as deceptive. Future human-\abr{ai} interaction tools can build on our methods for deception detection by triggering \textit{friction} to give users a chance of interrogating suspicious proposals.
2	Sample-Efficient Reinforcement Learning via Counterfactual-Based Data Augmentation	2020-12-16	展开 Reinforcement learning (RL) algorithms usually require a substantial amount of interaction data and perform well only for specific tasks in a fixed environment. In some scenarios such as healthcare, however, usually only few records are available for each patient, and patients may show different responses to the same treatment, impeding the application of current RL algorithms to learn optimal policies. To address the issues of mechanism heterogeneity and related data scarcity, we propose a data-efficient RL algorithm that exploits structural causal models (SCMs) to model the state dynamics, which are estimated by leveraging both commonalities and differences across subjects. The learned SCM enables us to counterfactually reason what would have happened had another treatment been taken. It helps avoid real (possibly risky) exploration and mitigates the issue that limited experiences lead to biased policies. We propose counterfactual RL algorithms to learn both population-level and individual-level policies. We show that counterfactual outcomes are identifiable under mild conditions and that Q- learning on the counterfactual-based augmented data set converges to the optimal value function. Experimental results on synthetic and real-world data demonstrate the efficacy of the proposed approach.

5. causal inference/causal discovery/counterfactual reasoning

序号	标题	日期	摘要
1	Counterfactual Analysis of Brain Network Dynamics	2026-03-31	展开 Causal inference in brain networks has traditionally relied on regression-based models such as Granger causality, structural equation modeling, and dynamic causal modeling. While effective for identifying directed associations, these methods remain descriptive and acyclic, leaving open the fundamental question of intervention: what would the causal organization become if a pathway were disrupted or externally modulated? We introduce a unified framework for counterfactual causal analysis that models both pathological disruptions and therapeutic interventions as an energy-perturbation problem on network flows. Grounded in Hodge theory, directed communication is decomposed into dissipative and persistent (harmonic) components, enabling systematic analysis of how causal organization reconfigures under hypothetical perturbations. This formulation provides a principled foundation for quantifying network resilience, compensation, and control in complex brain systems.
2	Linear models for causal inference under network interference	2026-03-31	展开 In causal inference, interference occurs when the treatment of one unit may affect the outcomes of other units. The goal of this work is to serve as a guide to the use of linear outcome modeling for estimating causal effects in settings where interference may pose a challenge to identification and estimation, such as spatial and network data. We demonstrate that, under a linear model, causal effects of binary and continuous treatments can be identified in terms of regression coefficients under totally and partially known interference structures. Our work constructs unbiased and consistent point and variance estimators for these effects under one or more possible fixed or random interference networks. A chief advantage is that this approach can be implemented using standard linear regression software, and is easily augmented with random effects and heteroscedastic or autocorrelation consistent standard errors. Numerical experiments and an example data analysis demonstrate the efficacy of this approach in eliminating interference bias.
3	Bell's Inequality, Causal Bounds, and Quantum Bayesian Computation: A Unified Framework	2026-03-30	展开 Bell inequalities characterize the boundary of the local-realist correlation polytope -- the set of joint probability distributions achievable by classical hidden-variable models. Quantum mechanics exceeds this boundary through non-commutativity, reaching the Tsirelson bound $2\sqrt{2}$ for CHSH. We show that this polytope structure is not specific to quantum foundations: it appears identically in the causal inference literature, where the instrumental inequality, the Balke--Pearl linear programming bounds, and the Tian--Pearl probabilities of causation all arise as facets of the same marginal compatibility polytope. Fine's theorem -- that CHSH inequalities hold if and only if a joint distribution exists -- is precisely the pivot: the instrumental variable model in causal inference is structurally equivalent to the Bell local hidden-variable model, with the instrument playing the role of the measurement setting and the latent confounder playing the role of the hidden variable $λ$. We develop this correspondence in detail, extending it to algorithmic (Kolmogorov complexity) and entropic formulations of Bell inequalities, the NPA semidefinite programming hierarchy, and the MIP$^*$=RE undecidability result. We further show that the Born-rule / Bayes-rule duality underlying quantum Bayesian computation exploits the same non-commutativity that enables Bell violation, providing polynomial speedups for posterior inference. The framework yields a concrete dictionary between quantum information theory, causal econometrics, and Bayesian computation, and suggests new directions including NPA-based quantum causal inference algorithms and quantum architectures for function approximation.
4	Position: Explainable AI is Causality in Disguise	2026-03-30	展开 The demand for Explainable AI (XAI) has triggered an explosion of methods, producing a landscape so fragmented that we now rely on surveys of surveys. Yet, fundamental challenges persist: conflicting metrics, failed sanity checks, and unresolved debates over robustness and fairness. The only consensus on how to achieve explainability is a lack of one. This has led many to point to the absence of a ground truth for defining ``the'' correct explanation as the main culprit. This position paper posits that the persistent discord in XAI arises not from an absent ground truth but from a ground truth that exists, albeit as an elusive and challenging target: the causal model that governs the relevant system. By reframing XAI queries about data, models, or decisions as causal inquiries, we prove the necessity and sufficiency of causal models for XAI. We contend that without this causal grounding, XAI remains unmoored. Ultimately, we encourage the community to converge around advanced concept and causal discovery to escape this entrenched uncertainty.
5	Diagnosing and Repairing Unsafe Channels in Vision-Language Models via Causal Discovery and Dual-Modal Safety Subspace Projection	2026-03-28	展开 Large Vision-Language Models (LVLMs) have achieved impressive performance across multimodal understanding and reasoning tasks, yet their internal safety mechanisms remain opaque and poorly controlled. In this work, we present a comprehensive framework for diagnosing and repairing unsafe channels within LVLMs (CARE). We first perform causal mediation analysis to identify neurons and layers that are causally responsible for unsafe behaviors. Based on these findings, we introduce a dual-modal safety subspace projection method that learns generalized safety subspaces for both visual and textual modalities through generalized eigen-decomposition between benign and malicious activations. During inference, activations are dynamically projected toward these safety subspaces via a hybrid fusion mechanism that adaptively balances visual and textual corrections, effectively suppressing unsafe features while preserving semantic fidelity. Extensive experiments on multiple safety benchmarks demonstrate that our causal-subspace repair framework significantly enhances safety robustness without degrading general multimodal capabilities, outperforming prior activation steering and alignment-based baselines. Additionally, our method exhibits good transferability, defending against unseen attacks.
6	Socioeconomic Drivers of Physical Morbidity Across U.S. Counties: A Spatial Causal Inference Approach	2026-03-28	展开 Identifying the causal effects of socioeconomic determinants on population health is of many great interests - from statistical methodology development to public health practitioners and policy developments. The statistical side of the problem needs to address several questions: spatial autocorrelation in both exposures and outcomes, confounding between treatments and covariates, and the need for geographically logical inference. We address these jointly by using spectral basis functions - Moran Eigenvector Maps and ICAR precision matrix eigenvectors - within a doubly robust generalized propensity score estimator for continuous treatments. Applied to 2022 county health data across the U.S. counties, the framework identifies the effect of six chosen predictors on the average physically unhealthy days per month. Possible further applications and methodological extensions are also discussed as future directions from this research.
7	The relative value of interventional and observational samples in Bayesian Causal Linear Gaussian Models	2026-03-27	展开 We investigate the asymptotic properties of Bayesian bivariate causal discovery for Gaussian Linear Structural Equation Models (SEMs) with heteroscedastic noise. We demonstrate that with purely observational data, the posterior distribution over the models fails to consistently identify the true causal structure - a consequence of the fundamental non-identifiability within the Markov Equivalence Class. Specifically, if the true generating mechanism corresponds to a connected graph (A -> B or B -> A), the asymptotic behavior of the posterior is given by the ratio between the prior on the true model and the push-forward prior of the alternative. In contrast, for the independence model, we establish that the posterior concentrates at a stochastic polynomial rate of O_p(n^{-1/2}). To resolve this non-identifiability, we incorporate m interventional samples and characterize the concentration rates as a function of the observational-to-total sample ratio, η. We identify a sharp concentration dichotomy: while the independence graph maintains a polynomial O_p(N^{-1/2}) rate (where N = n+m), connected graphs undergo a phase transition to exponentially fast convergence. This highlights an exponential relative importance between the two data types, as altering the amount of one data type directly changes the exponent governing the concentration speed. We derive explicit formulae for the exponential decay rates and provide precise conditions under which mixing observational and interventional data optimizes concentration speed. Finally, our theoretical findings are validated through empirical simulations in Bayesian Gaussian equivalent (BGe)-style prior specifications offering a principled foundation for experimental design in Bayesian causal discovery.
8	SpatialAnt: Autonomous Zero-Shot Robot Navigation via Active Scene Reconstruction and Visual Anticipation	2026-03-27	展开 Vision-and-Language Navigation (VLN) has recently benefited from Multimodal Large Language Models (MLLMs), enabling zero-shot navigation. While recent exploration-based zero-shot methods have shown promising results by leveraging global scene priors, they rely on high-quality human-crafted scene reconstructions, which are impractical for real-world robot deployment. When encountering an unseen environment, a robot should build its own priors through pre-exploration. However, these self-built reconstructions are inevitably incomplete and noisy, which severely degrade methods that depend on high-quality scene reconstructions. To address these issues, we propose SpatialAnt, a zero-shot navigation framework designed to bridge the gap between imperfect self-reconstructions and robust execution. SpatialAnt introduces a physical grounding strategy to recover the absolute metric scale for monocular-based reconstructions. Furthermore, rather than treating the noisy self-reconstructed scenes as absolute spatial references, we propose a novel visual anticipation mechanism. This mechanism leverages the noisy point clouds to render future observations, enabling the agent to perform counterfactual reasoning and prune paths that contradict human instructions. Extensive experiments in both simulated and real-world environments demonstrate that SpatialAnt significantly outperforms existing zero-shot methods. We achieve a 66% Success Rate (SR) on R2R-CE and 50.8% SR on RxR-CE benchmarks. Physical deployment on a Hello Robot further confirms the efficiency and efficacy of our framework, achieving a 52% SR in challenging real-world settings.
9	Enes Causal Discovery	2026-03-25	展开 Enes The proposed architecture is a mixture of experts, which allows for the model entities, such as the causal relationships, to be further parameterized. More specifically, an attempt is made to exploit a neural net as implementing neurons poses a great challenge for this dataset. To explain, a simple and fast Pearson coefficient linear model usually achieves good scores. An aggressive baseline that requires a really good model to overcome that is. Moreover, there are major limitations when it comes to causal discovery of observational data. Unlike the sachs one did not use interventions but only prior knowledge; the most prohibiting limitation is that of the data which is addressed. Thereafter, the method and the model are described and after that the results are presented.
10	Causal Transfer in Medical Image Analysis	2026-03-25	展开 Medical imaging models frequently fail when deployed across hospitals, scanners, populations, or imaging protocols due to domain shift, limiting their clinical reliability. While transfer learning and domain adaptation address such shifts statistically, they often rely on spurious correlations that break under changing conditions. On the other hand, causal inference provides a principled way to identify invariant mechanisms that remain stable across environments. This survey introduces and systematises Causal Transfer Learning (CTL) for medical image analysis. This paradigm integrates causal reasoning with cross-domain representation learning to enable robust and generalisable clinical AI. We frame domain shift as a causal problem and analyse how structural causal models, invariant risk minimisation, and counterfactual reasoning can be embedded within transfer learning pipelines. We studied spanning classification, segmentation, reconstruction, anomaly detection, and multimodal imaging, and organised them by task, shift type, and causal assumption. A unified taxonomy is proposed that connects causal frameworks and transfer mechanisms. We further summarise datasets, benchmarks, and empirical gains, highlighting when and why causal transfer outperforms correlation-based domain adaptation. Finally, we discuss how CTL supports fairness, robustness, and trustworthy deployment in multi-institutional and federated settings, and outline open challenges and research directions for clinically reliable medical imaging AI.
11	CounterScene: Counterfactual Causal Reasoning in Generative World Models for Safety-Critical Closed-Loop Evaluation	2026-03-22	展开 Generating safety-critical driving scenarios requires understanding why dangerous interactions arise, rather than merely forcing collisions. However, existing methods rely on heuristic adversarial agent selection and unstructured perturbations, lacking explicit modeling of interaction dependencies and thus exhibiting a realism--adversarial trade-off. We present CounterScene, a framework that endows closed-loop generative BEV world models with structured counterfactual reasoning for safety-critical scenario generation. Given a safe scene, CounterScene asks: what if the causally critical agent had behaved differently? To answer this, we introduce causal adversarial agent identification to identify the critical agent and classify conflict types, and develop a conflict-aware interactive world model in which a causal interaction graph is used to explicitly model dynamic inter-agent dependencies. Building on this structure, stage-adaptive counterfactual guidance performs minimal interventions on the identified agent, removing its spatial and temporal safety margins while allowing risk to emerge through natural interaction propagation. Extensive experiments on nuScenes demonstrate that CounterScene achieves the strongest adversarial effectiveness while maintaining superior trajectory realism across all horizons, improving long-horizon collision rate from 12.3% to 22.7% over the strongest baseline with better realism (ADE 1.88 vs.2.09). Notably, this advantage further widens over longer rollouts, and CounterScene generalizes zero-shot to nuPlan with state-of-the-art realism.
12	Efficient Counterfactual Reasoning in ProbLog via Single World Intervention Programs	2026-03-20	展开 Probabilistic Logic Programming (PLP) languages, like ProbLog, naturally support reasoning under uncertainty, while maintaining a declarative and interpretable framework. Meanwhile, counterfactual reasoning (i.e., answering ``what if'' questions) is critical for ensuring AI systems are robust and trustworthy; however, integrating this capability into PLP can be computationally prohibitive and unstable in accuracy. This paper addresses this challenge, by proposing an efficient program transformation for counterfactuals as Single World Intervention Programs (SWIPs) in ProbLog. By systematically splitting ProbLog clauses to observed and fixed components relevant to a counterfactual, we create a transformed program that (1) does not asymptotically exceed the computational complexity of existing methods, and is strictly smaller in common cases, and (2) reduces counterfactual reasoning to marginal inference over a simpler program. We formally prove the correctness of our approach, which relies on a weaker set independence assumptions and is consistent with conditional independencies, showing the resulting marginal probabilities match the counterfactual distributions of the underlying Structural Causal Model in wide domains. Our method achieves a 35% reduction in inference time versus existing methods in extensive experiments. This work makes complex counterfactual reasoning more computationally tractable and reliable, providing a crucial step towards developing more robust and explainable AI systems. The code is at https://github.com/EVIEHub/swip.
13	Using large language models for sensitivity analysis in causal inference: cases studies on Cornfield inequality and E-value	2026-03-15	展开 Sensitivity analysis methods such as the Cornfield inequality and the E-value were developed to assess the robustness of observed associations against unmeasured confounding -- a major challenge in observational studies. However, the calculation and interpretation of these methods can be difficult for clinicians and interdisciplinary researchers. Recent advances in large language models (LLMs) offer accessible tools that could assist sensitivity analyses, but their reliability in this context has not been studied. We assess four widely used LLMs, ChatGPT, Claude, DeepSeek, and Gemini, on their ability to conduct sensitivity analyses using Cornfield inequalities and E-values. We first extract study-specific information (exposures, outcomes, measured confounders, and effect estimates) from four published observational studies in different fields. Using those information, we develop structured prompts to assess the performance of the LLMs in three aspects: (1) accuracy of E-value calculation, (2) qualitative interpretation of robustness to unmeasured confounding, and (3) suggestion of possible unmeasured confounders. To our knowledge, this is the first study to investigate the use of LLMs for sensitivity analysis. The results show that ChatGPT, Claude, and Gemini accurately reproduce the E-values, whereas DeepSeek shows small biases. Qualitative conclusions from all the LLMs align with the magnitude of the E-values and the reported effect sizes, and all models identify biologically and epidemiologically plausible unmeasured confounders. These findings suggest that, when guided by structured prompting, LLMs can effectively assist in evaluating unmeasured confounding, and thereby can support study design and decision-making in observational studies.
14	Scaling Test-Time Robustness of Vision-Language Models via Self-Critical Inference Framework	2026-03-08	展开 The emergence of Large Language Models (LLMs) has driven rapid progress in multi-modal learning, particularly in the development of Large Vision-Language Models (LVLMs). However, existing LVLM training paradigms place excessive reliance on the LLM component, giving rise to two critical robustness challenges: language bias and language sensitivity. To address both issues simultaneously, we propose a novel Self-Critical Inference (SCI) framework that extends Visual Contrastive Decoding by conducting multi-round counterfactual reasoning through both textual and visual perturbations. This process further introduces a new strategy for improving robustness by scaling the number of counterfactual rounds. Moreover, we also observe that failure cases of LVLMs differ significantly across models, indicating that fixed robustness benchmarks may not be able to capture the true reliability of LVLMs. To this end, we propose the Dynamic Robustness Benchmark (DRBench), a model-specific evaluation framework targeting both language bias and sensitivity issues. Extensive experiments show that SCI consistently outperforms baseline methods on DRBench, and that increasing the number of inference rounds further boosts robustness beyond existing single-step counterfactual reasoning methods.
15	Causal Graph Neural Networks for Healthcare	2025-11-04	展开 Healthcare artificial intelligence systems often degrade in performance when deployed across institutions, with documented performance drops and perpetuation of discriminatory patterns embedded in data. This brittleness comes, in part, from learning statistical associations rather than causal mechanisms. Causal graph neural networks address this by combining graph-based representations of biomedical data with causal inference to learn invariant mechanisms instead of just spurious correlations. This Perspective reviews the methodology of structural causal models, disentangled causal representation learning, and techniques for interventional prediction and counterfactual reasoning on graphs. We discuss applications across psychiatric diagnosis and brain network analysis, cancer subtyping with multi-omics causal integration, continuous physiological monitoring, and drug recommendations. These methods provide building blocks for patient-specific Causal Digital Twins that could support in silico clinical experimentation. Remaining challenges include computational costs that preclude real-time deployment, validation challenges that go beyond standard cross-validation, and the risk of causal-washing where methods adopt causal terminology without rigorous evidentiary support. We propose a tiered framework distinguishing causally-inspired architectures from causally-validated discoveries and outline future directions, including scalable causal discovery, multi-modal data integration, and regulatory pathways for these methods. Making practical Causal Digital Twins possible will require an honest assessment of what current methods deliver, sustained collaboration across disciplines, and validation standards that match the strength of the causal claims being made.
16	Local Causal Discovery for Statistically Efficient Causal Inference	2025-10-16	展开 Causal discovery methods can identify valid adjustment sets for causal effect estimation for a pair of target variables, even when the underlying causal graph is unknown. Global causal discovery methods focus on learning the whole causal graph and therefore enable the recovery of optimal adjustment sets, i.e., sets with the lowest asymptotic variance, but they quickly become computationally prohibitive as the number of variables grows. Local causal discovery methods offer a more scalable alternative by focusing on the local neighborhood of the target variables, but are restricted to statistically suboptimal adjustment sets. In this work, we propose Local Optimal Adjustments Discovery (LOAD), a sound and complete causal discovery approach that combines the computational efficiency of local methods with the statistical optimality of global methods. First, LOAD identifies the causal relation between the targets and tests if the causal effect is identifiable by using only local information. If it is identifiable, it finds the possible descendants of the treatment and infers the optimal adjustment set as the parents of the outcome in a modified forbidden projection. Otherwise, it returns the locally valid parent adjustment sets. In our experiments on synthetic and realistic data LOAD outperforms global methods in scalability, while providing more accurate effect estimation than local methods.
17	CausalARC: Abstract Reasoning with Causal World Models	2025-09-03	展开 On-the-fly reasoning often requires adaptation to novel problems under limited data and distribution shift. This work introduces CausalARC: an experimental testbed for AI reasoning in low-data and out-of-distribution regimes, modeled after the Abstraction and Reasoning Corpus (ARC). Each CausalARC reasoning task is sampled from a fully specified causal world model, formally expressed as a structural causal model. Principled data augmentations provide observational, interventional, and counterfactual feedback about the world model in the form of few-shot, in-context learning demonstrations. As a proof-of-concept, we illustrate the use of CausalARC for four language model evaluation settings: (1) abstract reasoning with test-time training, (2) counterfactual reasoning with in-context learning, (3) program synthesis, and (4) causal discovery with logical reasoning. Within- and between-model performance varied heavily across tasks, indicating room for significant improvement in language model reasoning.
18	Flow IV: Counterfactual Inference In Nonseparable Outcome Models Using Instrumental Variables	2025-08-02	展开 To reach human level intelligence, learning algorithms need to incorporate causal reasoning. But identifying causality, and particularly counterfactual reasoning, remains elusive. In this paper, we make progress on counterfactual inference in nonseparable outcome models by utilizing instrumental variables (IVs). IVs are a classic tool for mitigating bias from unobserved confounders when estimating causal effects. While IV methods for effect estimation have been extended to nonseparable outcome models under different assumptions, existing IV approaches to counterfactual prediction typically assume one-dimensional outcomes and additive noise. In this paper, we show that under standard IV assumptions, along with the assumption that the outcome function is invertible and has a triangular structure, then the treatment-outcome relationship becomes identifiable from observed data. We furthermore propose a method to learn the outcome function utilizing normalizing flows. This outcome function estimator can then be used to perform counterfactual inference. We refer to the method as Flow IV.
19	Constructive Instrumental Variable Identification and Inference with Many Weak Interaction Moments	2025-04-18	展开 Instrumental variable methods are widely used for causal inference, but identification becomes especially challenging when instruments are weak and potentially invalid. These challenges are particularly pronounced in Mendelian randomization, where genetic variants serve as instruments and violations of exclusion restriction or independence assumptions are common. We propose MAGIC, a constructive and assumption-lean framework that achieves identification even when all candidate instruments may be invalid. The method exploits pairwise and higher-order interactions among mutually independent instruments to construct moment conditions orthogonal to both unmeasured confounding and direct effects under a linear structural model. The resulting estimation problem involves many potentially weak interaction moments with unknown nuisance parameters. We develop a semiparametric generalized method of moments estimator and introduce a global Neyman orthogonality condition to ensure robustness of both the moment function and its derivative to nuisance estimation under many weak moment asymptotics. We establish consistency and asymptotic normality when the number of moments diverges with sample size and characterize the semiparametric efficiency bound under fixed dimension. Simulations and an application to UK Biobank data illustrate the method.
20	Retrieving Classes of Causal Orders with Inconsistent Knowledge Bases	2024-12-18	展开 Traditional causal discovery methods often depend on strong, untestable assumptions, making them unreliable in real-world applications. In this context, Large Language Models (LLMs) have emerged as a promising alternative for extracting causal knowledge from text-based metadata, effectively consolidating domain expertise. However, LLMs are prone to hallucinations, necessitating strategies that account for these limitations. One effective approach is to use a consistency measure as a proxy of reliability. Moreover, LLMs do not clearly distinguish direct from indirect causal relationships, complicating the discovery of causal Directed Acyclic Graphs (DAGs), which are often sparse. This ambiguity is evident in the way informal sentences are formulated in various domains. For this reason, focusing on causal orders provides a more practical and direct task for LLMs. We propose a new method for deriving abstractions of causal orders that maximizes a consistency score obtained from an LLM. Our approach begins by computing pairwise consistency scores between variables, from which we construct a semi-complete partially directed graph that consolidates these scores into an abstraction. Using this structure, we identify both a maximally oriented partially directed acyclic graph and an optimal set of acyclic tournaments that maximize consistency across all configurations. We further demonstrate how both the abstraction and the class of causal orders can be used to estimate causal effects. We evaluate our method on a wide set of causal DAGs extracted from scientific literature in epidemiology and public health. Our results show that the proposed approach can effectively recover the correct causal order, providing a reliable and practical LLM-assisted causal framework.

6. video super resolution

序号	标题	日期	摘要
1	InstaVSR: Taming Diffusion for Efficient and Temporally Consistent Video Super-Resolution	2026-03-27	展开 Video super-resolution (VSR) seeks to reconstruct high-resolution frames from low-resolution inputs. While diffusion-based methods have substantially improved perceptual quality, extending them to video remains challenging for two reasons: strong generative priors can introduce temporal instability, and multi-frame diffusion pipelines are often too expensive for practical deployment. To address both challenges simultaneously, we propose InstaVSR, a lightweight diffusion framework for efficient video super-resolution. InstaVSR combines three ingredients: (1) a pruned one-step diffusion backbone that removes several costly components from conventional diffusion-based VSR pipelines, (2) recurrent training with flow-guided temporal regularization to improve frame-to-frame stability, and (3) dual-space adversarial learning in latent and pixel spaces to preserve perceptual quality after backbone simplification. On an NVIDIA RTX 4090, InstaVSR processes a 30-frame video at 2K$\times$2K resolution in under one minute with only 7 GB of memory usage, substantially reducing the computational cost compared to existing diffusion-based methods while maintaining favorable perceptual quality with significantly smoother temporal transitions.
2	ScrollScape: Unlocking 32K Image Generation With Video Diffusion Priors	2026-03-25	展开 While diffusion models excel at generating images with conventional dimensions, pushing them to synthesize ultra-high-resolution imagery at extreme aspect ratios (EAR) often triggers catastrophic structural failures, such as object repetition and spatial fragmentation. This limitation fundamentally stems from a lack of robust spatial priors, as static text-to-image models are primarily trained on image distributions with conventional dimensions. To overcome this bottleneck, we present ScrollScape, a novel framework that reformulates EAR image synthesis into a continuous video generation process through two core innovations. By mapping the spatial expansion of a massive canvas to the temporal evolution of video frames, ScrollScape leverages the inherent temporal consistency of video models as a powerful global constraint to ensure long-range structural integrity. Specifically, Scanning Positional Encoding (ScanPE) distributes global coordinates across frames to act as a flexible moving camera, while Scrolling Super-Resolution (ScrollSR) leverages video super-resolution priors to circumvent memory bottlenecks, efficiently scaling outputs to an unprecedented 32K resolution. Fine-tuned on a curated 3K multi-ratio image dataset, ScrollScape effectively aligns pre-trained video priors with the EAR generation task. Extensive evaluations demonstrate that it significantly outperforms existing image-diffusion baselines by eliminating severe localized artifacts. Consequently, our method overcomes inherent structural bottlenecks to ensure exceptional global coherence and visual fidelity across diverse domains at extreme scales.
3	DUO-VSR: Dual-Stream Distillation for One-Step Video Super-Resolution	2026-03-23	展开 Diffusion-based video super-resolution (VSR) has recently achieved remarkable fidelity but still suffers from prohibitive sampling costs. While distribution matching distillation (DMD) can accelerate diffusion models toward one-step generation, directly applying it to VSR often results in training instability alongside degraded and insufficient supervision. To address these issues, we propose DUO-VSR, a three-stage framework built upon a Dual-Stream Distillation strategy that unifies distribution matching and adversarial supervision for one-step VSR. Firstly, a Progressive Guided Distillation Initialization is employed to stabilize subsequent training through trajectory-preserving distillation. Next, the Dual-Stream Distillation jointly optimizes the DMD and Real-Fake Score Feature GAN (RFS-GAN) streams, with the latter providing complementary adversarial supervision leveraging discriminative features from both real and fake score models. Finally, a Preference-Guided Refinement stage further aligns the student with perceptual quality preferences. Extensive experiments demonstrate that DUO-VSR achieves superior visual quality and efficiency over previous one-step VSR approaches.
4	ChopGrad: Pixel-Wise Losses for Latent Video Diffusion via Truncated Backpropagation	2026-03-18	展开 Recent video diffusion models achieve high-quality generation through recurrent frame processing where each frame generation depends on previous frames. However, this recurrent mechanism means that training such models in the pixel domain incurs prohibitive memory costs, as activations accumulate across the entire video sequence. This fundamental limitation also makes fine-tuning these models with pixel-wise losses computationally intractable for long or high-resolution videos. This paper introduces ChopGrad, a truncated backpropagation scheme for video decoding, limiting gradient computation to local frame windows while maintaining global consistency. We provide a theoretical analysis of this approximation and show that it enables efficient fine-tuning with frame-wise losses. ChopGrad reduces training memory from scaling linearly with the number of video frames (full backpropagation) to constant memory, and compares favorably to existing state-of-the-art video diffusion models across a suite of conditional video generation tasks with pixel-wise losses, including video super-resolution, video inpainting, video enhancement of neural-rendered scenes, and controlled driving video generation.
5	SparkVSR: Interactive Video Super-Resolution via Sparse Keyframe Propagation	2026-03-17	展开 Video Super-Resolution (VSR) aims to restore high-quality video frames from low-resolution (LR) estimates, yet most existing VSR approaches behave like black boxes at inference time: users cannot reliably correct unexpected artifacts, but instead can only accept whatever the model produces. In this paper, we propose a novel interactive VSR framework dubbed SparkVSR that makes sparse keyframes a simple and expressive control signal. Specifically, users can first super-resolve or optionally a small set of keyframes using any off-the-shelf image super-resolution (ISR) model, then SparkVSR propagates the keyframe priors to the entire video sequence while remaining grounded by the original LR video motion. Concretely, we introduce a keyframe-conditioned latent-pixel two-stage training pipeline that fuses LR video latents with sparsely encoded HR keyframe latents to learn robust cross-space propagation and refine perceptual details. At inference time, SparkVSR supports flexible keyframe selection (manual specification, codec I-frame extraction, or random sampling) and a reference-free guidance mechanism that continuously balances keyframe adherence and blind restoration, ensuring robust performance even when reference keyframes are absent or imperfect. Experiments on multiple VSR benchmarks demonstrate improved temporal consistency and strong restoration quality, surpassing baselines by up to 24.6%, 21.8%, and 5.6% on CLIP-IQA, DOVER, and MUSIQ, respectively, enabling controllable, keyframe-driven video super-resolution. Moreover, we demonstrate that SparkVSR is a generic interactive, keyframe-conditioned video processing framework as it can be applied out of the box to unseen tasks such as old-film restoration and video style transfer. Our project page is available at: https://sparkvsr.github.io/
6	TextOVSR: Text-Guided Real-World Opera Video Super-Resolution	2026-03-16	展开 Many classic opera videos exhibit poor visual quality due to the limitations of early filming equipment and long-term degradation during storage. Although real-world video super-resolution (RWVSR) has achieved significant advances in recent years, directly applying existing methods to degraded opera videos remains challenging. The difficulties are twofold. First, accurately modeling real-world degradations is complex: simplistic combinations of classical degradation kernels fail to capture the authentic noise distribution, while methods that extract real noise patches from external datasets are prone to style mismatches that introduce visual artifacts. Second, current RWVSR methods, which rely solely on degraded image features, struggle to reconstruct realistic and detailed textures due to a lack of high-level semantic guidance. To address these issues, we propose a Text-guided Dual-Branch Opera Video Super-Resolution (TextOVSR) network, which introduces two types of textual prompts to guide the super-resolution process. Specifically, degradation-descriptive text, derived from the degradation process, is incorporated into the negative branch to constrain the solution space. Simultaneously, content-descriptive text is incorporated into a positive branch and our proposed Text-Enhanced Discriminator (TED) to provide semantic guidance for enhanced texture reconstruction. Furthermore, we design a Degradation-Robust Feature Fusion (DRF) module to facilitate cross-modal feature fusion while suppressing degradation interference. Experiments on our OperaLQ benchmark show that TextOVSR outperforms state-of-the-art methods both qualitatively and quantitatively. The code is available at https://github.com/ChangHua0/TextOVSR.
7	Rethinking Diffusion Model-Based Video Super-Resolution: Leveraging Dense Guidance from Aligned Features	2025-11-21	展开 Diffusion model (DM) based Video Super-Resolution (VSR) approaches achieve impressive perceptual quality. However, they suffer from error accumulation, spatial artifacts, and a trade-off between perceptual quality and fidelity, primarily caused by inaccurate alignment and insufficient compensation between video frames. In this paper, within the DM-based VSR pipeline, we revisit the role of alignment and compensation between adjacent video frames and reveal two crucial observations: (a) the feature domain is better suited than the pixel domain for information compensation due to its stronger spatial and temporal correlations, and (b) warping at an upscaled resolution better preserves high-frequency information, but this benefit is not necessarily monotonic. Therefore, we propose a novel Densely Guided diffusion model with Aligned Features for Video Super-Resolution (DGAF-VSR), with an Optical Guided Warping Module (OGWM) to maintain high-frequency details in the aligned features and a Feature-wise Temporal Condition Module (FTCM) to deliver dense guidance in the feature domain. Extensive experiments on synthetic and real-world datasets demonstrate that DGAF-VSR surpasses state-of-the-art methods in key aspects of VSR, including perceptual quality (35.82% DISTS reduction), fidelity (0.20 dB PSNR gain), and temporal consistency (30.37% tLPIPS reduction).
8	Time-Correlated Video Bridge Matching	2025-10-14	展开 Diffusion models excel in noise-to-data generation tasks, providing a mapping from a Gaussian distribution to a more complex data distribution. However they struggle to model translations between complex distributions, limiting their effectiveness in data-to-data tasks. While Bridge Matching models address this by finding the translation between data distributions, their application to time-correlated data sequences remains unexplored. This is a critical limitation for video generation and manipulation tasks, where maintaining temporal coherence is particularly important. To address this gap, we propose Time-Correlated Video Bridge Matching (TCVBM), a framework that extends BM to time-correlated data sequences in the video domain. TCVBM explicitly models inter-sequence dependencies within the diffusion bridge, directly incorporating temporal correlations into the sampling process. We compare our approach to classical methods based on bridge matching and diffusion models for three video-related tasks: frame interpolation, image-to-video generation, and video super-resolution. TCVBM achieves superior performance across multiple quantitative metrics, demonstrating enhanced generation quality and reconstruction fidelity.

7. knowledge graph/knowledge distillation/knowledge representation/knowledge transfer/knowledge embedding

序号	标题	日期	摘要
1	Multi-paradigm Logic Programming in the ${\cal E}$rgoAI System	2026-03-31	展开 ErgoAI is a high level, multi-paradigm logic programming language and system developed by Coherent Knowledge Systems as an enhancement of and a successor to the popular Flora-2 system. ErgoAI is oriented towards scalable knowledge representation and reasoning, and can exploit both structured knowledge as well as knowledge derived from external sources such as vector embeddings. From the start, ErgoAI (and Flora-2 before it) were designed to exploit the well-founded semantics for reasoning in a multi-paradigm environment, including object-based logic (F-logic) with non-monotonic inheritance; higher order syntax in the style of HiLog; defeasibility of rules; semantically clean transactional updates; extensive use of subgoal delay for handling unsafe queries and for better performance; and optional support for bounded rationality at a module level. Although Flora-2 programs are compiled into XSB and adopt many Prolog features, ErgoAI is altogether a different language and system. Under consideration in Theory and Practice of Logic Programming (TPLP).
2	ENEIDE: A High Quality Silver Standard Dataset for Named Entity Recognition and Linking in Historical Italian	2026-03-31	展开 This paper introduces ENEIDE (Extracting Named Entities from Italian Digital Editions), a silver standard dataset for Named Entity Recognition and Linking (NERL) in historical Italian texts. The corpus comprises 2,111 documents with over 8,000 entity annotations semi-automatically extracted from two scholarly digital editions: Digital Zibaldone, the philosophical diary of the Italian poet Giacomo Leopardi (1798--1837), and Aldo Moro Digitale, the complete works of the Italian politician Aldo Moro (1916--1978). Annotations cover multiple entity types (person, location, organization, literary work) linked to Wikidata identifiers, including NIL entities that cannot be mapped to the knowledge graph. To the best of our knowledge, ENEIDE represents the first multi-domain, publicly available NERL dataset for historical Italian with training, development, and test splits. We present a methodology for semi-automatic annotations extraction from manually curated scholarly digital editions, including quality control and annotation enhancement procedures. Baseline experiments using state-of-the-art models demonstrate the dataset's challenge for NERL and the gap between zero-shot approaches and fine-tuned models. The dataset's diachronic coverage spanning two centuries makes it particularly suitable for temporal entity disambiguation and cross-domain evaluation. ENEIDE is released under a CC BY-NC-SA 4.0 license.
3	Big2Small: A Unifying Neural Network Framework for Model Compression	2026-03-31	展开 With the development of foundational models, model compression has become a critical requirement. Various model compression approaches have been proposed such as low-rank decomposition, pruning, quantization, ergodic dynamic systems, and knowledge distillation, which are based on different heuristics. To elevate the field from fragmentation to a principled discipline, we construct a unifying mathematical framework for model compression grounded in measure theory. We further demonstrate that each model compression technique is mathematically equivalent to a neural network subject to a regularization. Building upon this mathematical and structural equivalence, we propose an experimentally-verified data-free model compression framework, termed \textit{Big2Small}, which translates Implicit Neural Representations (INRs) from data domain to the domain of network parameters. \textit{Big2Small} trains compact INRs to encode the weights of larger models and reconstruct the weights during inference. To enhance reconstruction fidelity, we introduce Outlier-Aware Preprocessing to handle extreme weight values and a Frequency-Aware Loss function to preserve high-frequency details. Experiments on image classification and segmentation demonstrate that \textit{Big2Small} achieves competitive accuracy and compression ratios compared to state-of-the-art baselines.
4	StereoVGGT: A Training-Free Visual Geometry Transformer for Stereo Vision	2026-03-31	展开 Driven by the advancement of 3D devices, stereo vision tasks including stereo matching and stereo conversion have emerged as a critical research frontier. Contemporary stereo vision backbones typically rely on either monocular depth estimation (MDE) models or visual foundation models (VFMs). Crucially, these models are predominantly pretrained without explicit supervision of camera poses. Given that such geometric knowledge is indispensable for stereo vision, the absence of explicit spatial constraints constitutes a significant performance bottleneck for existing architectures. Recognizing that the Visual Geometry Grounded Transformer (VGGT) operates as a foundation model pretrained on extensive 3D priors, including camera poses, we investigate its potential as a robust backbone for stereo vision tasks. Nevertheless, empirical results indicate that its direct application to stereo vision yields suboptimal performance. We observe that VGGT suffers from a more significant degradation of geometric details during feature extraction. Such characteristics conflict with the requirements of binocular stereo vision, thereby constraining its efficacy for relative tasks. To bridge this gap, we propose StereoVGGT, a feature backbone specifically tailored for stereo vision. By leveraging the frozen VGGT and introducing a training-free feature adjustment pipeline, we mitigate geometric degradation and harness the latent camera calibration knowledge embedded within the model. StereoVGGT-based stereo matching network achieved the $1^{st}$ rank among all published methods on the KITTI benchmark, validating that StereoVGGT serves as a highly effective backbone for stereo vision.
5	VACP: Visual Analytics Context Protocol	2026-03-31	展开 The rise of AI agents introduces a fundamental shift in Visual Analytics (VA), in which agents act as a new user group. Current agentic approaches - based on computer vision and raw DOM access - fail to perform VA tasks accurately and efficiently. This paper introduces the Visual Analytics Context Protocol (VACP), a framework designed to make VA applications "agent-ready" that extends generic protocols by explicitly exposing application state, available interactions, and mechanisms for direct execution. To support our context protocol, we contribute a formal specification of AI agent requirements and knowledge representations in VA interfaces. We instantiate VACP as a library compatible with major visualization grammars and web frameworks, enabling augmentation of existing systems and the development of new ones. Our evaluation across representative VA tasks demonstrates that VACP-enabled agents achieve higher success rates in interface interpretation and execution compared to current agentic approaches, while reducing token consumption and latency. VACP closes the gap between human-centric VA interfaces and machine perceivability, ensuring agents can reliably act as collaborative users in VA systems.
6	Semantic Communication for 6G Networks: A Trade-off between Distortion Criticality and Information Representability	2026-03-31	展开 In this work, a self-attention based conditional generative adversarial network (SA-cGAN) framework for the sixth generation (6G) semantic communication system is proposed, explicitly designed to balance the trade-off between distortion criticality and information representability under varying channel conditions. The proposed SA-cGAN model continuously learns compact semantic representations by jointly considering semantic importance, reconstruction distortion, and channel quality, enabling adaptive selection of semantic tokens for transmission. A knowledge graph is integrated to preserve contextual relationships and enhance semantic robustness, particularly in low signal-to-noise ratio (SNR) regimes. The resulting optimization framework incorporates continuous relaxation, submodular semantic selection, and principled constraint handling, allowing efficient semantic resource allocation under bandwidth and multi-constraint conditions. Simulation results show that, although SA-cGAN achieves modest syntactic bilingual evaluation understudy scores at low SNR to approximately 0.72 at 20 dB, it significantly outperforms conventional and JSCC-based schemes in semantic metrics, with semantic similarity, semantic accuracy, and semantic completeness consistently improving above 0.90 with SNR. Additionally, the model exhibits adaptive compression behavior, aggressively reducing redundant content while preserving critical semantic information to maintain fidelity. The convergence of training loss further validates stable and efficient learning of semantic representations. Overall, the results confirm that the proposed SA-cGAN model effectively captures distortion-invariant semantic representations and dynamically adapts transmitted content based on distortion criticality and information representability for meaning-centric communication in future 6G networks.
7	On the limited utility of parallel data for learning shared multilingual representations	2026-03-30	展开 Shared multilingual representations are essential for cross-lingual tasks and knowledge transfer across languages. This study looks at the impact of parallel data, i.e. translated sentences, in pretraining as a signal to trigger representations that are aligned across languages. We train reference models with different proportions of parallel data and show that parallel data seem to have only a minimal effect on the cross-lingual alignment. Based on multiple evaluation methods, we find that the effect is limited to potentially accelerating the representation sharing in the early phases of pretraining, and to decreasing the amount of language-specific neurons in the model. Cross-lingual alignment seems to emerge on similar levels even without the explicit signal from parallel data.
8	Zero-shot Cross-domain Knowledge Distillation: A Case study on YouTube Music	2026-03-30	展开 Knowledge Distillation (KD) has been widely used to improve the quality of latency sensitive models serving live traffic. However, applying KD in production recommender systems with low traffic is challenging: the limited amount of data restricts the teacher model size, and the cost of training a large dedicated teacher may not be justified. Cross-domain KD offers a cost-effective alternative by leveraging a teacher from a data-rich source domain, but introduces unique technical difficulties, as the features, user interfaces, and prediction tasks can significantly differ. We present a case study of using zero-shot cross-domain KD for multi-task ranking models, transferring knowledge from a (100x) large-scale video recommendation platform (YouTube) to a music recommendation application with significantly lower traffic. We share offline and live experiment results and present findings evaluating different KD techniques in this setting across two ranking models on the music app. Our results demonstrate that zero-shot cross-domain KD is a practical and effective approach to improve the performance of ranking models on low traffic surfaces.
9	Graphilosophy: Graph-Based Digital Humanities Computing with The Four Books	2026-03-30	展开 The Four Books have shaped East Asian intellectual traditions, yet their multi-layered interpretive complexity limits their accessibility in the digital age. While traditional bilingual commentaries provide a vital pedagogical bridge, computational frameworks are needed to preserve and explore this wisdom. This paper bridges AI and classical philosophy by introducing Graphilosophy, an ontology-guided, multi-layered knowledge graph framework for modeling and interpreting The Four Books. Integrating natural language processing, multilingual semantic embeddings, and humanistic analysis, the framework transforms a bilingual Chinese-Vietnamese corpus into an interpretively grounded resource. Graphilosophy encodes linguistic, conceptual, and interpretive relationships across interconnected layers, enabling cross-lingual retrieval and AI-assisted reasoning while explicitly preserving scholarly nuance and interpretive plurality. The system also enables non-expert users to trace the evolution of ethical concepts across borders and languages, ensuring that ancient wisdom remains a living resource for modern moral discourse rather than a static relic of the past. Through an interactive interface, users can trace the evolution of ethical concepts across languages, ensuring ancient wisdom remains relevant for modern discourse. A preliminary user study suggests the system's capacity to enhance conceptual understanding and cross-cultural learning. By linking algorithmic representation with ethical inquiry, this research exemplifies how AI can serve as a methodological bridge, accommodating the ambiguity of cultural heritage rather than reducing it to static data. The Source code and data are released at https://github.com/ThuDoMinh1102/confucian-texts-knowledge-graph.
10	ChemCLIP: Bridging Organic and Inorganic Anticancer Compounds Through Contrastive Learning	2026-03-30	展开 The discovery of anticancer therapeutics has traditionally treated organic small molecules and metal-based coordination complexes as separate chemical domains, limiting knowledge transfer despite their shared biological objectives. This disparity is particularly pronounced in available data, with extensive screening databases for organic compounds compared to only a few thousand characterized metal complexes. Here, we introduce ChemCLIP, a dual-encoder contrastive learning framework that bridges this organic-inorganic divide by learning unified representations based on shared anticancer activities rather than structural similarity. We compiled complementary datasets comprising 44,854 unique organic compounds and 5,164 unique metal complexes, standardized across 60 cancer cell lines. By training parallel encoders with activity-aware hard negative mining, we mapped structurally distinct compounds into a shared 256-dimensional embedding space where biologically similar compounds cluster together regardless of chemical class. We systematically evaluated four molecular encoding strategies: Morgan fingerprints, ChemBERTa, MolFormer, and Chemprop, through quantitative alignment metrics, embedding visualizations, and downstream classification tasks. Morgan fingerprints achieved superior performance with an average alignment ratio of 0.899 and downstream classification AUCs of 0.859 (inorganic) and 0.817 (organic). This work establishes contrastive learning as an effective strategy for unifying disparate chemical domains and provides empirical guidance for encoder selection in multi-modal chemistry applications, with implications extending beyond anticancer drug discovery to any scenario requiring cross-domain chemical knowledge transfer.
11	GraphWalker: Agentic Knowledge Graph Question Answering via Synthetic Trajectory Curriculum	2026-03-30	展开 Agentic knowledge graph question answering (KGQA) requires an agent to iteratively interact with knowledge graphs (KGs), posing challenges in both training data scarcity and reasoning generalization. Specifically, existing approaches often restrict agent exploration: prompting-based methods lack autonomous navigation training, while current training pipelines usually confine reasoning to predefined trajectories. To this end, this paper proposes \textit{GraphWalker}, a novel agentic KGQA framework that addresses these challenges through \textit{Automated Trajectory Synthesis} and \textit{Stage-wise Fine-tuning}. GraphWalker adopts a two-stage SFT training paradigm: First, the agent is trained on structurally diverse trajectories synthesized from constrained random-walk paths, establishing a broad exploration prior over the KG; Second, the agent is further fine-tuned on a small set of expert trajectories to develop reflection and error recovery capabilities. Extensive experiments demonstrate that our stage-wise SFT paradigm unlocks a higher performance ceiling for a lightweight reinforcement learning (RL) stage, enabling GraphWalker to achieve state-of-the-art performance on CWQ and WebQSP. Additional results on GrailQA and our constructed GraphWalkerBench confirm that GraphWalker enhances generalization to out-of-distribution reasoning paths. The code is publicly available at https://github.com/XuShuwenn/GraphWalker
12	TIEG-Youpu Solution for NeurIPS 2022 WikiKG90Mv2-LSC	2026-03-30	展开 WikiKG90Mv2 in NeurIPS 2022 is a large encyclopedic knowledge graph. Embedding knowledge graphs into continuous vector spaces is important for many practical applications, such as knowledge acquisition, question answering, and recommendation systems. Compared to existing knowledge graphs, WikiKG90Mv2 is a large scale knowledge graph, which is composed of more than 90 millions of entities. Both efficiency and accuracy should be considered when building graph embedding models for knowledge graph at scale. To this end, we follow the retrieve then re-rank pipeline, and make novel modifications in both retrieval and re-ranking stage. Specifically, we propose a priority infilling retrieval model to obtain candidates that are structurally and semantically similar. Then we propose an ensemble based re-ranking model with neighbor enhanced representations to produce final link prediction results among retrieved candidates. Experimental results show that our proposed method outperforms existing baseline methods and improves MRR of validation set from 0.2342 to 0.2839.
13	JW-VL: A Vision-Language Model for Solar Physics	2026-03-30	展开 Vision-Language Models (VLMs) have achieved breakthrough progress in general knowledge domains, yet adaptation to specialized scientific fields remains challenging due to multimodal representation shifts and the limited integration of domain-specific knowledge. To address the limitations of general-purpose VLMs when applied to solar physics image recognition, analysis, and reasoning, we propose JinWu Vision-Language (JW-VL), a fine-tuned foundation model tailored for solar physics. The model integrates multi-wavelength observational data from both space-based and ground-based telescopes, encompassing representative spectral bands spanning the photosphere, chromosphere, and corona. Built upon a cross-modal alignment knowledge distillation framework, JW-VL learns a joint visual-semantic embedding that enables end-to-end modeling from raw solar observational data to downstream tasks, including solar image recognition, solar activity analysis via image-based question answering, and optical character recognition (OCR), while also supporting the construction of a multi-band, cross-instrument solar image benchmark dataset. Furthermore, as a demonstration of interdisciplinary applicability, we developed a "Daily Solar Activity Reports" agent comprising core modules for solar activity level assessment, significant active region characterization, magnetic field complexity analysis, potential space weather impact assessment, and identifying active regions for targeted observation. While JW-VL may not yet meet the rigorous, high-precision demands of operational solar physics, it bridges raw observations and diverse downstream tasks, establishing a valuable methodological framework for applying multimodal deep learning to the field.
14	Building evidence-based knowledge graphs from full-text literature for disease-specific biomedical reasoning	2026-03-30	展开 Biomedical knowledge resources often either preserve evidence as unstructured text or compress it into flat triples that omit study design, provenance, and quantitative support. Here we present EvidenceNet, a framework and dataset for building disease-specific knowledge graphs from full-text biomedical literature. EvidenceNet uses a large language model (LLM)-assisted pipeline to extract experimentally grounded findings as structured evidence nodes, normalize biomedical entities, score evidence quality, and connect evidence records through typed semantic relations. We release two resources: EvidenceNet-HCC with 7,872 evidence records, 10,328 graph nodes, and 49,756 edges, and EvidenceNet-CRC with 6,622 records, 8,795 nodes, and 39,361 edges. Technical validation shows high component fidelity, including 98.3% field-level extraction accuracy, 100.0% high-confidence entity-link accuracy, 87.5% fusion integrity, and 90.0% semantic relation-type accuracy. In downstream evaluation, EvidenceNet improves internal and external retrieval-augmented question answering and retains structural signal for future link prediction and target prioritization. These results establish EvidenceNet as a disease-specific resource for evidence-aware biomedical reasoning and hypothesis generation.
15	Uniform Interpolation in Distributed Knowledge Modal Logics	2026-03-30	展开 Uniform interpolation is the property that, for any formula and set of atoms, there exists the strongest consequence omitting those atoms. It plays a central role in knowledge representation and reasoning tasks such as knowledge update and information hiding. This paper studies the uniform interpolation property in epistemic modal logics with distributed knowledge, which captures agents' collective reasoning abilities. Building on the bisimulation-quantifier perspective, we extend the canonical-formula and literal-elimination framework of Fang, Liu, and van Ditmarsch to distributed knowledge settings and introduce the concept of collective $p$-bisimulation. We show that, for distributed knowledge modal logics $\mathsf{K}_n\mathbf{D}$, $\mathsf{D}_n\mathbf{D}$, and $\mathsf{T}_n\mathbf{D}$, every satisfiable canonical formula's uniform interpolant omitting an atom $p$ is exactly its remainder of eliminating $p$. Then, we provide a finer analysis for the transitive and Euclidean systems $\mathsf{K45}_n\mathbf{D}$, $\mathsf{KD45}_n\mathbf{D}$, and $\mathsf{S5}_n\mathbf{D}$, and prove that every formula of modal depth $k + 1$ has a uniform interpolant of modal depth $2 k + 1$. Thus, we prove the uniform interpolation property in all the six distributed knowledge modal logics. Finally, we generalize the results to some variants with propositional common knowledge and discuss the method's limitations.
16	From Foundation ECG Models to NISQ Learners: Distilling ECGFounder into a VQC Student	2026-03-28	展开 Foundation models have recently improved electrocardiogram (ECG) representation learning, but their deployment can be limited by computational cost and latency constraints. In this work, we fine-tune ECGFounder as a high-capacity teacher for binary ECG classification on PTB-XL and the MIT-BIH Arrhythmia Database, and investigate whether knowledge distillation can transfer its predictive behavior to compact students. We evaluate two classical 1D students (ResNet-1D and a lightweight CNN-1D) and a quantum-ready pipeline that combines a convolutional autoencoder, which compresses 256-sample ECG windows into a low-dimensional latent representation, with a 6-qubit variational quantum circuit implemented in Qiskit and executed in a simulated backend. Across both datasets, the teacher provides the strongest overall performance, while distillation yields competitive students under a considerable reduction in trainable parameters. We further analyze the sensitivity of student performance to distillation settings, highlighting consistent accuracy--efficiency trade-offs when compressing a foundation ECG model into classical and quantum-ready learners under a unified evaluation protocol.
17	Multimodal Dataset Distillation via Phased Teacher Models	2026-03-26	展开 Multimodal dataset distillation aims to construct compact synthetic datasets that enable efficient compression and knowledge transfer from large-scale image-text data. However, existing approaches often fail to capture the complex, dynamically evolving knowledge embedded in the later training stages of teacher models. This limitation leads to degraded student performance and compromises the quality of the distilled data. To address critical challenges such as pronounced cross-stage performance gaps and unstable teacher trajectories, we propose Phased Teacher Model with Shortcut Trajectory (PTM-ST) -- a novel phased distillation framework. PTM-ST leverages stage-aware teacher modeling and a shortcut-based trajectory construction strategy to accurately fit the teacher's learning dynamics across distinct training phases. This enhances both the stability and expressiveness of the distillation process. Through theoretical analysis and comprehensive experiments, we show that PTM-ST significantly mitigates optimization oscillations and inter-phase knowledge gaps, while also reducing storage overhead. Our method consistently surpasses state-of-the-art baselines on Flickr30k and COCO, achieving up to 13.5% absolute improvement and an average gain of 9.53% on Flickr30k. Code: https://github.com/Previsior/PTM-ST.
18	Steering LLMs for Culturally Localized Generation	2026-03-24	展开 LLMs are deployed globally, yet produce responses biased towards cultures with abundant training data. Existing cultural localization approaches such as prompting or post-training alignment are black-box, hard to control, and do not reveal whether failures reflect missing knowledge or poor elicitation. In this paper, we address these gaps using mechanistic interpretability to uncover and manipulate cultural representations in LLMs. Leveraging sparse autoencoders, we identify interpretable features that encode culturally salient information and aggregate them into Cultural Embeddings (CuE). We use CuE both to analyze implicit cultural biases under underspecified prompts and to construct white-box steering interventions. Across multiple models, we show that CuE-based steering increases cultural faithfulness and elicits significantly rarer, long-tail cultural concepts than prompting alone. Notably, CuE-based steering is complementary to black-box localization methods, offering gains when applied on top of prompt-augmented inputs. This also suggests that models do benefit from better elicitation strategies, and don't necessarily lack long-tail knowledge representation, though this varies across cultures. Our results provide both diagnostic insight into cultural representations in LLMs and a controllable method to steer towards desired cultures.
19	Conformal Cross-Modal Active Learning	2026-03-24	展开 Foundation models for vision have transformed visual recognition with powerful pretrained representations and strong zero-shot capabilities, yet their potential for data-efficient learning remains largely untapped. Active Learning (AL) aims to minimize annotation costs by strategically selecting the most informative samples for labeling, but existing methods largely overlook the rich multimodal knowledge embedded in modern vision-language models (VLMs). We introduce Conformal Cross-Modal Acquisition (CCMA), a novel AL framework that bridges vision and language modalities through a teacher-student architecture. CCMA employs a pretrained VLM as a teacher to provide semantically grounded uncertainty estimates, conformally calibrated to guide sample selection for a vision-only student model. By integrating multimodal conformal scoring with diversity-aware selection strategies, CCMA achieves superior data efficiency across multiple benchmarks. Our approach consistently outperforms state-of-the-art AL baselines, demonstrating clear advantages over methods relying solely on uncertainty or diversity metrics.
20	Dual-Teacher Distillation with Subnetwork Rectification for Black-Box Domain Adaptation	2026-03-24	展开 Assuming that neither source data nor the source model is accessible, black box domain adaptation represents a highly practical yet extremely challenging setting, as transferable information is restricted to the predictions of the black box source model, which can only be queried using target samples. Existing approaches attempt to extract transferable knowledge through pseudo label refinement or by leveraging external vision language models (ViLs), but they often suffer from noisy supervision or insufficient utilization of the semantic priors provided by ViLs, which ultimately hinder adaptation performance. To overcome these limitations, we propose a dual teacher distillation with subnetwork rectification (DDSR) model that jointly exploits the specific knowledge embedded in black box source models and the general semantic information of a ViL. DDSR adaptively integrates their complementary predictions to generate reliable pseudo labels for the target domain and introduces a subnetwork driven regularization strategy to mitigate overfitting caused by noisy supervision. Furthermore, the refined target predictions iteratively enhance both the pseudo labels and ViL prompts, enabling more accurate and semantically consistent adaptation. Finally, the target model is further optimized through self training with classwise prototypes. Extensive experiments on multiple benchmark datasets validate the effectiveness of our approach, demonstrating consistent improvements over state of the art methods, including those using source data or models.
21	KG-Hopper: Empowering Compact Open LLMs with Knowledge Graph Reasoning via Reinforcement Learning	2026-03-22	展开 Large Language Models (LLMs) demonstrate impressive natural language capabilities but often struggle with knowledge-intensive reasoning tasks. Knowledge Base Question Answering (KBQA), which leverages structured Knowledge Graphs (KGs) exemplifies this challenge due to the need for accurate multi-hop reasoning. Existing approaches typically perform sequential reasoning steps guided by predefined pipelines, restricting flexibility and causing error cascades due to isolated reasoning at each step. To address these limitations, we propose KG-Hopper, a novel Reinforcement Learning (RL) framework that empowers compact open LLMs with the ability to perform integrated multi-hop KG reasoning within a single inference round. Rather than reasoning step-by-step, we train a Reasoning LLM that embeds the entire KG traversal and decision process into a unified ``thinking'' stage, enabling global reasoning over cross-step dependencies and dynamic path exploration with backtracking. Experimental results on eight KG reasoning benchmarks show that KG-Hopper, based on a 7B-parameter LLM, consistently outperforms larger multi-step systems (up to 70B) and achieves competitive performance with proprietary models such as GPT-3.5-Turbo and GPT-4o-mini, while remaining compact, open, and data-efficient. The code is publicly available at: https://github.com/Wangshuaiia/KG-Hopper.
22	PCA-Based Interpretable Knowledge Representation and Analysis of Geometric Design Parameters	2026-03-18	展开 In many CAD-based applications, complex geometries are defined by a high number of design parameters. This leads to high-dimensional design spaces that are challenging for downstream engineering processes like simulations, optimization, and design exploration tasks. Therefore, dimension reduction methods such as principal component analysis (PCA) are used. The PCA identifies dominant modes of geometric variation and yields a compact representation of the geometry. While classical PCA excels in the compact representation part, it does not directly recover underlying design parameters of a generated geometry. In this work, we deal with the problem of estimating design parameters from PCA-based representations. Analyzing a recent modification of the PCA dedicated to our field of application, we show that the results are actually identical to the standard PCA. We investigate limitations of this approach and present reasonable conditions under which accurate, interpretable parameter estimation can be obtained. With the help of dedicated experiments, we take a more in-depth look at every stage of the PCA and the possible changes of the geometry during these processes.
23	Deconfounded Lifelong Learning for Autonomous Driving via Dynamic Knowledge Spaces	2026-03-15	展开 End-to-End autonomous driving (E2E-AD) systems face challenges in lifelong learning, including catastrophic forgetting, difficulty in knowledge transfer across diverse scenarios, and spurious correlations between unobservable confounders and true driving intents. To address these issues, we propose DeLL, a Deconfounded Lifelong Learning framework that integrates a Dirichlet process mixture model (DPMM) with the front-door adjustment mechanism from causal inference. The DPMM is employed to construct two dynamic knowledge spaces: a trajectory knowledge space for clustering explicit driving behaviors and an implicit feature knowledge space for discovering latent driving abilities. Leveraging the non-parametric Bayesian nature of DPMM, our framework enables adaptive expansion and incremental updating of knowledge without predefining the number of clusters, thereby mitigating catastrophic forgetting. Meanwhile, the front-door adjustment mechanism utilizes the DPMM-derived knowledge as valid mediators to deconfound spurious correlations, such as those induced by sensor noise or environmental changes, and enhances the causal expressiveness of the learned representations. Additionally, we introduce an evolutionary trajectory decoder that enables non-autoregressive planning. To evaluate the lifelong learning performance of E2E-AD, we propose new evaluation protocols and metrics based on Bench2Drive. Extensive evaluations in the closed-loop CARLA simulator demonstrate that our framework significantly improves adaptability to new driving scenarios and overall driving performance, while effectively retaining previous acquired knowledge.
24	Deep Expert Injection for Anchoring Retinal VLMs with Domain-Specific Knowledge	2026-03-07	展开 Large Vision Language Models (LVLMs) show immense potential for automated ophthalmic diagnosis. However, their clinical deployment is severely hindered by lacking domain-specific knowledge. In this work, we identify two structural deficiencies hindering reliable medical reasoning: 1) the Perception Gap, where general-purpose visual encoders fail to resolve fine-grained pathological cues (e.g., microaneurysms); and 2) the Reasoning Gap, where sparse visual evidence is progressively overridden by massive language priors in deeper transformer layers, leading to ungrounded hallucinations. To bridge these gaps, we propose EyExIn, a data-efficient framework designed to anchor retinal VLMs with expert knowledge via a Deep Expert Injection mechanism. Our architecture employs an Expert-Aware Dual-Stream encoding strategy that decouples visual representation into a general stream for anatomical context and a specialized expert stream for pathological semantics. To ensure high-fidelity integration, we design a Semantic-Adaptive Gated Fusion module, which dynamically amplifies subtle lesion signals while filtering irrelevant background noise. Furthermore, we introduce Adaptive Deep Expert Injection to embed persistent "Vision Anchors" by integrating fused visual features as residual biases directly into intermediate LLM layers. This mechanism creates a visual shortcut that forces the reasoning stack to remain strictly grounded in visual evidence. Extensive experiments across four benchmarks demonstrate that our model consistently outperforms massive proprietary systems. EyExIn significantly enhances domain-specific knowledge embedding and achieves state-of-the-art precision in ophthalmic visual question answering, advancing the development of trustworthy ophthalmic AI.
25	Multi-Modal Representation Learning via Semi-Supervised Rate Reduction for Generalized Category Discovery	2026-02-23	展开 Generalized Category Discovery (GCD) aims to identify both known and unknown categories, with only partial labels given for the known categories, posing a challenging open-set recognition problem. State-of-the-art approaches for GCD task are usually built on multi-modality representation learning, which is heavily dependent upon inter-modality alignment. However, few of them cast a proper intra-modality alignment to generate a desired underlying structure of representation distributions. In this paper, we propose a novel and effective multi-modal representation learning framework for GCD via Semi-Supervised Rate Reduction, called SSR$^2$-GCD, to learn cross-modality representations with desired structural properties based on emphasizing to properly align intra-modality relationships. Moreover, to boost knowledge transfer, we integrate prompt candidates by leveraging the inter-modal alignment offered by Vision Language Models. We conduct extensive experiments on generic and fine-grained benchmark datasets demonstrating superior performance of our approach.
26	Enhancing Automatic Chord Recognition via Pseudo-Labeling and Knowledge Distillation	2026-02-23	展开 Automatic Chord Recognition (ACR) is constrained by the scarcity of aligned chord labels, as well-aligned annotations are costly to acquire. At the same time, open-weight pre-trained models are currently more accessible than their proprietary training data. In this work, we present a two-stage training pipeline that leverages pre-trained models together with unlabeled audio. The proposed method decouples training into two stages. In the first stage, we use a pre-trained BTC model as a teacher to generate pseudo-labels for over 1,000 hours of diverse unlabeled audio and train a student model solely on these pseudo-labels. In the second stage, the student is continually trained on ground-truth labels as they become available. To prevent catastrophic forgetting of the representations learned in the first stage, we apply selective knowledge distillation (KD) from the teacher as a regularizer. In our experiments, two models (BTC, 2E1D) were used as students. In stage 1, using only pseudo-labels, the BTC student achieves over 99% of the teacher's performance, while the 2E1D model achieves about 97% across seven standard mir_eval metrics. After a single training run for both students in stage 2, the resulting BTC student model surpasses the traditional supervised learning baseline by 2.5% and the original pre-trained teacher model by 1.1-3.2% across all metrics. The resulting 2E1D student model improves over the traditional supervised learning baseline by 2.67% on average and achieves almost the same performance as the teacher. Both cases show large gains on rare chord qualities.
27	Symphonym: Universal Phonetic Embeddings for Cross-Script Name Matching	2026-01-11	展开 Matching place names across writing systems is a persistent obstacle to the integration of multilingual geographic sources, whether modern gazetteers, medieval itineraries, or colonial-era surveys. Existing approaches depend on language-specific phonetic algorithms or romanisation steps that discard phonetic information, and none generalises across script boundaries. This paper presents Symphonym, a neural embedding system which maps toponyms from twenty writing systems into a unified 128-dimensional phonetic space, enabling direct cross-script similarity comparison without language identification or phonetic resources at inference time. A Teacher-Student knowledge distillation architecture first learns from articulatory phonetic features derived from IPA transcriptions, then transfers this knowledge to a character-level Student model. Trained on 32.7 million triplet samples drawn from 67 million toponyms spanning GeoNames, Wikidata, and the Getty Thesaurus of Geographic Names, the Student achieves the highest Recall@1 (85.2%) and Mean Reciprocal Rank (90.8%) on the MEHDIE cross-script benchmark -- medieval Hebrew and Arabic toponym matches curated by domain experts and entirely independent of the training data -- demonstrating cross-temporal generalisation from modern training material to pre-modern sources. An ablation using raw articulatory features alone yields only 45.0% MRR, confirming the contribution of the neural training curriculum. The approach naturally handles pre-standardisation orthographic variation characteristic of historical documents, and transfers effectively to personal names in archival sources, suggesting broad applicability to name resolution tasks in digital humanities and linked open data contexts.
28	CMV-Fuse: Cross Modal-View Fusion of AMR, Syntax, and Knowledge Representations for Aspect Based Sentiment Analysis	2025-12-07	展开 Natural language understanding inherently depends on integrating multiple complementary perspectives spanning from surface syntax to deep semantics and world knowledge. However, current Aspect-Based Sentiment Analysis (ABSA) systems typically exploit isolated linguistic views, thereby overlooking the intricate interplay between structural representations that humans naturally leverage. We propose CMV-Fuse, a Cross-Modal View fusion framework that emulates human language processing by systematically combining multiple linguistic perspectives. Our approach systematically orchestrates four linguistic perspectives: Abstract Meaning Representations, constituency parsing, dependency syntax, and semantic attention, enhanced with external knowledge integration. Through hierarchical gated attention fusion across local syntactic, intermediate semantic, and global knowledge levels, CMV-Fuse captures both fine-grained structural patterns and broad contextual understanding. A novel structure aware multi-view contrastive learning mechanism ensures consistency across complementary representations while maintaining computational efficiency. Extensive experiments demonstrate substantial improvements over strong baselines on standard benchmarks, with analysis revealing how each linguistic view contributes to more robust sentiment analysis.
29	Manual2Skill++: Connector-Aware General Robotic Assembly from Instruction Manuals via Vision-Language Models	2025-10-18	展开 Assembly hinges on reliably forming connections between parts; yet most robotic approaches plan assembly sequences and part poses while treating connectors as an afterthought. Connections represent the foundational physical constraints of assembly execution; while task planning sequences operations, the precise establishment of these constraints ultimately determines assembly success. In this paper, we treat connections as explicit, primary entities in assembly representation, directly encoding connector types, specifications, and locations for every assembly step. Drawing inspiration from how humans learn assembly tasks through step-by-step instruction manuals, we present Manual2Skill++, a vision-language framework that automatically extracts structured connection information from assembly manuals. We encode assembly tasks as hierarchical graphs where nodes represent parts and sub-assemblies, and edges explicitly model connection relationships between components. A large-scale vision-language model parses symbolic diagrams and annotations in manuals to instantiate these graphs, leveraging the rich connection knowledge embedded in human-designed instructions. We curate a dataset containing over 20 assembly tasks with diverse connector types to validate our representation extraction approach, and evaluate the complete task understanding-to-execution pipeline across four complex assembly scenarios in simulation, spanning furniture, toys, and manufacturing components with real-world correspondence. More detailed information can be found at https://nus-lins-lab.github.io/Manual2SkillPP/
30	Unlocking 3D Affordance Segmentation with 2D Semantic Knowledge	2025-10-09	展开 Affordance segmentation aims to decompose 3D objects into parts that serve distinct functional roles, enabling models to reason about object interactions rather than mere recognition. Existing methods, mostly following the paradigm of 3D semantic segmentation or prompt-based frameworks, struggle when geometric cues are weak or ambiguous, as sparse point clouds provide limited functional information. To overcome this limitation, we leverage the rich semantic knowledge embedded in large-scale 2D Vision Foundation Models (VFMs) to guide 3D representation learning through a cross-modal alignment mechanism. Specifically, we propose Cross-Modal Affinity Transfer (CMAT), a pretraining strategy that compels the 3D encoder to align with the semantic structures induced by lifted 2D features. CMAT is driven by a core affinity alignment objective, supported by two auxiliary losses, geometric reconstruction and feature diversity, which together encourage structured and discriminative feature learning. Built upon the CMAT-pretrained backbone, we employ a lightweight affordance segmentor that injects text or visual prompts into the learned 3D space through an efficient cross-attention interface, enabling dense and prompt-aware affordance prediction while preserving the semantic organization established during pretraining. Extensive experiments demonstrate consistent improvements over previous state-of-the-art methods in both accuracy and efficiency.
31	Multi-Level Knowledge Distillation and Dynamic Self-Supervised Learning for Continual Learning	2025-08-18	展开 Class-incremental with repetition (CIR), where previously trained classes repeatedly introduced in future tasks, is a more realistic scenario than the traditional class incremental setup, which assumes that each task contains unseen classes. CIR assumes that we can easily access abundant unlabeled data from external sources, such as the Internet. Therefore, we propose two components that efficiently use the unlabeled data to ensure the high stability and the plasticity of models trained in CIR setup. First, we introduce multi-level knowledge distillation (MLKD) that distills knowledge from multiple previous models across multiple perspectives, including features and logits, so the model can maintain much various previous knowledge. Moreover, we implement dynamic self-supervised loss (SSL) to utilize the unlabeled data that accelerates the learning of new classes, while dynamic weighting of SSL keeps the focus of training to the primary task. Both of our proposed components significantly improve the performance in CIR setup, achieving 2nd place in the CVPR 5th CLVISION Challenge.
32	Consistency-based Abductive Reasoning over Perceptual Errors of Multiple Pre-trained Models in Novel Environments	2025-05-25	展开 The deployment of pre-trained perception models in novel environments often leads to performance degradation due to distributional shifts. Although recent artificial intelligence approaches for metacognition use logical rules to characterize and filter model errors, improving precision often comes at the cost of reduced recall. This paper addresses the hypothesis that leveraging multiple pre-trained models can mitigate this recall reduction. We formulate the challenge of identifying and managing conflicting predictions from various models as a consistency-based abduction problem, building on the idea of abductive learning (ABL) but applying it to test-time instead of training. The input predictions and the learned error detection rules derived from each model are encoded in a logic program. We then seek an abductive explanation--a subset of model predictions--that maximizes prediction coverage while ensuring the rate of logical inconsistencies (derived from domain constraints) remains below a specified threshold. We propose two algorithms for this knowledge representation task: an exact method based on Integer Programming (IP) and an efficient Heuristic Search (HS). Through extensive experiments on a simulated aerial imagery dataset featuring controlled, complex distributional shifts, we demonstrate that our abduction-based framework outperforms individual models and standard ensemble baselines, achieving, for instance, average relative improvements of approximately 13.6% in F1-score and 16.6% in accuracy across 15 diverse test datasets when compared to the best individual model. Our results validate the use of consistency-based abduction as an effective mechanism to robustly integrate knowledge from multiple imperfect models in challenging, novel scenarios.
33	Disentangling Knowledge Representations for Large Language Model Editing	2025-05-24	展开 Knowledge Editing has emerged as a promising solution for efficiently updating embedded knowledge in large language models (LLMs). While existing approaches demonstrate effectiveness in integrating new knowledge and preserving the original capabilities of LLMs, they fail to maintain fine-grained irrelevant knowledge, namely facts that share the same subject as edited knowledge but differ in relation and object. This challenge arises because subject representations inherently encode multiple attributes, causing the target and fine-grained irrelevant knowledge to become entangled in the representation space, and thus vulnerable to unintended alterations during editing. To address this, we propose DiKE, a novel approach that Disentangles Knowledge representations for LLM Editing (DiKE). DiKE consists of two key components: a Knowledge Representation Disentanglement (KRD) module that decomposes the subject representation into target-knowledge-related and -unrelated components, and a Disentanglementbased Knowledge Edit (DKE) module that updates only the target-related component while explicitly preserving the unrelated one. We further derive a closedform, rank-one parameter update based on matrix theory to enable efficient and minimally invasive edits. To rigorously evaluate fine-grained irrelevant knowledge preservation, we construct FINE-KED, a new benchmark comprising fine-grained irrelevant knowledge at different levels of relational similarity to the edited knowledge. Extensive experiments across multiple LLMs demonstrate that DiKE substantially improves fine-grained irrelevant knowledge preservation while maintaining competitive general editing performance.
34	OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models	2025-04-30	展开 Audio-visual segmentation aims to separate sounding objects from videos by predicting pixel-level masks based on audio signals. Existing methods primarily concentrate on closed-set scenarios and direct audio-visual alignment and fusion, which limits their capability to generalize to new, unseen situations. In this paper, we propose OpenAVS, a novel training-free language-based approach that, for the first time, effectively aligns audio and visual modalities using text as a proxy for open-vocabulary Audio-Visual Segmentation (AVS). Equipped with multimedia foundation models, OpenAVS directly infers masks through 1) audio-to-text prompt generation, 2) LLM-guided prompt translation, and 3) text-to-visual sounding object segmentation. The objective of OpenAVS is to establish a simple yet flexible architecture that relies on the most appropriate foundation models by fully leveraging their capabilities to enable more effective knowledge transfer to the downstream AVS task. Moreover, we present a model-agnostic framework OpenAVS-ST that enables the integration of OpenAVS with any advanced supervised AVS model via pseudo-label based self-training. This approach enhances performance by effectively utilizing large-scale unlabeled data when available. Comprehensive experiments on three benchmark datasets demonstrate the superior performance of OpenAVS. It surpasses existing unsupervised, zero-shot, and few-shot AVS methods by a significant margin, achieving absolute performance gains of approximately 9.4% and 10.9% in mIoU and F-score, respectively, in challenging scenarios.
35	Unleashing the Potential of Mamba: Boosting a LiDAR 3D Sparse Detector by Using Cross-Model Knowledge Distillation	2024-09-17	展开 The LiDAR 3D object detector that strikes a balance between accuracy and speed is crucial for achieving real-time perception in autonomous driving. However, many existing LiDAR detection models depend on complex feature transformations, leading to poor real-time performance and high resource consumption, which limits their practical effectiveness. In this work, we propose a faster LiDAR 3D object detector, a framework that adaptively aligns sparse voxels to enable efficient heterogeneous knowledge distillation, called FASD. We aim to distill the Transformer sequence modeling capability into Mamba models, significantly boosting accuracy through knowledge transfer. Specifically, we first design the architecture for cross-model knowledge distillation to impart the global contextual understanding capabilities of the Transformer to Mamba. Transformer-based teacher model employ a scale-adaptive attention mechanism to enhance multiscale fusion. In contrast, Mamba-based student model leverages feature alignment through spatial-based adapters, supervised with latent space feature and span-head distillation losses, leading to improved performance and efficiency. We evaluated the FASD on the Waymo and nuScenes datasets, achieving a 4x reduction in resource consumption and a 1-2% performance improvement over the baseline, while also delivering significant gains in accuracy and efficiency in real deployment.
36	Towards Automated Knowledge Transfer in Evolutionary Multitasking via Large Language Models	2024-09-06	展开 Evolutionary multi-task optimization (EMTO) is an advanced optimization paradigm that improves search efficiency by enabling knowledge transfer across multiple tasks solved in parallel. Accordingly, a broad range of knowledge transfer methods (KTMs) have been developed as integral components of EMTO algorithms, most of which are tailored to specific problem settings. However, the design of effective KTMs typically relies on substantial domain expertise and careful manual customization, as different EMTO scenarios require distinct transfer strategies to achieve performance gains. Meanwhile, recent advances in large language models (LLMs) have demonstrated strong capabilities in autonomous programming and algorithm synthesis, opening up new possibilities for automating the design of optimization solvers. Motivated by this, in this paper, we propose a Self-guided Knowledge Transfer Design (SKTD) framework that leverages LLMs to autonomously generate knowledge transfer methods (KTMs) as algorithmic components within EMTO. By enabling data-driven and self-adaptive construction of transfer strategies, SKTD facilitates effective knowledge reuse across heterogeneous tasks and diverse EMTO scenarios. To the best of our knowledge, this work represents the first attempt to automate the generation of KTMs for EMTO. Extensive experiments on well-established EMTO benchmarks with varying degrees of task similarity demonstrate that the proposed SKTD consistently achieves superior or highly competitive performance compared with both the state-of-the-art program search approach and manually designed EMTO methods, in terms of optimization effectiveness and cross-scenario generalization.
37	Embedding Ontologies via Incorporating Extensional and Intensional Knowledge	2024-01-20	展开 Ontologies contain rich knowledge within domain, which can be divided into two categories, namely extensional knowledge and intensional knowledge. Extensional knowledge provides information about the concrete instances that belong to specific concepts in the ontology, while intensional knowledge details inherent properties, characteristics, and semantic associations among concepts. However, existing ontology embedding approaches fail to take both extensional knowledge and intensional knowledge into fine consideration simultaneously. In this paper, we propose a novel ontology embedding approach named EIKE (Extensional and Intensional Knowledge Embedding) by representing ontologies in two spaces, called extensional space and intensional space. EIKE presents a unified framework for embedding instances, concepts and their relations in an ontology, applying a geometry-based method to model extensional knowledge and a pretrained language model to model intensional knowledge, which can capture both structure information and textual information. Experimental results show that EIKE significantly outperforms state-of-the-art methods in three datasets for both triple classification and link prediction, indicating that EIKE provides a more comprehensive and representative perspective of the domain.

8. combinatorial game theory/xiangqi/chinese chess

序号	标题	日期	摘要
1	Additive sink subtraction	2026-01-26	展开 Subtraction games are a classical topic in Combinatorial Game Theory. A result of Golomb~(1966) shows that every subtraction game with a finite move set has an eventually periodic nim-sequence, but the known proof yields only an exponential upper bound on the period length. Flammenkamp~(1997) conjectures a striking classification for three-move subtraction games: non-additive rulesets exhibit linear period lengths of the form ``the sum of two moves'', where the choice of which two moves displays fractal-like behavior, while additive sets $S={a,b,a+b}$ have purely periodic outcomes with linear or quadratic period lengths. Despite early attention in Winning Ways~(1982), the general additive case remains open. We introduce and analyze a dual winning convention, which we call {\sc sink subtraction}. Unlike the standard {\em wall} convention, where moves to negative positions are forbidden, the sink convention declares a player the winner upon moving to a non-positive position. We show that {\sc additive sink subtraction} admits a complete solution: the nim-sequence is purely periodic with an explicit linear or quadratic period formula, and we conjecture a duality between additive sink subtraction and classical wall subtraction. Keywords: Additive Subtraction Game, Nimber, Periodicity, Sink Convention.
2	Variants of Wythoff game with terminal positions or blocking maneuvers	2025-12-12	展开 We show how the software Walnut can be used to obtain concise proofs of results concerning variants of the famous Wythoff game, in which blocking maneuvers or terminal positions are added, as discussed respectively by Larsson (2011) and Komak et al. (2025). Our approach provides automatic proofs that both confirm and extend their results, and the same techniques apply to newly introduced variants as well. Then, using classic techniques, we obtain new recursive and morphic characterizations of Wythoff-type games where the set of terminal positions $(x,y)$ satisfy $x+y\le\ell$. The use of Walnut in combinatorial game theory is relatively recent, and only a few examples have been explored so far. The Wythoff game, being directly connected to the Fibonacci numeration system, proves especially well-suited to this kind of approach. It permits us to solve instances for a fixed value of a parameter.
3	Impartial Games with Activeness	2025-11-26	展开 A combinatorial game is a two-player game without hidden information or chance elements. The main object of combinatorial game theory is to obtain the outcome, which player has a winning strategy, of a given combinatorial game. Positions of many well-known combinatorial games are naturally decomposed into a disjunctive sum of multiple components and can be analyzed independently for each component. Therefore, the study of disjunctive sums is a major topic in combinatorial game theory. Combinatorial games in which both players have the same set of possible moves for every position are called impartial games. In the normal-play convention, it is known that the outcome of a disjunctive sum of impartial games can be obtained by computing the Grundy number of each term. The theory of impartial games is generalized in various forms. This paper proposes another generalization of impartial games to a new framework, impartial games with activeness: each game is assigned a status of either `active'' or` inactive''; the status may change by moves; a disjunctive sum of games ends immediately, not only when no further moves can be made, but also when all terms become inactive. We formally introduce impartial games with activeness and investigate their fundamental properties.
4	$\mathcal{L}\mathcal{R}$-Ending partisan rulesets	2025-11-18	展开 In this paper, we consider $\mathcal{L}\mathcal{R}$-ending partisan rulesets as a branch of combinatorial game theory. In these rulesets, the sets of options of both players are the same. However, there are two kinds of terminal positions. If the game ends in one of the terminal positions, then a player wins and if the game ends in the other terminal position, the other player wins. We introduce notations for positions in $\mathcal{L}\mathcal{R}$-ending partisan rulesets and show their algebraic structures. We also introduce some examples of $\mathcal{L}\mathcal{R}$-partisan rulesets and show how our results can be used for analyzing the rulesets.
5	Various Diamond Properties in Combinatorial Game Theory	2025-09-26	展开 We investigate conditions under which positions in combinatorial games admit simple values. We introduce a unified diamond framework, the $\Diamond_A$-property ($A\in{\mathbb{Z},\mathbb{D}$), for sets of positions closed under options. Under certain conditions, this framework guarantees that all values are integers, dyadic rationals, or pairs ${m
6	Xiangqi-R1: Enhancing Spatial Strategic Reasoning in LLMs for Chinese Chess via Reinforcement Learning	2025-07-16	展开 Game playing has long served as a fundamental benchmark for evaluating Artificial General Intelligence. While Large Language Models (LLMs) have demonstrated impressive capabilities in general reasoning, their effectiveness in spatial strategic reasoning, which is critical for complex and fully observable board games, remains insufficiently explored. In this work, we adopt Chinese Chess (Xiangqi) as a challenging and rich testbed due to its intricate rules and spatial complexity. To advance LLMs' strategic competence in such environments, we propose a training framework tailored to Xiangqi, built upon a large-scale dataset of five million board-move pairs enhanced with expert annotations and engine evaluations. Building on this foundation, we introduce Xiangqi-R1, a 7B-parameter model trained in multi-stage manner. Our Experimental results indicate that, despite their size and power, general-purpose LLMs struggle to achieve satisfactory performance in these tasks. Compared to general-purpose LLMs, Xiangqi-R1 greatly advances with an 18% rise in move legality and a 22% boost in analysis accuracy. Our results point to a promising path for creating general strategic intelligence in complex areas.
7	On 3-terminal positions in Hex	2025-07-11	展开 This paper is about 3-terminal regions in Hex. A 3-terminal region is a region of the Hex board that is completely surrounded by black and white stones, in such a way that the black boundary stones form 3 connected components. We characterize Hex as the universal planar Shannon game of degree 3. This ensures that every Hex position can be decomposed into 3-terminal regions. We then investigate the combinatorial game theory of 3-terminal regions. We show that there are infinitely many distinct Hex-realizable values for such regions. We introduce an infinite family of 3-terminal positions called superswitches and investigate their properties. We also present a database of Hex-realizable 3-terminal values, and illustrate its utility as a problem-solving tool by giving various applications. The applications include the automated verification of connects-both templates and pivoting templates, a new handicap strategy for $11\times 11$ Hex, and a method for constructing witnesses for the non-inferiority of probes in many Hex templates. These methods allow us to disprove a conjecture by Henderson and Hayward.
8	A number game reconciliation	2025-07-07	展开 Number games play a central role in alternating normal play combinatorial game theory due to their real-number-like properties (Conway 1976). Here we undertake a critical re-examination: we begin with integer and dyadic games and identify subtle inconsistencies and oversights in the established literature (e.g. Siegel 2013), most notably, the lack of distinction between a game being a number and a game being equal to a number. After addressing this, we move to the general theory of number games. We analyze Conway's original definition and a later refinement by Siegel, and highlight conceptual gaps that have largely gone unnoticed. Through a careful dissection of these issues, we propose a more coherent and robust formulation. Specifically, we develop a refined characterization of numbers, via several subclasses, dyadics, canonical forms, their group theoretic closure and zugzwangs, that altogether better capture the essence of number games. This reconciliation not only clarifies existing ambiguities but also uncovers several open problems.
9	Deep Reinforcement Learning Xiangqi Player with Monte Carlo Tree Search	2025-06-18	展开 This paper presents a Deep Reinforcement Learning (DRL) system for Xiangqi (Chinese Chess) that integrates neural networks with Monte Carlo Tree Search (MCTS) to enable strategic self-play and self-improvement. Addressing the underexplored complexity of Xiangqi, including its unique board layout, piece movement constraints, and victory conditions, our approach combines policy-value networks with MCTS to simulate move consequences and refine decision-making. By overcoming challenges such as Xiangqi's high branching factor and asymmetrical piece dynamics, our work advances AI capabilities in culturally significant strategy games while providing insights for adapting DRL-MCTS frameworks to domain-specific rule systems.
10	Computational and Algebraic Structure of Board Games	2025-02-18	展开 We provide two methodologies in the area of computation theory to solve optimal strategies for board games such as Xi Gua Qi and Go. From experimental results, we find relevance to graph theory, matrix representation, and mathematical consciousness. We prove that the decision strategy of movement for Xi Gua Qi and Chinese checker games belongs to a subset that is neither a ring nor a group over set Y={-1,0,1}. Additionally, the movement for any board game with two players belongs to a subset that is neither a ring nor a group from the razor of Occam. We derive the closed form of the transition matrix for any board game with two players such as chess and Chinese chess. We discover that the element of the transition matrix belongs to a rational number. We propose a different methodology based on algebra theory to analyze the complexity of board games in their entirety, instead of being limited solely to endgame results. It is probable that similar decision processes of people may also belong to a matrix representation that is neither a ring nor a group.
11	RemoteChess: Enhancing Older Adults' Social Connectedness via Designing a Virtual Reality Chinese Chess (Xiangqi) Community	2025-02-17	展开 The decline of social connectedness caused by distance and physical limitations severely affects older adults' well-being and mental health. While virtual reality (VR) is promising for older adults to socialize remotely, existing social VR designs primarily focus on verbal communication (e.g., reminiscent, chat). Actively engaging in shared activities is also an important aspect of social connection. We designed RemoteChess, which constructs a social community and a culturally relevant activity (i.e., Chinese chess) for older adults to play while engaging in social interaction. We conducted a user study with groups of older adults interacting with each other through RemoteChess. Our findings indicate that RemoteChess enhanced participants' social connectedness by offering familiar environments, culturally relevant social catalysts, and asymmetric interactions. We further discussed design guidelines for designing culturally relevant social activities in VR to promote social connectedness for older adults.
12	Complete Implementation of WXF Chinese Chess Rules	2024-12-23	展开 Unlike repetitions in Western Chess where all repetitions are draws, repetitions in Chinese Chess could result in a win, draw, or loss depending on the kind of repetition being made by both players. One of the biggest hurdles facing Chinese Chess application development is a proper system for judging games correctly. This paper introduces a complete algorithm for ruling the WXF rules correctly in all 110 example cases found in the WXF manual. We introduce several novel optimizations for speeding up the repetition handling without compromising the program correctness. This algorithm is usable in engines, and we saw a total increase in playing strength by +10 point rating increase, or an increased 5% winrate when integrating this approach into our prototype engine.
13	Mastering Chinese Chess AI (Xiangqi) Without Search	2024-10-07	展开 We have developed a high-performance Chinese Chess AI that operates without reliance on search algorithms. This AI has demonstrated the capability to compete at a level commensurate with the top 0.1% of human players. By eliminating the search process typically associated with such systems, this AI achieves a Queries Per Second (QPS) rate that exceeds those of systems based on the Monte Carlo Tree Search (MCTS) algorithm by over a thousandfold and surpasses those based on the AlphaBeta pruning algorithm by more than a hundredfold. The AI training system consists of two parts: supervised learning and reinforcement learning. Supervised learning provides an initial human-like Chinese chess AI, while reinforcement learning, based on supervised learning, elevates the strength of the entire AI to a new level. Based on this training system, we carried out enough ablation experiments and discovered that 1. The same parameter amount of Transformer architecture has a higher performance than CNN on Chinese chess; 2. Possible moves of both sides as features can greatly improve the training process; 3. Selective opponent pool, compared to pure self-play training, results in a faster improvement curve and a higher strength limit. 4. Value Estimation with Cutoff(VECT) improves the original PPO algorithm training process and we will give the explanation.
14	XQSV: A Structurally Variable Network to Imitate Human Play in Xiangqi	2024-07-05	展开 In this paper, we introduce an innovative deep learning architecture, termed Xiangqi Structurally Variable (XQSV), designed to emulate the behavioral patterns of human players in Xiangqi, or Chinese Chess. The unique attribute of XQSV is its capacity to alter its structural configuration dynamically, optimizing performance for the task based on the particular subset of data on which it is trained. We have incorporated several design improvements to significantly enhance the network's predictive accuracy, including a local illegal move filter, an Elo range partitioning, a sequential one-dimensional input, and a simulation of imperfect memory capacity. Empirical evaluations reveal that XQSV attains a predictive accuracy of approximately 40%, with its performance peaking within the trained Elo range. This indicates the model's success in mimicking the play behavior of individuals within that specific range. A three-terminal Turing Test was employed to demonstrate that the XQSV model imitates human behavior more accurately than conventional Xiangqi engines, rendering it indistinguishable from actual human opponents. Given the inherent nondeterminism in human gameplay, we propose two supplementary relaxed evaluation metrics. To our knowledge, XQSV represents the first model to mimic Xiangqi players.
15	Shogi and Frieze group	2023-11-15	展开 Shogi is a traditional Japanese strategy board game in the same family as chess, chaturanga, and xiangqi, and has been theoretically studied from various aspects. The research on recommended sequences of moves in each opening of shogi is called joseki; how to use a rook (Static Rook and Ranging Rook), or how to develop a castle, etc. Also, many pieces of tsume shogi, artistic shogi miniature problems, in which the opponent's king is checkmated by a series of checks, have been created involving various beautiful techniques such as "saw" and "puzzle ring". In addition, the rapid development of AI in recent years has led to the pursuit of the best possible moves in shogi. In this paper, we move away from the study of winning and losing in shogi and focus on the mathematical aspects of the movement of shogi pieces. We propose to correspond movements of shogi pieces to a set of geometrical patterns constructed by the shape of shogi pieces and representing the Frieze group through the condition regarding the neighborhood of arrangements of given shogi pieces. Although the discovery of this correspondence does not lead to a winning strategy for shogi, however, it does demonstrate a curious involvement between the traditional Japanese board game and Western mathematics.
16	JiangJun: Mastering Xiangqi by Tackling Non-Transitivity in Two-Player Zero-Sum Games	2023-08-09	展开 This paper presents an empirical exploration of non-transitivity in perfect-information games, specifically focusing on Xiangqi, a traditional Chinese board game comparable in game-tree complexity to chess and shogi. By analyzing over 10,000 records of human Xiangqi play, we highlight the existence of both transitive and non-transitive elements within the game's strategic structure. To address non-transitivity, we introduce the JiangJun algorithm, an innovative combination of Monte-Carlo Tree Search (MCTS) and Policy Space Response Oracles (PSRO) designed to approximate a Nash equilibrium. We evaluate the algorithm empirically using a WeChat mini program and achieve a Master level with a 99.41% win rate against human players. The algorithm's effectiveness in overcoming non-transitivity is confirmed by a plethora of metrics, such as relative population performance and visualization results. Our project site is available at \url{https://sites.google.com/view/jiangjun-site/}.
17	Niel's Chess -- Rules for Xiangqi	2023-06-27	展开 In this paper, the rules of Niel's Chess are adapted to the game of Xiangqi, following the idea that the River and the Palaces play an important role in restricting and enabling chess pieces in their movements.
18	On the complexity of Dark Chinese Chess	2021-12-06	展开 This paper provides a complexity analysis for the game of dark Chinese chess (a.k.a. "JieQi"), a variation of Chinese chess. Dark Chinese chess combines some of the most complicated aspects of board and card games, such as long-term strategy or planning, large state space, stochastic, and imperfect-information, which make it closer to the real world decision-making problem and pose great challenges to game AI. Here we design a self-play program to calculate the game tree complexity and average information set size of the game, and propose an algorithm to calculate the number of information sets.
19	Cumulative Games: Who is the current player?	2020-05-13	展开 Combinatorial Game Theory(CGT)is a branch of Game Theory that has developed largely independently of Economic Game Theory (EGT), and is concerned with deep mathematical properties of two-player zero-sum games recursively defined over various combinatorial structures. The aim of this work is to lay the foundations for bridging the conceptual and technical gaps between CGT and EGT, here interpreted as multiplayer Extensive Form Games, so that they can be treated within a unified framework. More specifically, we introduce a class of $n$-player, general-sum games, called {\sc Cumulative Games}, which can be analyzed using tools from both CGT and EGT. We show how two of the most fundamental definitions of CGT, the outcome function and the disjunctive sum operator, naturally extend to the class of {\sc Cumulative Games}. The outcome function allows for efficient equilibrium computation under certain restrictions, while the disjunctive sum operator lets us define a partial order over games according to the advantage that a given player has. Finally, we show that any Extensive Form Game can be written as a {\sc Cumulative Game}.

9. code llm

序号	标题	日期	摘要
1	Compiling Code LLMs into Lightweight Executables	2026-03-31	展开 The demand for better prediction accuracy and higher execution performance in neural networks continues to grow. The emergence and success of Large Language Models (LLMs) have led to the development of many cloud-based tools for software engineering tasks such as code suggestion. While effective, cloud deployment raises concerns over privacy, latency, and reliance on connectivity. Running LLMs locally on personal devices such as laptops would address these issues by enabling offline use and reducing response time. However, local deployment is challenging: commodity devices lack high-performance accelerators like GPUs and are constrained by limited memory and compute capacity, making it difficult to execute large models efficiently. We present Ditto, a novel method for optimizing both the model size of Code LLMs and their inference programs, particularly for statically-typed programming languages such as C. Our approach integrates two key components: (1) a model compression technique inspired by product quantization, which clusters model parameters into codebooks and quantizes them to lower bit widths while ensuring that outputs remain within a bounded error, as well as synthesizing the inference program for the quantized model; and (2) a compilation pass integrated into LLVM that automatically detects and replaces unoptimized General Matrix-Vector Multiplication (GEMV) operations with implementations from Basic Linear Algebra Subprograms (BLAS) libraries, which are highly optimized for runtime performance. The output of Ditto is an optimized and compiled executable for running selected Code LLMs. We evaluate Ditto on three popular Code LLMs, achieving up to 10.5$\times$ faster inference and 6.4$\times$ lower memory usage compared with their original inference pipeline, while maintaining accuracy close to that of the full-precision models (with an average loss of only 0.27% in pass@1).
2	Steering Code LLMs with Activation Directions for Language and Library Control	2026-03-24	展开 Code LLMs often default to particular programming languages and libraries under neutral prompts. We investigate whether these preferences are encoded as approximately linear directions in activation space that can be manipulated at inference time. Using a difference-in-means method, we estimate layer-wise steering vectors for five language/library pairs and add them to model hidden states during generation. Across three open-weight code LLMs, these interventions substantially increase generation toward the target ecosystem under neutral prompts and often remain effective even when prompts explicitly request the opposite choice. Steering strength varies by model and target, with common ecosystems easier to induce than rarer alternatives, and overly strong interventions can reduce output quality. Overall, our results suggest that code-style preferences in LLMs are partly represented by compact, steerable structure in activation space.
3	GASP: Guided Asymmetric Self-Play For Coding LLMs	2026-03-16	展开 Asymmetric self-play has emerged as a promising paradigm for post-training large language models, where a teacher continually generates questions for a student to solve at the edge of the student's learnability. Although these methods promise open-ended data generation bootstrapped from no human data, they suffer from one major problem: not all problems that are hard to solve are interesting or informative to improve the overall capabilities of the model. Current asymmetric self-play methods are goal-agnostic with no real grounding. We propose Guided Asymmetric Self-Play (GASP), where grounding is provided by real-data goalpost questions that are identified to pose a hard exploration challenge to the model. During self-play, the teacher first generates an easier variant of a hard question, and then a harder variant of that easier question, with the goal of gradually closing the gap to the goalpost throughout training. Doing so, we improve pass@20 on LiveCodeBench (LCB) by 2.5% over unguided asymmetric self-play, and through the curriculum constructed by the teacher, we manage to solve hard goalpost questions that remain out of reach for all baselines.
4	Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning	2026-03-16	展开 Reinforcement learning for code generation relies on verifiable rewards from unit test pass rates. Yet high-quality test suites are scarce, existing datasets offer limited coverage, and static rewards fail to adapt as models improve. Recent self-play methods unify code and test generation in a single model, but face a inherent dilemma: white-box access leads to self-collusion where the model produces trivial tests for easy rewards, yet black-box restriction yields generic tests that miss implementation-specific bugs. We introduce Code-A1, an adversarial co-evolution framework that jointly optimizes a Code LLM and a Test LLM with opposing objectives. The Code LLM is rewarded for passing more tests, while the Test LLM is rewarded for exposing more defects. This architectural separation eliminates self-collusion risks and safely enables white-box test generation, where the Test LLM can inspect candidate code to craft targeted adversarial tests. We further introduce a Mistake Book mechanism for experience replay and a composite reward balancing test validity with adversarial difficulty. Experiments on Qwen2.5-Coder models demonstrate that Code-A1 achieves code generation performance matching or exceeding models trained on human-annotated tests, while significantly improving test generation capability.
5	ExecVerify: White-Box RL with Verifiable Stepwise Rewards for Code Execution Reasoning	2026-03-11	展开 Code LLMs still struggle with code execution reasoning, especially in smaller models. Existing methods rely on supervised fine-tuning (SFT) with teacher-generated explanations, primarily in two forms: (1) input-output (I/O) prediction chains and (2) natural-language descriptions of execution traces. However, intermediate execution steps cannot be explicitly verified during SFT, so the training objective can reduce to merely matching teacher explanations. Moreover, training data is typically collected without explicit control over task difficulty. We introduce ExecVerify, which goes beyond text imitation by incorporating verifiable white-box rewards derived from execution traces, including next-statement prediction and variable value/type prediction. Our work first builds a dataset with multiple difficulty levels via constraint-based program synthesis. Then, we apply reinforcement learning (RL) to reward correct answers about both intermediate execution steps and final outputs, aligning the training objective with semantic correctness at each execution step. Finally, we adopt a two-stage training pipeline that first enhances execution reasoning and then transfers to code generation. Experiments demonstrate that a 7B model trained with ExecVerify achieves performance comparable to 32B models on code reasoning benchmarks and improves pass@1 by up to 5.9% on code generation tasks over strong post-training baselines.
6	Neuron-Guided Interpretation of Code LLMs: Where, Why, and How?	2025-12-23	展开 Code language models excel on code intelligence tasks, yet their internal interpretability is underexplored. Existing neuron interpretability techniques from NLP are suboptimal for source code due to programming languages formal, hierarchical, and executable nature. We empirically investigate code LLMs at the neuron level, localizing language-specific neurons (selectively responsive to one language) and concept layers (feed-forward layers encoding language-agnostic code representations). We analyze Llama-3.1-8B and Qwen2.5-Coder-32B on multilingual inputs in C++, Java, Python, Go, and JavaScript, measuring neuron selectivity and layerwise contributions during generation. We find (1) neurons specialized for individual languages alongside a universal subset supporting general-purpose generation; and (2) lower layers mainly encode language-specific syntax, while middle layers capture semantic abstractions shared across languages, emerging as concept layers. We demonstrate utility on three tasks: neuron-guided fine-tuning for code generation, clone detection via concept-layer embeddings, and concept-layer-guided transfer for code summarization, each yielding consistent gains in multilingual settings.
7	Do Code LLMs Do Static Analysis?	2025-05-17	展开 This paper investigates code LLMs' capability of static analysis during code intelligence tasks such as code summarization and generation. Code LLMs are now household names for their abilities to do some programming tasks that have heretofore required people. The process that people follow to do programming tasks has long been understood to require static analysis. For example, human programmers navigate the call graph of large programs to comprehend the different parts of those programs. Education in programming includes static analysis under the assumption that better static analysis skills beget better programming. While popular culture is replete with anthropomorphic references such as LLM ``reasoning'', in fact code LLMs could exhibit a wholly alien thought process to humans. This paper studies the specific question of static analysis by code LLMs. We use three different static analysis tasks (callgraph generation, AST generation, and dataflow generation) and three different code intelligence tasks (code generation, summarization, and translation) with two different open-source models (Gemini and GPT-4o) and closed-source models (CodeLlaMA and Jam) as our experiments. We found that LLMs show poor performance on static analysis tasks and that pretraining on the static analysis tasks does not generalize to better performance on the code intelligence tasks and vice versa.
8	Is Compression Really Linear with Code Intelligence?	2025-05-16	展开 Understanding the relationship between data compression and the capabilities of Large Language Models (LLMs) is crucial, especially in specialized domains like code intelligence. Prior work posited a linear relationship between compression and general intelligence. However, it overlooked the multifaceted nature of code that encompasses diverse programming languages and tasks, and struggled with fair evaluation of modern Code LLMs. We address this by evaluating a diverse array of open-source Code LLMs on comprehensive multi-language, multi-task code benchmarks. To address the challenge of efficient and fair evaluation of pre-trained LLMs' code intelligence, we introduce \textit{Format Annealing}, a lightweight, transparent training methodology designed to assess the intrinsic capabilities of these pre-trained models equitably. Compression efficacy, measured as bits-per-character (BPC), is determined using a novel, large-scale, and previously unseen code validation set derived from GitHub. Our empirical results reveal a fundamental logarithmic relationship between measured code intelligence and BPC. This finding refines prior hypotheses of linearity, which we suggest are likely observations of the logarithmic curve's tail under specific, limited conditions. Our work provides a more nuanced understanding of compression's role in developing code intelligence and contributes a robust evaluation framework in the code domain.

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
.github		.github
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
justfile		justfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

自动论文推送（每月更新）

最后更新：2026-04-02 00:24

论文汇总（115篇）

1. efficient RL

2. partial observable markov decision process/pomdp

3. sparse reward reinforcement learning

4. casual RL/counterfactual RL/casual reinforcement learning

5. causal inference/causal discovery/counterfactual reasoning

6. video super resolution

7. knowledge graph/knowledge distillation/knowledge representation/knowledge transfer/knowledge embedding

8. combinatorial game theory/xiangqi/chinese chess

9. code llm

10. speech recognition

11. zero shot tracking/few shot tracking/pose tracking/pose estimation

12. text to 3d/image to 3d/text to texture

13. automated theorem proving/interactive theorem proving/formal verification

鸣谢

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

自动论文推送（每月更新）

最后更新：2026-04-02 00:24

论文汇总（115篇）

1. efficient RL

2. partial observable markov decision process/pomdp

3. sparse reward reinforcement learning

4. casual RL/counterfactual RL/casual reinforcement learning

5. causal inference/causal discovery/counterfactual reasoning

6. video super resolution

7. knowledge graph/knowledge distillation/knowledge representation/knowledge transfer/knowledge embedding

8. combinatorial game theory/xiangqi/chinese chess

9. code llm

10. speech recognition

11. zero shot tracking/few shot tracking/pose tracking/pose estimation

12. text to 3d/image to 3d/text to texture

13. automated theorem proving/interactive theorem proving/formal verification

鸣谢

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages